Jetzt Data Engineering for Large Foundation Models: A Ha von Jun Yu und Chang Wen Chen als Buch bei 1Advd.ch kaufen

Dieser Artikel gilt, aufgrund seiner Grösse, beim Versand als 3 Artikel!

Übersicht

Auf mobile öffnen

Lieferstatus:	Vorankündigung
Veröffentlichung:	ANGEKÜNDIGT (Januar 2027)
Genre:	EDV / Informatik
	AI Ops / Artificial Intelligence / Data Engineering / Data Science / Künstliche Intelligenz / Large Language Models (LLMs) / LLM Data Engineering / LLM Pre-training / machine learning / Maschinelles Lernen / Multimodal Alignment / Multimodal Data Processing / Natural Language Processing (NLP) / Natürliche Sprachen und maschinelle Übersetzung / Pre-training Data / Rag / Retrieval-Augmented Generation / synthetic data mehr...
ISBN:	9789819228492
EAN-Code:	9789819228492
Verlag:	Springer EN
Einband:	Gebunden
Sprache:	English
Dimensionen:	H 235 mm / B 155 mm / D
Illustration:	Approx. 2000 p.
Zus. Info:	EUDR exemption - product or manufacturing materials placed on the market prior to 31.12.2025.
Bewertung:	Keine Bewertung vor Veröffentlichung möglich.

Inhalt:

Data quality has become a decisive foundation for large foundation models, shaping their capability, reliability, alignment, and real-world applicability. Data Engineering for Large Foundation Models: A Handbook provides a systematic and practice-oriented guide to data engineering for foundation models. Moving beyond a narrow focus on large language models, the book covers the data lifecycle behind language models, vision-language models, multimodal understanding systems, text-to-image and text-to-video generative models, reasoning models, agentic systems, and domain-specific AI applications.

The book presents a full-stack framework for building high-quality data pipelines for foundation-model development. It covers large-scale pre-training data engineering, including data sourcing, acquisition, cleaning, deduplication, decontamination, tokenization, serialization, efficient loading, and quality evaluation. It also addresses multimodal data engineering for image-text, document, video, and audio data, as well as post-training and alignment data construction, including SFT, preference data, RLHF, Chain-of-Thought reasoning data, tool-use data, agent memory, and multi-turn interaction data.

The book further examines data-centric AI systems, including synthetic data factories, knowledge distillation, enterprise-grade RAG and multimodal RAG pipelines, online feedback loops, knowledge updating, DataOps platforms, data governance, privacy protection, federated learning, and compliance-aware data engineering. Through end-to-end projects and reproducible system designs, readers gain hands-on experience with distributed pre-training data pipelines, domain-specific SFT datasets, multimodal instruction data factories, reasoning data flywheels, agent tool-use data factories, enterprise DataOps platforms, privacy-preserving pipelines, open-source model reproduction, and text-to-video training data pipelines. Using modern tools such as Ray, Spark, Dask, Parquet, WebDataset, vector databases, DVC, MLflow, and Airflow, this handbook equips data engineers, MLOps and DataOps professionals, AI researchers, and technical product teams to build reliable, scalable, and continuously improving foundation-model systems.

Weitere Tipps:		0 Warenkorb ansehen


SUCHEN