TAKE YOUR DATA ENGINEERING SKILLS TO THE NEXT LEVEL BY LEARNING HOW TO
UTILIZE SCALA AND FUNCTIONAL PROGRAMMING TO CREATE CONTINUOUS AND
SCHEDULED PIPELINES THAT INGEST, TRANSFORM, AND AGGREGATE DATA
KEY FEATURES
* Transform data into a clean and trusted source of information for
your organization using Scala
* Build streaming and batch-processing pipelines with step-by-step
explanations
* Implement and orchestrate your pipelines by following CI/CD best
practices and test-driven development (TDD)
* Purchase of the print or Kindle book includes a free PDF eBook
BOOK DESCRIPTION
Most data engineers know that performance issues in a distributed
computing environment can easily lead to issues impacting the overall
efficiency and effectiveness of data engineering tasks. While Python
remains a popular choice for data engineering due to its ease of use,
Scala shines in scenarios where the performance of distributed data
processing is paramount. This book will teach you how to leverage the
Scala programming language on the Spark framework and use the latest
cloud technologies to build continuous and triggered data pipelines.
You’ll do this by setting up a data engineering environment for
local development and scalable distributed cloud deployments using
data engineering best practices, test-driven development, and CI/CD.
You’ll also get to grips with DataFrame API, Dataset API, and Spark
SQL API and its use. Data profiling and quality in Scala will also be
covered, alongside techniques for orchestrating and performance tuning
your end-to-end pipelines to deliver data to your end users. By the
end of this book, you will be able to build streaming and batch data
pipelines using Scala while following software engineering best
practices.
WHAT YOU WILL LEARN
* Set up your development environment to build pipelines in Scala
* Get to grips with polymorphic functions, type parameterization, and
Scala implicits
* Use Spark DataFrames, Datasets, and Spark SQL with Scala
* Read and write data to object stores
* Profile and clean your data using Deequ
* Performance tune your data pipelines using Scala
WHO THIS BOOK IS FOR
This book is for data engineers who have experience in working with
data and want to understand how to transform raw data into a clean,
trusted, and valuable source of information for their organization
using Scala and the latest cloud technologies.
Les mer
Produktdetaljer
ISBN
9781804614327
Publisert
2024
Utgave
1. utgave
Utgiver
Packt Publishing
Språk
Product language
Engelsk
Format
Product format
Digital bok
Forfatter