In this course, you will learn the basics of the Scala programming
language; learn how Apache Spark operates on a cluster; set up
discretized streams with Spark Streaming and transform them as data is
received; analyze streaming data over sliding windows of time;
maintain stateful information across streams of data; connect Spark
Streaming with highly scalable sources of data, including Kafka,
Flume, and Kinesis; dump streams of data in real-time to NoSQL
databases such as Cassandra; run SQL queries on streamed data in
real-time; train machine learning models in real-time with streaming
data, and use them to make predictions that keep getting better over
time; and also, package, deploy, and run self-contained Spark
Streaming code to a real Hadoop cluster using Amazon Elastic
MapReduce.
This course is very hands-on, filled with achievable activities and
exercises to reinforce your learning. By the end of this course, you
will be confidently creating Spark Streaming scripts in Scala and be
prepared to tackle massive streams of data in a whole new way. You
will be surprised at how easy Spark Streaming makes it!
All the codes and supporting files for this course are available at
https://github.com/packtpublishing/streaming-big-data-with-spark-streaming-scala-and-spark-3-
Les mer
Produktdetaljer
ISBN
9781787123915
Publisert
2023
Utgave
1. utgave
Utgiver
Vendor
Packt Publishing
Språk
Product language
Engelsk
Format
Product format
Digital bok
Forfatter