Get in Touch

Course Outline

  1. Scala Primer

    • An introductory overview of Scala
    • Labs: Getting Started with Scala
  2. Spark Fundamentals

    • Historical background
    • Relationship between Spark and Hadoop
    • Core concepts and architecture
    • The Spark ecosystem (Core, Spark SQL, MLlib, Streaming)
    • Labs: Installing and Running Spark
  3. Introduction to Spark

    • Executing Spark in local mode
    • Navigating the Spark Web UI
    • Utilizing the Spark Shell
    • Data analysis - Part 1
    • Examining RDDs
    • Labs: Exploring the Spark Shell
  4. Resilient Distributed Datasets (RDDs)

    • RDD concepts
    • Partitions
    • RDD operations and transformations
    • Different RDD types
    • Key-Value pair RDDs
    • Implementing MapReduce patterns on RDDs
    • Caching and persistence strategies
    • Labs: Creating and Inspecting RDDs; Caching RDDs
  5. Spark API Programming

    • Introduction to the Spark API and RDD API
    • Submitting your first Spark program
    • Debugging and logging techniques
    • Configuration properties
    • Labs: Programming with the Spark API; Submitting Jobs
  6. Spark SQL

    • SQL capabilities within Spark
    • Working with DataFrames
    • Defining tables and importing datasets
    • Querying DataFrames using SQL
    • Storage formats: JSON and Parquet
    • Labs: Creating and Querying DataFrames; Evaluating Data Formats
  7. MLlib

    • Introduction to MLlib
    • MLlib algorithms
    • Labs: Developing MLlib Applications
  8. GraphX

    • Overview of the GraphX library
    • GraphX APIs
    • Labs: Processing Graph Data with Spark
  9. Spark Streaming

    • Streaming overview
    • Evaluating Streaming platforms
    • Streaming operations
    • Sliding window operations
    • Labs: Writing Spark Streaming Applications
  10. Spark and Hadoop Integration

    • Introduction to Hadoop (HDFS and YARN)
    • Hadoop and Spark architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS files using Spark
  11. Spark Performance and Tuning

    • Broadcast variables
    • Accumulators
    • Memory management and caching
  12. Spark Operations

    • Deploying Spark in production environments
    • Sample deployment templates
    • Configuration best practices
    • Monitoring strategies
    • Troubleshooting techniques

Requirements

PRE-REQUISITES

Proficiency in at least one of the following languages: Java, Scala, or Python (laboratory exercises will utilize Scala and Python).
Fundamental knowledge of the Linux development environment, including command-line navigation and file editing with VI or nano.

 21 Hours

Number of participants


Price per participant

Testimonials (6)

Upcoming Courses

Related Categories