Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction, Objectives, and Migration Strategy
- Course goals, alignment with participant profiles, and success criteria
- High-level migration approaches and associated risk considerations
- Configuration of workspaces, repositories, and lab datasets
Day 1 — Migration Fundamentals and Architecture
- Core Lakehouse concepts, Delta Lake overview, and Databricks architecture
- Differences and implications between SMP and MPP architectures for migration
- Medallion (Bronze to Silver to Gold) design principles and Unity Catalog overview
Day 1 Lab — Translating a Stored Procedure
- Hands-on migration of a sample stored procedure into a notebook
- Mapping temporary tables and cursors to DataFrame transformations
- Validation and comparison against original output
Day 2 — Advanced Delta Lake & Incremental Loading
- ACID transactions, commit logs, versioning, and time travel capabilities
- Auto Loader, MERGE INTO patterns, upserts, and schema evolution
- OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning techniques
Day 2 Lab — Incremental Ingestion & Optimization
- Implementation of Auto Loader ingestion and MERGE workflows
- Application of OPTIMIZE, Z-ORDER, and VACUUM; validation of results
- Measurement of read and write performance improvements
Day 3 — SQL in Databricks, Performance & Debugging
- Analytical SQL features: window functions, higher-order functions, and JSON/array handling
- Interpreting the Spark UI, DAGs, shuffles, stages, tasks, and diagnosing bottlenecks
- Query tuning patterns: broadcast joins, hints, caching, and spill reduction
Day 3 Lab — SQL Refactoring & Performance Tuning
- Refactoring a complex SQL process into optimized Spark SQL
- Utilizing Spark UI traces to identify and resolve skew and shuffle issues
- Benchmarking before and after results, and documenting tuning steps
Day 4 — Tactical PySpark: Replacing Procedural Logic
- Spark execution model: driver, executors, lazy evaluation, and partitioning strategies
- Transforming loops and cursors into vectorized DataFrame operations
- Modularization, UDFs/pandas UDFs, widgets, and reusable libraries
Day 4 Lab — Refactoring Procedural Scripts
- Refactoring a procedural ETL script into modular PySpark notebooks
- Introducing parametrization, unit-style tests, and reusable functions
- Code review and application of best-practice checklists
Day 5 — Orchestration, End-to-End Pipeline & Best Practices
- Databricks Workflows: job design, task dependencies, triggers, and error handling
- Designing incremental Medallion pipelines with quality rules and schema validation
- Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic
Day 5 Lab — Build a Complete End-to-End Pipeline
- Assembling a Bronze to Silver to Gold pipeline orchestrated with Workflows
- Implementing logging, auditing, retries, and automated validations
- Running the full pipeline, validating outputs, and preparing deployment notes
Operationalization, Governance, and Production Readiness
- Best practices for Unity Catalog governance, lineage, and access controls
- Cost management, cluster sizing, autoscaling, and job concurrency patterns
- Deployment checklists, rollback strategies, and runbook creation
Final Review, Knowledge Transfer, and Next Steps
- Participant presentations of migration work and lessons learned
- Gap analysis, recommended follow-up activities, and training materials handoff
- References, further learning paths, and support options
Requirements
- A foundational understanding of data engineering concepts
- Practical experience with SQL and stored procedures (Synapse or SQL Server)
- Familiarity with ETL orchestration concepts (such as ADF or similar tools)
Target Audience
- Technology managers possessing a data engineering background
- Data engineers seeking to transition procedural OLAP logic to Lakehouse patterns
- Platform engineers tasked with driving Databricks adoption
35 Hours