Watch this below video
Azure Data factory
- Azure Data Factory by WafaStudies [Mandatory]
- Azure Data Factory by Adam Marczak [Mandatory]
Azure Synapse
- Azure Synapse Analytics by WafaStudies [Optional]
Azure Databricks
- Azure Databricks by WafaStudies [Mandatory]
- Azure Databricks by Adam Marczak [Mandatory]
Pyspark
- Pyspark wiki [Mandatory]
- Pyspark Notes [Interview Questions]
SQL
- SQL by kudvenkat [Mandatory]
- SQL Practice HackerRank [Mandatory]
- SQL Practice Naukri [Mandatory]
- SQL Practice Datalemur [Mandatory]
Python
- Python by Corey Schafer [Mandatory]
git
- Git Essentials by Corey Schafer [Preferred]
- Git Hindi by CodeWithHarry [Optional]
Azure Fundamentals
- Azure Data Fundamentals [Preferred]
- Making Apache Spark Better with Delta Lake [Presentation slides here]
- Understanding Query Plans and Spark UIs - Xiao Li Databricks [Presentation slides here]
- Optimizing Delta Parquet Data Lakes for Apache Spark - Matthew Powers [Presentation slides here]
- Everyday I'm Shuffling - Tips for Writing Better Apache Spark Programs [Presentation slides here]
- Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha [Presentation slides here]
- Apache Spark CoreโDeep DiveโProper Optimization Daniel Tomes Databricks [Presentation slides here]
- The Parquet Format and Performance Optimization Opportunities Boudewijn Braams [Presentation slides here]
- Easy, Scalable, Fault Tolerant Stream Processing with Structured Streaming in Apache Spark [Presentation slides here]
- Spark Architecture, Alexey Grishchenko [Presentation slides here]
- Deeper Understanding of Spark Internals - Aaron Davidson [Presentation slides here]
- Advanced Apache Spark Training - Sameer Farooqui [Presentation slides here]
- Top 5 Mistakes When Writing Spark Applications [Presentation slides here]
- Spark SQL: A compiler from Queries to RDDS with Sameer Agarwal [Presentation slides here]
- Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenchen Fan [Presentation slides here]
- A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai [Presentation slides here]
- Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland [Presentation slides here]
- Change Data Feed in Delta [Presentation slides here]
- Deep Dive into Delta Lake [Presentation slides here]
- Diving into Delta Lake: Unpacking the Transaction Log [Presentation slides here]
- Delta Lake 2.0 Overview [Presentation slides here]
- Accelerating Data Ingestion with Databricks Autoloader Simon [Presentation slides here]
- Tuning and Debugging in Apache Spark Patrick Wendell [Presentation slides here]
- Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia [Presentation slides here]
- Understanding the Performance of Spark Applications - Patrick Wendell [Presentation slides here]
- SQL, DataFrames, Datasets And Streaming - by Michael Armbrust [Presentation slides here]
- Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das [Presentation slides here]
- Designing ETL Pipelines with Structured Streaming and Delta Lake How to Architect Things Right [Presentation slides here]
- Deep Dive: Apache Spark Memory Management [Presentation slides here]
WIP
- Azure IoT Hub Part 1
- Azure IoT Hub Part 2
- Modern Industrial IoT Analytics on Azure - Part 1
- Modern Industrial IoT Analytics on Azure - Part 2
- Modern Industrial IoT Analytics on Azure - Part 3
- Azure Stream Analytics
- Azure Stream Analytics by Adam
- Azure Stream Analytics by Pragmatics
- Exam DP-203: Data Engineering on Microsoft Azure [Preferred] [Questions]
- Databricks Certified Data Engineer Associate [Preferred] [Questions]
- Databricks Certified Data Engineer Professional [Preferred] [Questions]
- Exam DP-900: Microsoft Azure Data Fundamentals [Optional]
- Delta Lake Essentials [Preferred]
- Delta Lake blogs [Preferred]
- Azure Free Account [Use new Credit Card]
- Databricks Community [choose - "Get started with Community Edition"]