Become A Big Data Engineer 1
Become A Big Data Engineer 1
Course Instructor:
A.K.M. Alfaz Uddin
Enterprise Data Engineering Lead Engineer,
Banglalink Digital Communications Ltd.
Former Lead Engineer, bKash Limited.
Former Senior Software Engineer, IMpulse (BD) Ltd
Former Specialist, BI/DW & CLM Systems, Robi Axiata Limited
www.aiquest.org
Module 1: Introduction to Data Engineering: 1 hour
• What is Data? Importance of data.
• Introduction to Data Engineering
• Importance of Data-Driven Decision Making
• Component of Big Data
• Big Data Tools
• Data Engineering vs. Data Science vs. Data Analysis
• Skills required for Data Engineers
• Daily Role and Responsibility of a Data Engineer
• Challenges and Opportunities in Data Engineering
• Data Engineering Lifecycle
• Key Concepts: Big Data, Databases, Data Warehousing
• Question & Answer Session
• Aggregating Data
▪ Aggregate functions: SUM, AVG, COUNT, MIN, MAX.
▪ Grouping data with GROUP BY clause.
▪ Filtering grouped data with HAVING clause.
www.aiquest.org
• Subqueries
• Modifying Data
▪ INSERT.
▪ UPDATE.
▪ DELETE.
▪ MERGE.
• Working with Views
▪ Creating and managing views.
▪ Advantages of using views.
• Introduction to PL/pgSQL
▪ Overview of PL/pgSQL as the procedural language for PostgreSQL.
▪ Importance of stored procedures, functions, and triggers.
• PL/pgSQL Syntax Basics
▪ Structure of PL/pgSQL blocks.
▪ Declaration of variables and data types.
▪ Comments in PL/pgSQL code.
• Flow Control Statements
▪ Conditional statements
▪ Looping
• Creating and Calling Functions
▪ Syntax for creating user-defined functions in PL/pgSQL.
▪ Defining function parameters and return types.
▪ Calling functions from SQL queries or other PL/pgSQL code.
• Stored Procedures
▪ Creating stored procedures in PL/pgSQL.
▪ Difference between functions and stored procedures.
▪ Advantages of using stored procedures for application logic.
• Normalization (1NF, 2NF, 3NF & BCNF)
• Indexes and Performance Optimization
▪ Importance of indexes in database performance.
▪ Creating and managing indexes.
▪ Query optimization techniques.
• Question & Answer Session
• Assignment
www.aiquest.org
Module 3: Python for Data Engineering: 08 Hours
• Python Basics
▪ Introduction to Python and its relevance in Data Engineering.
▪ Setting up Python development environment.
▪ Basic syntax, variables, data types, and operators.
• Data Structures in Python
▪ Lists, tuples, dictionaries, and sets
• Control Flow Structures and Functions
▪ Conditional statements
▪ Looping
▪ Writing and calling functions.
▪ Function parameters and return values.
• File Handling and Input/Output
▪ Reading from and writing to files.
• Working with Libraries
▪ Introduction to Python standard libraries.
▪ Exploring Python libraries: NumPy, Pandas, Polars etc.
▪ Installing and managing libraries using pip.
• Data Manipulation with Pandas
▪ Introduction to Pandas library.
▪ DataFrame basics.
▪ Data loading and manipulating data using DataFrames.
▪ Data cleaning, filtering, and transformation.
▪ Handling missing data.
• NumPy for Numerical Computing
▪ Basics of NumPy arrays.
▪ Mathematical operations with NumPy.
• Working with SQL Databases in Python
▪ Connecting to PostgreSQL using SQLAlchemy/psycopg2.
▪ Executing SQL queries from Python.
• Question & Answer Session
• Assignment
www.aiquest.org
Module 4: Data warehousing & ETL: 2 hours
www.aiquest.org
▪ Understanding Hadoop Distributed File System (HDFS) for distributed
storage.
▪ Introduction to Hadoop MapReduce.
• Apache Spark
▪ Introduction to Apache Spark.
▪ Hadoop vs. Apache Spark.
▪ Basics of Spark programming using Python (PySpark).
• Databricks
▪ Introduction to Databricks
▪ Databricks architecture
▪ Delta Lake.
▪ Setting up Databricks account creation and community edition setup.
▪ Magic Commands in Databricks
• Question & Answer Session
• Assignment
www.aiquest.org
▪ Overview of and GCP & understanding GCP services
• Introduction to Google BigQuery
▪ What is BigQuery and its key features?
▪ Exploring the BigQuery UI and running queries
• Data Loading & Manipulation
▪ Loading data into BigQuery from various sources.
▪ Writing basic SQL queries in BigQuery
▪ Filtering, sorting, and aggregating data
• Question & Answer Session
• Assignment
Contact Details:
Mr. Sohan Khan
www.aiquest.org