Data - Engineering & InterView Grooming Course
Data - Engineering & InterView Grooming Course
COMMUNITY
WEEKENDS DATA ENGINEERING/ DATA SCIENCE COURSE
BATCH 6
Class Already Started, 2023
Course Overview
It is a complete end to end Data Engineering / Data Science course which would cover
Spark, Hive, SQL, Python, AWS Cloud, Airflow, GIT along with Guesstimates and
Problem Solving. This course would particularly be helpful for the fresher’s college students of
someone who wants to make a transition into the engineering-science-analytics field. If someone
wants to upskill oneself oí wants to brush up one's knowledge then,this course would be
particularly very helpful considering the comprehensiveness along with the short duration of
the course.
The recoding of each live session with life-time access would also be provided to you.
But we would urge you to attend the live lectures for better understanding.
SPARK
Spark Overview
Why Spark is getting used everywhere
instead of MapReduce
Advantages & Disadvantages of Spark
Spark Components
Spark Architecture
Spark RDD's , Data Frames in detail
Different File Formats used in Spark
Spark Operations(Transformation & Action)
Shuffling in Spark
Parallelism in Spark
Spark Built in Functions
SPARK SQL in detail
Spark Joins
Spark Optimization techniques
Shared Variables in Spark
Spark Computations
Realtime problem and solution
Spark Assignment
HIVE
Amazon S3 Overview
Different S3 buckets overview
S3 life cycle
real time use case of S3
EMR
Autoscaling & Cooldown
Real time use of EMR
Amazon Athena Overview
Tables & View Creation
MSCK REPAIR
Glue
Redshift
Practice Problems
AIRFLOW
Airflow Overview
Why Airflow
What is DAG
DAG Creation
Operators & Sensors in Airflow
Integration of Spark jobs to Airflow
Real time problem statement
SQL
● Introduction to SQL
● What are databases and SQL and how they can be used together to
dive in
● How to store and modify the data in a database:
● DDL Commands: CREATE, ALTER, DROP, TRUNCATE, etc.
● Data types: VARCHAR, INT, DECIMAL, DATE, BOOLEAN, etc.
● Constraints: PRIMARY KEY UNIQUE KEY and NOT NULL etc.
● DML Operations: INSERT, UPDATE, DELETE etc.
● How to retrieve data: SELECT Statement
● Basic select clause operations: Distinct, Limit, ORDER By
● The filter (WHERE) clause: Logical operations, Comparison
operators,Advance filters
● Aggregation and Advance Aggregation: Group by, Partition By,
RowsBetween clause, Rolling Calculations, filter with Having
clause.
● SQL JOINS: INNER, LEFT, RIGHT, FULL OUTER, SELF, CROSS
● Self-Operations: UNION, UNION ALL, MINUS, Intersect
● Calculated Columns and SQL Functions: CASE WHEN, Date
Functions,String functions, Data type conversion functions, etc.
● Queries within queries: Subqueries and CTE (With Clause)
● Window Analytical Functions: RANK, ROW_NUMBER,
DENSE_RANK,LEAD/LAG, NTILE
● Performance tuning: Clustered and non-clustered indexes, best
practices for SQL optimization
PYTHON
● Introduction to Python
● Variables, keywords, indentation quotes
● Comparison: Arithmetic and logic operator
● LOOP
● PASS, BREAK AND Continue
● String (type casting, string formatting, slicing, string method
● List (type casting, String formatting, slicing, string method)
● List (type casting, string formatting, slicing, string method)
● Set (TYPE Castling, Different Operations)
● MAP (USE CASE)
● LAMBDA- (LAMBDA Functions USE)
● NUMPY, PANDAS (Python LIBRARIES IN Detail’s
GUESSTIMATES
● KEY Points’ About ANSWERING Guesstimate’s Question’s
● STEPS FOR SOLVING A Guesstimate Question
● Guesstimate’s Interview Question AND ANSWER EXAMPLES
● CONCLUSION:
● WHAT IS Guesstimate?
8
PROBLEM SOLVING
WHAT ELSE?
Regards,
7 + Years Exp in
The Industry
with Top
Product
Companies
SUBHADIP DAS
1
THANK YOU