0% found this document useful (0 votes)
224 views5 pages

Data Engineering Bootcamp

Uff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
224 views5 pages

Data Engineering Bootcamp

Uff
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DATA ENGINEERING COURSE

MODULE - 1 PYTHON PROGRAMMING

● Python Fundamentals
● Why Python and how it is different from the R programming language
● Variables, Identifiers, and Keywords in Python
● Data Structures in Python
● Strings, Array, Lists, Tuples, Set and Dictionaries
● Python Conditionals and Loops
● If, Nested If, Indentations
● Loops in Python
● Basic Operation and Operator in Python
● Operators in Python
● OOPs Concept
● Python Functions and Classes
● Functions and their types
● Classes in Python
● Type Conversion
● Lambda Functions
● Data Wrangling using Numpy
● Numpy for Data Engineers
● Data Wrangling using Pandas
● Pandas for Data Engineers
● Python for Visualisation
● Matplotlib
● Seaborne

MODULE - 2 INTRODUCTION TO SQL

● Introduction to SQL
● Introduction to Databases
● What is Database
● Introduction to MySQL and NoSQL
● DDL v/s DML v/s DCL v/s TCL
● Datatypes in SQL
● Basics of SQL
● Basic SQL statements (SELECT, DELETE and UPDATE)
● How to convert data into tables
● COMMIT and ROLLBACK statements
● Filtering Data using SQL
● Filter Data using the WHERE and ORDER BY Clause
● Usage of Filtering Operators – IN, NOT IN, IS NULL, BETWEEN
● Regular Expression for Filtering
● Functions in Database
● Basics of Function
● Boolean Expressions and Concatenation
● String Function
● Grouping Function
● Introduction to SQL
● Grouping Data and Computing Aggregates
● Introduction to Grouping
● Using GROUP BY & HAVING
● Subqueries and Nested queries in SQL
● Single-Row, Multiple-Row Subqueries
● Subqueries with ANY and ALL Operators
● Conditional Expressions using CASE Clause
● Correlated Subqueries
● Windows Functions in SQL
● Intro to window functions
● Basic windowing syntax
● The usual suspects: SUM, COUNT, and AVG ROW_NUMBER() RANK() and
DENSE_RANK() NTILE LAG and LEAD
● Defining a window alias
● Advanced windowing techniques
● Displaying Data from Multiple tables
● Introduction to Joins and its types
● Using UNION, UNION ALL, and EXPERT Clause Views, Sequences, and Indexes in SQL
MODULE - 3 BIG DATA WITH HADOOP AND SPARK

● Big Data with Hadoop


● Introduction to Big Data and Hadoop
● Hadoop Architecture, Distributed Storage (HDFS)
● Data Ingestion into Big Data Systems and ETL
● Distributed Processing Map Reduce Framework and Pig
● Apache Hive
● NoSQL Database
● Big Data with PySpark
● Introduction to PySpark
● Resilient Distributed Datasets
● PySpark UDF
● Broadcast and Accumulator
● PySpark Storage level
● Data frames and Transformations
● Data Processing with Spark Data Frames
● Sorting Technique
● PySpark RDD
● Broadcast & Accumulator
● PySpark SparkFiles
● PySpark StorageLevel
● PySpark Profiler
● PySpark StatusTracker
● PySpark Serializer
● Dataframes and Spark SQL
● Spark SQL
● PySpark SQL(Running SQL query on Spark)
● Creating DataFrames
● Transforming and Querying DataFrames
● Saving DataFrame
● DataFrames and RDDs
● Comparing Spark SQL, Impala, and Hive-on-Spark
● Big Data with Spark
● Apache Spark Next Generation
● Spark Core Processing
● Spark SQL – Processing Data Frames
● Stream Processing Frameworks and Spark Streaming
● RDD Operation
● RDD Persistence Overview
● Key value based RDD
● Comparing Spark SQL, Impala, Hive on Spark
● Deploying Spark SQL
● Creating the SparkContext
● Building a Spark Application using PySpark
● The Spark Application Web UI
● Configuring Spark Properties
● Running Spark on Cluster
● RDD Partitions
● Executing Parallel Operations
● Stages and Tasks

MODULE - 4 CLOUD ENGINEERING - AMAZON WEB SERVICES (AWS)

● How to build data pipelines in cloud


● Glue/Step – AWS etc.
● Various services hands on
● AWS EC2, AWS S3 etc
● Introduction to AWS
● AWS Overview History and Evolution of AWS
● Knowledge Check Overview of AWS Products and Services
● How to configure / decide service
● Ex: App service configuration within AWS
● AWS EC2 – how to choose the machine based on the requirement?
● How to build a data lake using AWS S3 / Blob Storage?
● Best Practices
● Folder naming conventions etc.
● Cloud Data Warehouse: AWS RedShift
● Back-end architecture
● How to create & work objects
● Databases
● Clone
● Time travel
● Un-drop
● Keys
● Schema
● Access Control - Important
● Cloud Engineering - Amazon Web Services (AWS)
● Best Practices of Cloud Platform
● What are the best practices?
● Integration using Cloud engineering services
● How to establish API to the DW
● Performance considerations
● Work with AWS Redshift, Kinesis as Database

MODULE - 5 ORCHESTRATION TOOL - AIRFLOW

● Orchestration Tool - Airflow


● Introduction to Airflow
● Apache Airflow
● Installation on Cloud
● Usage
● Connectivity
● Troubleshooting
● Upgrade
● End-to-End data pipeline using Airflow

You might also like