0% found this document useful (0 votes)

109 views21 pages

Bigdata Engineer Complete Syllabus: Presented by

The document provides an overview of a BigData Engineer Complete Syllabus that covers topics related to Hadoop, AWS, Azure, Databricks, and machine learning. The course includes 14 topics that will be covered over approximately 60 hours of online training on weekends. Topic areas include Python fundamentals, Hadoop, Hive, Spark, Kafka, HBase, Airflow, AWS, Azure, Databricks, statistics, and machine learning. Hands-on practice is a core part of the training with materials like practice Hadoop clusters and recorded video lessons provided.

Uploaded by

Chepuri Sravan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views21 pages

Bigdata Engineer Complete Syllabus: Presented by

Uploaded by

Chepuri Sravan Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

BigData Engineer Complete Syllabus

HADOOP | AWS | AZURE | DATABRICKS | MACHINE LEARNING

Presented By

Contact us @ +91 9715 010 010

Our Policy

▪ No Fast Track (30% Theory + 70% Hands-on)

▪ 100% Refund if you are not satisfied
▪ Interactive and Concept basis
▪ Latest Version upgrades (Hadoop 3x & Spark 3x)
▪ Private WhatsApp Group for your queries
Session Details

▪ Course Duration : 60 Hours (~approx.)

▪ Mode of Training : Online
▪ Programming : Python 3x
▪ Session Timing : Weekend (Saturday and Sunday)
▪ Per Class : 3:00 Hours
Course Materials

▪ Practice Hadoop VM Cluster (with Latest Version)

▪ Recorded Videos
▪ Practice Materials
▪ Online eLibraries
▪ Sample Interview Questions with Answer
Course Overview
Topic 1 – Python3 Fundamentals Topic 8 – Airflow Scheduler

Topic 2 – Hadoop 3x Topic 9 – AWS (Amazon Web Services)

Topic 3 – Hive Query Language Topic 10 – Azure Services

Topic 4 – SQOOP (Recording) Topic 11 – Databricks Cluster

Topic 5 – Spark (RDD, DF, SQL & ML) Topic 12 – Statistics Fundamentals

Topic 6 – Kafka Streaming Topic 13 – Machine Learning

Topic 7 – HBase (NoSQL) Topic 14 – Certification & Interview Tips

Topic - 1
Python Fundamentals Hands-on
Data Types Collections
Python 3
Variables •Standard, Int, String, •List, Tuple,
Installation Float, Char, Boolean Dictionary, Set

Classes Functions Control

•Objects, Method, •Lambda, Built in Statements String Slicing
Inheritance Functions, UDF •if else, elif, for, while

NumPy Pandas
Topic - 2
Hadoop Introduction
Brief Hadoop VS
Why Bigdata Google History of
Introduction 4V’s of Bigdata Google
needed now.? Concepts Databases
to Big Data Architecture

History of
Hadoop 1x vs Hadoop Secondary
Hadoop Layers Hadoop & Name Node
2x vs 3x Daemons: - Name Node
Ecosystems

Resource High
Node Manager
Data Node Manager / Job Heart Beat Block Report Availability
/ Task Tracker
Tracker (HA)

Replication
Special File
versus Erasure Block size Input Split
format
Encoding
Topic - 2
BigData Hadoop & YARN (cont..)

MR1 VS MR2 Mapper Reducer Combiner

Hadoop
Application
Commands Container YARN
Master
Hands-on

Hadoop Job
Opportunities
Topic - 3
Hive Introduction
Introduction Hive Hive Meta Hive Server 1 Beeline VS
on Hive Architecture Store vs 2 Hive CLI

Create Table Create Table Create Table Managed VS Connect hive

Thrift Server
with ORC with Parquet using Avro External Table via Beeline

Dynamic
Partitioning Static Partition Bucketing SerDe Hive Joins
Partition

Sample Top 10 Hive

Complex Data
Performance
Project 1 Types (JSON)
Tuning
Topic - 5
Apache Spark Introduction
Introduction of Spark Spark RDD
RDD Actions
Spark Architecture Components Transformation

RDD to Spark DataFrame

DataFrameReader DataFrameWriter
DataFrame DataFrames Transformation

Create a Create a Create a Create a

DataFrame
DataFrame via DataFrame using DataFrame using DataFrame using
Actions
JDBC Parquet file ORC Avro

Create a Create a
Sample
DataFrame using DataFrame using
Project 2
Hive Tables JSON
Topic - 5
Apache Spark Introduction (cont..)

Storage Level Spark-SQL Spark-SQL Spark-Streaming

Spark Cache()
Persist introduction Transformations (DStreams)

Spark-Structured Stateful Stateless Spark – Kafka Spark ML

Streaming Transformation Transformation Integration Libraries

Spark Job Spark Job Spark Job Top 10 Sample

Submission via Submission via Submission via Performance
Local Client Mode Cluster Mode Tuning in Spark Project 3
Topic - 6
Kafka Introduction

What is Kafka API Kafka VS Kafka

Producer
Kafka.? Connector Flume Architecture

In Sync Kafka
Offset Consumer Broker
Replica Serialization

Kafka Topic Kafka - Spark Sample

Creation Integration Project 3
Topic - 7
HBase Introduction
Introduction
CAP Theorem HMaster HRegionServer
of HBase

Row Key Column Family WAL HQuarumpeer

Data Model Sample

Operations Project 4
Topic - 8
Airflow Introduction

What is How DAG

WebServer Scheduler Task Instances
Airflow.? working

Bit shift
upstream and Sensors Executors Data Profiling Adhoc Queries
Downstream

Sqoop Spark Submit Sample

Operators Hive Operator
Operator Operator Project 5
Topic - 9
Amazon Web Services
Simple Storage Serverless
Introduction of Data Analytics
Service (S3) computing
AWS Services
Creation (Lambda)

Elastic Elastic Cloud

AWS Redshift MapReduce AWS RDS Computing
(EMR) (EC2)

Sample
Project 6
Topic - 10
Azure Services
Introduction of Data Analytics Virtual Machine
Blob Storage
Azure services Services (VM)

Azure Data Lake Azure Data Lake SQL Databases HD Insight

Gen 2 Gen 1

Sample
Project 7
Topic - 11
Databricks
Create a
Introduction Integrate with
Automated
of Databricks Azure
Cluster

Integrate with Schedule a

DBFS Storages
AWS Spark Job

Create a
DBFS Magic Sample
Interactive
functions Project 8
Cluster
Topic - 12
Statistics Fundamentals

Why Statistics Uni-variate Bi-variate Descriptive

Types of Data
needed.? Analysis Analysis Statistics

Mean,
VAR, Std Dev Inferential
Median, Skewness Kurtosis
and IQR Statistics
Mode

Central Limit Probability Hypothesis Summarize

Correlation
Theorem Distribution Testing data
Topic - 13
Machine Learning

Introduction of Types of Machine

ML Jargons Data Preprocessing
Machine Learning Learning

Missing Values EDA Handling Uniform Scaling Overfitting

Project 9 Project 10
Underfitting Confusion Matrix Build a Model using Build a Model Using
Python Libraries Spark ML Libraries
❑ Tips and Tricks for Cloudera Certification for Spark and Hadoop
Developer (CCA 175).

❑ Bigdata Interview related tips and tricks and Bigdata Interview

question with answers.
Please feel free to reach us If you have any
queries…
+91 9715 010 010

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Sreedevi Hand Book (Unix Linux)
80% (5)
Sreedevi Hand Book (Unix Linux)
74 pages
Floor Truss Span Tables
No ratings yet
Floor Truss Span Tables
2 pages
React - Js Cheat Sheet: Quick Learning
No ratings yet
React - Js Cheat Sheet: Quick Learning
16 pages
OH-SFF Naval Manual
No ratings yet
OH-SFF Naval Manual
180 pages
Azure SQL Trainings: Contact: +91 90 32 82 44 67
No ratings yet
Azure SQL Trainings: Contact: +91 90 32 82 44 67
6 pages
Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
HDPDeveloper EnterpriseSpark1 StudentGuide
100% (1)
HDPDeveloper EnterpriseSpark1 StudentGuide
244 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
Hive Tutorial For Beginners: Learn With Examples in 3 Days
No ratings yet
Hive Tutorial For Beginners: Learn With Examples in 3 Days
3 pages
Parallel Programming With Spark: Matei Zaharia
No ratings yet
Parallel Programming With Spark: Matei Zaharia
40 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
15 pages
05.azure Data Lake Authentication
No ratings yet
05.azure Data Lake Authentication
16 pages
Day 1 - Boot Camp Intro To Graph
No ratings yet
Day 1 - Boot Camp Intro To Graph
138 pages
Hive in Class Assignment Winter 2021
No ratings yet
Hive in Class Assignment Winter 2021
2 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Data-Engineering Course Structure
No ratings yet
Data-Engineering Course Structure
9 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
1 AWS Analytics and Data Lakes
No ratings yet
1 AWS Analytics and Data Lakes
15 pages
Azure Devops Pipelines Azure Devops
No ratings yet
Azure Devops Pipelines Azure Devops
2,075 pages
Company Interview Question Bank
No ratings yet
Company Interview Question Bank
16 pages
Akka PDF
No ratings yet
Akka PDF
454 pages
Database
No ratings yet
Database
145 pages
What Is Bigquery: Enterprise Data Warehouse
No ratings yet
What Is Bigquery: Enterprise Data Warehouse
2 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
DevOps Engineer Canada
No ratings yet
DevOps Engineer Canada
3 pages
Break Down Data Silos With ETL and Unlock Trapped Data With ETL
No ratings yet
Break Down Data Silos With ETL and Unlock Trapped Data With ETL
25 pages
DevOps CI and Data Warehouse
No ratings yet
DevOps CI and Data Warehouse
30 pages
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
No ratings yet
Azure Devops: Sato Naoki (Neo) - @satonaoki Jazug Tohoku Azure Devops #Jazug #Azuredevops
34 pages
Apache Hadoop Developer Training PDF
100% (1)
Apache Hadoop Developer Training PDF
397 pages
TCS Big Data Lake Presentation - VIL - 17apr2019
100% (1)
TCS Big Data Lake Presentation - VIL - 17apr2019
21 pages
Apache Airflow TRAINING12532
No ratings yet
Apache Airflow TRAINING12532
3 pages
Apache Kafka Course Curriculum
No ratings yet
Apache Kafka Course Curriculum
5 pages
Devops An Intro
No ratings yet
Devops An Intro
10 pages
What Are DBT Sources
No ratings yet
What Are DBT Sources
109 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
8 pages
AWS 05 DataLake
No ratings yet
AWS 05 DataLake
78 pages
Dice Resume CV SAI KARTHIK
No ratings yet
Dice Resume CV SAI KARTHIK
4 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Azure DevOps Build and Release Pipelines 1
100% (1)
Azure DevOps Build and Release Pipelines 1
13 pages
100 Days of Data Engineering - Make A Copy and Use As You Need - Sheet1
No ratings yet
100 Days of Data Engineering - Make A Copy and Use As You Need - Sheet1
4 pages
Amazon Web Services Training
No ratings yet
Amazon Web Services Training
5 pages
Data Contracts Early Release 042024
No ratings yet
Data Contracts Early Release 042024
52 pages
Big Data Hadoop Architect
No ratings yet
Big Data Hadoop Architect
19 pages
Fundamentals of Big Data Engineering: A Guide To The
No ratings yet
Fundamentals of Big Data Engineering: A Guide To The
14 pages
Spark
No ratings yet
Spark
160 pages
Complex Event Processing With Apache Flink Presentation
No ratings yet
Complex Event Processing With Apache Flink Presentation
49 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
DevOps Resume New
No ratings yet
DevOps Resume New
8 pages
Best Practices of Apache Airflow
No ratings yet
Best Practices of Apache Airflow
3 pages
The Changing Role of The DBA in The Expanding Cloud World - Database Trends and Applications
No ratings yet
The Changing Role of The DBA in The Expanding Cloud World - Database Trends and Applications
5 pages
Talend 082022
100% (1)
Talend 082022
115 pages
Spark in Production
No ratings yet
Spark in Production
34 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
De Mod 2 Transform Data With Spark
No ratings yet
De Mod 2 Transform Data With Spark
32 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Shelly Bansal - SR Data Engineer
No ratings yet
Shelly Bansal - SR Data Engineer
6 pages
Popegm
No ratings yet
Popegm
246 pages
New Sateesh Yellanki (Oracle 10g)
No ratings yet
New Sateesh Yellanki (Oracle 10g)
398 pages
Decoding Devops: Infrastructure As A Code
No ratings yet
Decoding Devops: Infrastructure As A Code
3 pages
Chaduvu 17 03 2021
No ratings yet
Chaduvu 17 03 2021
1 page
Credit Awareness
100% (2)
Credit Awareness
62 pages
CSIR CLRI Junior Secretariat Assistant Paper II 2018 English
No ratings yet
CSIR CLRI Junior Secretariat Assistant Paper II 2018 English
24 pages
Geronimo Creer, Jr. For Plaintiffs-Appellees. Benedicto G. Cobarde For Defendant, Defendant-Appellant
No ratings yet
Geronimo Creer, Jr. For Plaintiffs-Appellees. Benedicto G. Cobarde For Defendant, Defendant-Appellant
2 pages
UP vs. Dizon
No ratings yet
UP vs. Dizon
14 pages
SO12913 ORBITech PDF
No ratings yet
SO12913 ORBITech PDF
1 page
Gucci Strategic MGT
0% (1)
Gucci Strategic MGT
18 pages
Negotiation Roleplays Esl
No ratings yet
Negotiation Roleplays Esl
2 pages
Resume Sonali Sahu Tenth Revolution Group
No ratings yet
Resume Sonali Sahu Tenth Revolution Group
2 pages
Mid Semester Theory Exam17079936871961
No ratings yet
Mid Semester Theory Exam17079936871961
17 pages
FMX / Cruiso / BW 8-12: Ganzeboom Transmission Parts & Torque Converters
No ratings yet
FMX / Cruiso / BW 8-12: Ganzeboom Transmission Parts & Torque Converters
2 pages
1,6 Hexanediamine
No ratings yet
1,6 Hexanediamine
7 pages
Brand Audit of Hyundai
No ratings yet
Brand Audit of Hyundai
3 pages
(Handwritten Solutions) JEE ADVANCED PYQs - Straight Lines and Circles
No ratings yet
(Handwritten Solutions) JEE ADVANCED PYQs - Straight Lines and Circles
35 pages
Home Stay Registration Way of Sri Lanka Tourism
No ratings yet
Home Stay Registration Way of Sri Lanka Tourism
12 pages
MTP3 & M3ua
No ratings yet
MTP3 & M3ua
40 pages
Applied Modelling and Visualisation
No ratings yet
Applied Modelling and Visualisation
12 pages
Total
No ratings yet
Total
19 pages
Quiz ôn tập thi cuối kỳ Attempt review
No ratings yet
Quiz ôn tập thi cuối kỳ Attempt review
9 pages
Q3 Brochure
No ratings yet
Q3 Brochure
24 pages
Compensation Management Systems - Paper B - 4
No ratings yet
Compensation Management Systems - Paper B - 4
9 pages
24 Coercion Exercise
No ratings yet
24 Coercion Exercise
1 page
Notes-Exc - 1
No ratings yet
Notes-Exc - 1
2 pages
IFRS 15 Summary PDF
No ratings yet
IFRS 15 Summary PDF
8 pages
Product HRBX01K02
No ratings yet
Product HRBX01K02
3 pages
Notes On HRDSM - Merged
No ratings yet
Notes On HRDSM - Merged
161 pages
Southpoint School & College: Time: 30 Mins Subject: Computer Studies (Objectives) Full Marks: 30
No ratings yet
Southpoint School & College: Time: 30 Mins Subject: Computer Studies (Objectives) Full Marks: 30
2 pages
AI Project Cycle Question Bank
No ratings yet
AI Project Cycle Question Bank
14 pages
Mark Meadows Motion To Dismiss
No ratings yet
Mark Meadows Motion To Dismiss
34 pages

Bigdata Engineer Complete Syllabus: Presented by

Uploaded by

Bigdata Engineer Complete Syllabus: Presented by

Uploaded by

BigData Engineer Complete Syllabus

HADOOP | AWS | AZURE | DATABRICKS | MACHINE LEARNING

Contact us @ +91 9715 010 010

▪ No Fast Track (30% Theory + 70% Hands-on)

▪ Course Duration : 60 Hours (~approx.)

▪ Practice Hadoop VM Cluster (with Latest Version)

Topic 2 – Hadoop 3x Topic 9 – AWS (Amazon Web Services)

Topic 3 – Hive Query Language Topic 10 – Azure Services

Topic 4 – SQOOP (Recording) Topic 11 – Databricks Cluster

Topic 6 – Kafka Streaming Topic 13 – Machine Learning

Topic 7 – HBase (NoSQL) Topic 14 – Certification & Interview Tips

Classes Functions Control

MR1 VS MR2 Mapper Reducer Combiner

Create Table Create Table Create Table Managed VS Connect hive

Sample Top 10 Hive

RDD to Spark DataFrame

Create a Create a Create a Create a

Storage Level Spark-SQL Spark-SQL Spark-Streaming

Spark-Structured Stateful Stateless Spark – Kafka Spark ML

Spark Job Spark Job Spark Job Top 10 Sample

What is Kafka API Kafka VS Kafka

Kafka Topic Kafka - Spark Sample

Row Key Column Family WAL HQuarumpeer

Data Model Sample

What is How DAG

Sqoop Spark Submit Sample

Elastic Elastic Cloud

Azure Data Lake Azure Data Lake SQL Databases HD Insight

Integrate with Schedule a

Why Statistics Uni-variate Bi-variate Descriptive

Central Limit Probability Hypothesis Summarize

Introduction of Types of Machine

Missing Values EDA Handling Uniform Scaling Overfitting

❑ Bigdata Interview related tips and tricks and Bigdata Interview

You might also like