0% found this document useful (0 votes)
109 views21 pages

Bigdata Engineer Complete Syllabus: Presented by

The document provides an overview of a BigData Engineer Complete Syllabus that covers topics related to Hadoop, AWS, Azure, Databricks, and machine learning. The course includes 14 topics that will be covered over approximately 60 hours of online training on weekends. Topic areas include Python fundamentals, Hadoop, Hive, Spark, Kafka, HBase, Airflow, AWS, Azure, Databricks, statistics, and machine learning. Hands-on practice is a core part of the training with materials like practice Hadoop clusters and recorded video lessons provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views21 pages

Bigdata Engineer Complete Syllabus: Presented by

The document provides an overview of a BigData Engineer Complete Syllabus that covers topics related to Hadoop, AWS, Azure, Databricks, and machine learning. The course includes 14 topics that will be covered over approximately 60 hours of online training on weekends. Topic areas include Python fundamentals, Hadoop, Hive, Spark, Kafka, HBase, Airflow, AWS, Azure, Databricks, statistics, and machine learning. Hands-on practice is a core part of the training with materials like practice Hadoop clusters and recorded video lessons provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

BigData Engineer Complete Syllabus

HADOOP | AWS | AZURE | DATABRICKS | MACHINE LEARNING

Presented By

Contact us @ +91 9715 010 010


Our Policy

▪ No Fast Track (30% Theory + 70% Hands-on)


▪ 100% Refund if you are not satisfied
▪ Interactive and Concept basis
▪ Latest Version upgrades (Hadoop 3x & Spark 3x)
▪ Private WhatsApp Group for your queries
Session Details

▪ Course Duration : 60 Hours (~approx.)


▪ Mode of Training : Online
▪ Programming : Python 3x
▪ Session Timing : Weekend (Saturday and Sunday)
▪ Per Class : 3:00 Hours
Course Materials

▪ Practice Hadoop VM Cluster (with Latest Version)


▪ Recorded Videos
▪ Practice Materials
▪ Online eLibraries
▪ Sample Interview Questions with Answer
Course Overview
Topic 1 – Python3 Fundamentals Topic 8 – Airflow Scheduler

Topic 2 – Hadoop 3x Topic 9 – AWS (Amazon Web Services)

Topic 3 – Hive Query Language Topic 10 – Azure Services

Topic 4 – SQOOP (Recording) Topic 11 – Databricks Cluster

Topic 5 – Spark (RDD, DF, SQL & ML) Topic 12 – Statistics Fundamentals

Topic 6 – Kafka Streaming Topic 13 – Machine Learning

Topic 7 – HBase (NoSQL) Topic 14 – Certification & Interview Tips


Topic - 1
Python Fundamentals Hands-on
Data Types Collections
Python 3
Variables •Standard, Int, String, •List, Tuple,
Installation Float, Char, Boolean Dictionary, Set

Classes Functions Control


•Objects, Method, •Lambda, Built in Statements String Slicing
Inheritance Functions, UDF •if else, elif, for, while

NumPy Pandas
Topic - 2
Hadoop Introduction
Brief Hadoop VS
Why Bigdata Google History of
Introduction 4V’s of Bigdata Google
needed now.? Concepts Databases
to Big Data Architecture

History of
Hadoop 1x vs Hadoop Secondary
Hadoop Layers Hadoop & Name Node
2x vs 3x Daemons: - Name Node
Ecosystems

Resource High
Node Manager
Data Node Manager / Job Heart Beat Block Report Availability
/ Task Tracker
Tracker (HA)

Replication
Special File
versus Erasure Block size Input Split
format
Encoding
Topic - 2
BigData Hadoop & YARN (cont..)

MR1 VS MR2 Mapper Reducer Combiner

Hadoop
Application
Commands Container YARN
Master
Hands-on

Hadoop Job
Opportunities
Topic - 3
Hive Introduction
Introduction Hive Hive Meta Hive Server 1 Beeline VS
on Hive Architecture Store vs 2 Hive CLI

Create Table Create Table Create Table Managed VS Connect hive


Thrift Server
with ORC with Parquet using Avro External Table via Beeline

Dynamic
Partitioning Static Partition Bucketing SerDe Hive Joins
Partition

Sample Top 10 Hive


Complex Data
Performance
Project 1 Types (JSON)
Tuning
Topic - 5
Apache Spark Introduction
Introduction of Spark Spark RDD
RDD Actions
Spark Architecture Components Transformation

RDD to Spark DataFrame


DataFrameReader DataFrameWriter
DataFrame DataFrames Transformation

Create a Create a Create a Create a


DataFrame
DataFrame via DataFrame using DataFrame using DataFrame using
Actions
JDBC Parquet file ORC Avro

Create a Create a
Sample
DataFrame using DataFrame using
Project 2
Hive Tables JSON
Topic - 5
Apache Spark Introduction (cont..)

Storage Level Spark-SQL Spark-SQL Spark-Streaming


Spark Cache()
Persist introduction Transformations (DStreams)

Spark-Structured Stateful Stateless Spark – Kafka Spark ML


Streaming Transformation Transformation Integration Libraries

Spark Job Spark Job Spark Job Top 10 Sample


Submission via Submission via Submission via Performance
Local Client Mode Cluster Mode Tuning in Spark Project 3
Topic - 6
Kafka Introduction

What is Kafka API Kafka VS Kafka


Producer
Kafka.? Connector Flume Architecture

In Sync Kafka
Offset Consumer Broker
Replica Serialization

Kafka Topic Kafka - Spark Sample


Creation Integration Project 3
Topic - 7
HBase Introduction
Introduction
CAP Theorem HMaster HRegionServer
of HBase

Row Key Column Family WAL HQuarumpeer

Data Model Sample


Operations Project 4
Topic - 8
Airflow Introduction

What is How DAG


WebServer Scheduler Task Instances
Airflow.? working

Bit shift
upstream and Sensors Executors Data Profiling Adhoc Queries
Downstream

Sqoop Spark Submit Sample


Operators Hive Operator
Operator Operator Project 5
Topic - 9
Amazon Web Services
Simple Storage Serverless
Introduction of Data Analytics
Service (S3) computing
AWS Services
Creation (Lambda)

Elastic Elastic Cloud


AWS Redshift MapReduce AWS RDS Computing
(EMR) (EC2)

Sample
Project 6
Topic - 10
Azure Services
Introduction of Data Analytics Virtual Machine
Blob Storage
Azure services Services (VM)

Azure Data Lake Azure Data Lake SQL Databases HD Insight


Gen 2 Gen 1

Sample
Project 7
Topic - 11
Databricks
Create a
Introduction Integrate with
Automated
of Databricks Azure
Cluster

Integrate with Schedule a


DBFS Storages
AWS Spark Job

Create a
DBFS Magic Sample
Interactive
functions Project 8
Cluster
Topic - 12
Statistics Fundamentals

Why Statistics Uni-variate Bi-variate Descriptive


Types of Data
needed.? Analysis Analysis Statistics

Mean,
VAR, Std Dev Inferential
Median, Skewness Kurtosis
and IQR Statistics
Mode

Central Limit Probability Hypothesis Summarize


Correlation
Theorem Distribution Testing data
Topic - 13
Machine Learning

Introduction of Types of Machine


ML Jargons Data Preprocessing
Machine Learning Learning

Missing Values EDA Handling Uniform Scaling Overfitting

Project 9 Project 10
Underfitting Confusion Matrix Build a Model using Build a Model Using
Python Libraries Spark ML Libraries
❑ Tips and Tricks for Cloudera Certification for Spark and Hadoop
Developer (CCA 175).

❑ Bigdata Interview related tips and tricks and Bigdata Interview


question with answers.
Please feel free to reach us If you have any
queries…
+91 9715 010 010

You might also like