Edureka Training - Data Engineer Masters Program
Edureka Training - Data Engineer Masters Program
edureka!
Discover Learning
About Edureka
Edureka is one of the world’s largest and most effective online education platform for
technology professionals. In a span of 10 years, 100,000+ students from over 176 countries
have upskilled themselves with the help of our online courses. Since our inception, we have
been dedicated to helping technology professionals from all corners of the world learn
Programming, Data Science, Big Data, Cloud Computing, DevOps, Business Analytic, Java &
Mobile Technologies, Software Testing, Web Development, System Engineering, Project
Management, Digital Marketing, Business Intelligence, Cybersecurity, RPA and more.
We have an easy and affordable learning solution that is accessible to millions of learners. With
our learners spread across countries like the US, India, UK, Canada, Singapore, Australia, Middle
East, Brazil, and many others, we have built a community of over 1 million learners across the
globe.
Index
1 Linux Fundamentals
2 Apache Spark and Scala Certification Training Course
3 MongoDB Certification Training Course
4 Azure Fundamentals
5 Big Data Hadoop Certification Training Course
6 PySpark Certification Training Course
7 Microsoft SQL Server Certification Course
8 DP 203: Data Engineering on Microsoft Azure
9 Microsoft Power BI Certification Training Course
*Depending on industry requirements, Edureka may make changes to the course curriculum
edureka!
Discover Learning
LinuxIndex
Fundamentals (Self-paced)
Course Curriculum
Course Outline
Topics:
• History of Linux
• Linux vs Unix
• Features of Linux
• Components of Linux OS
• Architecture of Linux OS
• Linux Distribution
• Shell Scripting
• User Interface in Linux
• Linux Commands
Topics:
Topics:
• Process Management
• Process Synchronization
• Scripting
• BASH Scripting
• Expect Scripting
Module 4: Networking
Topics:
• OSI Layers
• Protocols
• DNS
• ICMP
• Linux Firewalls
• Iptables
• Linux Security
edureka!
Discover Learning
Course Outline
Topics:
Topics:
• What is Scala?
• Scala in other Frameworks
• Basic Scala Operations
• Control Structures in Scala
• Collections in Scala- Array
• Why Scala for Spark?
• Introduction to Scala REPL
• Variable Types in Scala
• Foreach loop, Functions, and Procedures
• ArrayBuffer, Map, Tuples, Lists, and more
Topics:
• Functional Programming
• Anonymous Functions
• Getters and Setters
• Properties with only Getters
• Singletons
• Overriding Methods
• Higher Order Functions
• Class in Scala
• Custom Getters and Setters
• Auxiliary Constructor and Primary Constructor
• Extending a Class
• Traits as Interfaces
• Layered Traits
Topics:
• Spark’s Place in Hadoop Ecosystem
• Spark Components & its Architecture
• Spark Deployment Modes
• Introduction to Spark Shell
• Writing your first Spark Job Using SBT
• Submitting Spark Job
• Spark Web UI
• Data Ingestion using Sqoop.
Topics:
• Challenges in Existing Computing Methods
• Probable Solution & How RDD Solves the Problem
• What is RDD, Its Functions, Transformations & Actions?
• Data Loading and Saving Through RDDs
• Key-Value Pair RDDs
• Other Pair RDDs o RDD Lineage
• RDD Lineage
• RDD Persistence
• WordCount Program Using RDD Concepts
• RDD Partitioning & How It Helps Achieve Parallelization
• Passing Functions to Spark
Topics:
• Need for Spark SQL
• What is Spark SQL?
• Spark SQL Architecture
• SQL Context in Spark SQL
• User Defined Functions
• Data Frames & Datasets
• Interoperating with RDDs
• JSON and Parquet File Formats
• Loading Data through Different Sources
• Spark – Hive Integration
Topics:
• Why Machine Learning?
• What is Machine Learning?
• Where Machine Learning is Used?
• Face Detection: USE CASE
• Different Types of Machine Learning Techniques
• Introduction to MLlib
• Features of MLlib and MLlib Tools
• Various ML algorithms supported by MLlib
Topics:
• Supervised Learning - Linear Regression, Logistic Regression, DecisionmTree, Random
Forest
• Unsupervised Learning - K-Means Clustering & How It Works with MLlib
• Analysis on US Election Data using MLlib (K-Means)
Topics:
• Need for Kafka
• Core Concepts of Kafka
• Where is Kafka Used?
• What is Kafka?
• Kafka Architecture
• Understanding the Components of Kafka Cluster
• Configuring Kafka Cluster
• Need of Apache Flume
• What is Apache Flume?
• Flume Sources
• Flume Channels
• Integrating Apache Flume and Apache Kafka
• Basic Flume Architecture
• Flume Sinks
• Flume Configuration
Topics:
• Drawbacks in Existing Computing Methods
• Why Streaming is Necessary?
• What is Spark Streaming?
• Spark Streaming Features
• Spark Streaming Workflow
• How Uber Uses Streaming Data
• Streaming Context & DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators
Topics:
• Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source
• Perform Twitter Sentimental Analysis Using Spark Streaming
edureka!
Discover Learning
*Depending on industry requirements, Edureka may make changes to the course curriculum
Course Outline
Topics:
Topics:
Topics:
• MongoDB Development Architecture
• MongoDB Production Architecture
• MongoDB CRUD Introduction, MongoDB CRUD Concepts
• MongoDB CRUD Concerns (Read & Write Operations)
• Concern Levels, Journaling, etc.
• Cursor Query Optimizations, Query Behavior in MongoDB
• Distributed Read & Write Queries
• MongoDB Datatypes
• MongoDB CRUD Syntax & Queries (Live Hands on)
Topics:
• Performance Tuning
Topics:
Topics:
• Security Introduction
• Security Concepts
• Integration of MongoDB with Jaspersoft
• Integration of MongoDB with Pentaho
• Integration of MongoDB with Hadoop/Hive
• Integration of MongoDB with Java
• Integration of MongoDB with GUI Tool Robomongo
• Case Study MongoDB and Java
Topics:
Topics:
• Overview of tools
• MongoDB Diagnostic Tools
• Diagnostics Commands
• MongoDB Deployment
• Setup & Configuration, Scalability, Management & Security
• Slow Queries
• Connectivity
edureka!
Discover Learning
Azure Fundamentals
• Vectors, and how to build them using Arrays and Linked Lists with Pointers
Course Curriculum
Course Outline
Topics:
• What is Cloud?
• Cloud Computing Patterns
• Service Models
• What is Azure
• Azure Features
• Azure Platform
• Azure Services
• What is Virtual Machine?
• Why Virtual Networks?
• Virtual Networks and its Components
• Azure Portal
Hands-On:
Topics:
• Why Storage?
• What is Azure Storage?
• Components of Azure Storage
• Blobs
• Queues
• File System
• Tables
Hands-On:
Hands-On:
Topics:
• What is AWS?
• What is Azure?
• AWS vs Azure
• AWS vs Azure vs GCP
• General Cloud Questions
• General Azure Questions
• Azure Interview Questions
edureka!
Discover Learning
Course
Course Curriculum
Course Outline
Topics:
Topics:
Topics:
• Traditional way vs MapReduce way
• Why MapReduce
• YARN Components
• YARN Architecture
• YARN MapReduce Application Execution Flow
• YARN Workflow
• Anatomy of MapReduce Program
• MapReduce: Combiner & Partitioner
• Input Splits, Relation between Input Splits and HDFS Blocks
• Demo of Health Care Dataset
• Demo of Weather Dataset
Topics:
• Counters
• Distributed Cache
• MRunit
• Reduce Join
Topics:
Topics:
• Hive vs Pig
• Hive Architecture and Components
• Hive Metastore
• Limitations of Hive
• Hive Partition
• Comparison with Traditional Database
• Hive Data Types and Data Models
• Hive Tables (Managed Tables & External Tables)
• Hive Bucketing
• Importing Data Hive Script & Hive UDF
• Querying Data & Managing Outputs
• Retail use case in Hive
• Hive Demo on Healthcare Dataset
Topics:
Topics:
• What is Spark
• Spark Ecosystem
• Spark Components
• What is Scala
• Why Scala
• SparkContext
• Spark RDD
Module 10: Oozie and Hadoop Project
Topics:
• Oozie
• Topics
• Oozie Components
• Oozie Workflow
• Demo of Oozie Workflow
Topics:
• Find out the frequency of books published each year. (Hint: Sample dataset provided)
• Find out in which year maximum number of books were published
• Find out how many books were published based on ranking in the year 2002
edureka !
Discover Learning
Course Outline
Topics:
Topics:
• Overview of Python
• Different Applications where Python is Used
• Values, Types, Variables
• Operands and Expressions
• Conditional Statements
• Loops
• Command Line Arguments
• Writing to the Screen
• Python files I/O Functions
• Numbers
• Strings and related operations
• Tuples and related operations
• Lists and related operations
• Dictionaries and related operations
• Sets and related operations
Topics:
• Spark Web UI
Topics:
• RDD Lineage
• RDD Persistence
Topics:
• Schema RDDs
• Spark-Hive Integration
Topics:
• Introduction to MLlib
Topics:
• Supervised Learning: Linear Regression, Logistic Regression, Decision Tree, Random
Forest
• Unsupervised Learning: K-Means Clustering & How It Works with MLlib
• Analysis of US Election Data using MLlib (K-Means)
Topics:
• Need for Kafka
• What is Kafka
• Core Concepts of Kafka
• Kafka Architecture
• Where is Kafka Used
• Understanding the Components of Kafka Cluster
• Configuring Kafka Cluster
• Kafka Producer and Consumer Java API
• Need of Apache Flume
• What is Apache Flume
• Basic Flume Architecture
• Flume Sources
• Flume Sinks
• Flume Channels
• Flume Configuration
• Integrating Apache Flume and Apache Kafka
Topics:
• Drawbacks in Existing Computing Methods
• Why Streaming is Necessary
• What is Spark Streaming
• Spark Streaming Features
• Spark Streaming Workflow
• How Uber Uses Streaming Data
• Streaming Context & DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators
Topics:
• Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source
Topics:
• Introduction to Spark GraphX
• Information about a Graph
edureka!
Discover Learning
Course Outline
Topics:
• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)
Topics:
• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)
Topics:
• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• MS SQL Server
Topics:
• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)
Topics:
• Ranking Functions
• Date and Time Functions
• UDFs (User Defined Functions)
• Backup and Restore Databases
• Triggers
• Index
Topics:
• Introduction to Optimization
• Understanding Performance
• Optimizing Queries
• Indexing for Performance
• Performance Tuning
Topics:
Topics:
Topics:
• Introduction to Azure
• Creating Azure Account
• Creating and configuring Azure VMs
• Azure SQL Database
• Accessing Azure Services
• Query SQL database in Azure
Topics:
edureka!
Discover Learning
Azure
Course Curriculum
Course Outline
Topics:
• Azure Subscriptions
• Azure Resources
• Azure Free Tier Account
• Azure Resource Manager
• Azure Resource Manager Template
• Azure Storage
• Types of Azure Storage
Topics:
Topics:
Topics:
• Create datasets
Topics:
Topics:
• Functions
• Function Parameters
• Global variables
• Variable scope and Returning Values
• Lambda Functions
• Object Oriented Concepts
• Standard Libraries
• Modules Used in Python (OS, Sys, Date and Time etc.)
• The Import statements
• Module search path
• Package installation ways
• Errors and Exception Handling
• Handling multiple exceptions
Module 8: Work with Data Warehouses using Azure Synapse Analytics - Part I
Topics:
Module 9: Work with Data Warehouses using Azure Synapse Analytics - Part II
Topics:
Topics:
Topics:
edureka!
Discover Learning
Microsoft Power
*Depending on BI Certification
industry requirements, Edureka may make changes to the course curriculum
Training Course
Course Curriculum
Course Outline
• Introduction to DAX
• Importance of DAX
• Data Types in DAX
• DAX Operators
• DAX Calculation Types
• Steps to Create Calculated Columns
• Steps to Create Calculated Tables
• Measures in DAX
• DAX Syntax
• DAX Functions
• DAX Tables and Filtering
• Use Parameters
• Create a data flow
• Introduction to Anomaly Detection
• Introduction to Smart Narrative
• Introduction to Sensitivity labels in Power BI
• Deployment Pipeline
• Hands-on:
• Connecting with Power BI service
• Creating Data flow
• Creating scorecard
Topics:
• Power BI Service
• Row Level Security
• Visuals and Charts
• Power BI Desktop
• Handling Workspaces
• Power BI Gateway
• Power BI Service
• Data Visualization
• Dashboard Management