0% found this document useful (0 votes)
33 views49 pages

Edureka Training - Data Engineer Masters Program

Uploaded by

Chaudry Umer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views49 pages

Edureka Training - Data Engineer Masters Program

Uploaded by

Chaudry Umer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

edureka!

edureka!
Discover Learning

Data Engineer Masters Program

About Edureka
Edureka is one of the world’s largest and most effective online education platform for
technology professionals. In a span of 10 years, 100,000+ students from over 176 countries
have upskilled themselves with the help of our online courses. Since our inception, we have
been dedicated to helping technology professionals from all corners of the world learn
Programming, Data Science, Big Data, Cloud Computing, DevOps, Business Analytic, Java &
Mobile Technologies, Software Testing, Web Development, System Engineering, Project
Management, Digital Marketing, Business Intelligence, Cybersecurity, RPA and more.
We have an easy and affordable learning solution that is accessible to millions of learners. With
our learners spread across countries like the US, India, UK, Canada, Singapore, Australia, Middle
East, Brazil, and many others, we have built a community of over 1 million learners across the
globe.

About the Program


Edureka’s Data Engineer Masters program is curated by industry experts to provide learners
with a deep understanding of the principles and practices of data engineering through its
extensive course work and hands-on projects. The well researched curriculum enables learners
to design and build data pipelines, manage databases, and develop data infrastructure to meet
the requirements of any organization. Unleash the power of data and accelerate your career—
join the global revolution now!

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Index
1 Linux Fundamentals
2 Apache Spark and Scala Certification Training Course
3 MongoDB Certification Training Course
4 Azure Fundamentals
5 Big Data Hadoop Certification Training Course
6 PySpark Certification Training Course
7 Microsoft SQL Server Certification Course
8 DP 203: Data Engineering on Microsoft Azure
9 Microsoft Power BI Certification Training Course

*Depending on industry requirements, Edureka may make changes to the course curriculum

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning

LinuxIndex
Fundamentals (Self-paced)
Course Curriculum

Course Outline

Module 1: Linux Fundamentals

Topics:

• History of Linux
• Linux vs Unix
• Features of Linux
• Components of Linux OS
• Architecture of Linux OS
• Linux Distribution
• Shell Scripting
• User Interface in Linux
• Linux Commands

Module 2: User Administration

Topics:

• File Systems and its Types


• Software Package Management
• Users in Linux

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• User Groups in Linux


• File/Folder Permissions
• Special Permissions

Module 3: Shell Scripting

Topics:

• Process Management

• Process Synchronization

• Some Basic Linux Commands

• Scripting

• BASH Scripting

• Expect Scripting

Module 4: Networking

Topics:

• OSI Layers

• Protocols

• DNS

• ICMP

• Packet Capturing Tools

• Linux Firewalls

• Iptables

• Linux Security

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning

Apache Spark and Scala Certification


*Depending on industry requirements, Edureka may make changes to the course curriculum

Training (Self -Paced) Course Curriculum

Course Outline

Module 1: Introduction to Big Data Hadoop and Spark

Topics:

• What is Big Data?


• Big Data Customer Scenarios
• Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
• How Hadoop Solves the Big Data Problem?
• What is Hadoop?
• Hadoop’s Key Characteristics
• Hadoop Ecosystem and HDFS
• Hadoop Core Components
• Rack Awareness and Block Replication YARN and its Advantage
• Hadoop Cluster and its Architecture
• Hadoop: Different Cluster Modes
• Big Data Analytics with Batch & Real-time Processing
• Why Spark is needed?
• What is Spark?
• How Spark differs from other frameworks?
• Spark at Yahoo!

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 2: Introduction to Scala and Apache Spark

Topics:

• What is Scala?
• Scala in other Frameworks
• Basic Scala Operations
• Control Structures in Scala
• Collections in Scala- Array
• Why Scala for Spark?
• Introduction to Scala REPL
• Variable Types in Scala
• Foreach loop, Functions, and Procedures
• ArrayBuffer, Map, Tuples, Lists, and more

Module 3: Functional Programming and OOPs Concepts in Scala

Topics:
• Functional Programming
• Anonymous Functions
• Getters and Setters
• Properties with only Getters
• Singletons
• Overriding Methods
• Higher Order Functions
• Class in Scala
• Custom Getters and Setters
• Auxiliary Constructor and Primary Constructor
• Extending a Class
• Traits as Interfaces
• Layered Traits

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 4: Deep Dive into Apache Spark Framework

Topics:
• Spark’s Place in Hadoop Ecosystem
• Spark Components & its Architecture
• Spark Deployment Modes
• Introduction to Spark Shell
• Writing your first Spark Job Using SBT
• Submitting Spark Job
• Spark Web UI
• Data Ingestion using Sqoop.

Module 5: Playing with Spark RDDs

Topics:
• Challenges in Existing Computing Methods
• Probable Solution & How RDD Solves the Problem
• What is RDD, Its Functions, Transformations & Actions?
• Data Loading and Saving Through RDDs
• Key-Value Pair RDDs
• Other Pair RDDs o RDD Lineage
• RDD Lineage
• RDD Persistence
• WordCount Program Using RDD Concepts
• RDD Partitioning & How It Helps Achieve Parallelization
• Passing Functions to Spark

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 6: DataFrames and Spark SQL

Topics:
• Need for Spark SQL
• What is Spark SQL?
• Spark SQL Architecture
• SQL Context in Spark SQL
• User Defined Functions
• Data Frames & Datasets
• Interoperating with RDDs
• JSON and Parquet File Formats
• Loading Data through Different Sources
• Spark – Hive Integration

Module 7: Machine Learning using Spark MLlib

Topics:
• Why Machine Learning?
• What is Machine Learning?
• Where Machine Learning is Used?
• Face Detection: USE CASE
• Different Types of Machine Learning Techniques
• Introduction to MLlib
• Features of MLlib and MLlib Tools
• Various ML algorithms supported by MLlib

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 8: Deep Dive into Spark MLlib

Topics:
• Supervised Learning - Linear Regression, Logistic Regression, DecisionmTree, Random
Forest
• Unsupervised Learning - K-Means Clustering & How It Works with MLlib
• Analysis on US Election Data using MLlib (K-Means)

Module 9: Understanding Apache Kafka & Apache Flume

Topics:
• Need for Kafka
• Core Concepts of Kafka
• Where is Kafka Used?
• What is Kafka?
• Kafka Architecture
• Understanding the Components of Kafka Cluster
• Configuring Kafka Cluster
• Need of Apache Flume
• What is Apache Flume?
• Flume Sources
• Flume Channels
• Integrating Apache Flume and Apache Kafka
• Basic Flume Architecture
• Flume Sinks
• Flume Configuration

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 10: Apache Spark Streaming- Processing Multiple Batches

Topics:
• Drawbacks in Existing Computing Methods
• Why Streaming is Necessary?
• What is Spark Streaming?
• Spark Streaming Features
• Spark Streaming Workflow
• How Uber Uses Streaming Data
• Streaming Context & DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators

Module 11: Apache Spark Streaming- Data Sources

Topics:
• Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source
• Perform Twitter Sentimental Analysis Using Spark Streaming

Module 12: In Class Project


Learning Objectives
Work on an end-to-end Financial domain project covering all the major concepts of Spark
taught during the course.

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 13: Spark GraphX (Self-Paced)


Learning Objectives
In this module, you will be learning the key concepts of Spark GraphX programming and
operations along with different GraphX algorithms and their implementations.

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning
*Depending on industry requirements, Edureka may make changes to the course curriculum

MongoDB Certification Training Course


(Self-paced)
Course Curriculum

Course Outline

Module 1: Introduction to MongoDB - Architecture and Installation

Topics:

• Understanding the basic concepts of a Database


• Database categories: What is NoSQL? Why NoSQL? Benefit over RDBMS
• Types of NoSQL Database, and NoSQL vs. SQL Comparison, ACID & Base Property
• CAP Theorem, implementing NoSQL and what is MongoDB?
• Overview of MongoDB, Design Goals for MongoDB Server and Database, MongoDB tools
• Understanding the following: Collection, Documents and Key/ Values, etc.
• Introduction to JSON and BSON documents
• Environment setup (live Hands-on) and using various MongoDB tools available in the
MongoDB Package
• Case study discussion

Module 2: Schema Design and Data Modelling

Topics:

• Data Modelling Concepts


• Why Data Modelling? Data Modelling Approach
• Analogy between RDBMS & MongoDB Data Model, MongoDB Data Model (Embedding
& Linking)

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Challenges for Data Modelling in MongoDB


• Data Model Examples and Patterns
• Model Relationships between Documents
• Model Tree Structures
• Model Specific Application Contexts
• Use Case discussion of Data modeling

Module 3: CRUD Operations

Topics:
• MongoDB Development Architecture
• MongoDB Production Architecture
• MongoDB CRUD Introduction, MongoDB CRUD Concepts
• MongoDB CRUD Concerns (Read & Write Operations)
• Concern Levels, Journaling, etc.
• Cursor Query Optimizations, Query Behavior in MongoDB
• Distributed Read & Write Queries
• MongoDB Datatypes
• MongoDB CRUD Syntax & Queries (Live Hands on)

Module 4: Indexing and Aggregation Framework

Topics:

• Index Introduction, Index Concepts, Index Types, Index Properties


• Index Creation and Indexing Reference
• Introduction to Aggregation
• Approach to Aggregation
• Types of Aggregation (Pipeline, MapReduce & Single Purpose)

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Performance Tuning

Module 5: MongoDB Administration

Topics:

• Administration concepts in MongoDB


• Monitoring issues related to Database
• Monitoring at Server, Database, Collection level, and various Monitoring tools related to
MongoDB
• Database Profiling, Locks, Memory Usage, No of connections, page fault etc.
• Backup and Recovery Methods for MongoDB
• Export and Import of Data to and from MongoDB
• Run time configuration of MongoDB
• Production notes/ best practices
• Data Managements in MongoDB (Capped Collections/ Expired data from TTL), Hands on
Administrative Tasks

Module 6: Scalability and Availability

Topics:

• Introduction to Replication (High Availability)


• Concepts around Replication
• What is Replica Set and Master Slave Replication?
• Type of Replication in MongoDB
• How to setup a replicated cluster & managing replica sets etc.
• Introduction to Sharding (Horizontal Scaling)
• Concepts around Sharding, what is shards, Key
• Config Server, Query Router etc.
• How to setup a Sharding
• Type of Sharding (Hash Based, Range Based etc.), and Managing Shards

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 7: MongoDB Security


Topics:

• Security Introduction
• Security Concepts
• Integration of MongoDB with Jaspersoft
• Integration of MongoDB with Pentaho
• Integration of MongoDB with Hadoop/Hive
• Integration of MongoDB with Java
• Integration of MongoDB with GUI Tool Robomongo
• Case Study MongoDB and Java

Module 8: Application Engineering and MongoDB Tools

Topics:

• MongoDB Package Components


• Configuration File Options
• MongoDB Limits and Thresholds
• Connection String URI Format/ Integration of any compatible tool with MongoDB API
and Drivers for MongoDB
• MMS (MongoDB Monitoring Service)
• HTTP and Rest Interface
• Integration of MongoDB with Hadoop and Data Migration MongoDB with Hadoop
(MongoDB to Hive)
• Integration with R

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 9: MongoDB on the Cloud

Topics:

• Overview of MongoDB Cloud products


• Using Cloud Manager to monitor MongoDB deployments
• Introduction to MongoDB Stitch
• MongoDB Cloud Atlas
• MongoDB Cloud Manager
• Working with MongoDB Ops Manager

Module 10: Diagnostics and Fixes


Topics:

• Overview of tools
• MongoDB Diagnostic Tools
• Diagnostics Commands
• MongoDB Deployment
• Setup & Configuration, Scalability, Management & Security
• Slow Queries
• Connectivity

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning

Azure Fundamentals
• Vectors, and how to build them using Arrays and Linked Lists with Pointers

Course Curriculum

Course Outline

Module 1: Introduction to Azure and Azure VM

Topics:

• What is Cloud?
• Cloud Computing Patterns
• Service Models
• What is Azure
• Azure Features
• Azure Platform
• Azure Services
• What is Virtual Machine?
• Why Virtual Networks?
• Virtual Networks and its Components
• Azure Portal

Hands-On:

• Exploring Microsoft Azure Portal

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 2: Azure Storage

Topics:

• Why Storage?
• What is Azure Storage?
• Components of Azure Storage
• Blobs
• Queues
• File System
• Tables

Hands-On:

• Creating a Storage Account


• Creating Blobs
• Creating Queues

Module 3: Azure Virtual Network


Topics:

• Why Virtual Networks?


• What is a Virtual Network?
• Azure Subnets
• Network Security Groups
• Virtual Network Architecture

Hands-On:

• Creating Network Security Groups


• Create a Virtual Network
• Create a Webserver VM and Database VM
• Configure the Network Security Groups for respective VMs

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 4: Azure Comparison

Topics:

• What is AWS?
• What is Azure?
• AWS vs Azure
• AWS vs Azure vs GCP
• General Cloud Questions
• General Azure Questions
• Azure Interview Questions

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning

Big Data Hadoop Certification Training


*Depending on industry requirements, Edureka may make changes to the course curriculum

Course
Course Curriculum

Course Outline

Module 1: Understanding Big Data and Hadoop

Topics:

• Introduction to Big Data & Big Data Challenges


• Limitations & Solutions of Big Data Architecture
• Data types and Operations
• Hadoop Storage: HDFS (Hadoop Distributed File System)
• Hadoop & its Features
• Hadoop Processing: MapReduce Framework
• Different Hadoop Distributions
• Hadoop Ecosystem
• Hadoop 2.x Core Components

Module 2: Hadoop Architecture and HDFS

Topics:

• Typical Production Hadoop Cluster


• Common Hadoop Shell Commands
• Hadoop 2.x Cluster Architecture
• Hadoop Cluster Modes

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Federation and High Availability Architecture


• Hadoop 2.x Configuration Files
• Single Node Cluster & Multi-Node Cluster set up
• Basic Hadoop Administration

Module 3: Hadoop MapReduce Framework

Topics:
• Traditional way vs MapReduce way
• Why MapReduce
• YARN Components
• YARN Architecture
• YARN MapReduce Application Execution Flow
• YARN Workflow
• Anatomy of MapReduce Program
• MapReduce: Combiner & Partitioner
• Input Splits, Relation between Input Splits and HDFS Blocks
• Demo of Health Care Dataset
• Demo of Weather Dataset

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 4: Advanced Hadoop MapReduce

Topics:

• Counters

• Distributed Cache

• MRunit

• Reduce Join

• Custom Input Format

• Sequence Input Format

• XML file Parsing using MapReduce

Module 5: Apache Pig

Topics:

• Introduction to Apache Pig


• MapReduce vs Pig
• Pig Components & Pig Execution
• Pig Latin Programs
• Pig Data Types & Data Models in Pig
• Shell and Utility Commands
• Pig UDF & Pig Streaming
• Testing Pig scripts with Punit
• Aviation use-case in PIG
• Pig Demo of Healthcare Dataset

Module 6: Apache Hive

Topics:

• Introduction to Apache Hive

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Hive vs Pig
• Hive Architecture and Components
• Hive Metastore
• Limitations of Hive
• Hive Partition
• Comparison with Traditional Database
• Hive Data Types and Data Models
• Hive Tables (Managed Tables & External Tables)
• Hive Bucketing
• Importing Data Hive Script & Hive UDF
• Querying Data & Managing Outputs
• Retail use case in Hive
• Hive Demo on Healthcare Dataset

Module 7: Advanced Apache Hive and HBase


Topics:

• Hive QL: Joining Tables, Dynamic Partitioning


• Custom MapReduce Scripts
• Hive Indexes and views
• Hive Query Optimizers
• Hive Thrift Server
• Hive UDF
• HBase v/s RDBMS
• HBase Components
• HBase Architecture
• HBase Run Modes
• HBase Configuration
• HBase Cluster Deployment
• Apache HBase: Introduction to NoSQL Databases and Hbase

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 8: Advanced Apache HBase

Topics:

• HBase Data Model HBase Shell


• HBase Client API
• Hive Data Loading Techniques
• Apache Zookeeper
• Introduction ZooKeeper
• Data Model
• Zookeeper Service
• HBase Bulk Loading
• Getting and Inserting Data
• HBase Filters

Module 9: Processing Distributed Data with Apache Spark

Topics:

• What is Spark
• Spark Ecosystem
• Spark Components
• What is Scala
• Why Scala
• SparkContext
• Spark RDD
Module 10: Oozie and Hadoop Project

Topics:

• Oozie
• Topics
• Oozie Components
• Oozie Workflow
• Demo of Oozie Workflow

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Scheduling Jobs with Oozie Scheduler


• Oozie Coordinator
• Oozie Commands
• Oozie Web Console
• Oozie for MapReduce Hive in Oozie
• Combining flow of MapReduce Jobs
• Hadoop Project Demo
• Hadoop Talend Integration

Module 11: Certification Project

Topics:

• Find out the frequency of books published each year. (Hint: Sample dataset provided)
• Find out in which year maximum number of books were published
• Find out how many books were published based on ranking in the year 2002

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka !
Discover Learning

PySpark Certification Training Course


Course Curriculum

Course Outline

Module 1: Introduction to Big Data Hadoop and Spark

Topics:

• What is Big Data?


• Big Data Customer Scenarios
• Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
• How Hadoop Solves the Big Data Problem?
• What is Hadoop?
• Hadoop’s Key Characteristics
• Hadoop Ecosystem and HDFS
• Hadoop Core Components
• Rack Awareness and Block Replication
• YARN and its Advantage
• Hadoop Cluster and its Architecture
• Hadoop: Different Cluster Modes
• Big Data Analytics with Batch & Real-Time Processing
• Why is Spark Needed?
• What is Spark?
• How Spark Differs from its Competitors?
• Spark at eBay
• Spark’s Place in Hadoop Ecosystem

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 2: Introduction to Python for Apache Spark

Topics:

• Overview of Python
• Different Applications where Python is Used
• Values, Types, Variables
• Operands and Expressions
• Conditional Statements
• Loops
• Command Line Arguments
• Writing to the Screen
• Python files I/O Functions
• Numbers
• Strings and related operations
• Tuples and related operations
• Lists and related operations
• Dictionaries and related operations
• Sets and related operations

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 3: Functions, OOPS, and Modules in Python

Topics:

• Spark Components & its Architecture

• Spark Deployment Modes

• Introduction to PySpark Shell

• Submitting PySpark Job

• Spark Web UI

• Writing your first PySpark Job Using Jupyter Notebook

• Data Ingestion using Sqoop

Module 4: Playing with Spark RDDs

Topics:

• Challenges in Existing Computing Methods

• Probable Solution & How RDD Solves the Problem

• What is RDD, It’s Operations, Transformations & Actions

• Data Loading and Saving Through RDDs

• Key-Value Pair RDDs

• Other Pair RDDs, Two Pair RDDs

• RDD Lineage

• RDD Persistence

• WordCount Program Using RDD Concepts

• RDD Partitioning & How it Helps Achieve Parallelization

• Passing Functions to Spark

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 5: DataFrames and Spark SQL

Topics:

• Need for Spark SQL

• What is Spark SQL

• Spark SQL Architecture

• SQL Context in Spark SQL

• Schema RDDs

• User Defined Functions

• Data Frames & Datasets

• Interoperating with RDDs

• JSON and Parquet File Formats

• Loading Data through Different Sources

• Spark-Hive Integration

Module 6: Machine Learning using Spark MLlib

Topics:

• Why Machine Learning

• What is Machine Learning

• Where Machine Learning is used

• Face Detection: USE CASE

• Different Types of Machine Learning Techniques

• Introduction to MLlib

• Features of MLlib and MLlib Tools

• Various ML algorithms supported by MLlib

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 7: Deep Dive into Spark MLlib

Topics:
• Supervised Learning: Linear Regression, Logistic Regression, Decision Tree, Random
Forest
• Unsupervised Learning: K-Means Clustering & How It Works with MLlib
• Analysis of US Election Data using MLlib (K-Means)

Module 8: Understanding Apache Kafka and Apache Flume

Topics:
• Need for Kafka
• What is Kafka
• Core Concepts of Kafka
• Kafka Architecture
• Where is Kafka Used
• Understanding the Components of Kafka Cluster
• Configuring Kafka Cluster
• Kafka Producer and Consumer Java API
• Need of Apache Flume
• What is Apache Flume
• Basic Flume Architecture
• Flume Sources
• Flume Sinks
• Flume Channels
• Flume Configuration
• Integrating Apache Flume and Apache Kafka

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 9: Apache Spark Streaming - Processing Multiple Batches

Topics:
• Drawbacks in Existing Computing Methods
• Why Streaming is Necessary
• What is Spark Streaming
• Spark Streaming Features
• Spark Streaming Workflow
• How Uber Uses Streaming Data
• Streaming Context & DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators

Module 10: Apache Spark Streaming - Data Sources

Topics:
• Apache Spark Streaming: Data Sources
• Streaming Data Source Overview
• Apache Flume and Apache Kafka Data Sources
• Example: Using a Kafka Direct Data Source

Module 11: Spark GraphX (Self-Paced)

Topics:
• Introduction to Spark GraphX
• Information about a Graph

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• GraphX Basic APIs and Operations


• Spark GraphX Algorithm - PageRank, Personalized PageRank, Triangle Count, Shortest
Paths, Connected Components, Strongly Connected Components, Label Propagation

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning

Microsoft SQL Server Certification


*Depending on industry requirements, Edureka may make changes to the course curriculum

Course Course Curriculum

Course Outline

Module 1: Introduction to RDBMS and SQL Server

Topics:

• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)

Module 2: Database Normalization, DDL, and DML Commands

Topics:

• Database Systems
• RDBMS
• Properties of Databases

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)

Module 3: Querying Data using Built-in Functions and T-SQL

Topics:

• Database Systems

• RDBMS

• Properties of Databases

• Introduction to SQL

• E-R Model

• Client Server Model

• MS SQL Server

• Microsoft SQL Management Studio (SSMS)

Module 4: Working with Advanced SQL

Topics:

• Database Systems
• RDBMS
• Properties of Databases
• Introduction to SQL
• E-R Model
• Client Server Model
• MS SQL Server
• Microsoft SQL Management Studio (SSMS)

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 5: UDFs, Backup & Restore in SQL Server

Topics:

• Ranking Functions
• Date and Time Functions
• UDFs (User Defined Functions)
• Backup and Restore Databases
• Triggers
• Index

Module 6: SQL Server Optimization and Performance

Topics:

• Introduction to Optimization
• Understanding Performance
• Optimizing Queries
• Indexing for Performance
• Performance Tuning

Module 7: MS SQL User Administration

Topics:

• Architecture of Security Model


• Server Authentication Modes
• Managing Users, Roles, and Logins
• Permissions (GRANT, DENY, REVOKE)
• Understanding Server Agents
• Server Agent Jobs and Schedules

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 8: Advanced SQL Server Administration

Topics:

• Database Mails via Server Agents


• Activity Monitor
• Log Shipping
• Configuring Log Shipping
• Transparent Data Encryption

Module 9: Introduction to Azure

Topics:

• Introduction to Azure
• Creating Azure Account
• Creating and configuring Azure VMs
• Azure SQL Database
• Accessing Azure Services
• Query SQL database in Azure

Module 10: Migrating SQL Workloads to Azure

Topics:

• Introduction to Microsoft Data Migration Assistant


• Setting up Migration Assistant
• Migrate Local SQL Server Database to Azure SQL Database
• Migration Data Checks
• Best Practices

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning

DP 203: Data Engineering on Microsoft


*Depending on industry requirements, Edureka may make changes to the course curriculum

Azure
Course Curriculum

Course Outline

Module 1: Introduction to Microsoft Azure and its Services

Topics:

• Azure Subscriptions
• Azure Resources
• Azure Free Tier Account
• Azure Resource Manager
• Azure Resource Manager Template
• Azure Storage
• Types of Azure Storage

Module 2: Introduction to Azure Data Engineering

Topics:

• Understand the evolving world of data


• Data abundance
• Understanding the Data Engineering Problem
• Understand job responsibilities
• Understanding Data Engineering Processing - Extract Transform and Load

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Overview of Azure Data Engineering Services


• Understand data storage in Azure Storage
• Understand data storage in Azure Data Lake Storage
• Understand Azure Cosmos DB
• Understand Azure SQL Database
• Understand Azure Synapse Analytics
• Understand Azure Stream Analytics
• Understand Azure HDInsight
• Understand other Azure data services

Module 3: Storing Data in Azure

Topics:

• How to choose an Azure Storage Service in Azure

• Create an Azure Storage Account

• Connect an app to Azure Storage API

• Connect to your Azure storage account

• Explore Azure Storage security features

• Understand storage account keys

• Understand shared access signatures

• Control network access to your storage account

• Understand Advanced Threat Protection for Azure Storage

• Explore Azure Data Lake Storage security features

• Introduction to Blob storage

• What are blobs?

• Design a storage organization strategy

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 4: Azure Data Factory Part - I

Topics:

• Integrate data with Azure Data Factory or Azure Synapse Pipeline

• Understand Azure Data Factory

• Describe data integration patterns

• Explain the data factory process

• Understand Azure Data Factory components

• Azure Data Factory security

• Set-up Azure Data Factory

• Create linked services

• Create datasets

• Create data factory activities and pipelines

• Manage integration runtimes

• Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipeline

• List the data factory ingestion methods

• Describe data factory connectors

• Understand data ingestion security considerations

Module 5: Azure Data Factory Part - II

Topics:

• Explain Data Factory transformation methods


• Describe Data Factory transformation types
• Debug mapping data flow
• Describe slowly changing dimensions
• Choose between slowly changing dimension types
• Understand data factory control flow

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Work with data factory pipelines


• Debug data factory pipelines
• Add parameters to data factory components
• Execute data factory packages
• Describe SQL Server Integration Services
• Understand the Azure-SIS integration runtime
• Set-up Azure-SIS integration runtime
• Run SSIS packages in Azure Data Factory
• Migrate SSIS packages to Azure Data Factory
• Configure a git repository with a development factory
• Create and merge a feature branch
• Deploy a release pipeline
• Visually monitor pipeline runs
• Integrate with Azure Monitor
• Set up alerts
• Rerun pipeline runs

Module 6: Azure Synapse Analytics Part - I

Topics:

• What is Azure Synapse Analytics


• How Azure Synapse Analytics works
• When to use Azure Synapse Analytics
• Create Azure Synapse Analytics workspace
• Describe Azure Synapse Analytics SQL
• Explain Apache Spark in Azure Synapse Analytics
• Orchestrate data integration with Azure Synapse pipelines
• Visualize your analytics with Power BI
• Understand hybrid transactional analytical processing with Azure Synapse Link

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Use Azure Synapse Studio

Module 7: Azure Synapse Analytics Part - II


Topics:

• Functions
• Function Parameters
• Global variables
• Variable scope and Returning Values
• Lambda Functions
• Object Oriented Concepts
• Standard Libraries
• Modules Used in Python (OS, Sys, Date and Time etc.)
• The Import statements
• Module search path
• Package installation ways
• Errors and Exception Handling
• Handling multiple exceptions

Module 8: Work with Data Warehouses using Azure Synapse Analytics - Part I

Topics:

• Describe a modern data warehouse


• Define a modern data warehouse architecture
• Exercise - Identify modern data warehouse architecture components
• Design ingestion patterns for a modern data warehouse
• Understand data storage for a modern data warehouse
• Understand file formats and structure for a modern data warehouse
• Prepare and transform data with Azure Synapse Analytics

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 9: Work with Data Warehouses using Azure Synapse Analytics - Part II

Topics:

• Understand data load design goals


• Explain load methods into Azure Synapse Analytics
• Manage source data files
• Manage singleton updates
• Set-up dedicated data load accounts
• Implement workload management
• Simplify ingestion with the Copy Activity
• Understand performance issues related to tables

Module 10: Optimizing Data Queries in Azure

Topics:

• Understand table distribution design


• Use indexes to improve query performance
• Understand query plans
• Create statistics to improve query performance
• Improve query performance with materialized views
• Use read committed snapshot for data consistency
• How does statistics affect a query plan?
• Describe the integration methods between SQL and spark pools in Azure Synapse
Analytics
• Understand the use-cases for SQL and spark pools integration
• Exercise: Integrate SQL and spark pools in Azure Synapse Analytics
• Externalize the use of spark pools within Azure Synapse Workspace
• Transfer data outside the synapse workspace using the PySpark connector
• Explore the development tools for Azure Synapse Analytics
• Understand transact-SQL language capabilities for Azure Synapse Analytics

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 11: Managing Workloads in Azure Synapse Analytics

Topics:

• Scale compute resources in Azure Synapse Analytics


• Pause compute in Azure Synapse Analytics
• Manage workloads in Azure Synapse Analytics
• Use Azure Advisor to review recommendations
• Use dynamic management views to identify and troubleshoot query performance
• Understand skewed data and space usage
• Understand network security options for Azure Synapse Analytics
• Configure Conditional Access
• Configure authentication
• Manage authorization through column and row level security
• Manage sensitive data with Dynamic Data Masking
• Implement encryption in Azure Synapse Analytics

Module 12: Deep Dive into Azure Databricks


Topics:

• Get started with Azure Databricks


• Identify Azure Databricks workloads
• Understand key concepts
• Use Apache Spark in Azure Databricks
• Create a Spark cluster
• Use Spark in notebooks
• Use Spark to work with data files
• Visualize data
• Get Started with Delta Lake
• Create Delta Lake tables
• Create and query catalog tables

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Use Delta Lake for streaming data


• Get started with SQL Warehouses
• Create databases and tables
• Create queries and dashboards
• Understand Azure Databricks notebooks and pipelines
• Create a linked service for Azure Databricks
• Use a Notebook activity in a pipeline
• Use parameters in a notebook

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

edureka!
Discover Learning

Microsoft Power
*Depending on BI Certification
industry requirements, Edureka may make changes to the course curriculum

Training Course
Course Curriculum

Course Outline

Module 1: Introduction to Power BI


Topics

• Introduction to Business Intelligence


• Self-Service Business Intelligence (SSBI)
• Introduction to Power BI
• Traditional BI vs. Power BI
• Power BI vs. Tableau vs. QlikView
• Uses of Power BI
• The Flow of Work in Power BI
• Working with Power BI
• Basic Components of Power BI
• Comparison of Power BI Version
• Introduction to Building Blocks of Power BI
• Data model and importance of Data Modeling

Module 2: Power BI Desktop and Data Transformation


Topics

• Data Sources in Power BI Desktop


• Loading Data in Power BI Desktop
• Views in Power BI Desktop
• Query Editor in Power BI
• Transform, Clean, Shape, and Model Data
• Manage Data Relationship
• Editing a Relationship

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Cross Filter Direction


• Saving Workfile
• Measures

Module 3: Data Analysis Expression (DAX)


Topics:

• Introduction to DAX
• Importance of DAX
• Data Types in DAX
• DAX Operators
• DAX Calculation Types
• Steps to Create Calculated Columns
• Steps to Create Calculated Tables
• Measures in DAX
• DAX Syntax
• DAX Functions
• DAX Tables and Filtering

Module 4: Data Visualization


Topics:

• Introduction to Visuals In Power BI


• Visualization Charts in Power BI
• Matrixes and Tables
• Slicers and Map Visualizations
• Gauges and Single Number Cards
• Modifying Colors in Charts And Visuals
• Shapes, Text Boxes, and Images
• Custom Visuals
• Page Layout and Formatting
• Bookmarks and Selection Pane
• KPI Visuals
• Z-order
• Grouping and Binning

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 5: Power BI Service


Topics:

• Introduction to Power BI Service


• Creating a Dashboard
• Quick Insights in Power BI
• Configuring a Dashboard
• Power BI Q&A
• Ask Questions about your Data
• Power BI Embedded
• Bookmarks and buttons

Module 6: Connectivity Modes


Topics:

• Data Sources Supported in Power BI


• Exploring Live Connections to Data Sources
• Connecting Directly to SQL Azure
• Connecting Directly to SQL Server Analysis Services/My SQL
• Import Power View and Power Pivot
• Data Gateways
• Direct Query vs. Import Connectivity modes
• Connecting Power BI in Excel

Module 7: Power BI Report Server


Topics:

• What is Power BI Report Server?


• Key Features of Report Server
• The architecture of the Report Server
• Limitations of Report Server
• Power BI Report Server vs. Power BI Service
• Acquiring and Installing Power BI Report Server
• What is a Web Portal?
• Paginated Reports
• Row Level Security

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

Module 8: R & Python in Power BI


Topics:

• Brief concepts about R


• R Programming Concepts
• Create R Scripts for BI
• Python Programming
• Python Scripts in BI
• Python integration with Power BI

Module 9: Advance Analytics in Power BI


Topics:

• Use Parameters
• Create a data flow
• Introduction to Anomaly Detection
• Introduction to Smart Narrative
• Introduction to Sensitivity labels in Power BI
• Deployment Pipeline
• Hands-on:
• Connecting with Power BI service
• Creating Data flow
• Creating scorecard

Module 10: In-Class Project


Industry - 1: Retail Sector
Problem Statement:
Global Super Store is an online supergiant store that has worldwide operations. This store takes
orders, delivers products across the globe, and deals with all the major product categories like
furniture, office supplies & technology. As a sales manager for this store, you want to analyze
the sales of the products based on provided historical data; this analysis will help you to plan
your inventory and business processes accordingly. Also, to know the products & customers
behavior.

Topics:

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.


edureka!

• Power BI Service
• Row Level Security
• Visuals and Charts
• Power BI Desktop
• Handling Workspaces

Industry - 2: Sales and Finance


Problem Statement:
PEW Retail Inc. Ltd has subsidiaries across the globe, and they sell products to various
customers scattered in a different geography. They are looking to have a consolidated
dashboard.
Topics:

• Power BI Gateway
• Power BI Service
• Data Visualization
• Dashboard Management

www.edureka.co © Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.

You might also like