Big Data Masters Program
Big Data Masters Program
06 Week
»» Hbase data model
»» 4-Dimensional data model
09 Week
»» Immutability
»» Transformation & Action
Apache Spark - General Purpose Cluster »» Lazy Evaluation
Computing Framework »» Word count program in Spark
»» Word count program in PySpark
»» Scala Interview Preparation Series
»» Word count problem real-time example
»» What is App class in Scala
»» Week9: Quiz
»» Default args, named args & variable args
»» Week9: Assignment
»» Difference between nil, null, none & nothing
»» Week8 Assignment Solution
»» What is option in Scala
»» What is unit in Scala
»» Dealing with nulls in Scala
»» What is yield
10 Week
»» What is vector Apache Spark - In Depth
»» Scala if guards & pattern guards »» Spark Real-Time Example
»» What is “for comprehensions” »» Broadcast Variable
»» Difference between “==” in java and Scala »» Accumulators
»» Difference between strict val vs lazy val »» How Spark Executes Program on the Cluster
»» What are default packages in Scala »» Spark Driver and Executors
»» What is Scala apply method »» Client Mode, Cluster Mode and Local Mode
»» Analyzing Log Messages - Hands on »» Spark Ecosystem
»» Narrow vs Wide Transformations »» Map vs Map Partitions
»» Stages in Spark »» Introduction to Spark Structured API
»» Difference Between reduceByKey & reduce »» Spark DataFrame
»» Difference Between groupByKey & reduceByKey »» Understanding SparkSession
»» Pair RDD »» SparkSession vs SparkContext
»» Pair RDD vs Map »» Dataframe with Various Transformations
»» Understanding Default Parallelism »» RDD vs DataFrame vs Datasets
»» Difference Between repartition & coalesce »» Challenges with DataFrame
»» When to Increase/Decrease Partitions »» Spark Dataset API
»» Spark on YARN Architecture »» Difference Between DataFrame and Dataset
»» Benefits of Dataset
YARN - Yet Another Resource Negotiator »» Creating Dataframe/Datasets from Various
File Formats
»» Limitations or Drawbacks of MR1
»» Read Modes & Schema
»» Resource Manager
»» Ways to Define the Schema
»» Node Manager
»» Defining a Explicit Schema
»» Application Master
»» Week11: Quiz
»» Containers
»» Week11: Assignment
»» Week10: Quiz
»» Week10 Assignment Solution
»» Week10: Assignment
»» Week9 Assignment Solution
11 Week 12 Week
Apache Spark - Structured API Part-1 Apache Spark - Structured API Part-2
»» Cache vs Persist »» Writing Output to Sink (spark.write)
»» Real-time example - Finding top movies based »» Saving file in Various file format
on ratings »» Introduction to SparkSql
»» Storing Data in Persistent Manner
»» Handling Spark Metadata 13 Week
»» Low & High level Transformations Apache Spark - Optimization Part-1
»» Refering to a Column in Dataframe/Dataset »» Level of Optimizations
»» Column String »» Resource level optimizations
»» Column Object »» Application level optimizations
»» Column Expression »» Cluster level optimizations
»» Spark UDF using Structured API »» How to calculate no of Executors
»» Adding Column in Dataframe »» Thin Executor
»» Dataframe to Dataset Using Case Class. »» Fat Executor
»» Dataset to DataFrame Conversion »» How to calculate no of Executors
»» Spark Catalog »» How to Calculate Memory allacation
»» Registring UDF with Driver »» How to Calculate No of Cores
»» Transformations Hands on Examples »» Heap Memory
»» Aggregate Transformations »» Off-Heap Memory
»» Simple Aggregations »» Hands on With Real-time cluster
»» Grouping Aggregations »» Understanding Cluster Configuarations
»» Window Aggregations »» Realtime Example:
»» Joins on DataFrame Moving ata to HDFS using a Edge node and
»» Simple Join (Shuffle Sort Merge Join) work around it in a realtime cluster
»» Client Mode vs Cluster Mode When using Spark »» Practical on Stateless Transformation
Submit »» Practical on Stateful Transformation
»» Spark Join Optimizations »» reduceByKey vs updateStateByKey
»» Spark Advance Optimizations: Sort Aggregate »» Working With Sliding Window
vs Hash Aggregate »» reduceByKeyAndWindow Transformation
»» Spark Catalyst Optimizer »» reduceByWindow Transformation
»» Week14: Quiz »» countByWindow Transformation
»» Week14: Assignment »» Week15: Quiz
»» Week13 Assignment Solution »» Week15: Assignment
»» Week14 Assignment Solution
15 Week 16 Week
Apache Spark - Streaming Part-1
Apache Spark - Streaming Part-2
»» Kind of Processing
»» What Is Structured Streaming
»» What is Real-tim Processing
»» Requirement Of Structure Streaming
»» The Importance of Real-time Processing
»» Limitations Of Spark Streaming
»» Batch processing vs Real-tim Stream Processing
»» Benefits Of Spark Structure Streaming
»» Spark Streaming Data
»» Practical - Wordcount Example On Structured
»» Spark discretized stream or DStream Streaming
»» Batch & Batch Interval »» Dynamically Setting The Shuffle Partitions
»» Do Spark is a real-time streaming engine »» Data Stream Writer Output Modes
»» Stream Processing in Spark »» Datastream Output Modes - append, update &
»» Transformed DStream complete
»» Spark Streaming Graceful Shutdown
»» How Does Spark Streaming Code Executes Internally 17 Week
»» How a Job Converted to Micro batches Apache Kafka - Distributed Event
»» Trigger Point For Micro Batches Streaming Platform
»» Types of Triggers - unspecified, time interval, »» Introduction To Kafka
one time, continuous
»» Kakfa Architecture
»» Types of Data Sources - Socket Source, Rate
Source, File Source, Kafka Source »» Kafka Key Concepts/Fundamentals
»» Why Cloud & Big Data on Cloud »» How to create a normal table manually on csv
data residing in s3
»» Major Cloud Providers of Bigdata
»» How to minimize data scanning in Athena
»» What is EMR
»» How to create partition table on Parquet file
»» Hdfs vs S3
»» Infering Schema automatically using AWS Glue
»» What Is S3
»» Glue Catalog
»» Important Instances in AWS
»» Week18: Quiz
»» Kinds of Nodes in Cluster
»» Week18: Assignment
»» Transient vs Long Running Cluster
»» Week17 Assignment Solution
»» Running Spark Code on Emr
»» How to Track Your Job
»» Copy File From S3 to Local
»» Zeppelin Notebook
19 Week
»» Types of EC2 Instances
Big Data on Cloud Part-2
»» How to Create a VM AWS Glue
»» What is a Keypair »» What is AWS Glue?
»» Elastic IP »» Introduction To Glue
»» AWS Storage, Networking & CLI »» Features of Glue
»» Instance Store »» AWS Glue Benefits
»» S3 & EBS »» AWS Glue Terminology
»» Public Ip Vs Private Ip »» Pointing to Specific Data Stores and Endpoints
»» Network Switches »» Glue Data Catalogue
»» Security Group »» Crawlers
»» Aws Command Line Interface »» Connecting to Your Data Store
»» Launch A Emr Cluster Using Advanced Options »» Using Crawlers for Catalogue Tables
»» Overview and Working of Glue Jobs »» Viewing The DAG In Ui-Graph View, Tree View,
»» Adding New Jobs in Glue Logs Viewing
»» Triggering Jobs and Their Scheduling »» Example Showcasing Bash Operators Usage
»» Setting Precedence Among Various Tasks
AWS Redshift
»» Lifecycle Of A Task-Understanding Various Stages
»» Database vs Data Warehouse vs Data Lake
»» About Trigger_rules & Understanding With Example
»» Introduction to Amazon Redshift
»» Airflow Artifact - More On Operators
»» Benefits of Amazon Redshift
»» Writing Our Own Custom Operators
»» Use Cases of Amazon Redshift
»» Walkthrough Of Airflow UI
»» Redshift Master Slave Architecture
»» Connections To Various Datastores & Variables
»» Types of Nodes
»» Working With Connections, Understanding
»» Redshift Spectrum Sensors – Demo
»» Redshift Fault Tolerance »» Building an end-to-end customer-360 pipeline
»» Redshift Sort Keys using Airflow involving data collection from
various sources, processing in spark, loading
»» Redshift Distribution Styles
the processed data in hive and uploading the
»» Practical Demonstration same to HBase and generating a notification
»» Week19: Quiz about success of the pipeline to the
downstream applications.
»» Week19: Assignment
»» Week18 Assignment Solution
20 Week Plus
One end-to-end pipeline PROJECT
Apache Airflow - Workflow Management involving all Major components like
Platform Sqoop, Hdfs, Hive, Hbase, Spark... etc.
»» Introduction To Airflow And Its Usage Interview Preparation Tips:
»» What Is Workflow
»» Cron-Job Creation Example
Sample Resume
»» Airflow Additional Features 15+ Mock Interview Recordings
»» Airflow Architecture And Components
Mock Interview QA
»» Airflow Installation Demo
»» Dags-Creating A Simple Helloworld Dag
Interview Questions
»» Introduction To Tasks And Operators How to Handle Managerial Round Qs
5 Star Google Rated
Big Data Course
LEARN FROM THE EXPERT
9108179578