0% found this document useful (0 votes)

107 views

Big Data Masters Program

The document outlines a curriculum for a Big Data Masters Program covering topics like HDFS, MapReduce, Sqoop, Hive, and HBase over 7 weeks. Week 1 introduces HDFS concepts and Linux commands. Week 2 covers MapReduce and distributed computing frameworks. Week 3 focuses on Apache Sqoop for data ingestion. Weeks 4-6 cover Apache Hive for processing structured data in Hadoop. Week 7 introduces NoSQL databases and covers Apache HBase in particular.

Uploaded by

Arun Singh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views

Big Data Masters Program

Uploaded by

Arun Singh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

CURRICULUM

Big Data Masters Program

01 Week »» What is SPOF
»» FSimage & Edit Logs
Introduction to Big Data & HDFS Concepts »» Secondary Namenode
along with Linux Commands »» Name Node Recovery
»» Check Pointing
»» Introduction to Big Data »» Understanding Replication Factor
»» What is Big Data And Why Big Data »» What is Rack And Rack Failure
»» Big Data System Requirements »» Rack Awareness Mechanism
»» Monolithic vs Distributed System »» Block Report
»» Distributed System Architecture »» Namenode High Availability
»» What is Hadoop And Evolution of Hadoop »» Quorum Journal Manager & Quorum Journal Node
»» Google File System (GFS) »» Understanding Linux File System
»» Distributed Processing (MapReduce) »» List & Parameters of List Command
»» Hadoop 1.0 vs Hadoop 2.0 »» Touch, Mkdir, Rmdir & Other Linux Commands
»» What is Yarn »» HDFS Commands:
»» Core Components of Hadoop »» List Files & Directories
»» Hadoop Ecosystems Tools »» How HDFS Commands Work
»» Brief Introduction to Spark »» ‘ls’ Command With Various Parameters
»» Hadoop Cluster Vs Spark Cluster »» Create, Remove File/Directory
»» HDFS Architecture: »» Copy & Get Files/Folders From Local to HDFS &
»» What is Node And What is Cluster Vice Versa
»» Data Block & Block Size »» Move Files/Folders From HDFS to HDFS
»» Slave Node, Master Node, Data Node & Name Node »» Change Replication Factor Dynamically
»» Metadata And Replication Factor »» View File Metadata Information
»» Heart Beat & Fault Tolerance »» Week1: Quiz
»» Handling Namenode Failure »» Week1: Assignment
02 Week »» Realtime Use Case: Google Web Search
»» How Google Search Works
MapReduce - Distributed Computing »» MapReduce Programming
Framework »» MR Code Explanation
»» Introduction to MapReduce »» How to Write Map Reduce Code
»» What is MapReduce »» Mapper Code
»» Stages in MapReduce »» Reducer Code
»» What is Key-Value »» Main Code
»» What is Map & What is Reduce »» Finding the Frequency of Each Word in a File
»» Example to Undestand Map&Reduce »» Mapreduce Jars
»» Word Count Example in MapREduce »» MapReduce Practical Sessions
»» Record Reader »» Word Count Program - Practical Session1
»» Mapper Phase »» Jar Creation & Execution - Practical Session2:
»» Reducer Phase »» How to Create a Jar
»» MapReduce Shuffle & Sort »» How to Execute the Jar
»» Inside Map & Reduce Phase »» How to Track a Job
»» Wordcount Example in MapReduce »» How to Track All Previous Jobs
»» Typical MapReduce Flow »» MR Program Variations - Practical Session3:
»» Blocks in MapReduce »» How to Change Number of Reducers
»» Default Number of Mappers & Reducers »» Writing Custom Partitioner Logic
»» Understanding Number of Mappers/Reducers »» Changing Number of Reducers to Zero
»» MapReduce Framework Behind the Scenes »» Introducing Combiner
»» Role of Hash Function in MapReduce »» Writing Custom Combiner logic
»» Partitioning in MapReduce »» Week2: Quiz
»» How to Choose Number of Reducers »» Week2: Assignment
»» How Hash Function Works »» Week1 Assignment Solution
»» Understanding Shuffle & Sort
»» Example: Calculating Max Temperature in a Day
»» Combiner Function in MapReduce
»» Advantages of Combiners
03 Week
»» When to Use or Not to Use Combiner
Apache Sqoop - Data Ingestion to Hadoop
»» Example1: Filtering Data using MapReduce »» Sqoop Fundamentals
»» Example2: Finding Distinct Values »» Sqoop Basics
»» Example3: Finding Top 3 Most Influential users »» What is sqoop
»» Sqoop Workflow »» What is Hive
»» Key Features of Sqoop »» Hive Query Language (HQL)
»» Sqoop Import »» Understanding Hive Table
»» Sqoop Export »» Introduction to Hive Metadata
»» Connecting to MySQL »» Why Hive over traditional databases
»» Acessing MySQL Databases from Hadoop »» Transactional and Analytical Processing
»» Acessing MySQL Tables from Hadoop »» What is Data Warehouse
»» Sqoop Eval »» Hive Architecture
»» Sqoop Import Practicals »» Hive on top of Hadoop
»» Sqoop Export Practicals »» How Hive Works
»» Sqoop Job »» Transactional vs Analytical Processing
»» Sqoop Password Management »» Data Warehouse Concept
»» Sqoop Incremental Load »» The Hive Metastore
»» Sqoop Default Import »» Hive vs RDBMS
»» Sqoop Free-Frorm Query Import »» HQL vs SQL
»» Sqoop Direct import »» Hive Subqueries Views & Index
»» Importing Data Into Hive »» Transactional and Analytical Processing
»» Importing Data Into HBase »» What is Data Warehouse
»» Sqoop Validate »» Hive Architecture
»» When a Sqoop Export May Fail »» Hive on Hadoop
»» Week3: Quiz »» Hive Metastore
»» Week3: Assignment »» Hive vs. RDBMS
»» Week2 Assignment Solution »» Hive Complex Data Types
»» Hive Array, Map & Struct

04 Week »» Hive Built-in Functions

»» Hive UDF, UDAF & UDTF
Apache Hive Basics - Process Structure »» Hive Lateral Views
Data in Hadoop »» Hive Subqueries
»» Hive Overview: »» Hive Views
»» Transactional System and Analytical System »» Hive Normalization vs Denormalization
»» Examples of Transactional Systems »» Week4: Quiz
»» Examples of Analytical Systems »» Week3 Assignment Solution
05 Week 07 Week
Apache Hive Advance - Part 1 NoSQL Databases - HBase
»» Hive Structure Level Optimizations: »» Hbase Basics
»» Hive Partitioning
»» Key requirements of database
»» Hive Partitioning With 2 Columns
»» Limitations of Hadoop
»» Hive Bucketing
»» Google Bigtable concept for quick searching
»» Hive Partitioning With Bucketing
»» Hive Query Level Optimizations: »» Implementation of Bigtable as Hbase
»» Hive Join Optimizations »» Properties of Hbase
»» Hive Bucket Map Join Optimizations »» What Hbase can offer
»» Hive Window Functions »» Row based storage vs Columnar storage
»» Hive Ranking »» Advantages of columnar storage
»» Hive Sorting »» Normalization vs Denormalization
»» Week5: Quiz »» CRUD Operation
»» Week5: Assignment
»» RDBMS vs Hbase

06 Week
»» Hbase data model
»» 4-Dimensional data model

Apache Hive Advance - Part 2 »» CAP Theorem

»» Hbase Architecture
»» Hive File Format
»» Row vs Column File Formats »» Hbase Region Server
»» Specialized File Formats »» Region, Memstore, Wal & Block Cache
»» Internals of ORC File Formats »» Hfile
»» Internals of Parquet File Formats »» Zookeeper
»» ORC vs Parquet File Formats »» Hmaster & Meta Table
»» Hive Compression Techniques »» Hbase Architecture components in details
»» Hive Vectorization »» Hbase Read/Write operations
»» Changing the Hive Engine
»» Compaction
»» Hive Thrift Server
»» Hbase Data Update
»» Hive MSCK Repair
»» Hbase Data Deletion
»» Hive SCD
»» Week6: Quiz »» Handling Server Failures
»» Week6: Assignment »» Hbase Practicals
»» Week5: Assignment Solution »» Handling Hbase Failure Services
»» Create & List Table »» Var vs val
»» Insert Records in Table »» Type inference
»» Scan(view) & Get records from table »» Data types in Scala
»» Delete a column »» String Interpolation
»» Describe a table »» String Comparison
»» Check table exists or not »» Flow control: If else
»» Drop table - Understanding how it works »» Match Case
»» Parameters of get command »» For Loop
»» Parameters of scan command »» While loop
»» Hbase files structure in HDFS »» Scala Functional Programming
»» How to disable/enable a table »» How to define a function
»» Various filters in Hbase »» Higher order function
»» Count Records »» Anonymous function
»» Cassandra Overview »» Scala Collections
»» What is Cassandra »» Array
»» How Cassandra Cluster Look Like »» List
»» Tunable read/write Consistency »» Tuple
»» Hbase vs Cassandra »» Range
»» Integration with Hadoop (Mini Project) »» Set
»» Hbase-Hive Integration »» Map
»» Week7: Quiz »» Scala Functional Programming:
»» Week7: Assignment »» Why Scala
»» Week6 Assignment Solution »» Modes of writing Scala code
»» What is a functional programming

08 week »» What is a function

»» What is a pure function?
Learning Scala - A Guide to Functional »» First class function
Programming »» Higher order function
»» Why Scala »» Anonymous function
»» Where to Run Scala Code »» Immutability
»» Scala Code Using IDE »» Loop
»» Scala Basics »» Recursion
»» Tail recursion »» What is a diamond problem in Scala
»» Statement vs Expression »» What is a trait
»» Closure »» Why Scala is the top most choice for a big data
»» Scala type system developer over Python and Java

»» Scala operators »» What is Apache Spark

»» Anonymous function »» Understanding Spark cluster

»» Placeholder syntax »» Is Spark a replacement to Hadoop

»» Partially applied functions »» Why Spark is faster than MapReduce

»» Function currying »» How data store in Spark

»» Week8: Quiz »» What is RDD

»» Week8: Assignment »» What is DAG

»» Week7 Assignment Solution »» RDD Lineage

»» Resiliency

09 Week
»» Immutability
»» Transformation & Action
Apache Spark - General Purpose Cluster »» Lazy Evaluation
Computing Framework »» Word count program in Spark
»» Word count program in PySpark
»» Scala Interview Preparation Series
»» Word count problem real-time example
»» What is App class in Scala
»» Week9: Quiz
»» Default args, named args & variable args
»» Week9: Assignment
»» Difference between nil, null, none & nothing
»» Week8 Assignment Solution
»» What is option in Scala
»» What is unit in Scala
»» Dealing with nulls in Scala
»» What is yield
10 Week
»» What is vector Apache Spark - In Depth
»» Scala if guards & pattern guards »» Spark Real-Time Example
»» What is “for comprehensions” »» Broadcast Variable
»» Difference between “==” in java and Scala »» Accumulators
»» Difference between strict val vs lazy val »» How Spark Executes Program on the Cluster
»» What are default packages in Scala »» Spark Driver and Executors
»» What is Scala apply method »» Client Mode, Cluster Mode and Local Mode
»» Analyzing Log Messages - Hands on »» Spark Ecosystem
»» Narrow vs Wide Transformations »» Map vs Map Partitions
»» Stages in Spark »» Introduction to Spark Structured API
»» Difference Between reduceByKey & reduce »» Spark DataFrame
»» Difference Between groupByKey & reduceByKey »» Understanding SparkSession
»» Pair RDD »» SparkSession vs SparkContext
»» Pair RDD vs Map »» Dataframe with Various Transformations
»» Understanding Default Parallelism »» RDD vs DataFrame vs Datasets
»» Difference Between repartition & coalesce »» Challenges with DataFrame
»» When to Increase/Decrease Partitions »» Spark Dataset API
»» Spark on YARN Architecture »» Difference Between DataFrame and Dataset
»» Benefits of Dataset
YARN - Yet Another Resource Negotiator »» Creating Dataframe/Datasets from Various
File Formats
»» Limitations or Drawbacks of MR1
»» Read Modes & Schema
»» Resource Manager
»» Ways to Define the Schema
»» Node Manager
»» Defining a Explicit Schema
»» Application Master
»» Week11: Quiz
»» Containers
»» Week11: Assignment
»» Week10: Quiz
»» Week10 Assignment Solution
»» Week10: Assignment
»» Week9 Assignment Solution

11 Week 12 Week
Apache Spark - Structured API Part-1 Apache Spark - Structured API Part-2
»» Cache vs Persist »» Writing Output to Sink (spark.write)

»» Spark Storage Levels »» Spark File Layout

»» Difference Between DAG & Lineage »» Benefits of Repartitions

»» How to Submit a Spark Job »» partitionBy & bucketBy

»» Real-time example - Finding top movies based »» Saving file in Various file format
on ratings »» Introduction to SparkSql
»» Storing Data in Persistent Manner
»» Handling Spark Metadata 13 Week
»» Low & High level Transformations Apache Spark - Optimization Part-1
»» Refering to a Column in Dataframe/Dataset »» Level of Optimizations
»» Column String »» Resource level optimizations
»» Column Object »» Application level optimizations
»» Column Expression »» Cluster level optimizations
»» Spark UDF using Structured API »» How to calculate no of Executors
»» Adding Column in Dataframe »» Thin Executor
»» Dataframe to Dataset Using Case Class. »» Fat Executor
»» Dataset to DataFrame Conversion »» How to calculate no of Executors
»» Spark Catalog »» How to Calculate Memory allacation
»» Registring UDF with Driver »» How to Calculate No of Cores
»» Transformations Hands on Examples »» Heap Memory
»» Aggregate Transformations »» Off-Heap Memory
»» Simple Aggregations »» Hands on With Real-time cluster
»» Grouping Aggregations »» Understanding Cluster Configuarations
»» Window Aggregations »» Realtime Example:
»» Joins on DataFrame Moving ata to HDFS using a Edge node and
»» Simple Join (Shuffle Sort Merge Join) work around it in a realtime cluster

»» Broadcast Join »» Static Resource allocation

»» Dealing With Ambiguoes Column Names »» Dynamic Resource allocation

»» Dealing With Null’s »» Understanding Memory Usage in Spark

»» Internals of Join Operations »» Execution Memory

»» When to Use Simple Join When Use »» Storage Memory

Broadcast Join »» Practical Demonstration:
»» Grouping Aggregation Real-time Example Cache & Persist

»» Infering Data in SparkSQL »» Java Serializer vs Kryo Serializer

»» Week12: Quiz »» Week12: Quiz

»» Week12: Assignment »» Week12: Assignment

»» Week11 Assignment Solution »» Week11 Assignment Solution

14 Week »» Understanding Producer & Consumer
»» Practical on Real-time Processing
Apache Spark - Optimization Part-2 »» Stream Transformations

»» Broadcast Join Practical Demonstartions »» Stateless Transformations

»» Broadcast Join Using RDD »» Stateful Transformations

»» When to Use Broadcast Join »» Window Operations

»» Broadcast Join Using Dataframe »» Batch Interval

»» Visualizing Broadcast Join with Structured API »» Window Size

»» Practical Demo on Repartition vs Coalesce »» Sliding Interval

»» Client Mode vs Cluster Mode When using Spark »» Practical on Stateless Transformation
Submit »» Practical on Stateful Transformation
»» Spark Join Optimizations »» reduceByKey vs updateStateByKey
»» Spark Advance Optimizations: Sort Aggregate »» Working With Sliding Window
vs Hash Aggregate »» reduceByKeyAndWindow Transformation
»» Spark Catalyst Optimizer »» reduceByWindow Transformation
»» Week14: Quiz »» countByWindow Transformation
»» Week14: Assignment »» Week15: Quiz
»» Week13 Assignment Solution »» Week15: Assignment
»» Week14 Assignment Solution

15 Week 16 Week
Apache Spark - Streaming Part-1
Apache Spark - Streaming Part-2
»» Kind of Processing
»» What Is Structured Streaming
»» What is Real-tim Processing
»» Requirement Of Structure Streaming
»» The Importance of Real-time Processing
»» Limitations Of Spark Streaming
»» Batch processing vs Real-tim Stream Processing
»» Benefits Of Spark Structure Streaming
»» Spark Streaming Data
»» Practical - Wordcount Example On Structured
»» Spark discretized stream or DStream Streaming
»» Batch & Batch Interval »» Dynamically Setting The Shuffle Partitions
»» Do Spark is a real-time streaming engine »» Data Stream Writer Output Modes
»» Stream Processing in Spark »» Datastream Output Modes - append, update &
»» Transformed DStream complete
»» Spark Streaming Graceful Shutdown
»» How Does Spark Streaming Code Executes Internally 17 Week
»» How a Job Converted to Micro batches Apache Kafka - Distributed Event
»» Trigger Point For Micro Batches Streaming Platform
»» Types of Triggers - unspecified, time interval, »» Introduction To Kafka
one time, continuous
»» Kakfa Architecture
»» Types of Data Sources - Socket Source, Rate
Source, File Source, Kafka Source »» Kafka Key Concepts/Fundamentals

»» Limitations of socket source »» Overview Of Zookeeper And It’s Role In Kafka

Cluster
»» Practical on File Data Source
»» Cluster, Nodes, Brokers, Topics
»» Types of Spark Streaming Output Data Options
»» Consumer, Producers, Logs, Partitions
»» Fault Tolerance and Exactly Once Guarantee
»» Concept Of Consumer Groups
»» Understanding Checkpoint Location
»» Leader & Follower Partition
»» Stateful vs Stateless Transformations
»» Installing One Node Kafka Cluster On Local
»» Managed Stateful Operations vs UnManaged
Stateful Operations »» Installing Multinode Kafka Cluster On Local

»» Types of Aggregations - Continuous »» Command Line Producer And Consumer

Aggregations vs Time Bound Aggregations »» Replication Concept For Fault Tolerance
»» Window Tranformations »» How Data Is Stored In Brokers
»» updateStateByKey, reduceByKeyAndWindow, »» Log Segments, Message Offsets, Message
reduceByWindow, countByWindow Index
»» Types of windows - Tumbling Time Window, »» Isr List / Minimum Isr
Sliding Time Window »» Committed Vs Uncommited Messages
»» Dealing With Late Coming Records Using »» Writing A Kafka Producer In Java
Watermark
»» Writing A Kafka Consumer In Java
»» State Store Cleanup
»» Scaling Up The Kafka Cluster
»» Calculating the Watermark Boundary
»» Achieving Exactly Once Semantics
»» Streaming Joins
»» Integrating Kafka With Spark Structured
»» Streaming Dataframe to static dataframe Streaming.
»» Streaming Dataframe With Another Streaming »» Week16: Quiz
Dataframes
»» Week16: Assignment
»» Week16: Quiz
»» Week15 Assignment Solution
»» Week16: Assignment
»» Week15 Assignment Solution
18 Week AWS Athena:
»» What is Athena
Big Data on Cloud Part-1 »» When do we require Athena
AWS EMR (Elastic MapReduce): »» What problem Athena Solve

»» What is a VM (Virtual Machine) »» How Athena Works

»» On-Premise vs Cloud Setup »» Athena Pricing

»» Major Vendors of Hadoop Distribution »» Athena Practical Demonstration:

»» Why Cloud & Big Data on Cloud »» How to create a normal table manually on csv
data residing in s3
»» Major Cloud Providers of Bigdata
»» How to minimize data scanning in Athena
»» What is EMR
»» How to create partition table on Parquet file
»» Hdfs vs S3
»» Infering Schema automatically using AWS Glue
»» What Is S3
»» Glue Catalog
»» Important Instances in AWS
»» Week18: Quiz
»» Kinds of Nodes in Cluster
»» Week18: Assignment
»» Transient vs Long Running Cluster
»» Week17 Assignment Solution
»» Running Spark Code on Emr
»» How to Track Your Job
»» Copy File From S3 to Local
»» Zeppelin Notebook
19 Week
»» Types of EC2 Instances
Big Data on Cloud Part-2
»» How to Create a VM AWS Glue
»» What is a Keypair »» What is AWS Glue?
»» Elastic IP »» Introduction To Glue
»» AWS Storage, Networking & CLI »» Features of Glue
»» Instance Store »» AWS Glue Benefits
»» S3 & EBS »» AWS Glue Terminology
»» Public Ip Vs Private Ip »» Pointing to Specific Data Stores and Endpoints
»» Network Switches »» Glue Data Catalogue
»» Security Group »» Crawlers
»» Aws Command Line Interface »» Connecting to Your Data Store
»» Launch A Emr Cluster Using Advanced Options »» Using Crawlers for Catalogue Tables
»» Overview and Working of Glue Jobs »» Viewing The DAG In Ui-Graph View, Tree View,
»» Adding New Jobs in Glue Logs Viewing

»» Triggering Jobs and Their Scheduling »» Example Showcasing Bash Operators Usage
»» Setting Precedence Among Various Tasks
AWS Redshift
»» Lifecycle Of A Task-Understanding Various Stages
»» Database vs Data Warehouse vs Data Lake
»» About Trigger_rules & Understanding With Example
»» Introduction to Amazon Redshift
»» Airflow Artifact - More On Operators
»» Benefits of Amazon Redshift
»» Writing Our Own Custom Operators
»» Use Cases of Amazon Redshift
»» Walkthrough Of Airflow UI
»» Redshift Master Slave Architecture
»» Connections To Various Datastores & Variables
»» Types of Nodes
»» Working With Connections, Understanding
»» Redshift Spectrum Sensors – Demo
»» Redshift Fault Tolerance »» Building an end-to-end customer-360 pipeline
»» Redshift Sort Keys using Airflow involving data collection from
various sources, processing in spark, loading
»» Redshift Distribution Styles
the processed data in hive and uploading the
»» Practical Demonstration same to HBase and generating a notification
»» Week19: Quiz about success of the pipeline to the
downstream applications.
»» Week19: Assignment
»» Week18 Assignment Solution

20 Week Plus
One end-to-end pipeline PROJECT
Apache Airflow - Workflow Management involving all Major components like
Platform Sqoop, Hdfs, Hive, Hbase, Spark... etc.
»» Introduction To Airflow And Its Usage Interview Preparation Tips:
»» What Is Workflow
»» Cron-Job Creation Example
Sample Resume
»» Airflow Additional Features 15+ Mock Interview Recordings
»» Airflow Architecture And Components
Mock Interview QA
»» Airflow Installation Demo
»» Dags-Creating A Simple Helloworld Dag
Interview Questions
»» Introduction To Tasks And Operators How to Handle Managerial Round Qs
5 Star Google Rated
Big Data Course
LEARN FROM THE EXPERT

9108179578

Dev Guide
100% (3)
Dev Guide
195 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
PTC Creo 7.0 - Student Edition - Quick Installation Guide: Log in
No ratings yet
PTC Creo 7.0 - Student Edition - Quick Installation Guide: Log in
2 pages
Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
Hive in Class Assignment Winter 2021
No ratings yet
Hive in Class Assignment Winter 2021
2 pages
Studying For A Tech Interview Sucks
No ratings yet
Studying For A Tech Interview Sucks
8 pages
2 Hadoop (Uploaded)
No ratings yet
2 Hadoop (Uploaded)
82 pages
9 Sqoop Notes
No ratings yet
9 Sqoop Notes
17 pages
Datatypes in Hive
No ratings yet
Datatypes in Hive
31 pages
Hive Tutorial For Beginners: Learn With Examples in 3 Days
No ratings yet
Hive Tutorial For Beginners: Learn With Examples in 3 Days
3 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Impala
No ratings yet
Impala
11 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
1 Hdfs Notes
No ratings yet
1 Hdfs Notes
38 pages
DW
No ratings yet
DW
29 pages
Interview PDF
No ratings yet
Interview PDF
100 pages
Azure Data Engineer Mock Interview - Project Special
No ratings yet
Azure Data Engineer Mock Interview - Project Special
11 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
3 Mapreduce Notes
No ratings yet
3 Mapreduce Notes
25 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Hbase PDF
No ratings yet
Hbase PDF
8 pages
File Formats in Big Data
No ratings yet
File Formats in Big Data
13 pages
Ambari Operations
No ratings yet
Ambari Operations
194 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
Sqoop Interview Questions
No ratings yet
Sqoop Interview Questions
6 pages
DDL Commands
No ratings yet
DDL Commands
65 pages
Big Data Masters Certification Learnbay
No ratings yet
Big Data Masters Certification Learnbay
12 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Snowflake Standards
No ratings yet
Snowflake Standards
2 pages
De Mod 2 Transform Data With Spark
No ratings yet
De Mod 2 Transform Data With Spark
32 pages
Real Time Hadoop Interview Questions From Various Interviews
No ratings yet
Real Time Hadoop Interview Questions From Various Interviews
6 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
No ratings yet
6 Frequently Asked Hadoop Interview Questions and Answers: Q1.What Is Hadoop?
8 pages
Snowflake Setup - MD
No ratings yet
Snowflake Setup - MD
2 pages
Complete Guide To Spark Memory Management 1726709042
No ratings yet
Complete Guide To Spark Memory Management 1726709042
11 pages
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
No ratings yet
Apache Hive Interview Questions: 1. Define The Difference Between Hive and Hbase?
10 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Sqoop Cammand
No ratings yet
Sqoop Cammand
8 pages
Leetcode Preparation
No ratings yet
Leetcode Preparation
14 pages
Understanding Business Intelligence:: ETL and Data Mart Best Practices
No ratings yet
Understanding Business Intelligence:: ETL and Data Mart Best Practices
20 pages
Unit-7 Transaction Processing
No ratings yet
Unit-7 Transaction Processing
107 pages
Bigdata Engineer Complete Syllabus: Presented by
No ratings yet
Bigdata Engineer Complete Syllabus: Presented by
21 pages
Create An Spark Streaming App: 1. Architecture and Abstraction
No ratings yet
Create An Spark Streaming App: 1. Architecture and Abstraction
8 pages
Azure SQL Trainings: Contact: +91 90 32 82 44 67
No ratings yet
Azure SQL Trainings: Contact: +91 90 32 82 44 67
6 pages
Elite SQL Queries For Practice PDF
0% (1)
Elite SQL Queries For Practice PDF
20 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
Apache Spark Theory by Arsh
No ratings yet
Apache Spark Theory by Arsh
4 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Interview
No ratings yet
Interview
86 pages
Monitoring Hadoop
From Everand
Monitoring Hadoop
Gurmukh Singh
No ratings yet
Linux Cookbook PDF
100% (1)
Linux Cookbook PDF
371 pages
222 KN6
No ratings yet
222 KN6
1 page
Application For Combined Graduate Level Examination - 2015: SSC - Registration Slip
No ratings yet
Application For Combined Graduate Level Examination - 2015: SSC - Registration Slip
1 page
7I 6nxyam Iv2Am Idr: 2aer': Iv8Y:Aa Kdaxas A Gu': 50 P/&N: 1:ivst T Jvab Lqae
No ratings yet
7I 6nxyam Iv2Am Idr: 2aer': Iv8Y:Aa Kdaxas A Gu': 50 P/&N: 1:ivst T Jvab Lqae
3 pages
Ahmedabad Telecom District: Account Summary
No ratings yet
Ahmedabad Telecom District: Account Summary
3 pages
7I 6nxyam Iv2Am Idr: 2aer': Iv8Y: B.A Gu': 50
No ratings yet
7I 6nxyam Iv2Am Idr: 2aer': Iv8Y: B.A Gu': 50
3 pages
7I 6nxyam Iv2Am Idr: P/&N: 1pheli Aebisidi Lqae. (10) A E
No ratings yet
7I 6nxyam Iv2Am Idr: P/&N: 1pheli Aebisidi Lqae. (10) A E
4 pages
Date Bill No. Particular Amount (RS) : Total 9858 00
No ratings yet
Date Bill No. Particular Amount (RS) : Total 9858 00
2 pages
7I 6nxyam Iv2Am Idr: 2aer': Iv8Y:Samany) An Gu': 50
No ratings yet
7I 6nxyam Iv2Am Idr: 2aer': Iv8Y:Samany) An Gu': 50
1 page
(WWW - Entrance Exam - Net) AFCAT GK Solved
No ratings yet
(WWW - Entrance Exam - Net) AFCAT GK Solved
6 pages
Certificate Templates For Word6
No ratings yet
Certificate Templates For Word6
1 page
AmiraDevGuide PDF
No ratings yet
AmiraDevGuide PDF
160 pages
اسرار الحاسب والانترنت
100% (1)
اسرار الحاسب والانترنت
28 pages
Foundations of Logic Programming
No ratings yet
Foundations of Logic Programming
23 pages
Cinergy C PCI HD TechnicalData GB PDF
No ratings yet
Cinergy C PCI HD TechnicalData GB PDF
2 pages
Porting J2ME Apps To Nokia X Using J2ME Android Bridge
No ratings yet
Porting J2ME Apps To Nokia X Using J2ME Android Bridge
11 pages
Activate Bi Content
No ratings yet
Activate Bi Content
12 pages
Ashoka Women Engineering College
No ratings yet
Ashoka Women Engineering College
16 pages
What Is The Difference Between UEFI and Legacy Mode Which We Need To Choose While Installing The OS - Quora
100% (1)
What Is The Difference Between UEFI and Legacy Mode Which We Need To Choose While Installing The OS - Quora
3 pages
UltraISO Premium Edition 9.5.1.2810 SN
No ratings yet
UltraISO Premium Edition 9.5.1.2810 SN
2 pages
Couchbase SDK Net 1.2 PDF
No ratings yet
Couchbase SDK Net 1.2 PDF
57 pages
Basic Embedded System Design Tutorial PDF
100% (1)
Basic Embedded System Design Tutorial PDF
204 pages
A Comparative Study Between Andriod and Ios Mobile Phones-1
No ratings yet
A Comparative Study Between Andriod and Ios Mobile Phones-1
3 pages
Assembly and Disassembly of Laptops
100% (19)
Assembly and Disassembly of Laptops
66 pages
Os Module 1
No ratings yet
Os Module 1
21 pages
Fax
No ratings yet
Fax
9 pages
BIG-IP DNS Concepts
No ratings yet
BIG-IP DNS Concepts
58 pages
Atm Database System
57% (14)
Atm Database System
30 pages
Vita 65 Openvpx 31 60
No ratings yet
Vita 65 Openvpx 31 60
30 pages
Node Manager Configuration
No ratings yet
Node Manager Configuration
23 pages
Module 6 Cybersecurity
No ratings yet
Module 6 Cybersecurity
50 pages
Azure Kubarnities
No ratings yet
Azure Kubarnities
8 pages
NMEA Superfast To Coursebus Converter
No ratings yet
NMEA Superfast To Coursebus Converter
12 pages
AirWave 8.3.0.1 Installation Guide
No ratings yet
AirWave 8.3.0.1 Installation Guide
18 pages
Unit 1 - Components of Computer: Main Memory / Primary Storage
No ratings yet
Unit 1 - Components of Computer: Main Memory / Primary Storage
4 pages
Nexus 9000 Vs Catalyst 65000
No ratings yet
Nexus 9000 Vs Catalyst 65000
2 pages
Pointers in C
No ratings yet
Pointers in C
22 pages
SimpleScalar Guide
No ratings yet
SimpleScalar Guide
4 pages
Practical 13. Determinant Cyclic Progress Bar
No ratings yet
Practical 13. Determinant Cyclic Progress Bar
3 pages
Programming With Java
No ratings yet
Programming With Java
1 page

Big Data Masters Program

Uploaded by

Big Data Masters Program

Uploaded by

CURRICULUM

Big Data Masters Program

04 Week »» Hive Built-in Functions

Apache Hive Advance - Part 2 »» CAP Theorem

08 week »» What is a function

»» Scala operators »» What is Apache Spark

»» Anonymous function »» Understanding Spark cluster

»» Placeholder syntax »» Is Spark a replacement to Hadoop

»» Partially applied functions »» Why Spark is faster than MapReduce

»» Function currying »» How data store in Spark

»» Week8: Quiz »» What is RDD

»» Week8: Assignment »» What is DAG

»» Week7 Assignment Solution »» RDD Lineage

»» Spark Storage Levels »» Spark File Layout

»» Difference Between DAG & Lineage »» Benefits of Repartitions

»» How to Submit a Spark Job »» partitionBy & bucketBy

»» Broadcast Join »» Static Resource allocation

»» Dealing With Ambiguoes Column Names »» Dynamic Resource allocation

»» Dealing With Null’s »» Understanding Memory Usage in Spark

»» Internals of Join Operations »» Execution Memory

»» When to Use Simple Join When Use »» Storage Memory

»» Infering Data in SparkSQL »» Java Serializer vs Kryo Serializer

»» Week12: Quiz »» Week12: Quiz

»» Week12: Assignment »» Week12: Assignment

»» Week11 Assignment Solution »» Week11 Assignment Solution

»» Broadcast Join Practical Demonstartions »» Stateless Transformations

»» Broadcast Join Using RDD »» Stateful Transformations

»» When to Use Broadcast Join »» Window Operations

»» Broadcast Join Using Dataframe »» Batch Interval

»» Visualizing Broadcast Join with Structured API »» Window Size

»» Practical Demo on Repartition vs Coalesce »» Sliding Interval

»» Limitations of socket source »» Overview Of Zookeeper And It’s Role In Kafka

»» Types of Aggregations - Continuous »» Command Line Producer And Consumer

»» What is a VM (Virtual Machine) »» How Athena Works

»» On-Premise vs Cloud Setup »» Athena Pricing

»» Major Vendors of Hadoop Distribution »» Athena Practical Demonstration:

You might also like