0% found this document useful (0 votes)
27 views2 pages

Big Data, Azure, Hadoop, Python

This big data training covers topics related to big data testing including understanding big data, ETL vs big data testing, and data warehouse testing. It introduces Hadoop Distributed File System concepts and operations. Modules cover Hive, MapReduce, Sqoop, Flume, HBase, and Spark fundamentals as well as using Spark for big data processing. The training also covers big data on Microsoft Azure including Azure Databricks and a comparison to AWS. Hands-on practice is included for HDFS file system operations and Spark programming in Python and Scala. The focus is on understanding these big data technologies from a software testing perspective.

Uploaded by

Deep Narayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views2 pages

Big Data, Azure, Hadoop, Python

This big data training covers topics related to big data testing including understanding big data, ETL vs big data testing, and data warehouse testing. It introduces Hadoop Distributed File System concepts and operations. Modules cover Hive, MapReduce, Sqoop, Flume, HBase, and Spark fundamentals as well as using Spark for big data processing. The training also covers big data on Microsoft Azure including Azure Databricks and a comparison to AWS. Hands-on practice is included for HDFS file system operations and Spark programming in Python and Scala. The focus is on understanding these big data technologies from a software testing perspective.

Uploaded by

Deep Narayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Big Data, Azure , Hadoop, Python

Duration : 60 - 70 hrs
Module 1

Introduction of Big Data Testing


Understanding what is Big data
Difference between ETL and big data testing
Types of testing in BIG DATA

Module 2

Introduction to BI, ETL and Data warehouse testing


Architecture OLTP v OLAP
What is a Data Warehouse?
What is an Enterprise Data Warehouse?
What is Data Marts • Source System and target Systems
Staging Area Drill up and Drill down
Facts and Dimensions • Slowly Changing Dimensions

Module 3

Introduction to HDFS and file operations & Unix file system


Hadoop Distributed File System (HDFS) Concepts and its Importance
HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
Unix Basic
HDFS files operations
HDFS files system HANDS ON
Note : HDFS files system HANDS ON - On the testing side - Unit testing angle

Module 4

Big data(Hadoop) Ecosystem Fundamentals


Hive Fundamentals
HiveQL
Hive Use cases (Data types, Joins, databases, tables etc)
Map Reduce
Sqoop
Flume
Zookeeper
Hbase
Oozie
Module 5

Spark and py-spark Fundamentals (in terms of unit testing concepts)


Introduction to Spark
Spark Architecture
Spark RDD's
Spark programming in Python(Py-spark)
Spark SQL
Spark programming in Scala - Basics
Big Data processing with Spark

Module 6

Big Data on Azure


Azure Fundamentals - Storage, Compute, Networking, Security, Databases etc.
Azure Databricks
Comparative study of Azure and AWS

Additional reference
Big Data processing with Spark

Note: Candidate must be able to write the basic scripts for the unit testing, no need to
go in detail as per the development requirements

You might also like