100% found this document useful (1 vote)
27 views36 pages

Data Bots Training Courses

This document outlines a comprehensive course on Big Data, covering topics such as Big Data technologies, Hadoop, HDFS, MapReduce, and various associated tools like HBase, Hive, and Pig. It includes practical labs and industry use cases, emphasizing hands-on experience with cloud platforms. Prerequisites for the course include knowledge of Linux commands and basic programming skills in SQL, Python, or Java.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
27 views36 pages

Data Bots Training Courses

This document outlines a comprehensive course on Big Data, covering topics such as Big Data technologies, Hadoop, HDFS, MapReduce, and various associated tools like HBase, Hive, and Pig. It includes practical labs and industry use cases, emphasizing hands-on experience with cloud platforms. Prerequisites for the course include knowledge of Linux commands and basic programming skills in SQL, Python, or Java.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 36

This is Draft Version 1.

0
Course
Introduction to Big Data
What is Big Data,
Big Deal about Big Data,
Big Data Sources,
Industries using Big Data,
Big Data challenges
Big Data Technologies and Hadoop
Solution to Big Data problems,
Various Big Data Technologies,
Big Data/Hadoop Platforms,
Hadoop Distributions and Vendors,
Big Data Suites.
Introduction to Hadoop
A Brief History of Hadoop,
Evolution of Hadoop,
Comparison with Other Systems,
Hadoop Installation and Cluster Configuration
Setting up a Hadoop Cluster,
Cluster specification,
Cluster Setup and Installation,
Single and Multi Node Cluster Setup on Virtual Machine,
Remote Login using Putty/Mac Terminal/Ubuntu Terminal.
Hadoop Configuration, Security in Hadoop, Administering Hadoop,
HDFS - Monitoring & Maintenance, Hadoop benchmarks,
Hadoop in the cloud.
Hadoop Architecture,
Core components of Hadoop,
Common Hadoop Shell commands.
Hadoop Distributed File System (HDFS)
Distributed File System,
What is HDFS,
Where does HDFS fit in,
Core components of HDFS,
HDFS Daemons,
Hadoop Server Roles: Name Node, Secondary Name Node, and Data Node
HDFS Architecture
HDFS Architecture,
Scaling and Rebalancing,
Big Deal about HDFS,
Replication,
Rack Awareness,
Data Pipelining,
Node Failure Management.
HDFS NameNode High Availability( see the hadoop version)
HDFS Data Storage Process
HDFS Data storage process,
Anatomy of writing and reading file in HDFS,
HDFS user and admin commands,
HDFS Web Interface.
Getting in touch with Map Reduce Framework
Hadoop Map Reduce paradigm,
Map and Reduce tasks,
Map Reduce Execution Framework,
Anatomy of a Map Reduce Job run
More Map Reduce Concepts
Partitioners and Combiners,
Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs),
Output Formats (Text Output, Binary Output, Multiple Output).
Basics of Map Reduce Programming
Hadoop Data Types,
Java and Map Reduce,
Map Reduce program structure,
Map-only program, Reduce-only program,
Use of combiner and partitioner,
Counters, Schedulers(Job Scheduling),
Custom Writables, Compression
Complex Map Reduce programming,
Map Reduce streaming,
Python and Map Reduce.
Hadoop Ecosystem
Hadoop YARN
HBase
Pig
Hadoop ETL Development,
ETL Process in Hadoop,
Discussion of ETL functions,
Data Extractions,
Need of ETL tools,
Advantages of ETL tools.
Introduction to HBase
Overview of HBase
HBase architecture
Installation
java client API for HBase
CRUD operations
HBase Security
The Hive Data-ware House
Introduction to Hive,
Hive architecture and Installation,
Comparison with Traditional Database,
Basics of Hive Query Language.
Working with Hive QL
Datatypes,
Operators and Functions,
Hive Tables (Managed Tables and Extended Tables),
Partitions and Buckets,
Storage Formats,
Importing data,
Altering and Dropping Tables.
Querying with Hive QL
Querying Data-Sorting,
Aggregating,
Map Reduce Scripts,
Joins and Sub queries,
views,
Map and Reduce side joins to optimize query.
Data manipulation with Hive,
UDFs,
Appending data into existing Hive table,
custom map/reduce in Hive
Introduction to PIG and PIG Latin
Introduction to PIG,
PIG vs Map Reduce,
PIG Latin Scripting,
Running PIG,
PIG Latin Statements.
Basics of PIG Latin Programming
Conventions, Data Types,
Arithmetic and Relational Operators,
UDF Statements.
PIG Built-In Functions
Eval Functions, Load/Store Functions,
Math Functions,
String Functions,
Date Time Functions,
Tuple,
Bag,
Map Functions.
UDFs (user defined functions), Control Structures, Commands
Writing a PIG UDF
Piggy Bank
Data Fu
PIG Macros
Parameter Substitution
Shell and Utility Commands
Combiner
Use cases
Real-Time Data Analytics using PIG
Apache Spark APIs for large-scale data processing
Overview, Linking with Spark, Initializing Spark,
Resilient Distributed Datasets (RDDs), External Datasets, RDD Operations,
Passing Functions to Spark, Working with Key-Value Pairs, Shuffle operations,
RDD Persistence, Removing Data, Shared Variables, Deploying to a Cluster
Apache Phoenix:
Apache Phoenix Overview, Need of Phoenix, Features,
DataLake Disccussion
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of Linux command, SQL python/java basic
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
Introduction to Big Data
What is Big Data,
Big Deal about Big Data,
Big Data Sources,
Industries using Big Data,
Big Data challenges
Big Data Technologies and Hadoop
Solution to Big Data problems,
Various Big Data Technologies,
Big Data/Hadoop Platforms,
Hadoop Distributions and Vendors,
Big Data Suites.
Introduction to Hadoop
A Brief History of Hadoop,
Evolution of Hadoop,
Comparison with Other Systems,
1: Understanding Hadoop
Review
Lab Starting an HDP 2.3 Cluster
2: Introduction to the Hadoop Distributed File System (HDFS)
Review
Demonstration: Understanding Block Storage
Lab Using HDFS Commands
3: Inputting Data Into HDFS
Review
Lab Importing RDBMS Data into HDFS
Lab Exporting HDFS Data to an RDBMS
Lab Importing Log Data into HDFS using Flume
4: The MapReduce Framework
Review
Demonstration: Understanding MapReduce
Lab Running a MapReduce Job
5: Introduction to Pig
Review
Demonstration: Understanding PIG
Lab Getting Started with PIG
Lab Exploring Data with PIG
6: Advanced Pig Programming
Review
Lab Splitting a Dataset
Lab Joining Datasets with PIG
Lab Preparing Data for Hive
Demonstration Guide: Computing PageRank
Lab Analyzing Clickstream Data
Lab Analyzing Stock Market Data using Quantiles
7: Hive Programming
Review
Lab Understanding Hive Tables
Demonstration: Understanding Partitions and Skew
Lab Analyzing Big Data with Hive
Demonstration: Computing ngrams
Lab Joining Datasets in Hive
Lab Computing ngrams of Emails in Avro Format
8: Using HCatalog
Review
Lab Using HCatalog with Pig
9: Advanced Hive Programming
Review
Lab Advanced Hive Programming
10: Hadoop 2 and YARN
Review
Lab Running a YARN Application
11: Introducing Apache Spark
Review
12: Programming with Apache Spark
Review
Lab Getting Started with Apache Spark
13: Spark SQL and DataFrames
Lab Exploring Spark SQL
14: Defining Workflow with Oozie
Review
Lab Defining an Oozie Workflow
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of Linux command, SQL python/java basic
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
What is machine learning?
Algorithm types of Machine learning
Supervised and Unsupervised Learning
Uses of Machine learning
Evaluating ML techniques
basic statistics
Significance of visual analytics
Information Visualization
Data Representation
Data collection and binding
Structured Data
Unstructured data
Data analytics Life Cycle:
Discovery,
Data preparation
Model planning
Data analytics Life Cycle:
Model building implementation
Quality assurance
Documentation
Management approval
Installation
Acceptance and operation
Intelligent data analysis,
Nature of Data,
Analytic Processes and Tools,
Analysis vs. Reporting
Modern Data Analytic Tools
Visual Encodings
color, size, shape, lines, axes, scaling, annotation
Taxonomy of data visualization(Some Types of charts, but not limited to)
Comparison charts - Bar chart, Box plots, Histograms, Gannt charts, Glyph chart,
Sanky diagam, Word Cloud etc.
Hierarchies and relationships - Pie chart, stacked bar, Tree map etc.
Changes over time - Line chart, sparklines, candlestick/ohlc etc.
Connections and relationships - scatter lots, bubble plots, radial network, heat maps,
etc.
Geospatial Data, Geomapping
Choropleth
Cartogram
GeoJSON
Choosing appropriate visuals
Applying calculations, statistics
Data sorting, filters
Interactive visualization
Event listeners/callbacks
Data updation
Clustering
Hierarchical Clustering & K means
Distance Measure and Data Preparation - Scaling & Weighting
Evaluation and Profiling of Clusters
Hierarchical Clustering
Clustering Case Study
Principal Component analysis
Decision Trees
Classification and Regression Trees
Bayesian analysis and Naïve bayes classifier
Assigning probabilities and calculating results
Discriminant Analysis (Linear and Quadratic)
K-Nearest Neighbors Algorithm
Concept of Model Ensembling
Random forest, Gradient boosting Machines, Model Stacking
Association rules mining
Apriori and FP-growth algorithms
Support vector Machines
Basic classification principle of SVM
Linear and Non linear classification (Polynomial and Radial)
Moving average, Exponential Smoothing, Holt’s Trend Methods, Holt-Winters’
Methods for seasonality
Auto-correlation(ACF & PACF), Auto-regression, Auto-regressive Models, Moving
Average Models
ARMA & ARIMA
Neural Network and its applications
Single layer neural Network
Activation Functions: Sigmoid, Hyperbolic Tangent, ReLu
Overview of Backpropagation of errors
Introduction to Deep Learning
What are Tensors?
Introduction to Convolutional Neural Network & Recurrent Neural Network
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Basic Knowledge Python/R
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
Introduction to Big Data
What is Big Data,
Big Deal about Big Data,
Big Data Sources,
Industries using Big Data,
Big Data challenges
Big Data Technologies and Hadoop
Solution to Big Data problems,
Various Big Data Technologies,
Big Data/Hadoop Platforms,
Hadoop Distributions and Vendors,
Big Data Suites.
Introduction to Hadoop
A Brief History of Hadoop,
Evolution of Hadoop,
Comparison with Other Systems,
About Course Labs
Using Hadoop for Data Science
Lesson Review
Lab Setting Up the Development Environment
HDFS
Lesson Review
Demonstration: Understanding Block Storage
Lab Using HDFS Commands
The MapReduce Framework
Lesson Review
Demonstration: Understanding MapReduce
Hadoop 2 and YARN
Lesson Review
basic statistics
Machine Learning From Data
Lesson Review
Lab Using Apache Mahout for Machine Learning
Introduction to Pig
Demonstration: Understanding Pig
Lab Getting Started with Apache Pig
Python Programming
Lesson Review
Lab Using the IPython Notebook
Analyzing Data with Python
Lesson Review
Demonstration: Understanding the NumPy Package
Demonstration: Pandas Library
Lab Performing Data Analysis with Python
Lab Interpolating Data Points
Running Python on Hadoop
Lesson Review
Lab Defining a Pig User Defined Function in Python
Lab Streaming Python with Pig
Lab Exploring Data with Apache Pig
Machine Learning Algorithms
Lesson Review
Demonstration: Classification with Scikit-Learn
Lab Computing K-Nearest Neighbor
Lab Generating a K-Means Clustering
Natural Language Processing
Lesson Review
Demonstration: POS Tagging Using a Decision Tree
Lab Using the Python Natural Language Toolkit
Lab Classifying Text using Naïve Bayes
Apache Spark MLib
Lesson Review
Lab Using Spark Transformations and Actions
Lab Using Spark MLib
Lab Creating a Spam Classifier using Spark MLlib
Taking Data Science to Production
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of Linux command, SQL Python/Java
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
Introduction to Business Analytics using some case studies
Case studies: Making Right Business Decisions based on data
Exploratory Data Analysis 1- Visualization and Exploring Data, Descriptive Statistical M
Probability Distribution and Data
Exploratory Data Analysis 2: Sampling and Estimation, Statistical Interfaces,
Predictive modeling and analysis
Regression Analysis
Forecasting Techniques
Simulation and Risk Analysis
Optimization, Linear, Non linear, Integer
Decision Analysis
Strategy and Analytics
Overview of Factor Analysis, Directional Data Analytics, Functional Data Analysis
Text analytics, NLP,
Social network analysis, web scrapping.
Dimensionality issues, Ridge & lasso regression, bias/variance trade off, density, PCA, F
feature selection, Bagging and boosting
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Basic Knowledge Python/R,Statistics & ML
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
Introduction to Big Data
What is Big Data,
Big Deal about Big Data,
Big Data Sources,
Industries using Big Data,
Big Data challenges
Big Data Technologies and Hadoop
Solution to Big Data problems,
Various Big Data Technologies,
Big Data/Hadoop Platforms,
Hadoop Distributions and Vendors,
Big Data Suites.
Introduction to Hadoop
A Brief History of Hadoop,
Evolution of Hadoop,
Comparison with Other Systems,
A Scala Primer
Lesson Review
Important: Reaching the Spark UI
Lab 0.1 - Set up lab environment
Lab 1.1 - Start Interpreter
An Introduction to Spark
Lab 2.1 - First Look at Spark (Optional)
Lab 2.2 - Spark Shell
Lesson Review
RDDs and Spark Architecture
Lesson Review
Lab 3.1 - RDD Basics operations
Lab 3.2 - Operations On Multiple RDDs
Spark SQL, DataFrames and Datasets
Lesson Review
Lab 4.1 - Data Formats
Lab 4.2 - Spark SQL Basics
Lab 4.3 - DataFrame Transformations
Lab 4.4 - The Dataset Typed API
Lab 4.5 - Splitting Text Data
Shuffling Transformations and Performance
Lesson Review
Lab 5.1 - Exploring Grouping
Lab 5.2 - Seeing Catalyst at Work
Lab 5.3 - Seeing Tungsten at Work
Performance Tuning
Lesson Review
Lab 6.1 - Caching
Lab 6.2 - Joins and Broadcasts
Creating Stand Alone Applications
Lesson Review
Lab 7.1 - Spark Job Submission
Lab 7.2 - More Complex Spark Standalong Appliction (Optional)
Spark Streaming Overview
Lesson Review
Lab 8.1 - Spark Streaming (1.0+)
Lab 8.2 - Spark Structured Streaming (2,0+)
Lab 8.3 - Spark Structured Streaming with Kafka (2.0+)
Lab - Pyspark Structured Streaming
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of Linux command, SQL Python/Java
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
A Peek into Enterprise Data Flow
Lesson Review
HDF 3.0 - What is New
Lesson Review
NiFi Architecture and Features
System Requirements
Lesson Review
Installing and Configuring HDF
Lesson Review
Lab 1 - Installing and Starting HDF with Ambari
NiFi User Interface
Lesson Review
Summary and History
Anatomy of a Processor
Anatomy of a Connection
Controller Services and Reporting Tasks
Demo 1 - NiFi User Interface
Building a NiFi DataFlow
Lesson Review
Lab 2 - Building A NiFi DataFlow
Anatomy of a Processor Group
Lesson Review
Lab 3 - Working With Processor Groups
Anatomy of a Remote Processor Group
Lesson Review
Lab 4 - Working With Remote Processor Group (Site-to-Site)
Working with Attributes
Lesson Review
Demo 2 - Working With Attributes
NiFi Expression Language
Lesson Review
Lab 5 - NiFi Expression Language
Working with Templates
Demo 3 - Working with Templates
HDF DataFlow Optimizatiton
Data Provenance
Lesson Review
Demo 4 - Data Provenance
Working with NiFi Clusters
Lesson Review
Lab 6 - Working With NiFi Cluster
Monitoring NiFi
Demo 5 - NiFi Monitoring
Demo 6 - NiFi Notification Services
Lab 7 - Advanced NiFi Monitoring
HDF with HDP - A Complete Big Data Solution
Lesson Review
Lab 8 - HDF Integration with HDP
HDF Best Practices
Lesson Review
Security: HDF Authentication
Lesson Review
Lab 9 - Securing HDF with 2-way SSL Using Ambari
Lab 10 - NiFi User Authentication with LDAP
Lab 11 - Ranger Install and Configuring NiFi with Kerberos
Security: HDF Authorization and Mutli-Tenancy
Managed / File Based Authorizer
Lab 12 - File Based Authorization In NiFi
External / Ranger Based Authorizer
Lab 13 - Ranger Based Authorization In NiFi
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of Linux command & BigData technologies
We will use cloud for the lab (AWS/GCP/AZURRE)
chnologies
Course
Introduction to Big Data
What is Big Data,
Big Deal about Big Data,
Big Data Sources,
Industries using Big Data,
Big Data challenges
Big Data Technologies and Hadoop
Solution to Big Data problems,
Various Big Data Technologies,
Big Data/Hadoop Platforms,
Hadoop Distributions and Vendors,
Big Data Suites.
Introduction to Hadoop
A Brief History of Hadoop,
Evolution of Hadoop,
Comparison with Other Systems,
Important: Cluster Setup Notes
Cluster Creation Guide
About Course Labs
The Definition of Security
Lesson Review
Securing Sensitive Data
Lesson Review
Integrating HDP Security
Lesson Review
What Security Tool to Use for Each Use Case
Lesson Review
Lab Setting up the Environment
HDP Security Prerequisites
Lab Configure AD Resolution and Certificate
Ambari Security Server
Lesson Review
Lab Security Options for Ambari
Kerberos Deep Dive
Lesson Review
Enabling Kerberos
Lesson Review
Lab Kerberize the Cluster
Installing Apache Ranger
Lesson Review
Lab Ranger Install
Apache Ranger KMS and HDFS Encryption
Lesson Review
Lab Ranger KMS Data Encryption Setup
Secure Access Apache Ranger
Lesson Review
Lab Secured Hadoop Exercises
Knox Overview
Lesson Review
Knox Installation
Lab Knox
Ambari Views for Controlled Access
Lesson Review
Lab Other Security Features for Ambari
Lab Install SolrCloud
Atlas Introduction
Atlas classification
Atlas business taxanomy
Atlas & Ranger integration
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of BigData technologies
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
Hadoop Primer
Lesson Review
Lab Running a MapReduce Job
Apache HBase Overview
Lesson Review
Lab Using HBase
Lab Importing Tables From MySQL Into HBase
HBase Architecture
Lesson Review
Lab Zookeeper
HBase Services and Operations
Lesson Review
Lab Examining HBase Configuration Files
HBase Command Line Interface
Lesson Review
Lab Using HBase Shell Commands
HBase Installation and Configuration
Lesson Review
Lab Backup and Snapshot
Lab Exporting With Pig and Import With ImportTsv
Lesson Review
Lab Column Families
Lab Exploring the Case Study
HBase Optimizations
Lesson Review
Lab Setting Blocksize and Enabling Bloomfilters
Demonstration: Using Java Data Access Object
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of BigData technologies
We will use cloud for the lab (AWS/GCP/AZURRE)
Course
Introduction to Big Data
What is Big Data,
Big Deal about Big Data,
Big Data Sources,
Industries using Big Data,
Big Data challenges
Big Data Technologies and Hadoop
Solution to Big Data problems,
Various Big Data Technologies,
Big Data/Hadoop Platforms,
Hadoop Distributions and Vendors,
Big Data Suites.
Introduction to Hadoop
A Brief History of Hadoop,
Evolution of Hadoop,
Comparison with Other Systems,
The YARN Architecture
Lesson Review
Lab Running a YARN Application
Overview of a YARN Application
Lesson Review
Lab Set up a YARN Development Environment
Writing a YARN Client
Lesson Review
Lab Writing a YARN Client
Lab Submitting an ApplicationMaster
Writing a YARN Application Master
Lesson Review
Lab Writing an ApplicationMaster
Lab Requesting Containers
Containers
Lesson Review
Lab Writing Custom Containers
Lab Putting It All Together
Job Scheduling
Lesson Review
Industry Usecase -1
Industry Usecase -2
Industry Usecase -3
Interview - QA
Note :
Prerequisites: Knowledge of BigData technologies
We will use cloud for the lab (AWS/GCP/AZURRE)

You might also like