0% found this document useful (0 votes)
3 views

Data Analytics TOC

The Foundation Module for Big Data Analytics is a 240-hour program designed to up-skill individuals with a basic understanding of programming and data sequences. It covers essential topics such as data analytics, the data ecosystem, Hadoop, MapReduce, and various tools like SQL and Apache Spark, aimed at university students and professionals interested in Big Data. Key learning outcomes include evaluating Big Data trends, understanding data management systems, and executing data processing operations.

Uploaded by

soniyk40
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Data Analytics TOC

The Foundation Module for Big Data Analytics is a 240-hour program designed to up-skill individuals with a basic understanding of programming and data sequences. It covers essential topics such as data analytics, the data ecosystem, Hadoop, MapReduce, and various tools like SQL and Apache Spark, aimed at university students and professionals interested in Big Data. Key learning outcomes include evaluating Big Data trends, understanding data management systems, and executing data processing operations.

Uploaded by

soniyk40
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Foundation Module – BDA (Indicative duration: 240 hrs.

Foundation Module- pre-requisite

Basics of Information technology


Hardware and software components
Operating system
Computational thinking and problem solving skills
Basics of programming (Python)
Basics of Object oriented programming concepts
Database concepts

Foundational Curriculum – Big Data Analytics


Foundational Curriculum for Big Data Analytics is aimed at up-skilling those who have a basic
understanding of programming and data sequences, to help them expand their knowledge and learn the
fundamentals of Big Data Analytics technologies at a beginner level. This Curriculum has been divided
into three modules, of which the first is an introductory module.

Curriculum Details Scope and Objective Enable students to explore the fundamentals of Big
Data Analytics, to provide them with a base from
where they can up skill themselves for specific Big
Data Analytics job roles.
Intended Audience
University students enrolled in streams such as
Engineering, Computer Science, Statistics, Sciences or
Mathematics

Employed professionals who wish to explore their


career options and interests with regards to Big Data
Analytics

Enthusiasts curious about understanding the hype behind


Big Data Analytics
Pre-requisites Knowledge of the fundamentals of programming
including data sequences such as stacks, queues,
strings, arrays, linked lists, trees,
maps and the concepts of Object-Oriented
Programming
Key Learning Outcomes 1. Evaluate trends in Big Data and discuss how Big
Data is transforming businesses
2. Evaluate the different platforms used for
processing Big Data
3. Evaluate the features of databases
4. Write Map and Reduce codes for distributed
processing of data
5. Understand key concepts behind Big Data
modelling and management and gain practical skills
needed for modelling Big Data projects
6. Select appropriate data models that suit the
requirements of data
7. Differentiate between a traditional Database
Management System and a Big Data Management
System
8. Retrieve data from Big Data management systems
9. Execute simple Big Data integration and processing
operations

List of Tools Suggested (Indicative) SQL, Mongo DB, Hadoop, MapReduce, HDFS, Apache
Spark, PySpark, SparkR, Java, Apache Pig, Dynamo DB,
Spark MLlib, GraphX, Postgres,
Pandas

Indicative TOC
Data Analytics
Module 1: Data analytics an Overview-
 What & Why - Data Analytics?
 Different components of a modern data ecosystem, and the role of Data Analysts play in this
ecosystem.
 Different types of data analysis and the key steps in a data analysis process.
 Roles, responsibilities, and skillsets required to be a Data Analyst
 Data Analytics Tools

Module 2: The Data Ecosystem


 Different types of data structures, file formats, sources of data
 Understanding of various types of data repositories such as Databases, Data Warehouses, Data
Marts, Data Lakes, and Data Pipelines.
 Extract, Transform, and Load (ETL) Process, which is used to extract, transform, and load data
into data repositories.

Chapter 1: Introduction to Big Data-Hadoop framework


o Big Data Overview, What is Big Data Analytics
o Overview of Hadoop Ecosystem
o What is Big Data & Role of Hadoop in Big data– Overview of other Big Data Systems
o Hadoop integrations into Exiting Software Products
o Current Scenario in Hadoop Ecosystem
o Installation & Configuration
o Use Cases of Hadoop (HealthCare, Retail, Telecom)

Chapter 2: HDFS
o HDFS Concepts & Design
o Architecture, HDFS Daemons
o Overview Of Hadoop Distributed File System
 Name nodes
 Data nodes
 The Command-Line Interface
o Data Flow (File Read , File Write)
o Fault Tolerance, Shell Commands
o Data Flow Archives, Coherency -Data Integrity
o Role of Secondary NameNode

Chapter 3: Hadoop Components - MapReduce


o Anatomy of Map Reduce & Theory
o Data Flow (Map – Shuffle – Reduce)
o MapRed vs MapReduce APIs
o Programming [Mapper, Reducer, Combiner, Partitioner]
o Writables
o Input and Output format
o Streaming API using python
o Magic of Shuffle Phase
o File Formats, Sequence Files

Chapter 4: Extended subjects on HBASE


o Introduction to NoSQL
o CAP Theorem
o Hbase and RDBMS
o HBASE and HDFS
o Architecture (Read Path, Write Path, Compactions, Splits)
o Installation & Configuration
o Role of Zookeeper
o HBase Shell Introduction to Filters
o RowKeyDesign -What’s New in HBase Hands On

Chapter 5: Extended subjects on HIVE


o Architecture
o Installation & Configuration
o Hive vs RDBMS
o Working on Hive Beeline
o Hive- HQL, Tables
o DDL, DML
o UDF
o Partitioning, Bucketing
o Hive functions, Date functions, String functions
o Joins, Sub Queries and other Aggregations

Chapter 6: Apache Spark 5hrs


o Introduction to Spark - Getting started
o Resilient Distributed Dataset and DataFrames
o Spark application programming
o introduction to Spark libraries
o Spark configuration, monitoring and tuning

Module 3: Gathering, Wrangling & Visualizing Data with


Advance Python Libraries [Pandas, numPy & , matplotlib]
o Introduction to Pandas.
o Data Structure in Pandas-(Series, Data Frame)
o DataFrame implementation using – series, Lists, Dictionary, a NumPy 2D array
o Identify and Handle Missing Values
o Data Formatting
o Data Normalization Sets
o Binning
o Indicator variables
o CSV file handling
o Exporting data from DataFrame to CSV File
o EDA & Data Visualization using matplotlib library

Tableau
o What is Tableau?
o Tableau Architecture
o Workspace & Navigation
o Tableau Data Connections
o Filter data in Tableau
o Tableau Sort Data
o Data Visualization with Tableau
o Dynamic Data Manipulation and Presentation in Tableau
Module 4: Mining & Visualizing Data and Communicating
Results
Chapter -1 Introduction to Statistical Modelling

o What is a Statistical Mode


o Why do we need Statistical Modeling?
o Estimation:
o Confidence Interval
o Hypothesis Testing

Chapter 2 - Introduction to Statistical Modelling

o Linear Regression
 Simple Linear Regression
 Multiple Linear Regression
o Classification
 Logistic Regression
 Discriminant Analysis
o Resampling Methods
 Bootstrapping
 Cross-Validation
o Tree-based Methods
 Bagging
 Boosting
o Unsupervised Learning
 Principal Component Analysis
 K-Means Clustering
 Hierarchical Clustering
o Types of Variables
 Dependent Variable, also known as Response Variable:
 Explanatory Variable, also known as Independent Variable:
o Model Parameters and Model Residuals

Chapter 3 - Difference between Statistical Modelling and Machine Learning

Chapter 4 - Difference Statistical Modelling Perspective

Chapter 5 - Difference Machine Learning Perspective

R Programming
o Understanding R as a programming environment
o R basics-
 Math, Variables, and Strings
 Vectors and Factors
 Vector operations
o Data structures in R
o Arrays & Matrices
o Lists
o Dataframes
o R programming fundamentals
 Conditions and loops
 Functions in R
 Objects and Classes
 Debugging
o Working with data in R
 Reading CSV and Excel Files
 Reading text files
 Writing and saving data objects to file in R
o Strings and Dates in R
 String operations in R
 Regular Expressions
 Dates in R
o Descriptive Statistics using R

o Data Visualization using R

o Exploratory Data Analysis (EDA) using R

o A Comprehensive analysis on a sample data set using Machine Learning Technique.

Module 5: Career Opportunities and Data Analysis in


Action

o Different career opportunities in the field of Data Analysis and the different paths that
you can take for getting skilled as a Data Analyst.
o Hands-on project on with use cases (scenario based) in gathering, wrangling, mining,
analyzing, and visualizing data.

You might also like