0% found this document useful (0 votes)

9 views6 pages

Data Analytics TOC

The Foundation Module for Big Data Analytics is a 240-hour program designed to up-skill individuals with a basic understanding of programming and data sequences. It covers essential topics such as data analytics, the data ecosystem, Hadoop, MapReduce, and various tools like SQL and Apache Spark, aimed at university students and professionals interested in Big Data. Key learning outcomes include evaluating Big Data trends, understanding data management systems, and executing data processing operations.

Uploaded by

soniyk40

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

Data Analytics TOC

Uploaded by

soniyk40

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Foundation Module – BDA (Indicative duration: 240 hrs.

Foundation Module- pre-requisite

Basics of Information technology

Hardware and software components
Operating system
Computational thinking and problem solving skills
Basics of programming (Python)
Basics of Object oriented programming concepts
Database concepts

Foundational Curriculum – Big Data Analytics

Foundational Curriculum for Big Data Analytics is aimed at up-skilling those who have a basic
understanding of programming and data sequences, to help them expand their knowledge and learn the
fundamentals of Big Data Analytics technologies at a beginner level. This Curriculum has been divided
into three modules, of which the first is an introductory module.

Curriculum Details Scope and Objective Enable students to explore the fundamentals of Big
Data Analytics, to provide them with a base from
where they can up skill themselves for specific Big
Data Analytics job roles.
Intended Audience
University students enrolled in streams such as
Engineering, Computer Science, Statistics, Sciences or
Mathematics

Employed professionals who wish to explore their

career options and interests with regards to Big Data
Analytics

Enthusiasts curious about understanding the hype behind

Big Data Analytics
Pre-requisites Knowledge of the fundamentals of programming
including data sequences such as stacks, queues,
strings, arrays, linked lists, trees,
maps and the concepts of Object-Oriented
Programming
Key Learning Outcomes 1. Evaluate trends in Big Data and discuss how Big
Data is transforming businesses
2. Evaluate the different platforms used for
processing Big Data
3. Evaluate the features of databases
4. Write Map and Reduce codes for distributed
processing of data
5. Understand key concepts behind Big Data
modelling and management and gain practical skills
needed for modelling Big Data projects
6. Select appropriate data models that suit the
requirements of data
7. Differentiate between a traditional Database
Management System and a Big Data Management
System
8. Retrieve data from Big Data management systems
9. Execute simple Big Data integration and processing
operations

List of Tools Suggested (Indicative) SQL, Mongo DB, Hadoop, MapReduce, HDFS, Apache
Spark, PySpark, SparkR, Java, Apache Pig, Dynamo DB,
Spark MLlib, GraphX, Postgres,
Pandas

Indicative TOC
Data Analytics
Module 1: Data analytics an Overview-
 What & Why - Data Analytics?
 Different components of a modern data ecosystem, and the role of Data Analysts play in this
ecosystem.
 Different types of data analysis and the key steps in a data analysis process.
 Roles, responsibilities, and skillsets required to be a Data Analyst
 Data Analytics Tools

Module 2: The Data Ecosystem

 Different types of data structures, file formats, sources of data
 Understanding of various types of data repositories such as Databases, Data Warehouses, Data
Marts, Data Lakes, and Data Pipelines.
 Extract, Transform, and Load (ETL) Process, which is used to extract, transform, and load data
into data repositories.

Chapter 1: Introduction to Big Data-Hadoop framework

o Big Data Overview, What is Big Data Analytics
o Overview of Hadoop Ecosystem
o What is Big Data & Role of Hadoop in Big data– Overview of other Big Data Systems
o Hadoop integrations into Exiting Software Products
o Current Scenario in Hadoop Ecosystem
o Installation & Configuration
o Use Cases of Hadoop (HealthCare, Retail, Telecom)

Chapter 2: HDFS
o HDFS Concepts & Design
o Architecture, HDFS Daemons
o Overview Of Hadoop Distributed File System
 Name nodes
 Data nodes
 The Command-Line Interface
o Data Flow (File Read , File Write)
o Fault Tolerance, Shell Commands
o Data Flow Archives, Coherency -Data Integrity
o Role of Secondary NameNode

Chapter 3: Hadoop Components - MapReduce

o Anatomy of Map Reduce & Theory
o Data Flow (Map – Shuffle – Reduce)
o MapRed vs MapReduce APIs
o Programming [Mapper, Reducer, Combiner, Partitioner]
o Writables
o Input and Output format
o Streaming API using python
o Magic of Shuffle Phase
o File Formats, Sequence Files

Chapter 4: Extended subjects on HBASE

o Introduction to NoSQL
o CAP Theorem
o Hbase and RDBMS
o HBASE and HDFS
o Architecture (Read Path, Write Path, Compactions, Splits)
o Installation & Configuration
o Role of Zookeeper
o HBase Shell Introduction to Filters
o RowKeyDesign -What’s New in HBase Hands On

Chapter 5: Extended subjects on HIVE

o Architecture
o Installation & Configuration
o Hive vs RDBMS
o Working on Hive Beeline
o Hive- HQL, Tables
o DDL, DML
o UDF
o Partitioning, Bucketing
o Hive functions, Date functions, String functions
o Joins, Sub Queries and other Aggregations

Chapter 6: Apache Spark 5hrs

o Introduction to Spark - Getting started
o Resilient Distributed Dataset and DataFrames
o Spark application programming
o introduction to Spark libraries
o Spark configuration, monitoring and tuning

Module 3: Gathering, Wrangling & Visualizing Data with

Advance Python Libraries [Pandas, numPy & , matplotlib]
o Introduction to Pandas.
o Data Structure in Pandas-(Series, Data Frame)
o DataFrame implementation using – series, Lists, Dictionary, a NumPy 2D array
o Identify and Handle Missing Values
o Data Formatting
o Data Normalization Sets
o Binning
o Indicator variables
o CSV file handling
o Exporting data from DataFrame to CSV File
o EDA & Data Visualization using matplotlib library

Tableau
o What is Tableau?
o Tableau Architecture
o Workspace & Navigation
o Tableau Data Connections
o Filter data in Tableau
o Tableau Sort Data
o Data Visualization with Tableau
o Dynamic Data Manipulation and Presentation in Tableau
Module 4: Mining & Visualizing Data and Communicating
Results
Chapter -1 Introduction to Statistical Modelling

o What is a Statistical Mode

o Why do we need Statistical Modeling?
o Estimation:
o Confidence Interval
o Hypothesis Testing

Chapter 2 - Introduction to Statistical Modelling

o Linear Regression
 Simple Linear Regression
 Multiple Linear Regression
o Classification
 Logistic Regression
 Discriminant Analysis
o Resampling Methods
 Bootstrapping
 Cross-Validation
o Tree-based Methods
 Bagging
 Boosting
o Unsupervised Learning
 Principal Component Analysis
 K-Means Clustering
 Hierarchical Clustering
o Types of Variables
 Dependent Variable, also known as Response Variable:
 Explanatory Variable, also known as Independent Variable:
o Model Parameters and Model Residuals

Chapter 3 - Difference between Statistical Modelling and Machine Learning

Chapter 4 - Difference Statistical Modelling Perspective

Chapter 5 - Difference Machine Learning Perspective

R Programming
o Understanding R as a programming environment
o R basics-
 Math, Variables, and Strings
 Vectors and Factors
 Vector operations
o Data structures in R
o Arrays & Matrices
o Lists
o Dataframes
o R programming fundamentals
 Conditions and loops
 Functions in R
 Objects and Classes
 Debugging
o Working with data in R
 Reading CSV and Excel Files
 Reading text files
 Writing and saving data objects to file in R
o Strings and Dates in R
 String operations in R
 Regular Expressions
 Dates in R
o Descriptive Statistics using R

o Data Visualization using R

o Exploratory Data Analysis (EDA) using R

o A Comprehensive analysis on a sample data set using Machine Learning Technique.

Module 5: Career Opportunities and Data Analysis in

Action

o Different career opportunities in the field of Data Analysis and the different paths that
you can take for getting skilled as a Data Analyst.
o Hands-on project on with use cases (scenario based) in gathering, wrangling, mining,
analyzing, and visualizing data.

Oracle.1Z0-902.v2024-11-19.q34
No ratings yet
Oracle.1Z0-902.v2024-11-19.q34
21 pages
Steps To Make Maven Project1
100% (1)
Steps To Make Maven Project1
10 pages
Big Data Black Book
16% (25)
Big Data Black Book
2 pages
Big Data Black Book PDF
15% (20)
Big Data Black Book PDF
2 pages
Data Science Training Content Naresh IT Hyderabad
No ratings yet
Data Science Training Content Naresh IT Hyderabad
13 pages
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
2 pages
Essbase Interview Questions
No ratings yet
Essbase Interview Questions
43 pages
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
Big Data Data Analytics
No ratings yet
Big Data Data Analytics
5 pages
Data Analytics and Data Science Curiculam Google ADDS
No ratings yet
Data Analytics and Data Science Curiculam Google ADDS
31 pages
Diploma in Data Science Online Training Content by MR Navin NareshIT Modified
No ratings yet
Diploma in Data Science Online Training Content by MR Navin NareshIT Modified
10 pages
Edukuron Data Engineering
No ratings yet
Edukuron Data Engineering
10 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
6 pages
20IT503 - Big Data Analytics - Unit1
No ratings yet
20IT503 - Big Data Analytics - Unit1
59 pages
Big Data Engineer Course (2) (1)
No ratings yet
Big Data Engineer Course (2) (1)
31 pages
22IS61 Big data analytics 2025
No ratings yet
22IS61 Big data analytics 2025
4 pages
DE_Python
No ratings yet
DE_Python
11 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
Data Science and Big Data Analytics_ Unit_1
No ratings yet
Data Science and Big Data Analytics_ Unit_1
47 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Specialised Programme On Big Data and Machine Learning - 8 Weeks
No ratings yet
Specialised Programme On Big Data and Machine Learning - 8 Weeks
6 pages
113 Ce 74
No ratings yet
113 Ce 74
4 pages
Advance Big Data Science Using Python-R-Hadoop-Spark (1/3) : Total Duration: 90 Hours + Practice
No ratings yet
Advance Big Data Science Using Python-R-Hadoop-Spark (1/3) : Total Duration: 90 Hours + Practice
1 page
Big Data Training in Chennai - Big Data Course in Chennai
No ratings yet
Big Data Training in Chennai - Big Data Course in Chennai
1 page
Module 1 Introduction to Big Data Analytics
No ratings yet
Module 1 Introduction to Big Data Analytics
121 pages
Getting an Overview of Big Data
No ratings yet
Getting an Overview of Big Data
8 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
bda
No ratings yet
bda
1 page
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
2 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
16 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
MCAD2232 (PRESS) BIG DATA and Its Applications
No ratings yet
MCAD2232 (PRESS) BIG DATA and Its Applications
140 pages
Data Analytics Course Guide 2024
No ratings yet
Data Analytics Course Guide 2024
14 pages
New Microsoft Office Excel Worksheet
No ratings yet
New Microsoft Office Excel Worksheet
44 pages
Bigdata Hadoop Spark - Python
No ratings yet
Bigdata Hadoop Spark - Python
8 pages
Data Mining and Analytics
No ratings yet
Data Mining and Analytics
2 pages
NDS Data Practitioner Degree Curriculum
No ratings yet
NDS Data Practitioner Degree Curriculum
10 pages
Annexure - I - Syllabus PG-DBDA Aug 16
No ratings yet
Annexure - I - Syllabus PG-DBDA Aug 16
4 pages
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
10 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
E - TC and Elex - Syllabus - 4102017 PDF
No ratings yet
E - TC and Elex - Syllabus - 4102017 PDF
3 pages
Symbiosis Skills and Professional University
No ratings yet
Symbiosis Skills and Professional University
3 pages
MCA-SEM-III-Syllabus Mobile Computing
No ratings yet
MCA-SEM-III-Syllabus Mobile Computing
12 pages
Data Engineern - Bootcamp Brochure
No ratings yet
Data Engineern - Bootcamp Brochure
12 pages
Syllabus
No ratings yet
Syllabus
3 pages
B2. Introduction To Big Data With Spark and Hadoop - Coursera
No ratings yet
B2. Introduction To Big Data With Spark and Hadoop - Coursera
12 pages
Specialised Programme On Big Data Analytics
No ratings yet
Specialised Programme On Big Data Analytics
3 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
COMP9313: Big Data Management
No ratings yet
COMP9313: Big Data Management
79 pages
Data Analyst Syllabus(for Aundh)
No ratings yet
Data Analyst Syllabus(for Aundh)
8 pages
Big Data - 2 Marks-1
No ratings yet
Big Data - 2 Marks-1
1 page
FSDS - Curriculum
No ratings yet
FSDS - Curriculum
3 pages
r18 - Big Data Analytics - Cse (DS)
0% (1)
r18 - Big Data Analytics - Cse (DS)
1 page
Essentials of Big Data Griet
No ratings yet
Essentials of Big Data Griet
2 pages
BIG Data Syllabus
No ratings yet
BIG Data Syllabus
2 pages
2024 25 ODD CE449 BDA Syllabus
No ratings yet
2024 25 ODD CE449 BDA Syllabus
4 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
DATA ANALYTICS SYLLABUS
No ratings yet
DATA ANALYTICS SYLLABUS
12 pages
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
Access Exercise
0% (1)
Access Exercise
3 pages
Assignment 5
No ratings yet
Assignment 5
9 pages
CookBook-4 0 0 PDF
No ratings yet
CookBook-4 0 0 PDF
336 pages
Dbms 6
No ratings yet
Dbms 6
2 pages
Python Cheatsheet
No ratings yet
Python Cheatsheet
14 pages
Lab 03
No ratings yet
Lab 03
12 pages
reaseach-paper-saas-ai-platform
No ratings yet
reaseach-paper-saas-ai-platform
10 pages
Unit 9.2 Database Models
No ratings yet
Unit 9.2 Database Models
14 pages
Android -programin-notes
No ratings yet
Android -programin-notes
30 pages
Acta Astronautica: Sciencedirect
No ratings yet
Acta Astronautica: Sciencedirect
12 pages
SQL Server 2022 Administration Inside Out 1St Edition Randolph West - Ebook PDF
No ratings yet
SQL Server 2022 Administration Inside Out 1St Edition Randolph West - Ebook PDF
51 pages
Online Registration System
No ratings yet
Online Registration System
7 pages
IT130-44-Week 6 Lecture Notes
No ratings yet
IT130-44-Week 6 Lecture Notes
7 pages
Oracle Concepts and Architecture Database Structures
No ratings yet
Oracle Concepts and Architecture Database Structures
102 pages
Sandeep Updated Resume
No ratings yet
Sandeep Updated Resume
6 pages
Oracle - Optimizer Bug Fixes With Disabled Fix Control in 19c
No ratings yet
Oracle - Optimizer Bug Fixes With Disabled Fix Control in 19c
7 pages
Ready Reckoner For Microsoft Excel 2003
No ratings yet
Ready Reckoner For Microsoft Excel 2003
8 pages
IT446 Test Bank
No ratings yet
IT446 Test Bank
57 pages
Unit9 23 Sy
No ratings yet
Unit9 23 Sy
8 pages
A Project Report On..
No ratings yet
A Project Report On..
39 pages
CS Project Report Template New
No ratings yet
CS Project Report Template New
23 pages
Analyst Technical Interview Prep
No ratings yet
Analyst Technical Interview Prep
11 pages
The Relational Algebra and Relational Calculus
0% (1)
The Relational Algebra and Relational Calculus
9 pages
DATA-VISUALIZATION-USING-POWER-BI-RESEARCH-PAPER
No ratings yet
DATA-VISUALIZATION-USING-POWER-BI-RESEARCH-PAPER
6 pages
Unit-1 RDBMS
No ratings yet
Unit-1 RDBMS
24 pages
Entity Relationship Diagram – ER Diagram in DBMS
No ratings yet
Entity Relationship Diagram – ER Diagram in DBMS
7 pages
Email Spam Detection (Research Paper)
No ratings yet
Email Spam Detection (Research Paper)
8 pages

Data Analytics TOC

Uploaded by

Data Analytics TOC

Uploaded by

Foundation Module – BDA (Indicative duration: 240 hrs.

Foundation Module- pre-requisite

Basics of Information technology

Foundational Curriculum – Big Data Analytics

Employed professionals who wish to explore their

Enthusiasts curious about understanding the hype behind

Module 2: The Data Ecosystem

Chapter 1: Introduction to Big Data-Hadoop framework

Chapter 3: Hadoop Components - MapReduce

Chapter 4: Extended subjects on HBASE

Chapter 5: Extended subjects on HIVE

Chapter 6: Apache Spark 5hrs

Module 3: Gathering, Wrangling & Visualizing Data with

o What is a Statistical Mode

Chapter 2 - Introduction to Statistical Modelling

Chapter 3 - Difference between Statistical Modelling and Machine Learning

Chapter 4 - Difference Statistical Modelling Perspective

Chapter 5 - Difference Machine Learning Perspective

o Data Visualization using R

o Exploratory Data Analysis (EDA) using R

o A Comprehensive analysis on a sample data set using Machine Learning Technique.

Module 5: Career Opportunities and Data Analysis in

You might also like