0% found this document useful (0 votes)

15 views

Data Science and Big Data Analytics_ Unit_1

The document outlines a course on Data Science and Big Data Analytics, detailing its teaching scheme, objectives, outcomes, and syllabus. It covers fundamental concepts of Big Data, mathematical foundations, processing techniques, analytics, visualization, and the impact of Big Data on various sectors. Additionally, it discusses challenges, advantages, and projects utilizing Big Data, emphasizing the need for new infrastructure and skill sets in the field.

Uploaded by

Devika Rankhambe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Data Science and Big Data Analytics_ Unit_1

Uploaded by

Devika Rankhambe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 47

Data Science and Big Data

Analytics
314453 : DATA SCIENCE AND BIG DATA ANALYTICS

Teaching Scheme:
Lectures: 4 Hours/Week
Credits 04
Examination Scheme:
In-Semester : 30 Marks
End-Semester: 70 Marks

Prerequisites:
Discrete Mathematics
Database management system
Data mining and Data warehousing
Course Objectives

1. To introduce basic need of Big Data and Data science

to handle huge amount of data.
2. To understand the basic mathematics behind the Big
data.
3. To understand the different Big data processing
techniques.
4. To understand and apply the Analytical concept of Big
data using R.
5. To visualize the Big Data using different tools.
6. To understand the application and impact of Big Data.
Course Outcomes

By the end of the course, students should be able to

 Students will be able to outline Big Data learning primitives.

 Students will be able to learn and Apply different
mathematical models behind Big Data.
 Students will be able to demonstrate their Big Data learning
skills by developing industry or research applications.
 Students will be able to analyze each learning model come
from a different algorithmic approach and it will perform
differently under different datasets.
Syllabus
• UNIT – I INTRODUCTION: DATA SCIENCE AND BIG DATA (08
hours)
– Introduction to Data science and Big Data, Defining Data science and Big
Data, Big Data examples, Data explosion, Data volume, Data Velocity, Big
data infrastructure and challenges, Big Data Processing Architectures,
Data Warehouse, Re-Engineering the Data Warehouse, Shared everything
and shared nothing architecture, Big data learning approaches.

• UNIT – II MATHEMATICAL FOUNDATION OF BIG DATA (08 Hours)

– Probability theory, Tail bounds with applications, Markov chains and
random walks, Pair wise independence and universal hashing,
Approximate counting, Approximate median, The streaming models,
Flajolet Martin Distance sampling, Bloom filters, Local search and testing
connectivity, Enforce test techniques, Random walks and testing, Boolean
functions, BLR test for linearity.
Syllabus (Cont…)
• UNIT - III BIG DATA PROCESSING (08 Hours)
– Big Data technologies, Introduction to Google file system, Hadoop
Architecture, Hadoop Storage: HDFS, Common Hadoop Shell commands,
Anatomy of File Write and Read, NameNode, Secondary NameNode, and
DataNode, Hadoop MapReduce paradigm, Map Reduce tasks, Job, Task
trackers - Cluster Setup – SSH & Hadoop Configuration, Introduction to:
NOSQL, Textual ETL processing.

• UNIT – IV BIG DATA ANALYTICS (08 Hours)

– Data analytics life cycle, Data cleaning , Data transformation, Comparing
reporting and analysis, Types of analysis, Analytical approaches, Data
analytics using R, Exploring basic features of R, Exploring R GUI, Reading
data sets, Manipulating and processing data in R, Functions and packages
in R, Performing graphical analysis in R, Integrating R and Hadoop, Hive,
Data analytics.
Syllabus (Cont…)
• UNIT – V Big Data Visualization (08 Hours)
– Introduction to Data visualization, Challenges to Big data visualization,
Conventional data visualization tools, Techniques for visual data
representations, Types of data visualization, Visualizing Big Data, Tools used in
data visualization, Propriety Data Visualization tools, Open –source data
visualization tools, Analytical techniques used in Big data visualization, Data
visualization with Tableau, Introduction to: Pentaho, Flare, Jasper Reports,
Dygraphs, Datameer Analytics Solution and Cloudera, Platfora, NodeBox,
Gephi, Google Chart API, Flot, D3, and Visually.

• UNIT – VI BIG DATA TECHNOLOGIES APPLICATION AND IMPACT (08 Hours)

– Social media analytics, Text mining, Mobile analytics , Roles and
responsibilities of Big data person, Organizational impact, Data analytics life
cycle, Data Scientist roles and responsibility, Understanding decision theory,
creating big data strategy, big data value creation drivers, Michael Porter’s
valuation creation models, Big data user experience ramifications, Identifying
big data use cases.
Text Books
• 1. Krish Krishnan, Data warehousing in the age
of Big Data, Elsevier, ISBN: 9780124058910,
1st Edition.
• 2. DT Editorial Services, Big Data, Black Book,
DT Editorial Services, ISBN: 9789351197577,
2016 Edition.
Reference Books
• 1. Mitzenmacher and Upfal, Probability and Computing: Randomized
Algorithms and Probabilistic Analysis, Cambridge University press,
ISBN :521835402 hardback.
• 2. Dana Ron, Algorithmic and Analysis Techniques in Property Testing,
School of EE.
• 3. Graham Cormode, Minos Garofalakis, Peter J. Haas and Chris Jermaine,
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches,
Foundation and trends in databases, ISBN :10.1561/1900000004.
• 4. A.Ohri, R for Business Analytics, Springer, ISBN:978-1-4614-4343-8.
• 5. Alex Holmes, Hadoop in practice, Dreamtech press,
ISBN:9781617292224.
• 6. AmbigaDhiraj, Big Data, Big Analytics: Emerging Business Intelligence
and Analytic Trends for Today’s Business, Wiely CIO Series.
• 7. Arvind Sathi, Big Data Analytics: Disruptive Technologies for Changing
the Game, IBM Corporation, ISBN:978-1-58347-380-1.
Reference Books (Cont…)
• 8. EMC Education Services, Data Science and Big Data Analytics- Discovering, analyzing
Visualizing and Presenting Data.
• 9. Li Chen, Zhixun Su, Bo Jiang, Mathematical Problems in Data Science, Springer,
ISBN :978-3-31925127-1.
• 10. Philip Kromer and Russell Jurney, Big Data for chips, O’Reilly, ISBN :9789352132447.
• 11. EMC Education services, Data Science and Big Data Analytics, EMC2 Wiley,
ISBN :9788126556533.
• 12. Mueller Massaron, Python for Data science, Wiley, ISBN :9788126557394.
• 13. EMC Education Services, Data Science and Big Data Analytics, Wiley India, ISBN:
9788126556533
• 14. Benoy Antony, Konstantin Boudnik, Cheryl Adams,,Professional Hadoop, Wiley India,
ISBN :9788126563029
• 15. Mark Gardener, Beginning R: The Statistical Programming Language ,Wiley India,
ISBN :9788126541201
• 16. Mark Gardener, The Essential R Reference ,Wiley India, ISBN : 9788126546015
• 17. Judith Hurwitz, Alan Nugent, Big Data For Dummies, Wiley India, ISBN :
9788126543281
• .

UNIT – I
Introduction: Data Science and
Big Data
Big Data: Introduction
• Big Data is large volume of Data in structured or
unstructured form.
• The rate of data generation has increased
exponentially by increasing use of data intensive
technologies.
• Processing or analyzing the huge amount of data is a
challenging task.
• It requires new infrastructure and a new way of
thinking about the way business and IT industry
works
What is Big Data?

The are many examples of "data", but what makes some of it
“big”? The classic definition revolves around the three Vs.

Volume, velocity, and variety.

Volume: There is a just a lot of it being generated all the
time. Things get interesting and “big”, when you can’t fit it
all on one computer anymore. Why? There are many ideas
here such as MapReduce, Hadoop, etc. that all revolve
around being able to process data that goes from Terabytes,
to Petabytes, to Exabytes.

Velocity: Data is being generated very quickly. Can you
even store it all? If not, then what do you get rid of and http://
pl.wikipedia.org/
what do you keep? wiki/
Green_Giant#mediavie

Variety: The data types you mention all take different wer/
Plik:Jolly_green_giant.j
shapes. What does it mean to store them so that you can pg
play with or compare them?
Defining Big Data
6 V’s of Big Data
Case study: 6 V’s in clinical dataset
Data Volume
Data Velocity
Data Variety

Ambiguity -- Uncertainty
Viscosity – It is the inertia when navigating through a data collection.
Virality – measures the speed at which data can spread through a network.
Problem of Data Explosion
Problem of Data Explosion (Cont…)
• The International Data Corporation (IDC) study
predicts that overall data will grow by 50 times
by 2020.
• The digital universe is 1.8 trillion gigabytes
(109) in size and stored in 500 quadrillion
(1015) files.
• Information Bits in the digital universe as stars
in our physical universe.
• 90% Data is in unstructured form.
Issues in Big Data
• Issues related to the Characteristics
• Storage and Transfer Issues
• Data Management Issues
• Processing Issues
Issues related to the Characteristics
• Data Volume Issues
• Data Velocity Issues
• Data Variety Issues
• Worth of Data Issues
• Data Complexity Issues
Storage and Transfer Issues
• Current Storage Techniques and Storage Medium are
not appropriate for effectively handling Big Data.
• Current Technology limits 4 Terabytes (1012) per disk,
so1 Exabyte (1018) size data will take 25,000 Disks.
• Accessing that data will also overwhelm network.
• Assuming a sustained transfer of 1 Exabyte will take
2,800 hours with a 1 Gbps capable network with 80%
effective transfer rate and 100Mbps sustainable
speed.
Data Management Issues
• Resolving issues of access, utilization, updating,
governance, and reference (in publications) have
proven to be major stumbling blocks.
• In such volume, it is impractical to validate every
data item.
• New approaches and research to data qualification
and validation are needed.
• The richness of digital data representation prohibits
a personalized methodology for data collection.
Processing Issues
• The Processing Issues are critical to handle.
• Example: 1 Exabyte = 1000 Petabytes (1015).
Assuming a processor expends 100 instructions on
one block at 5 gigahertz, the time required for end
to-end processing would be 20 nanoseconds. To
process 1K petabytes would require a total end-to-
end processing time of roughly 635 years.
• Effective processing of Exabyte of data will require
extensive parallel processing and new analytics
algorithms
Challenges in Big Data
• Privacy and Security
• Data Access and Sharing of Information
• Analytical Challenges
• Human Resources and Manpower
• Technical Challenges
Privacy and Security
• Privacy and Security are sensitive and includes
conceptual, Technical as well as legal significance.
• Most Peoples are vulnerable to Information Theft.
• Privacy can be compromised in the large data sets.
• The Security is also critical to handle in such large
data.
• Social stratification would be important arising
consequence.
Data Access and Sharing of Information
• Data should be available in accurate, complete
and timely manner.
• The data management and governance
process bit complex adding the necessity to
make data open and make it available to
government agencies.
• Expecting sharing of data between companies
is awkward.
Analytical Challenges
• Big data brings along with it some huge
analytical challenges.
• Analysis on such huge data, requires a large
number of advance skills.
• The type of analysis which is needed to be
done on the data depends highly on the
results to be obtained.
Human Resources and Manpower
• Big Data needs to attract organizations and
youth with diverse new skill sets.
• The skills includes technical as well as research,
analytical, interpretive and creative ones.
• It requires training programs to be held by the
organizations.
• Universities need to introduce curriculum on
Big data.
Technical Challenges
• Fault Tolerance: If the failure occurs the damage
done should be within acceptable threshold rather
than beginning the whole task from the scratch.
• Scalability: Requires a high level of sharing of
resources which is expensive and dealing with the
system failures in an efficient manner.
• Quality of Data: Big data focuses on quality data
storage rather than having very large irrelevant data.
• Heterogeneous Data: Structured and Unstructured
Data.
Advantages of Big Data
• Understanding and Targeting Customers
• Understanding and Optimizing Business Process
• Improving Science and Research
• Improving Healthcare and Public Health
• Optimizing Machine and Device Performance
• Financial Trading
• Improving Sports Performance
• Improving Security and Law Enforcement
Some Projects using Big Data
• Amazon.com handles millions of back-end operations and
have 7.8 TB, 18.5 TB, and 24.7 TB Databases.
• Walmart is estimated to store more than 2.5 PB Data for
handling 1 million transactions per hour.
• The Large Hadron Collider (LHC) generates 25 PB data
before replication and 200 PB Data after replication.
• Sloan Digital Sky Survey ,continuing at a rate of about 200
GB per night and has more than 140 TB of information.
• Utah Data Center for Cyber Security stores Yottabytes
(1024).
Is Big Data the same as Data Science?

Are Big Data and Data Science the same
thing?

I wouldn't say so...

Data Science can be done on small data sets.

And not everything done using Big Data would
necessarily be called Data Science.

Data
Big Data
Science
Is Big Data the same as Data Science?

Are Big Data and Data Science the same
thing?

I wouldn't say so...

Data Science can be done on small data sets.

And not everything done using Big Data would
necessarily be called Data Science.

But there certainly is a substantial overlap!
Data
Big Data
Science
Big Data Infrastructure: Hadoop/MapReduce
Programming & Data Processing

 Architecture of Hadoop, HDFS, and Yarn

 Programming on Hadoop

 Basic Data Processing: Sort and Join

 Information Retrieval using Hadoop
 Data Mining using Hadoop (Kmeans+Histograms)
 Machine Learning on Hadoop (EM)

 Hive/Pig
 HBase and Cassandra
Big Data Learning Approaches
Classical Approach
Given
Wanted

Input Output
Model

Machine Learning Approach

Input Model Output

How Machine Learning Works
• Machine Learning builds a model from the data
• Supervised: Data and Labels
• Unsupervised: Data with no label

• The model is used then to:

• Predict the outcome of a system
• Recognize complicated patterns in the new data points
• Classify inputs
Big Data Processing Examples

• Weather data
• Contract Data
• Financial reporting data
• Clinical trials data
• Social Media posts
• Survey data
Big Data processing Architectures

• Centralized Processing
• Distributed Processing
• Client Server Architecture
• Cluster Architecture
Advantages of distributed processing are scalability,
customization of processing and management of information
based on operation, and parallel processing of data which
reduce latencies.
Disadvantages of distributed processing are data redundancy,
process redundancy, resource overhead and volume.
Big Data Processing Cycle
Big Data Processing Flow
Shared everything Architectures
Shared nothing Architectures
The requirements for Big Data infrastructure and
Processing Architecture
• Data Processing Requirements:
– Data- Model less architecture
– Micro batch Processing
– Data collection in real time
– Minimal Data transformation
– Multi partition Capability
– Efficient Data Reads
– Share data across multiple processing points
– Store results in File system or non-relational DBMS
Infrastructure Requirements

• Linear Scalability
• High Throughput
• Fault Tolerant
• Auto recovery
• Distributed Data processing
• High degree of parallelism
• Programming language interface

Big Data Black Book
16% (25)
Big Data Black Book
2 pages
Smart Poultry Management
No ratings yet
Smart Poultry Management
8 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Icaew Cfab BTF 2019 Study Guide
100% (3)
Icaew Cfab BTF 2019 Study Guide
50 pages
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
2 pages
Big Data Black Book PDF
15% (20)
Big Data Black Book PDF
2 pages
Data Science Training Content Naresh IT Hyderabad
No ratings yet
Data Science Training Content Naresh IT Hyderabad
13 pages
BDA2023Outline
No ratings yet
BDA2023Outline
7 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
20IT503 - Big Data Analytics - Unit1
No ratings yet
20IT503 - Big Data Analytics - Unit1
59 pages
Unit 1
No ratings yet
Unit 1
19 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
COMP9313: Big Data Management
No ratings yet
COMP9313: Big Data Management
79 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
UNIT I BIG DATA Extra Content
No ratings yet
UNIT I BIG DATA Extra Content
15 pages
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
No ratings yet
Big Data Analytics (BDA) : Name of The Faculty: Affiliation: Teaching Area
8 pages
Module 1 Introduction to Big Data Analytics
No ratings yet
Module 1 Introduction to Big Data Analytics
121 pages
Big Data
No ratings yet
Big Data
41 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
Essentials of Big Data Griet
No ratings yet
Essentials of Big Data Griet
2 pages
BIG Data Syllabus
No ratings yet
BIG Data Syllabus
2 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
6 pages
BDA_Notes
No ratings yet
BDA_Notes
68 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
BIG Data_Unit_1
No ratings yet
BIG Data_Unit_1
24 pages
Unit-1 Final sgs
No ratings yet
Unit-1 Final sgs
24 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
232 pages
BDCC Unit 1
No ratings yet
BDCC Unit 1
165 pages
Big Data: Prof. Thushara Weerawardane
No ratings yet
Big Data: Prof. Thushara Weerawardane
39 pages
Digital Notes of Big Data Analytics Dated 5.1.2024
No ratings yet
Digital Notes of Big Data Analytics Dated 5.1.2024
175 pages
Basics of Big Data
No ratings yet
Basics of Big Data
14 pages
MCAD2232 (PRESS) BIG DATA and Its Applications
No ratings yet
MCAD2232 (PRESS) BIG DATA and Its Applications
140 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
119 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
221 pages
Syllabus of Course Big Data Integration
No ratings yet
Syllabus of Course Big Data Integration
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
36 pages
Unit 1-Big Data Analytics & Lifecycle
No ratings yet
Unit 1-Big Data Analytics & Lifecycle
130 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
3.1.1
No ratings yet
3.1.1
7 pages
Unit I
No ratings yet
Unit I
61 pages
1.introduction To Data Science
No ratings yet
1.introduction To Data Science
23 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
48 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
BDA Unlocked
100% (1)
BDA Unlocked
69 pages
CS8091-Big-Data-Analytics
No ratings yet
CS8091-Big-Data-Analytics
28 pages
Digital Notes IDBA Final Original
No ratings yet
Digital Notes IDBA Final Original
156 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
Coursera Report Ishaan Taneja 1000016551
No ratings yet
Coursera Report Ishaan Taneja 1000016551
7 pages
Big Data Analytics syllabus
No ratings yet
Big Data Analytics syllabus
1 page
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Data warehouse
No ratings yet
Data warehouse
167 pages
Nov 2018
No ratings yet
Nov 2018
4 pages
CC Syllabus Anna University
No ratings yet
CC Syllabus Anna University
1 page
Dsbda Unit -1 - Copy
No ratings yet
Dsbda Unit -1 - Copy
21 pages
CEGP019671: Human Computer Interaction
No ratings yet
CEGP019671: Human Computer Interaction
4 pages
Software Product Quality Metrics
No ratings yet
Software Product Quality Metrics
48 pages
Stqa Unit - II Quiz
No ratings yet
Stqa Unit - II Quiz
6 pages
Load Balancing and Service Discovery Using Docker
No ratings yet
Load Balancing and Service Discovery Using Docker
10 pages
Big Data For The Future: Unlocking The Predictive Power of The Web
No ratings yet
Big Data For The Future: Unlocking The Predictive Power of The Web
12 pages
HDFC 1
No ratings yet
HDFC 1
5 pages
A Comparison of Azure AWS and Google Cloud Services PDF
No ratings yet
A Comparison of Azure AWS and Google Cloud Services PDF
17 pages
MSC Management With Data Analytics
No ratings yet
MSC Management With Data Analytics
3 pages
NPPD AMI Analytics Reporting Presentation LPPC CIO Meeting
No ratings yet
NPPD AMI Analytics Reporting Presentation LPPC CIO Meeting
14 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Lecture1.2 BigData 5Vs
No ratings yet
Lecture1.2 BigData 5Vs
33 pages
Big Data
No ratings yet
Big Data
7 pages
Article in Press: Data-Driven Smart Manufacturing
No ratings yet
Article in Press: Data-Driven Smart Manufacturing
13 pages
Unified Analytics For Data and AI, Databricks Special Edition PDF
No ratings yet
Unified Analytics For Data and AI, Databricks Special Edition PDF
37 pages
Technical Seminar On Big Data'
No ratings yet
Technical Seminar On Big Data'
14 pages
Esg Economic Validation Google Bigquery Vs Cloud-Based-Edws-September 2019 PDF
No ratings yet
Esg Economic Validation Google Bigquery Vs Cloud-Based-Edws-September 2019 PDF
16 pages
Data Science for Business With R 1st Edition Jeffrey S Saltz Jeffrey Morgan Stanton all chapter instant download
100% (1)
Data Science for Business With R 1st Edition Jeffrey S Saltz Jeffrey Morgan Stanton all chapter instant download
65 pages
Higher Education's Top 10 Strategic Technologies For 2016: Educause Center For Analysis and Research
No ratings yet
Higher Education's Top 10 Strategic Technologies For 2016: Educause Center For Analysis and Research
55 pages
Cyber Physical System
100% (1)
Cyber Physical System
40 pages
1 s2.0 S2667096823000496 Main
No ratings yet
1 s2.0 S2667096823000496 Main
14 pages
Artificial-Intelligence-and-Financial-Marketing-Transforming-Customer-Segmentation-and-Risk-Assessment_2025_IADITI--International-Association-for-Digital-Transformation-and-Technological-Innovation
No ratings yet
Artificial-Intelligence-and-Financial-Marketing-Transforming-Customer-Segmentation-and-Risk-Assessment_2025_IADITI--International-Association-for-Digital-Transformation-and-Technological-Innovation
10 pages
Thairu Sophie MBA 2023
No ratings yet
Thairu Sophie MBA 2023
76 pages
Chapter 1 Big Data Development Trend and Kunpeng Big Data Solution
No ratings yet
Chapter 1 Big Data Development Trend and Kunpeng Big Data Solution
52 pages
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
No ratings yet
AD3491 - Unit 1 - Introduction to Data Science Important Questions 2 Marks With Answer --3-8
6 pages
Big Data HR Management PDF
No ratings yet
Big Data HR Management PDF
24 pages
Faisal We Like Project
No ratings yet
Faisal We Like Project
63 pages
William Wizner - Python For Data Science - Data Analysis and Deep Learning With Python Coding and Programming
100% (1)
William Wizner - Python For Data Science - Data Analysis and Deep Learning With Python Coding and Programming
73 pages
Telecommunications Sector
No ratings yet
Telecommunications Sector
16 pages
Article 2 - Big Data Analytics Opportunity or Threat For The Accounting Profession
No ratings yet
Article 2 - Big Data Analytics Opportunity or Threat For The Accounting Profession
18 pages
Business Intelligence - The Ultimate Guide To BI, Artificial Intelligence, Machine Learning, Big Data, Cybersecurity, Data Science, and Predictive Analytics
No ratings yet
Business Intelligence - The Ultimate Guide To BI, Artificial Intelligence, Machine Learning, Big Data, Cybersecurity, Data Science, and Predictive Analytics
153 pages
Top Oracle Financials Interview Questions & Answers For 2022
No ratings yet
Top Oracle Financials Interview Questions & Answers For 2022
18 pages

Data Science and Big Data Analytics_ Unit_1

Uploaded by

Data Science and Big Data Analytics_ Unit_1

Uploaded by

Data Science and Big Data

1. To introduce basic need of Big Data and Data science

By the end of the course, students should be able to

 Students will be able to outline Big Data learning primitives.

• UNIT – II MATHEMATICAL FOUNDATION OF BIG DATA (08 Hours)

• UNIT – IV BIG DATA ANALYTICS (08 Hours)

• UNIT – VI BIG DATA TECHNOLOGIES APPLICATION AND IMPACT (08 Hours)

 Architecture of Hadoop, HDFS, and Yarn

 Basic Data Processing: Sort and Join

Machine Learning Approach

Input Model Output

• The model is used then to:

You might also like