0% found this document useful (0 votes)

46 views

Data Science: October 2021

This document provides an overview of data science and its key components. It discusses how data science evolved from earlier fields like data mining and statistics. The document outlines the different types of data, characteristics of big data, and components of data science like statistics, domain expertise, data engineering, visualization, and advanced computing. It also gives examples of applications of data science in areas like image recognition, gaming, and internet search. The document is intended to provide foundational knowledge about data science.

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Data Science: October 2021

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/355170843

Data Science

Presentation · October 2021

DOI: 10.13140/RG.2.2.17701.22240

CITATIONS READS

0 1,001

1 author:

Chitra G Desai
National Defence Academy
59 PUBLICATIONS 110 CITATIONS

SEE PROFILE

All content following this page was uploaded by Chitra G Desai on 11 October 2021.

The user has requested enhancement of the downloaded file.

Foundations of Data Science
Dr Chitra Desai
Professor and Head
Faculty of Computational Science
Introduction
• Data Analysis
• Data Mining
• Statistical Learning
• Knowledge Discovery
• Pattern Discovery
• Big Data

Fall under the same umbrella which is learning from data.

2
• Data Analysis is a process of inspecting, cleaning, transforming and modelling
data with the goal of discovering useful information, informing conclusion and
supporting decision-making.
• Statistical Learning theory deals with the problem of finding a predictive function
based on data
• Data Mining is the process of discovering patterns in large data sets involving
methods at the intersection of machine learning, statistics, and database
systems.,
• Knowledge Discovery in Databases, or KDD for short, refers to the broad process
of finding knowledge in data, and emphasizes the "high-level" application of
particular data mining methods. The unifying goal of the KDD process is to extract
knowledge from data in the context of large databases,
• Pattern Discovery – Uncovering patterns from massive data sets
• Big Data - is larger, more complex data sets, especially from new data sources.
These data sets are so voluminous that traditional data processing software just
can't manage them.
3
Data…
• For example, Google processes 24 petabytes of data per day.
• Facebook processes ten millions of photo every hour.
• YouTube, we have about one hour of video uploaded every second.
• Twitter, about 500 million tweets per day.
• And in astronomy, for example, satellites data is in hundreds of
petabytes.
• It is estimated that by 2021, the digital universe will reach 74
zettabytes of data.

4
Big Data
• Big data is a blanket term for any collection of data sets so large or
complex that it becomes difficult to process them using traditional
data management techniques such as, for example, the RDBMS.
• Data science involves using methods to analyze massive amounts of
data and extract the knowledge it contains.
• The relationship between big data and data science as being like the
relationship between crude oil and an oil refinery.
• Data science and big data evolved from statistics and traditional data
management but are now considered to be distinct disciplines.

5
Vs of Big Data
• The characteristics of big data are often referred to as the four Vs:
• Volume—How much data is there?
• Variety—How diverse are different types of data?
• Velocity—At what speed is new data generated?
• Veracity - How accurate the data is?
• Value – What is the value of the huge data collected?

• The challenges they bring can be felt in almost every aspect: data capture,
curation, storage, search, sharing, transfer, and visualization. In addition,
big data calls for specialized techniques to extract the insights

6
Data Trail

• The fact is that today we are “datafied”.

• Wherever we go, we leave a trail of data.

• Smartphones for example, are tracking our locations.

• We leave it a data trail in our web browsing.

• We also interact a lot today with social networks, leaving behind us

photos, comments, and so on, and so forth.

7
Types of Data
• Raw ingredient for data science • tables,
comes in the form of Data • images,
• Different types and flavours. • transactions,
• text, • videos,
• it could be numbers, • and sometimes all of the above.
• click streams,
• graphs,

8
Types of Data
• Structured data
Structured data is data that depends on a data model and resides in a fixed field within a record.
Structured data stored in tables within databases or Excel files .
• Semi-structured data
Semi-structured data is information that doesn't reside in a relational database but that does have
some organizational properties that make it easier to analyze.
Examples - CSV , XML and JSON documents are semi structured documents, NoSQL databases are
considered as semi structured.
• Unstructured data
Unstructured data is data that isn’t easy to fit into a data model because the content is context-
specific or varying.

Examples - unstructured data is your regular email, a paragraph from a book with relevant
information, social media comments and posts that need to be analyzed.
9
Types of Data
• Graph-based or network data

Examples of graph-based data can be found on many social media websites . For instance, on
LinkedIn you can see who you know at which company. Your follower list on Twitter is another
example of graph-based data.

• Streaming data

Streaming data is data that is continuously generated by different sources. Such data should be
processed incrementally using stream processing techniques without having access to all of the
data.

Examples are the “What’s trending” on Twitter, live sporting or music events, and the stock market.

10
Data Science
• The term data mining tend to disappear today, it was coined in the
mid 90s.

• Today we talk more about data science.

• All this terminology has been unified under the data science
terminology, which is

How to do science with data?

11
Data Mining … Data Science
• 1996 data mining - Obtaining useful information from data
• 2001 Willian S Cleveland - took data mining to another level

Computer Science + Data Mining = Data Science

• Solve real company problem using data

• Talk about and what industry want.
• Improve on their product from the data input.

12
Web 2.0
• Big data (World of possibilities insight using data )
• Sophisticated data handling infrastructure
• Parallel computing technology
• Map reduce
• Spark
• Hadoop
• Massive unstructured data set
• 2010 train m/c using data driven approach rather than knowledge
driven approach

13
Data Science
• Data Science is the area of study which involves extracting insights
from vast amounts of data by the use of various scientific methods,
algorithms, and processes.
• It helps you to discover hidden patterns from the raw data.
• The term Data Science has emerged because of the evolution of
mathematical statistics, data analysis, and big data.
• Data Science is an interdisciplinary field that allows you to extract
knowledge from structured or unstructured data.
• Data science enables you to translate a business problem into a
research project and then translate it back into a practical solution.

14
Components of Data Science

15
Components of Data Science
• Statistics:
• Statistics is one of the most important components of data science.
• Statistics is a way to collect and analyze the numerical data in a large amount
and finding meaningful insights from it.
• Domain Expertise:
• In data science, domain expertise binds data science together.
• Domain expertise means specialized knowledge or skills of a particular area.
• In data science, there are various areas for which we need domain experts.

16
Components of Data Science
• Data engineering:
• Data engineering is a part of data science, which involves acquiring, storing,
retrieving, and transforming the data.
• Data engineering also includes metadata (data about data) to the data.
• Visualization:
• Data visualization is meant by representing data in a visual context so that people
can easily understand the significance of data.
• Data visualization makes it easy to access the huge amount of data in visuals.
• Advanced computing:
• Heavy lifting of data science is advanced computing.
• Advanced computing involves designing, writing, debugging, and maintaining the
source code of computer programs.

17
Components of Data Science

18
Applications of Data Science:

• Image recognition and speech recognition:

• Photo tagging – Facebook
• Device response – Siri, Cortana
• Gaming world:
• EA Sports, Sony, Nintendo, are widely using data science for enhancing user
experience.
• Internet Search:
• Search engines use the data science technology to make the search
experience better, and you can get a search result with a fraction of seconds.

19
Applications of Data Science:
• Transport:
• Self driving car
• Healthcare:
• Tumor detection, drug discovery, medical image analysis, virtual medical bots,
etc.
• Recommendation systems:
• Amazon, Netflix, Google Play, etc., are using data science technology for
making a better user experience with personalized recommendations.
• Risk detection:
• Finance industries - issue of fraud and risk of losses, but with the help of data
science, this can be rescued.

20
Tools for Data Science

• Following are some tools required for data science:

• Data Analysis tools: R, Python, Statistics, SAS, Jupyter, R Studio,
MATLAB, Excel, RapidMiner.
• Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS
Redshift
• Data Visualization tools: R, Jupyter, Tableau, Cognos.
• Machine learning tools: Spark, Mahout, Azure ML studio.

21
Challenges of Data science Technology

• High variety of information & data is required for accurate analysis

• Not adequate data science talent pool available
• Management does not provide financial support for a data science team
• Unavailability of/difficult access to data
• Data Science results not effectively used by business decision makers
• Explaining data science to others is difficult
• Privacy issues
• Lack of significant domain expert
• If an organization is very small, they can't have a Data Science team
22
23
24
25
26
About Data
• Data for Nuclear Explosion during specific time period. The file nw is
obtained from
data.world repository https://data.world/datasets/military.
• The data is analyzed here to apply concepts of Exploratory Data
Analysis and application of machine learning algorithm using python.

27
Data Information

28
Data Description

29
30
Missing Data

31
Redundant Data

32
Data Format

33
Data Visualization

34
35
EDA : Exploratory Data Analysis
• 1. Our first objective is to identify from which year to which year the
data is available with us.
• For that let us focus on the field datatime

We see that datetime field consist

of year month date and time
Our objective currently is to focus
only the year.
Let us extract year in a new column
year from the field datetime
36
37
Observation 1
• It is observed that data is for the period 1945 to 1998
• Also it is observed that during 1947,1950,1959 and 1997 no
explosions were carried out

38
2. Year wise Explosion Data

39
Data Visualization

40
3. Identify the number of countries participated in explosion from 1945 to 1998.
also find Country that has conducted max explosions

41
42
Data products
• At the h20 world conference in the Bay Area, on 11th November 2015
• Hilary Mason emphasized that the creation of “data products”
requires three components:
• data (of course)
• plus technical expertise (machine-learning)
• plus people and process (talent).
• Google Maps is a great example of a data product that epitomizes all
these three qualities.

• Hilary Mason is an American data scientist and the founder of technology startup Fast Forward Labs as
well as Data Scientist in Residence at Accel Partners.

43
Fact
• Data is meaningless without context
• People are natural analyst

44
SPF 15 Vs SPF 30

45
Ways to represent…

46
Learning

Data scientist – Interpret and Guide

47
Data Science Hierarchy of Needs

AI and Deep
Learning

Learn / Optimize A/B Testing, Experimentation,

Simple ML problems
Analytics, Metric, Segments, Aggregates,
Aggregate/Label
Features, Training Data

Explore/Transform Cleaning, Anomaly Detection, Prep

Reliable data flow,

Move / Store Infrastructure,ETL,Structured and
unstructured data storage

Collect Instrumentation, Logging, Sensors,

48
External data, User Generated Contents
Questions?
Thank You

50
View publication stats

NSX 64 Admin
No ratings yet
NSX 64 Admin
590 pages
Pyspark Interview Questions: Click Here
0% (1)
Pyspark Interview Questions: Click Here
35 pages
CDISC SDTM Conversion
No ratings yet
CDISC SDTM Conversion
11 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
Data
No ratings yet
Data
43 pages
Defining Data Science
100% (1)
Defining Data Science
167 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
mod 3
No ratings yet
mod 3
96 pages
CH-2 Data Science Emerging Technology
No ratings yet
CH-2 Data Science Emerging Technology
20 pages
Chap1-Overview of Data Science
No ratings yet
Chap1-Overview of Data Science
50 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
20IT501_BDA_Unit1
No ratings yet
20IT501_BDA_Unit1
18 pages
Unit 1
No ratings yet
Unit 1
137 pages
Foundations of Data Science PPT TEXT BOOK
No ratings yet
Foundations of Data Science PPT TEXT BOOK
132 pages
FDSUNIT 1
No ratings yet
FDSUNIT 1
27 pages
Chapter one-DSA
No ratings yet
Chapter one-DSA
20 pages
Unit 3 Part 1
No ratings yet
Unit 3 Part 1
43 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
DSUP Chapter 1 PDF
No ratings yet
DSUP Chapter 1 PDF
31 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
1c. INTRODUCTION-Data-Science-basic
No ratings yet
1c. INTRODUCTION-Data-Science-basic
31 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
IDS Unit 1
No ratings yet
IDS Unit 1
67 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
DS Unit-1 PDF
No ratings yet
DS Unit-1 PDF
50 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Data Science - AD1102-1
No ratings yet
Data Science - AD1102-1
53 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Modul1 PPt.pptx
No ratings yet
Modul1 PPt.pptx
56 pages
BCA Lecture I
No ratings yet
BCA Lecture I
20 pages
M-1
No ratings yet
M-1
98 pages
ETCh2
No ratings yet
ETCh2
36 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
Data Science Introduction
No ratings yet
Data Science Introduction
82 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
70 pages
himadev
No ratings yet
himadev
37 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Data Science
No ratings yet
Data Science
244 pages
data science assignment
No ratings yet
data science assignment
4 pages
Data Science Unit-I
No ratings yet
Data Science Unit-I
13 pages
Unit 1
No ratings yet
Unit 1
26 pages
Data Science
No ratings yet
Data Science
40 pages
Data Science SPPU
No ratings yet
Data Science SPPU
115 pages
Data Science and Big Data Analytics Unit 1 notes
No ratings yet
Data Science and Big Data Analytics Unit 1 notes
13 pages
AIDS C04-Session-19
No ratings yet
AIDS C04-Session-19
29 pages
unit_1
No ratings yet
unit_1
9 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
FDS - UNIT 1
No ratings yet
FDS - UNIT 1
233 pages
1.introduction To Data Science
No ratings yet
1.introduction To Data Science
23 pages
Ch7-Overview of Data Science-part 1
No ratings yet
Ch7-Overview of Data Science-part 1
37 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Data v2
No ratings yet
Data v2
25 pages
Unit 1
No ratings yet
Unit 1
76 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
INTRODUCTION and M1-CH-1
No ratings yet
INTRODUCTION and M1-CH-1
63 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Chapter1-Foundations For Efficiencies
No ratings yet
Chapter1-Foundations For Efficiencies
5 pages
Chapter3 Gaining Efficiencies
No ratings yet
Chapter3 Gaining Efficiencies
6 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Semi-: Supervised Learning
No ratings yet
Semi-: Supervised Learning
40 pages
Time Series
100% (1)
Time Series
91 pages
08.time Series
No ratings yet
08.time Series
1 page
Uncertainity Quantification
No ratings yet
Uncertainity Quantification
88 pages
SQL Joins Interview Questions: Click Here
No ratings yet
SQL Joins Interview Questions: Click Here
34 pages
20210501-ML Question Bank
No ratings yet
20210501-ML Question Bank
1 page
Artificial Intelligence Interview Questions: Click Here
No ratings yet
Artificial Intelligence Interview Questions: Click Here
44 pages
Numpy Interview Questions: Click Here
No ratings yet
Numpy Interview Questions: Click Here
32 pages
Kartik Takyar 101803618 Jatin Kapoor 101803619 Himanshu Mahajan 101803620 Gaurish Garg 101803621 Aditi Pandey 101803622
No ratings yet
Kartik Takyar 101803618 Jatin Kapoor 101803619 Himanshu Mahajan 101803620 Gaurish Garg 101803621 Aditi Pandey 101803622
23 pages
98-364 Microsoft Exam Questions and Answers - CertLibrary - Com4
No ratings yet
98-364 Microsoft Exam Questions and Answers - CertLibrary - Com4
7 pages
Chapter 1
No ratings yet
Chapter 1
54 pages
Netapp Powershell Commands
No ratings yet
Netapp Powershell Commands
74 pages
PDF Reducer V.3: User Guide
No ratings yet
PDF Reducer V.3: User Guide
38 pages
Flex Ray Communication System
No ratings yet
Flex Ray Communication System
245 pages
Navo JMBL Users Guide Ds-Ug-20081017
No ratings yet
Navo JMBL Users Guide Ds-Ug-20081017
53 pages
Mos Integrated Circuit: V25+ 16/8-Bit Single-Chip Microcontroller
No ratings yet
Mos Integrated Circuit: V25+ 16/8-Bit Single-Chip Microcontroller
80 pages
Machine Learning Bro Ids
No ratings yet
Machine Learning Bro Ids
25 pages
Vsphere Replication 85 Admin
No ratings yet
Vsphere Replication 85 Admin
159 pages
Please Do Not Modify or Delete This File
No ratings yet
Please Do Not Modify or Delete This File
3 pages
Cr10win en Sp6
No ratings yet
Cr10win en Sp6
127 pages
Login Authentication Using Bean and Servlet in JSP
No ratings yet
Login Authentication Using Bean and Servlet in JSP
5 pages
Chapter 2 8086 Addressing Modes1
100% (1)
Chapter 2 8086 Addressing Modes1
14 pages
SNR-SQL Tasks
No ratings yet
SNR-SQL Tasks
3 pages
Kaspersky Endpoint Security and Management. Scaling: KSC Installation On A Failover Cluster
100% (1)
Kaspersky Endpoint Security and Management. Scaling: KSC Installation On A Failover Cluster
37 pages
A Mini Project Report On (Size 16) Attendence Management System
No ratings yet
A Mini Project Report On (Size 16) Attendence Management System
29 pages
Unix Orchadmin Commands
No ratings yet
Unix Orchadmin Commands
2 pages
SAP Data Migration With LSMW - SCN
No ratings yet
SAP Data Migration With LSMW - SCN
2 pages
Python
No ratings yet
Python
4 pages
DWDM Unit 2 PDF
No ratings yet
DWDM Unit 2 PDF
16 pages
Cloud Architecting - Week 3
No ratings yet
Cloud Architecting - Week 3
4 pages
PR125-IF4C001-Modbus-Tables
No ratings yet
PR125-IF4C001-Modbus-Tables
14 pages
Veeam Backup and Replication-Free Vs Full
No ratings yet
Veeam Backup and Replication-Free Vs Full
3 pages
HC26.11.310 HBM Bandwidth Kim Hynix Hot Chips HBM 2014 v7
No ratings yet
HC26.11.310 HBM Bandwidth Kim Hynix Hot Chips HBM 2014 v7
24 pages
2025-05-11
No ratings yet
2025-05-11
5 pages
Azure Synapse Analytics
No ratings yet
Azure Synapse Analytics
4 pages

Data Science: October 2021

Uploaded by

Data Science: October 2021

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Presentation · October 2021

The user has requested enhancement of the downloaded file.

Fall under the same umbrella which is learning from data.

• The fact is that today we are “datafied”.

• Wherever we go, we leave a trail of data.

• Smartphones for example, are tracking our locations.

• We leave it a data trail in our web browsing.

• We also interact a lot today with social networks, leaving behind us

• Today we talk more about data science.

How to do science with data?

Computer Science + Data Mining = Data Science

• Solve real company problem using data

• Image recognition and speech recognition:

• Following are some tools required for data science:

• High variety of information & data is required for accurate analysis

We see that datetime field consist

Data scientist – Interpret and Guide

Learn / Optimize A/B Testing, Experimentation,

Explore/Transform Cleaning, Anomaly Detection, Prep

Reliable data flow,

Collect Instrumentation, Logging, Sensors,

You might also like