0% found this document useful (0 votes)

30 views29 pages

AIDS C04-Session-19

Uploaded by

Likhitha Bhagyasri Yella

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views29 pages

AIDS C04-Session-19

Uploaded by

Likhitha Bhagyasri Yella

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

21CS2213RA

AI for Data Science

Session -19

Contents: Data science-an introduction

1
Session Objective
• An ability to understand about Data Science.

• An ability to Understand the real-life application and uses of

Data Science
Data Science

• Data science combines the scientific method, math and statistics,

specialized programming, advanced analytics, AI, and even storytelling to
uncover and explain the business insights buried in data.

• Data science is a multidisciplinary approach to extracting actionable

insights from the large and ever-increasing volumes of data collected and
created by organizations.

• Data science is all about using data to solve problems.

Cont.

Data science is–

 preparing data for analysis and processing.

 performing advanced data analysis.

 presenting the results to reveal patterns and enable stakeholders to draw

informed conclusions.
Cont.

Data science enables businesses to process huge amounts of structured and

unstructured big data to detect patterns.
Data science lifecycle

• The data science lifecycle also called the data science pipeline. Following steps

involved in Data Science Life Cycle.

 Step 1: Define Problem Statement: Creating a well-defined problem

statement is a first and critical step in data science.
 Step 2: Data Collection: need to collect the data which can help to solve
the problem through systematic approach.
 Step 3: Data Quality Check and Remediation: Ensuring the data that is
used for analysis and interpretation is of good quality.
Cont.

Step 4: Exploratory Data Analysis: Before you model the steps to arrive
at a solution, it’s important to analyse the data.

Step 5: Data Modelling: Modelling means formulating every step and

gather the techniques required to achieve the solution.

Step 6: Data Communication: This is the final step where you present the
results from your analysis to the stakeholders. You explain to them how you
came to a specific conclusion and your critical findings .
Cont.

Data Science life cycle

Cont. (Given by IBM)

• The data science lifecycle includes anywhere from five to sixteen steps.

• The processes common to just about everyone’s definition of the lifecycle

include the following:

 Capture: This is the gathering of raw structured and unstructured data

from all relevant sources via just about any method.

 Prepare and maintain: This involves putting the raw data into a
consistent format for analytics or machine learning or deep learning
models.
Cont.

 Preprocess or process: To examine biases, patterns, ranges, and

distributions of values within the data to determine the data’s suitability
for use with predictive analytics, machine learning, and/or deep learning
algorithms.

 Analyze: This is where perform statistical analysis, predictive analytics,

regression, machine learning and deep learning algorithms, and more to
extract insights from the prepared data.
Cont.

 Communicate: Finally, the insights are presented as reports, charts, and

other data visualizations that make the insights—and their impact on the
business—easier for decision-makers to understand.
Types of data

• Always need to look at what

types of data are involved.

 Known Data

 Unknown Data

 Others’ decisions

 Your decisions
Data Science Tools

• To build and run code in order to create models, the most popular programming
languages are open-source tools that include or support pre-built statistical, machine
learning and graphics capabilities. These languages include:

 R: An open-source programming language and environment for developing statistical

computing and graphics

 Python: Python is a general-purpose, object-oriented, high-level programming

language that emphasizes code readability through its distinctive generous use of
white space.
Cont.

 SQL Analysis Services: Use perform in-

database analytics using common data
mining functions and basic predictive
models.

 SAS/ACCESS: Can be used to access

data from Hadoop and is used for creating
repeatable and reusable model flow
diagrams.
SAS: Statistical Analysis System
Data Science Applications

 Identifying and predicting disease

 Personalized healthcare recommendations

 Optimizing shipping routes in real-time

 Getting the most value out of soccer rosters

 Finding the next slew of world-class athletes

 Stamping out tax fraud

 Automating digital ad placement

 Algorithms that help you find love

 Predicting incarceration rates

Big Data

• Big data is a collection of massive and complex data sets and data volume.

• It include the huge quantities of data, data management capabilities, social

media analytics and real-time data.

• Big data is about data volume and large data set's measured in terms of
terabytes or petabytes.

• After examining of Bigdata, the data has been launched as Big Data analytics.

• Big data analytics is the process of examining large amounts of data.

5 Vs in Big Data

• Doug Laney introduced this concept of 3 Vs of Big Data, viz. Volume, Variety, and
Velocity.

Volume: refers to the amount of data that is being collected (the data could be
structured or unstructured).

Velocity: refers to the rate at which data is coming in.

Variety: refers to the different kinds of data (data types, formats, etc.) that is
coming in for analysis.
Cont.

Over the last few years, 2 additional Vs of data have also

emerged i.e. value and veracity.

Value refers to the usefulness of the collected data.

Veracity refers to the quality of data that is coming in from

different sources.
Types of Data Science
Data Analytics

•Data analytics is the science of analyzing raw data to make

conclusions about that information.

•The techniques and processes of data analytics have been

automated into mechanical processes and algorithms that work
over raw data for human consumption.

•Data analytics help a business optimize its performance.

Data Science and Data Analytics (Two sides of the same coin)

• Data science is an umbrella term that encompasses data

analytics, data mining, machine learning, and several other
related disciplines.
 Data Science and Data Analytics utilize data in different ways.

 Data Science and Data Analytics deal with Big Data, each
taking a unique approach.
 Data analytics is mainly concerned with Statistics,
Mathematics, and Statistical Analysis.
Cont.

 Data Science focuses on finding meaningful correlations

between large datasets.
 Data Analytics is designed to uncover the specifics of extracted
insights.

Note: Data Analytics is a branch of Data Science that focuses on

more specific answers to the questions that Data Science brings
forth.
Key Points

• Data science and data analytics both fields are ways of understanding big data, and
both often involve analyzing massive databases using R and Python.

• SAS/ACCESS engines are tightly integrated and used by all SAS solutions for third-
party data integration, supported integration standards include ODBC, JDBC, Spark
SQL (on SAS Viya) and OLE DB.

• Internet users generate about 2.5 quintillion bytes of data every day. By 2020, every
person on Earth will be generating about 146,880 GB of data every day, and by 2025,
that will be 165 zettabytes every year.
Lab/Skilling

Case Study: Diabetes Prevention

What if we could predict the occurrence of diabetes and take appropriate measures
beforehand to prevent it?
Conclusion

• We should be careful and not directly link data analytics and data science to artificial
intelligence and machine learning.

• There are different types of data to consider when we face a complex problem with
lots of data.

• We can also use Apache Spark, Tableau and Snowflake, Google machine learning
stack Tensorflow, NLP training and Deep learning experience are all part of the data
science toolkit
Placement Related/Industry Oriented

• Data preparation and analysis are the most important data science skills, but data
preparation alone typically consumes 60 to 70 percent of a data scientist’s time.

• By 2020, there will be around 40 zettabytes of data, that's 40 trillion gigabytes.

• The amount of data that exists grows exponentially.

• At any time, about 90 percent of this huge amount of data gets generated in the most
recent two years, according to sources like IBM and SINTEF.

• This means there is a huge amount of work in data science.

References

• https://www.ibm.com/cloud/learn/data-science-introduction
• https://www.edureka.co/blog/what-is-data-science/
• https://towardsdatascience.com/intro-to-data-science-531079c38b22
• https://www.omnisci.com/learn/data-science
• https://www.edureka.co/blog/what-is-data-science/
• https://www.edureka.co/blog/data-science-applications/
• https://www.omnisci.com/learn/data-science
Next Class Topic

In next class I will cover following topics-

 Data pre-processing
 Feature extraction technique
Thank you

Ingersoll Rand System Automation X8I Modbus Rtu User's Manual
No ratings yet
Ingersoll Rand System Automation X8I Modbus Rtu User's Manual
138 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Pt. Fortindo Sukses Makmur: Price List
No ratings yet
Pt. Fortindo Sukses Makmur: Price List
22 pages
Petrel 2014 1 Release Notes
No ratings yet
Petrel 2014 1 Release Notes
46 pages
CE 3220 11 Drilling Rock and Earth PDF
No ratings yet
CE 3220 11 Drilling Rock and Earth PDF
67 pages
Ref System Fault Location
No ratings yet
Ref System Fault Location
24 pages
Chapter 7-8 - Drilling Bits Cost
No ratings yet
Chapter 7-8 - Drilling Bits Cost
75 pages
Hydrogen Use in Internal Combustion Engine - A Review
No ratings yet
Hydrogen Use in Internal Combustion Engine - A Review
13 pages
Tesa Hotel Catalogue en
No ratings yet
Tesa Hotel Catalogue en
11 pages
HaightAshburyFreePressVol 1no 61968D D TeoliJr A C 1
100% (1)
HaightAshburyFreePressVol 1no 61968D D TeoliJr A C 1
16 pages
Math Investigation
No ratings yet
Math Investigation
20 pages
314 - Guidelines For The Use of Weathering Steel in Bridges
No ratings yet
314 - Guidelines For The Use of Weathering Steel in Bridges
108 pages
Comparison Fagll03 Fbl3n Fbl5n
No ratings yet
Comparison Fagll03 Fbl3n Fbl5n
2 pages
Chinese Pidgin English - Bibliography PDF
No ratings yet
Chinese Pidgin English - Bibliography PDF
7 pages
Conference Program
No ratings yet
Conference Program
5 pages
Datascience
75% (8)
Datascience
28 pages
Honda 2012 Cbr1000rr Parts List
100% (70)
Honda 2012 Cbr1000rr Parts List
4 pages
Pakala Narayana Swami V King Emperor
100% (1)
Pakala Narayana Swami V King Emperor
12 pages
Unit 2
No ratings yet
Unit 2
71 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Data Science: Chapter 1: Introduction To Big Data
100% (2)
Data Science: Chapter 1: Introduction To Big Data
77 pages
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
No ratings yet
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
6 pages
Ae1 Listening Test Paper 08.2021
No ratings yet
Ae1 Listening Test Paper 08.2021
3 pages
Static Methods and Variables
No ratings yet
Static Methods and Variables
2 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Data Science - AD1102-1
No ratings yet
Data Science - AD1102-1
53 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data
No ratings yet
Data
43 pages
What Is New in Netbackup 6.5
No ratings yet
What Is New in Netbackup 6.5
42 pages
PT - Inspirasi Kreasi Sejahtera: Mulai Project P.I.C Minggu 1 Minggu 2 Minggu 3
No ratings yet
PT - Inspirasi Kreasi Sejahtera: Mulai Project P.I.C Minggu 1 Minggu 2 Minggu 3
1 page
3250+module+1+ +Intro+to+Data+Science
No ratings yet
3250+module+1+ +Intro+to+Data+Science
71 pages
Session 1819
No ratings yet
Session 1819
47 pages
BACTERIAL QUALITY AND DEPURATION OF THE GREEN MUSSEL Perna Viridis From Natural Beds
No ratings yet
BACTERIAL QUALITY AND DEPURATION OF THE GREEN MUSSEL Perna Viridis From Natural Beds
6 pages
Inroduction To Data Science
No ratings yet
Inroduction To Data Science
62 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
119 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
Unit 3 Part 1
No ratings yet
Unit 3 Part 1
43 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
16 pages
M1 - FDS
No ratings yet
M1 - FDS
19 pages
File
No ratings yet
File
27 pages
Data Science 2
No ratings yet
Data Science 2
3 pages
Grade 10 Physics Assessment
No ratings yet
Grade 10 Physics Assessment
1 page
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Activity in STS 101
No ratings yet
Activity in STS 101
3 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Week4 EnhancedSystemDecomposition Part2
No ratings yet
Week4 EnhancedSystemDecomposition Part2
22 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
Data Science Components
No ratings yet
Data Science Components
7 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
Adobe Scan 09 Sept 2024
No ratings yet
Adobe Scan 09 Sept 2024
4 pages
DS Unit 1 - ABM
No ratings yet
DS Unit 1 - ABM
103 pages
Kadir
No ratings yet
Kadir
84 pages
Ultrasonic Sensor UB 0-GM 75 - 5: Dimensions
No ratings yet
Ultrasonic Sensor UB 0-GM 75 - 5: Dimensions
5 pages
Module 1 Applied Data Science 1.1 and 1.2
No ratings yet
Module 1 Applied Data Science 1.1 and 1.2
104 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Himadev
No ratings yet
Himadev
37 pages
Introduction To Data-Science
No ratings yet
Introduction To Data-Science
246 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
Chapter 1
No ratings yet
Chapter 1
85 pages
Data Science Presentation
No ratings yet
Data Science Presentation
27 pages
Monthly Bill
No ratings yet
Monthly Bill
1 page
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Thoits 1994 StressorsProblemSolvingIndividual
No ratings yet
Thoits 1994 StressorsProblemSolvingIndividual
19 pages
347 862932 Introduction
No ratings yet
347 862932 Introduction
35 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
Performance Management System in Nigeria: An Evaluation of New Aper in Federal Civil Service of Nigeria Pillah, Tyodzer Patrick, PHD
No ratings yet
Performance Management System in Nigeria: An Evaluation of New Aper in Federal Civil Service of Nigeria Pillah, Tyodzer Patrick, PHD
9 pages
Data Science Unit I
No ratings yet
Data Science Unit I
13 pages
Trends in Data Science: AI and DS-I
No ratings yet
Trends in Data Science: AI and DS-I
32 pages
Datascience With Python
No ratings yet
Datascience With Python
178 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
Introductiontodatascience 230122140841 B90a0856
No ratings yet
Introductiontodatascience 230122140841 B90a0856
44 pages
Ids Unit-I
No ratings yet
Ids Unit-I
34 pages
Datascience Internship
No ratings yet
Datascience Internship
19 pages
CO1 1 Introduction To Data Science, Evolution of Data SciencE
No ratings yet
CO1 1 Introduction To Data Science, Evolution of Data SciencE
24 pages
DS Notes
No ratings yet
DS Notes
159 pages
Introductiontodatascience 230122140841 B90a0856 1
No ratings yet
Introductiontodatascience 230122140841 B90a0856 1
44 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet

AIDS C04-Session-19

Uploaded by

AIDS C04-Session-19

Uploaded by

21CS2213RA

AI for Data Science

Contents: Data science-an introduction

• An ability to Understand the real-life application and uses of

• Data science combines the scientific method, math and statistics,

• Data science is a multidisciplinary approach to extracting actionable

• Data science is all about using data to solve problems.

Data science is–

 preparing data for analysis and processing.

 performing advanced data analysis.

 presenting the results to reveal patterns and enable stakeholders to draw

Data science enables businesses to process huge amounts of structured and

involved in Data Science Life Cycle.

 Step 1: Define Problem Statement: Creating a well-defined problem

Step 5: Data Modelling: Modelling means formulating every step and

Data Science life cycle

• The processes common to just about everyone’s definition of the lifecycle

 Capture: This is the gathering of raw structured and unstructured data

 Preprocess or process: To examine biases, patterns, ranges, and

 Analyze: This is where perform statistical analysis, predictive analytics,

 Communicate: Finally, the insights are presented as reports, charts, and

• Always need to look at what

 R: An open-source programming language and environment for developing statistical

 Python: Python is a general-purpose, object-oriented, high-level programming

 SQL Analysis Services: Use perform in-

 SAS/ACCESS: Can be used to access

 Identifying and predicting disease

 Personalized healthcare recommendations

 Optimizing shipping routes in real-time

 Getting the most value out of soccer rosters

 Finding the next slew of world-class athletes

 Stamping out tax fraud

 Automating digital ad placement

 Algorithms that help you find love

 Predicting incarceration rates

• It include the huge quantities of data, data management capabilities, social

• Big data analytics is the process of examining large amounts of data.

Velocity: refers to the rate at which data is coming in.

Over the last few years, 2 additional Vs of data have also

Value refers to the usefulness of the collected data.

Veracity refers to the quality of data that is coming in from

•Data analytics is the science of analyzing raw data to make

•The techniques and processes of data analytics have been

•Data analytics help a business optimize its performance.

• Data science is an umbrella term that encompasses data

 Data Science focuses on finding meaningful correlations

Note: Data Analytics is a branch of Data Science that focuses on

Case Study: Diabetes Prevention

• By 2020, there will be around 40 zettabytes of data, that's 40 trillion gigabytes.

• The amount of data that exists grows exponentially.

• This means there is a huge amount of work in data science.

In next class I will cover following topics-

You might also like