0% found this document useful (0 votes)

12 views9 pages

Data Science Introduction

Uploaded by

guruvarshniganesapandi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views9 pages

Data Science Introduction

Uploaded by

guruvarshniganesapandi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Data Science Introduction

Data Science is a combination of multiple disciplines that uses statistics,

data analysis, and machine learning to analyze data and to extract
knowledge and insights from it.

What is Data Science?

Data Science is about data gathering, analysis and decision-making.

Data Science is about finding patterns in data, through analysis, and make
future predictions.

By using Data Science, companies are able to make:

 Better decisions (should we choose A or B)

 Predictive analysis (what will happen next?)
 Pattern discoveries (find pattern, or maybe hidden information in the
data)

Data Science enables companies to efficiently understand gigantic data from multiple sources
and derive valuable insights to make smarter data-driven decisions. Data Science is widely
used in various industry domains, including marketing, healthcare, finance, banking, policy
work, and more.

Where is Data Science Needed?

Data Science is used in many industries in the world today, e.g. banking,
consultancy, healthcare, and manufacturing.

Examples of where Data Science is needed:

 For route planning: To discover the best routes to ship

 To foresee delays for flight/ship/train etc. (through predictive analysis)
 To create promotional offers
 To find the best suited time to deliver goods
 To forecast the next years revenue for a company
 To analyze health benefit of training
 To predict who will win elections
Data Science can be applied in nearly every part of a business where data is
available. Examples are:

 Consumer goods
 Stock markets
 Industry
 Politics
 Logistic companies
 E-commerce

How Does a Data Scientist Work?

A Data Scientist requires expertise in several backgrounds:

 Machine Learning
 Statistics
 Programming (Python or R)
 Mathematics
 Databases

A Data Scientist must find patterns within the data. Before he/she can find the
patterns, he/she must organize the data in a standard format.

Here is how a Data Scientist works:

1. Ask the right questions - To understand the business problem.

2. Explore and collect data - From database, web logs, customer
feedback, etc.
3. Extract the data - Transform the data to a standardized format.
4. Clean the data - Remove erroneous values from the data.
5. Find and replace missing values - Check for missing values and
replace them with a suitable value (e.g. an average value).
6. Normalize data - Scale the values in a practical range (e.g. 140 cm is
smaller than 1,8 m. However, the number 140 is larger than 1,8. - so
scaling is important).
7. Analyze data, find patterns and make future predictions.
8. Represent the result - Present the result with useful insights in a way
the "company" can understand.
What is Data?
Data is a collection of information.

One purpose of Data Science is to structure data, making it interpretable and

easy to work with.

Data can be categorized into two groups:

 Structured data
 Unstructured data

Unstructured Data
Unstructured data is not organized. We must organize the data for analysis
purposes.
Structured Data
Structured data is organized and easier to work with.

How to Structure Data?

We can use an array or a database table to structure or present data.

Example of an array:
[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]

The following example shows how to create an array in Python:

Example
Array = [80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
print(Array)

Try it Yourself »

It is common to work with very large data sets in Data Science.

In this tutorial we will try to make it as easy as possible to understand the

concepts of Data Science. We will therefore work with a small data set that is
easy to interpret.

Database Table
A database table is a table with structured data.

The following table shows a database table with health data extracted from a
sports watch:

Duration Average_Pulse Max_Pulse Calorie_Burnage Hours_Work

30 80 120 240 10

30 85 120 250 10

45 90 130 260 8
45 95 130 270 8

45 100 140 280 0

60 105 140 290 7

60 110 145 300 7

60 115 145 310 8

75 120 150 320 0

75 125 150 330 8

This dataset contains information of a typical training session such as duration,

average pulse, calorie burnage etc.

Database Table Structure

A database table consists of column(s) and row(s):

Column 1 Column 2 Column 3 Column 4 Column 5

Duration Average_Pulse Max_Pulse Calorie_Burnage Hours_Work

Row 1 30 80 120 240 10

Row 2 30 85 120 250 10

Row 3 45 90 130 260 8

Row 4 45 95 130 270 8

Row 5 45 100 140 280 0

Row 6 60 105 140 290 7

Row 7 60 110 145 300 7

Row 8 60 115 145 310 8

Row 9 75 120 150 320 0

Row 10 75 125 150 330 8

A row is a horizontal representation of data.

A column is a vertical representation of data.

Variables
A variable is defined as something that can be measured or counted.

Examples can be characters, numbers or time.

In the example under, we can observe that each column represents a variable.
Duration Average_Pulse Max_Pulse Calorie_Burnage Hours_Work

30 80 120 240 10

30 85 120 250 10

45 90 130 260 8

45 95 130 270 8

45 100 140 280 0

60 105 140 290 7

60 110 145 300 7

60 115 145 310 8

75 120 150 320 0

75 125 150 330 8

There are 6 columns, meaning that there are 6 variables (Duration,
Average_Pulse, Max_Pulse, Calorie_Burnage, Hours_Work, Hours_Sleep).

There are 11 rows, meaning that each variable has 10 observations.

But if there are 11 rows, how come there are only 10 observations?

It is because the first row is the label, meaning that it is the name of
the variable.

Python. Python is the most widely used data science programming language in the world today.
It is an open-source, easy-to-use language that has been around since the year 1991.

Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
Unit 1 DS BCA NOTES
No ratings yet
Unit 1 DS BCA NOTES
7 pages
Data Science Introduction_lecture Class.ppt
No ratings yet
Data Science Introduction_lecture Class.ppt
62 pages
Unit 1
No ratings yet
Unit 1
76 pages
Data Science - LT
No ratings yet
Data Science - LT
45 pages
imagePROGRAF_PRO-4100S_6100S_SM_r2_220516
No ratings yet
imagePROGRAF_PRO-4100S_6100S_SM_r2_220516
625 pages
Data Science Book
No ratings yet
Data Science Book
383 pages
M-1-FDS-NOTES-PPT (2) (1)
No ratings yet
M-1-FDS-NOTES-PPT (2) (1)
19 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Unit 1 DA
No ratings yet
Unit 1 DA
72 pages
Data Science Study materials
No ratings yet
Data Science Study materials
47 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
24 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
84 pages
Data Science: by Neha Tyagi
100% (1)
Data Science: by Neha Tyagi
17 pages
Vendor List
100% (1)
Vendor List
257 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
MODULE 3. SHS MIL - Q1 - W3 - Responsible Use of Media and Information
100% (2)
MODULE 3. SHS MIL - Q1 - W3 - Responsible Use of Media and Information
8 pages
1. Introduction to Data Science.docx
No ratings yet
1. Introduction to Data Science.docx
24 pages
IDS Unit 1 Notes
No ratings yet
IDS Unit 1 Notes
24 pages
DSBDA
No ratings yet
DSBDA
18 pages
unit 1 final (1)
No ratings yet
unit 1 final (1)
75 pages
Broadcom Company Overview IR 202006 FINAL PDF
No ratings yet
Broadcom Company Overview IR 202006 FINAL PDF
18 pages
DS231_Week_2
No ratings yet
DS231_Week_2
33 pages
Chapter 1-Introduction to data science
No ratings yet
Chapter 1-Introduction to data science
39 pages
DS231 Module 2
No ratings yet
DS231 Module 2
33 pages
Introduction to Data Science Lecture 1
No ratings yet
Introduction to Data Science Lecture 1
4 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
25 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
Data Science PDF
No ratings yet
Data Science PDF
8 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Data Science-New (Unit-I)
No ratings yet
Data Science-New (Unit-I)
18 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
Lesson 1 - Introduction To Data Science
No ratings yet
Lesson 1 - Introduction To Data Science
5 pages
Peachtree Quantum 2010 Basic Self-Study Guide
No ratings yet
Peachtree Quantum 2010 Basic Self-Study Guide
258 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Unit I
No ratings yet
Unit I
52 pages
Unit-1 - Introduction to Data Science
No ratings yet
Unit-1 - Introduction to Data Science
17 pages
Anu Data Scie
No ratings yet
Anu Data Scie
32 pages
Applied_Data_Science-MODULE-1-SEM8
No ratings yet
Applied_Data_Science-MODULE-1-SEM8
16 pages
49d634691070b2749a54e4ecd7d59f0d66a125e5 (1)
No ratings yet
49d634691070b2749a54e4ecd7d59f0d66a125e5 (1)
8 pages
2 Data Science Process 06-01-2024
No ratings yet
2 Data Science Process 06-01-2024
32 pages
Data Science
No ratings yet
Data Science
18 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
Project Report
No ratings yet
Project Report
29 pages
Unit 1-FDS
No ratings yet
Unit 1-FDS
18 pages
COMPUTATIONAL DATA SCIENCE - UNIT 1
No ratings yet
COMPUTATIONAL DATA SCIENCE - UNIT 1
18 pages
Data Scince
No ratings yet
Data Scince
8 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
Chapter 1 Introduction To HTML, HTTP and PHP
No ratings yet
Chapter 1 Introduction To HTML, HTTP and PHP
78 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
7 pages
Basic of ds
No ratings yet
Basic of ds
14 pages
Mifos Pay
No ratings yet
Mifos Pay
8 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
7 pages
DATA SCIENCE LIFE CYCLE
No ratings yet
DATA SCIENCE LIFE CYCLE
12 pages
Data Science
No ratings yet
Data Science
5 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
PracticeToMemorize
No ratings yet
PracticeToMemorize
16 pages
CSS_M7-12_Assessment2_Hilary Ndeze_v3 (2)
No ratings yet
CSS_M7-12_Assessment2_Hilary Ndeze_v3 (2)
24 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Unit 3
No ratings yet
Unit 3
9 pages
DuoDVR 222k User Guide
No ratings yet
DuoDVR 222k User Guide
149 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
DB WEEK 12
No ratings yet
DB WEEK 12
28 pages
Evolve III Maestro EBook11 User-Manual-4850141
No ratings yet
Evolve III Maestro EBook11 User-Manual-4850141
6 pages
Handheld Ultrasonic Flow Meter Manual 200923
No ratings yet
Handheld Ultrasonic Flow Meter Manual 200923
45 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
2-6-1 VSU Technology Principle
No ratings yet
2-6-1 VSU Technology Principle
27 pages
Bharti - Sharma-Interview CV Template
No ratings yet
Bharti - Sharma-Interview CV Template
4 pages
C Projects For Resume
100% (2)
C Projects For Resume
8 pages
22bpops103 203 Lab Manual Final
No ratings yet
22bpops103 203 Lab Manual Final
52 pages
Requirements Engineering Questionnaire: January 2001
No ratings yet
Requirements Engineering Questionnaire: January 2001
16 pages
PowerScale - Isilon - HD400-Installation and Setup Guide
No ratings yet
PowerScale - Isilon - HD400-Installation and Setup Guide
38 pages
Nshield HSM Family v11.72.02
No ratings yet
Nshield HSM Family v11.72.02
75 pages
MetFi-presentation 2
No ratings yet
MetFi-presentation 2
17 pages
W18 Operating System
No ratings yet
W18 Operating System
35 pages
UNIT 3 Solution Python Programming QUESTION BANK 2023-24
No ratings yet
UNIT 3 Solution Python Programming QUESTION BANK 2023-24
21 pages
J1939-2 - Agricultural and Forestry Off-Road Machinery Control and Communication Network - 2013-03
No ratings yet
J1939-2 - Agricultural and Forestry Off-Road Machinery Control and Communication Network - 2013-03
15 pages
Shocked Black Guy Know Your Meme
No ratings yet
Shocked Black Guy Know Your Meme
1 page
r2020 Datasheet
No ratings yet
r2020 Datasheet
2 pages
Co MM and S Descri Ption Type Set/Exec Ute Inquiry Test Parameters Examples Response
No ratings yet
Co MM and S Descri Ption Type Set/Exec Ute Inquiry Test Parameters Examples Response
6 pages
Customer Service Associate: Knowledge & Skills Required
No ratings yet
Customer Service Associate: Knowledge & Skills Required
3 pages
Assignment 6
No ratings yet
Assignment 6
4 pages
1st Announcement ACRS-2024
No ratings yet
1st Announcement ACRS-2024
2 pages
SQL Server: Tips and Tricks - 2
From Everand
SQL Server: Tips and Tricks - 2
Priyanka Agarwal
4.5/5 (3)
EXCEL: Microsoft: Boost Your Productivity Quickly! Learn Excel, Spreadsheets, Formulas, Shortcuts, & Macros
From Everand
EXCEL: Microsoft: Boost Your Productivity Quickly! Learn Excel, Spreadsheets, Formulas, Shortcuts, & Macros
Quick Start Guides
No ratings yet

Data Science Introduction

Uploaded by

Data Science Introduction

Uploaded by

Data Science Introduction

Data Science is a combination of multiple disciplines that uses statistics,

What is Data Science?

By using Data Science, companies are able to make:

 Better decisions (should we choose A or B)

Where is Data Science Needed?

Examples of where Data Science is needed:

 For route planning: To discover the best routes to ship

How Does a Data Scientist Work?

Here is how a Data Scientist works:

1. Ask the right questions - To understand the business problem.

One purpose of Data Science is to structure data, making it interpretable and

Data can be categorized into two groups:

How to Structure Data?

The following example shows how to create an array in Python:

It is common to work with very large data sets in Data Science.

In this tutorial we will try to make it as easy as possible to understand the

Duration Average_Pulse Max_Pulse Calorie_Burnage Hours_Work

45 100 140 280 0

60 105 140 290 7

60 110 145 300 7

60 115 145 310 8

75 120 150 320 0

75 125 150 330 8

This dataset contains information of a typical training session such as duration,

Database Table Structure

Column 1 Column 2 Column 3 Column 4 Column 5

Duration Average_Pulse Max_Pulse Calorie_Burnage Hours_Work

Row 2 30 85 120 250 10

Row 3 45 90 130 260 8

Row 4 45 95 130 270 8

Row 5 45 100 140 280 0

Row 6 60 105 140 290 7

Row 7 60 110 145 300 7

Row 8 60 115 145 310 8

Row 9 75 120 150 320 0

Row 10 75 125 150 330 8

A row is a horizontal representation of data.

A column is a vertical representation of data.

Examples can be characters, numbers or time.

45 100 140 280 0

60 105 140 290 7

60 110 145 300 7

60 115 145 310 8

75 120 150 320 0

75 125 150 330 8

There are 11 rows, meaning that each variable has 10 observations.

You might also like