Module 1

The document outlines a Data Science course, detailing course outcomes, syllabus modules, and key concepts in data science, including data collection, management, and analysis. It emphasizes the importance of data science in various industries and its applications, such as personalized healthcare and optimizing food delivery. Additionally, it distinguishes between data science and machine learning, explaining their respective roles and components.

Uploaded by

shindetrupti1507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views30 pages

Module 1

Uploaded by

shindetrupti1507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

OE-Data Science

6OE371
COs
Course Outcomes (CO) with Bloom’s Taxonomy Level

To acquaint core concepts and technologies in Data Science. Understanding

CO1
Demonstrate data collection and management using different Applying
CO2 technologies.
Study the key concepts in data science, including their real-world Applying
CO3 applications and toolkits used by data scientists.

Analyse and interpret large data sets in the context of real-world Analysing
CO4 problems.
Syllabus
Module Module Contents Hours
Module 1: Introduction to core concepts and technologies
I Introduction, Terminology, data science process, data science toolkit, Types of data, 4
Example applications
Module 2 Data Collection and Management
II Introduction, Sources of data, Data collection, Exploring and fixing data, Data storage 7
and management, Using multiple data sources.

Module 3 Data Pre-processing

III Data Cleaning, Data Integration, Data Reduction, Data Transformation and Data 8
Discretization.
Module 4 Data Visualization
Introduction, Types of data visualization, Data for visualization: Data types, Data
IV 6
encodings, Retinal variables, Mapping variables to encodings, visual encodings.

Module 5 Data Analysis

Introduction, Terminology and concepts, Introduction to statistics, Central tendencies
V and distributions, Variance, Distribution properties and arithmetic, Samples/CLT, 8
Correlation, Linear Regression, Least Squares, Residuals, Regression Inference,
classification, classifiers.
Module 6 Recent trends
Recent trends in various data collection and analysis techniques, various visualization
VI 6
techniques, Case Study, application development methods used in data science.
Introduction
• What is Data Science?
• Why Data Science?
• Components of DS
• Difference between ML and Data science
• Applications of Data Science.
What is Data Science?

• Deep study of the massive amount of data, which involves extracting

meaningful insights from raw, structured, and unstructured data that
is processed using the scientific method, different technologies, and
algorithms.
• Data science combines math and statistics, specialized programming,
advanced analytics, artificial intelligence (AI), and machine learning
with specific subject matter expertise to uncover actionable insights
hidden in an organization’s data.
What is Data Science?

• Data science is the study of data to extract meaningful insights for

business.
• Vast volumes of data using modern tools and techniques to find
unseen patterns, derive meaningful information, and make business
decisions.
Why Data Science?
• The purpose of data science is to find patterns.
• Data science enables businesses to interact with their customers.
• Products and businesses can better connect with their customers
when they use.
• Industries can quickly examine their problems and successfully
address them using data science.
• Depending on how data is used, can determine whether a product
succeeds or fails.
• Giving management and officials the ability to foster new ideas.
• An improved user experience.
Components of DS
Difference between DS and ML
Data Science Machine Learning
Data science is a field of computer science to extracts useful Machine Learning is a subset of Artificial Intelligence that
data from structured, unstructured, and semi-structured data. helps to make computers capable of predicting outcomes
based on training from old data/experience.

It primarily deals with data. Machine Learning uses data to learn from it and predict
insights or results.
Data in Data Science maybe or maybe not have evolved from It includes various technologies like supervised, unsupervised,
a machine or mechanical process. semi-supervised and reinforcement learning, regression,
clustering, etc.

It is broadly used as a multidisciplinary term. It is used in data science.

It includes various data operations such as cleaning, It includes operations such as data preparation, data analysis,
collection, manipulation, etc. training the model, etc.

It requires knowledge of various analytical functions and a It needs advanced knowledge of Data Modelling.
basic understanding of machine learning and Artificial
Intelligence.

It requires strong knowledge of Python, R, SAS, Scala, as well It requires knowledge of programming languages like Java,
as hands-on knowledge of SQL databases. Python, R as well as in-depth knowledge of mathematical
concepts such as probability and statistics.
Applications of Data Science
• Image recognition and speech recognition
• Gaming world
• Internet search
• Healthcare
• Recommendation systems
• Risk detection
Data Science Life Cycle / process
Data science toolkit
Types of Data
Qualitative data/Categorical
data
• Qualitative or Categorical Data describes the object under consideration
using a finite set of discrete classes.
• It means that this type of data can’t be counted or measured easily
using numbers and therefore divided into categories.
• The gender of a person (male, female, or others) is a good example of
this data type.
• These are usually extracted from audio, images, or text medium.
• Another example can be of a smartphone brand that provides
information about the current rating, the color of the phone, category
of the phone, and so on.
• All this information can be categorized as Qualitative data.
Nominal

• These are the set of values that don’t possess a natural ordering.
• e.g The color of a smartphone as we can’t compare one color with
others.
• It is not possible to state that ‘Red’ is greater than ‘Blue’.
• The gender of a person where we can’t differentiate between male,
female, or others.
• Nominal data types in statistics are not quantifiable and cannot be
measured through numerical units.
• Nominal types of statistical data are valuable while conducting
qualitative research as it extends freedom of opinion to subjects.
Ordinal
• These types of values have a natural ordering while maintaining their class of values.
• e.g If we consider the size of a clothing brand then we can easily sort them according
to their name tag in the order of small < medium < large.
• The grading system while marking candidates in a test can also be considered as an
ordinal data type where A+ is definitely better than B grade.
• These categories help us deciding which encoding strategy can be applied to which
type of data.
• Data encoding for Qualitative data is important because machine learning models
can’t handle these values directly and needed to be converted to numerical types as
the models are mathematical in nature.
• For nominal data type where there is no comparison among the categories, one-hot
encoding can be applied which is similar to binary coding considering there are in
less number and for the ordinal data type, label encoding can be applied which is a
form of integer encoding.
Quantitative Data Type

• This data type tries to quantify things and it does by considering

numerical values that make it countable in nature.
• The price of a smartphone, discount offered, number of ratings on a
product, the frequency of processor of a smartphone, or ram of that
particular phone, all these things fall under the category of
Quantitative data types.
• The key thing is that there can be an infinite number of values a
feature can take.
• For instance, the price of a smartphone can vary from x amount to any
value and it can be further broken down based on fractional values.
• Interval-scaled attributes
• Type of numerical attribute where the difference between two values
is meaningful.
• The term "interval scale" refers to an ordered series of numbers
where the difference between the values is consistent, but the zero
point is not truly meaningful.
• Interval variables can be added and subtracted, providing meaningful
results.
• For example, temperature, as measured in degrees Celsius or
Fahrenheit, is an interval variable.
• If it is 20 degrees today and 30 degrees tomorrow, it is correct to say
that tomorrow is 10 degrees hotter than today.
• Ratio-scaled attribute
• Ratio variables are a type of numerical attribute where the difference
between two values is meaningful and there is a true "zero" point,
which denotes the absence of the quantity.
• This zero point allows for the comparison of values through
multiplication or division, unlike interval-scaled attributes.
• Examples of ratio variables include age, salary, and height.
• In these examples, a value of 0 signifies the absence of the quantity: 0
years old means no age or not born yet, a salary of $0 means no
income, and a height of 0 cm signifies no height.
• If person A is 20 years old and person B is 40 years old, it's correct to
say that person B is twice as old as person A.
Discrete
• Discrete data is a type of numerical data that only takes specific or
'discrete' values and cannot be meaningfully subdivided into smaller
increments. This often corresponds to items or events that are countable.
• Examples of discrete data include:
• The number of pets a person has. You can have 2 dogs or 3 dogs, but it
doesn't make sense to have 2.7 dogs.
• The number of cars in a parking lot. You can have 10, 20, or 30 cars, but
not 22.5 cars.
• The number of students in a class. You can't have a fraction of a student.
Continuous
• Continuous data is a type of numerical data that can take on any value within
a certain range.
• Continuous data can be meaningfully subdivided into finer and finer
increments, depending on the precision of the measurement system.
• Examples of continuous data include:
• The height of people. You can be 170.18 cm or 170.19 cm tall, or any height in
between.
• The time it takes to run a marathon. It could be 3 hours, 45 minutes, 30.2
seconds, or 3 hours, 45 minutes, 30.3 seconds, or any time in between.
• The weight of a bag of apples. It could be 1.5 kg, 1.51 kg, 1.515 kg, and so on,
depending on how precise your scale is.
Exercise
Select the measurement scale Nominal, Ordinal, Interval or Ratio for
each scenario.
• A person’s age.
• A person’s race.
• Age groupings (baby, toddler, adolescent, teenager, adult, elderly).
• Clothing brand.
• A person’s IQ score.
• Temperature in degrees Celsius.
• The amount of mercury in a tuna fish.
Exercise
• Select the measurement scale Nominal, Ordinal, Interval or Ratio for each
scenario.
• Temperature in degrees Kelvin.
• Eye color.
• Year in school (freshman, sophomore, junior, senior).
• The weight of a hummingbird.
• The height of a building.
• The amount of iron in a person’s blood.
• A person’s gender.
• A person’s race.
Exercise
• State which type of variable each is, qualitative or quantitative?
• A person’s age.
• A person’s gender.
• The amount of mercury in a tuna fish.
• The weight of an elephant.
• Temperature in degrees Fahrenheit.
• State which type of variable each is, qualitative or quantitative?
• The height of a giraffe.
• A person’s race.
• Hair color.
• A person’s ethnicity.
• Year in school (freshman, sophomore, junior, senior).
Exercise
• State whether the variable is discrete or continuous.
• A person’s weight.
• The height of a building.
• A person’s age.
• The number of floors of a skyscraper.
• The number of clothing items available for purchase.
• State whether the variable is discrete or continuous.
• Temperature in degrees Celsius.
• The number of cars for sale at a car dealership.
• The time it takes to run a marathon.
• The amount of mercury in a tuna fish.
• The weight of a hummingbird.
Real life applications of data
science
• PERSONALIZING TREATMENT PLANS
• Oncora’s software uses machine learning to create personalized
recommendations for current cancer patients based on data from
past ones. Healthcare facilities using the company’s platform include
UT Health San Antonio and Scripps Health. Their radiology team
collaborated with Oncora data scientists to mine 15 years’ worth of
data on diagnoses, treatment plans, outcomes and side effects from
more than 50,000 cancer records. Based on this data, Oncora’s
algorithm learned to suggest personalized chemotherapy and
radiation regimens.
Real life applications of data
science
• OPTIMIZING FOOD DELIVERY
• The data scientists at UberEats have a fairly simple goal:
getting hot food delivered quickly. Making that happen across the
country though, takes machine learning, advanced statistical
modeling and staff meteorologists. In order to optimize the full
delivery process, the team has to predict how every possible variable
— from storms to holiday rushes — will impact traffic and cooking
time.
Real life applications of data
science
• TRACKING PHYSICAL DATA FOR ATHLETES
• WHOOP makes wearable devices that track athletes’ physical data like
resting heart rate, sleep cycle and respiratory rate. The goal is to help
athletes understand when to push their training and when to rest —
and to make sure they’re taking the necessary steps to get the most
out of their body. Professional athletes like Olympic sprinter Gabby
Thomas, Olympic golfer Nelly Korda and PGA golfer Nick Watney are
among the WHOOPS’ users, according to the company’s website.
Real life applications of data
science
• SUGGESTING FRIENDS ON FACEBOOK
• Meta’s Facebook platform, of course, uses data science in various
ways, but one of its buzzier data-driven features is the “People You
May Know” sidebar, which appears on the social network’s home
screen. Often creepily prescient, it’s based on a user’s friend list, the
people they’ve been tagged with in photos and where they’ve worked
and gone to school. It’s also based on “really good math,” according
to the Washington Post — specifically, a type of data science known
as network science, which essentially forecasts the growth of a user’s
social network based on the growth of similar users’ networks.

Unit - 1 Notes - Introduction To Data-Analytics PDF
50% (2)
Unit - 1 Notes - Introduction To Data-Analytics PDF
106 pages
Differentiated Pedagogy
No ratings yet
Differentiated Pedagogy
46 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
No ratings yet
APznzaaTDyVpfrWbShDImgnP-JNu1yemoc2q17hXX6oIqf5nIMDti35MPCYygccsLGx4mqqqRwgsi2RuPcVeljJjLK2Pq4TVL61kXZn9tn...2w1U2TrfzirKNSEEtdBLb8IeJCqR_3agy5mhPSa-CSFFcgwGcoNjFXZ9PqDyWyLxttkHmEwQMqOnNarT7o0Mr15grkiNoeFL8MUjcekWCARrZ5jNz30iru5gxh
73 pages
Unit 1; Data Analytics (KCA-034)
No ratings yet
Unit 1; Data Analytics (KCA-034)
21 pages
UNIT 1
No ratings yet
UNIT 1
85 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
FHA UNIT 1 INTRODUCTION
No ratings yet
FHA UNIT 1 INTRODUCTION
8 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
24 pages
DSUR Notes-1
No ratings yet
DSUR Notes-1
12 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
EDA 1
No ratings yet
EDA 1
137 pages
Biostatistics - Data and Its Types
No ratings yet
Biostatistics - Data and Its Types
11 pages
SFDC Defination and Terminolgy
No ratings yet
SFDC Defination and Terminolgy
18 pages
W1L1,2,3 Lecture Script
No ratings yet
W1L1,2,3 Lecture Script
17 pages
Unit 1-Part2
No ratings yet
Unit 1-Part2
28 pages
Chapter 1.1 Introduction to Data
No ratings yet
Chapter 1.1 Introduction to Data
10 pages
Week 2
No ratings yet
Week 2
30 pages
CHAR OF DATA DV 1
No ratings yet
CHAR OF DATA DV 1
14 pages
Module 3 Data Types
No ratings yet
Module 3 Data Types
10 pages
Statistical Learning - Introduction
No ratings yet
Statistical Learning - Introduction
20 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Wk. 3. Data (12-05-2021)
No ratings yet
Wk. 3. Data (12-05-2021)
57 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Data Science and Ai Education For Young Minds
No ratings yet
Data Science and Ai Education For Young Minds
75 pages
Chapter 1 (6)
No ratings yet
Chapter 1 (6)
62 pages
Data Integration
No ratings yet
Data Integration
21 pages
business Analytics (tanya pandey) mba m3a
No ratings yet
business Analytics (tanya pandey) mba m3a
64 pages
DSI Guide - Types of Data-3
No ratings yet
DSI Guide - Types of Data-3
19 pages
Lecture_2_Basics of Data Science (1)
No ratings yet
Lecture_2_Basics of Data Science (1)
56 pages
ML-Lecture-4-data
No ratings yet
ML-Lecture-4-data
22 pages
intro to DS
No ratings yet
intro to DS
18 pages
classVIII DS Student Handbook
No ratings yet
classVIII DS Student Handbook
30 pages
CSE512 DataAndImageModels
No ratings yet
CSE512 DataAndImageModels
82 pages
ITDS Unit 1_merged
No ratings yet
ITDS Unit 1_merged
86 pages
FDS Module 1 Notes
No ratings yet
FDS Module 1 Notes
27 pages
Data Analytics
No ratings yet
Data Analytics
302 pages
I. Data Collection What Is Data?
No ratings yet
I. Data Collection What Is Data?
12 pages
Unit-2-1
No ratings yet
Unit-2-1
48 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
Machine Learning
No ratings yet
Machine Learning
57 pages
Unit 1
No ratings yet
Unit 1
28 pages
fds print
No ratings yet
fds print
7 pages
Unit01-Advanced Data Management Techniques
No ratings yet
Unit01-Advanced Data Management Techniques
11 pages
Advanced Data Management Techniques
No ratings yet
Advanced Data Management Techniques
257 pages
Data Types for Analyst
No ratings yet
Data Types for Analyst
8 pages
Lesson 3 Data Science
No ratings yet
Lesson 3 Data Science
12 pages
DAV 1 UNIT
No ratings yet
DAV 1 UNIT
30 pages
Data m2
No ratings yet
Data m2
34 pages
DSA QB 2023-24
No ratings yet
DSA QB 2023-24
3 pages
Basics of Data and Types of Data
No ratings yet
Basics of Data and Types of Data
3 pages
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Types of Data and Data Quality: KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
No ratings yet
Types of Data and Data Quality: KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
25 pages
UNIT1.5
No ratings yet
UNIT1.5
11 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Dr. Ayaz_Data Science Presentation
No ratings yet
Dr. Ayaz_Data Science Presentation
164 pages
datas_unit1
No ratings yet
datas_unit1
20 pages
W1 What Is Play and Developmentally Appropriate Practices in Early Childhood Education
100% (3)
W1 What Is Play and Developmentally Appropriate Practices in Early Childhood Education
46 pages
Wa0078.
No ratings yet
Wa0078.
116 pages
Socio-Economic Status of Slum Dwellers: An Empirical Study On The Capital City of Bangladesh
No ratings yet
Socio-Economic Status of Slum Dwellers: An Empirical Study On The Capital City of Bangladesh
7 pages
Home Economics Literacy
No ratings yet
Home Economics Literacy
2 pages
Cir 101 Periodic Test 1 Schedule For Class I To XII
No ratings yet
Cir 101 Periodic Test 1 Schedule For Class I To XII
2 pages
FDP On Foundations of Data Science-VNRVJIET - 22 - 07 - 2022
No ratings yet
FDP On Foundations of Data Science-VNRVJIET - 22 - 07 - 2022
3 pages
Research Methods Teaching Notes
No ratings yet
Research Methods Teaching Notes
56 pages
Buss-Human Nature and Culture
No ratings yet
Buss-Human Nature and Culture
24 pages
Churail Series Research
No ratings yet
Churail Series Research
14 pages
Jawaharlal Nehru Technological University Hyderabad: Dr. M. Chandra Mohan
No ratings yet
Jawaharlal Nehru Technological University Hyderabad: Dr. M. Chandra Mohan
17 pages
Betty Newmens Theory
No ratings yet
Betty Newmens Theory
39 pages
Chapter 5 Typical and A Typical Development Among Children
No ratings yet
Chapter 5 Typical and A Typical Development Among Children
44 pages
Practice Test 1: Questions 1-6
100% (1)
Practice Test 1: Questions 1-6
10 pages
Multiple Intelligences
No ratings yet
Multiple Intelligences
29 pages
Yes-No-Not-Given-Lesson KC
No ratings yet
Yes-No-Not-Given-Lesson KC
9 pages
Edited AY2023 24 Course Outline Addis Ababa University
No ratings yet
Edited AY2023 24 Course Outline Addis Ababa University
3 pages
Nursing Care Study
No ratings yet
Nursing Care Study
20 pages
Sheessh Na Research Na Walang Prelim
No ratings yet
Sheessh Na Research Na Walang Prelim
67 pages
AI Chronic Disease Management Systems_
No ratings yet
AI Chronic Disease Management Systems_
10 pages
Key ORG20
No ratings yet
Key ORG20
6 pages
Linguistic Theories, Branches and Fields
100% (4)
Linguistic Theories, Branches and Fields
13 pages
Thesis Chapter 3 Introduction Sample
100% (3)
Thesis Chapter 3 Introduction Sample
7 pages
Forms & Functions of Social Group PDF
No ratings yet
Forms & Functions of Social Group PDF
29 pages
Problem Based Learning (PBL) : A Conundrum
No ratings yet
Problem Based Learning (PBL) : A Conundrum
13 pages
LESSON 3 - Freedom As Foundation For Moral Acts Ethics and Culture
100% (1)
LESSON 3 - Freedom As Foundation For Moral Acts Ethics and Culture
21 pages
Material Design in CLIL
100% (2)
Material Design in CLIL
53 pages
Module4 Measuring Disease Part II
No ratings yet
Module4 Measuring Disease Part II
38 pages
User Involvement in Software Development: The Good, The Bad, and The Ugly
No ratings yet
User Involvement in Software Development: The Good, The Bad, and The Ugly
5 pages
Portfolio Elc270 Group 4
No ratings yet
Portfolio Elc270 Group 4
35 pages

Module 1

Uploaded by

Module 1

Uploaded by

OE-Data Science

To acquaint core concepts and technologies in Data Science. Understanding

Module 3 Data Pre-processing

Module 5 Data Analysis

• Deep study of the massive amount of data, which involves extracting

• Data science is the study of data to extract meaningful insights for

It is broadly used as a multidisciplinary term. It is used in data science.

• This data type tries to quantify things and it does by considering

You might also like