0% found this document useful (0 votes)

19 views48 pages

Week 1

The document is an introduction to a Data Science course taught by Dr. Irfan Yousuf at UET, Lahore. It outlines the importance of data science as a career, the necessary skill set including statistics and programming, and provides an overview of key statistical concepts such as descriptive and inferential statistics, probability distributions, and normal distribution. The course aims to equip students with the foundational knowledge required in the field of data science.

Uploaded by

Ambreen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views48 pages

Week 1

Uploaded by

Ambreen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Introduction to Data Science

Dr. Irfan Yousuf

Department of Computer Science (New Campus)
UET, Lahore
(Week 1; January 15 - 19, 2024)
Instructor
• Dr. Irfan Yousuf
• [email protected]
Weekly Contents
Weekly Contents
Weekly Contents
Weekly Contents
Why Data Science?
• One of the topmost professions
• New driving force behind industries is Data.
• Data Science is the Career of Tomorrow.
Skill Set Needed
• Statistics
• Programming skills
• Multivariable Calculus & Linear Algebra
Statistics
• In plural form, it refers to set of numerical data.

• In singular form, it is an academic discipline.

Data
• Facts and statistics collected for reference or analysis.

• Data are units of information, often numeric, that are

collected through observation.

• Data is a collection of facts, such as numbers, words,

measurements, observations or just descriptions of things.
What is Statistics
• Statistics is a branch of mathematics that deals with the
scientific collection, organization, presentation, analysis,
and interpretation of numerical data in order to obtain
useful and meaningful information.
Descriptive Statistics
• A statistical method concerned with the collection,
organization, presentation and description of sample data.
Inferential Statistics
• Inferential Statistics concerned with the analysis of a sample
data leading to prediction, inferences, interpretation,
decision or conclusion about the entire population
Population vs. Sample
• Population: The totality of all the elements or persons for
which one has an interest at a particular time.
• Students of 2018 session of CS-KSK

• Sample: It is a subset of a population

• Students with CGPA > 3.0
Parameter vs. Statistic
• A parameter is a number describing a whole population.

• A statistic is a number describing a sample.

• With inferential statistics, we use sample statistics to make educated

guesses about population parameters.
Quantitative vs. Qualitative Data
• Quantitative: These are numerical information obtained from
counting or measuring that which can be manipulated by any
fundamental operation.
• Age, Weight, Height

• Qualitative: These are descriptive attributes and characterized

by categorical responses.
• Gender, Weather, Attitude
Variable
• A variable is any characteristics, number, or quantity that can
be measured or counted.

• Independent variables: Variables you manipulate in order to

affect the outcome of an experiment, e.g., Age

• Dependent variables: Variables that represent the outcome of

the experiment, e.g., Salary
Descriptive vs. Inferential Statistics
• Descriptive: concerned with the collection, organization,
presentation and description of sample data.

• Inferential: concerned with the analysis of a sample data

leading to prediction, inferences, interpretation, decision or
conclusion about the entire population
Inferential Statistics
• Inferential statistics takes data from a sample and makes
inferences about the larger population from which the sample
was drawn.
• Because the goal of inferential statistics is to draw
conclusions from a sample and generalize them to a
population, we need to have confidence that our sample
accurately reflects the population.

• Define the population we are studying.

• Draw a representative sample from that population.
• Use analyses that incorporate the sampling error.
Probability Distributions
• A probability distribution is the mathematical function that
gives the probabilities of occurrence of different possible
outcomes of an experiment.

• Tossing a coin
• throwing a fair die

• Probability distributions are typically defined in terms of the

probability distribution functions.
Probability Distribution Functions

Probability Mass
Function (PMF) for
Discrete Data
Cumulative
Distribution
Function (CDF)
Probability Density
Function (PDF) for
Continuous Data
Discrete vs. Continuous Variable
• A discrete variable is a variable that takes on distinct,
countable values. In theory, you should always be able to
count the values of a discrete variable.

• A continuous variable is a variable that can take on any

value within a range. Because the possible values for a
continuous variable are infinite, we measure continuous
variables (rather than count),
Probability Density Functions (PDFs)
• For a discrete random variable X that takes on a finite or countably
infinite number of possible values, we determine P(X=x) for all the
possible values of X, and call it the probability mass function
(pmf)

• For continuous random variables, the probability that X takes on

any particular value x is 0. That is, finding P(X=x) for a continuous
random variable is not going to work. Instead, we'll need to find the
probability that falls in some interval (a,b) , that is, we'll need to
find P(a < X < b). We'll do that using a probability density function
(pdf).
Probability Mass Function
Day Travel Time (min) pms X p(X=x)
1 25 0.1 25 0.1
2 26 0.2 26 0.2
3 26 0.2 28 0.2
4 28 0.2 32 0.1
5 28 0.2 33 0.1
6 32 0.1 34 0.2
7 33 0.1 35 0.1
8 34 0.2
9 34 0.2
10 35 0.1
Cumulative Distribution Function of PMF
Day Travel Time (min) pms X PMF CDF
1 25 0.1 25 0.1 0.1
2 26 0.2 26 0.2 0.3
3 26 0.2 28 0.2 0.5
4 28 0.2 32 0.1 0.6
5 28 0.2 33 0.1 0.7
6 32 0.1 34 0.2 0.9
7 33 0.1 35 0.1 1
8 34 0.2
9 34 0.2
10 35 0.1
Probability Density Function
Probability Density Function
Let the random variable X denote the time a person waits for
an elevator to arrive. Suppose the longest one would need to
wait for the elevator is 2 minutes, so that the possible values of
X (in minutes) are given by the interval [0,2] .
A possible pdf for X is given by:
Probability Density Function

probability that a person waits less than

30 seconds (or 0.5 minutes).

Integral Formula
Probability Density Function
Continuous random variables have zero point probabilities, i.e.,
the probability that a continuous random variable equals a single
value is always given by 0.

Probability for a continuous random variable is given by areas

under pdf’s.
Cumulative Distribution Function of PDF

Let X have pdf f , then the cdf F is given by

Cumulative Distribution Function of PDF

PDF to CDF
Normal Distribution
Normal Distribution
• The mean, median and mode are exactly the same.
• The distribution is symmetric about the mean—half the
values fall below the mean and half above the mean.
• The distribution can be described by two values: the mean and
the standard deviation.
Normal Distribution

Day Time 11 28.24

1 32.14 12 29.10
2 31.30 13 28.34
3 29.17
14 28.50
4 28.15
15 29.26
5 30.30
6 30.41
16 28.29
7 32.37 17 25.36
8 33.19 18 27.18
9 31.19 19 30.29
10 30.37 20 27.15
Normal Distribution
Normal Distribution
Day Time f(x)
1 32.14 0.08
2 31.30 0.13
3 29.17 0.20
4 28.15 0.16
5 30.30 0.19 Mean 29.52
6 30.41 0.18
7 32.37 0.07 St. Dev 1.96
8 33.19 0.04
9 31.19 0.14
10 30.37 0.19
11 28.24 0.16
12 29.10 0.20
13 28.34 0.17
14 28.50 0.18
15 29.26 0.20
16 28.29 0.17
17 25.36 0.02
18 27.18 0.10
19 30.29 0.19
20 27.15 0.10
Normal Distribution
Time f(x)
25.36 0.02
27.15 0.10
27.18 0.10
28.15 0.16
28.24 0.16
28.29 0.17
28.34 0.17
28.50 0.18
29.10 0.20
29.17 0.20
29.26 0.20
30.29 0.19
30.30 0.19
30.37 0.19
30.41 0.18
31.19 0.14
31.30 0.13
32.14 0.08
32.37 0.07
33.19 0.04
Normal Distribution

Mean 29.52 M+SD 31.48

St. Dev 1.96
M-SD 27.55
Normal Distribution
• The mean, median and mode are exactly the same.
• The distribution is symmetric about the mean—half the
values fall below the mean and half above the mean.
• The distribution can be described by two values: the mean and
the standard deviation.
Normal Distribution
68-95-99.7 Rule
CDF of Normal Distribution
• The cumulative distribution function (cdf) is the probability that the
variable X takes a value less than or equal to x.
• (Here in the figure below, Mean=0, SD=1)
Z-Distribution
• The standard normal distribution, also called the z-distribution, is a
special normal distribution where the mean is 0 and the standard
deviation is 1.
• Z-scores tell you how many standard deviations away from the mean
each value lies.
Z-Distribution
Z-Score

As the formula shows, the z-score is simply the raw score

minus the population mean, divided by the population
standard deviation.
Z-Distribution
Day Time
1 26
2 33
3 65
4 28 Mean is 38.8 minutes
5 34 Standard Deviation is 11.4 minutes
6 55
7 25
8 44
9 50
10 36
11 26
12 37
13 43
14 62
15 35
16 38
17 45
18 32
19 28
20 34
Summary
• Introduction to Data Science

Introduction To Probability, Statistics, and Random Processes - Hossein Pishro-Nik
100% (1)
Introduction To Probability, Statistics, and Random Processes - Hossein Pishro-Nik
1,007 pages
Statistics Notes (All Units)
No ratings yet
Statistics Notes (All Units)
47 pages
MT233 October 2019-1
No ratings yet
MT233 October 2019-1
39 pages
Module 2
No ratings yet
Module 2
36 pages
Chi Square Test
100% (2)
Chi Square Test
5 pages
Supply Chain Basics Order Fulfilment
No ratings yet
Supply Chain Basics Order Fulfilment
48 pages
Continuous Probability Distributions
No ratings yet
Continuous Probability Distributions
59 pages
Unit-2 - Random Variables and Probability Distributions - Jan2025
No ratings yet
Unit-2 - Random Variables and Probability Distributions - Jan2025
136 pages
Phase Equilibria Two-Component System: I. Liquid-Liquid System Ideal Solution
100% (1)
Phase Equilibria Two-Component System: I. Liquid-Liquid System Ideal Solution
16 pages
Week5 BAM
No ratings yet
Week5 BAM
48 pages
Math 5846 Chapter 2
No ratings yet
Math 5846 Chapter 2
102 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
156 pages
Pro Ch3 (2021 22) Note
No ratings yet
Pro Ch3 (2021 22) Note
84 pages
Probability and Distribution Deck
No ratings yet
Probability and Distribution Deck
34 pages
Productivity in Concrete Masonry Construction
No ratings yet
Productivity in Concrete Masonry Construction
7 pages
ISM Session 5 June 2025
No ratings yet
ISM Session 5 June 2025
74 pages
Writing Practice
No ratings yet
Writing Practice
20 pages
Uncertainty Analysis in Ship Performance Monitoring Draft Under Review
No ratings yet
Uncertainty Analysis in Ship Performance Monitoring Draft Under Review
23 pages
Lecture04 CH 04 ContinuousDistributions Baron Inf Stats FA24
No ratings yet
Lecture04 CH 04 ContinuousDistributions Baron Inf Stats FA24
46 pages
Asimplemathdictionary
No ratings yet
Asimplemathdictionary
31 pages
STAT1012 Ch4 Continuous Probability Distribution
No ratings yet
STAT1012 Ch4 Continuous Probability Distribution
53 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
66 pages
Cs3353 Fds Unit 3 Notes Eduengg
No ratings yet
Cs3353 Fds Unit 3 Notes Eduengg
47 pages
Chapter 2 Random Variables
No ratings yet
Chapter 2 Random Variables
34 pages
Continuous Random Variables and Probability Distributions
No ratings yet
Continuous Random Variables and Probability Distributions
35 pages
EEE 6542 - Lecture 4 Notes - Complete - Backup
No ratings yet
EEE 6542 - Lecture 4 Notes - Complete - Backup
40 pages
Chapter 2 Random Variables PDF
No ratings yet
Chapter 2 Random Variables PDF
41 pages
STT201
No ratings yet
STT201
19 pages
Chap 2 Random Variables
No ratings yet
Chap 2 Random Variables
41 pages
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
No ratings yet
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
63 pages
Chapter 2 Random Variables
No ratings yet
Chapter 2 Random Variables
41 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Lecure-3 Probability
No ratings yet
Lecure-3 Probability
80 pages
Module2 - Random Variable
No ratings yet
Module2 - Random Variable
24 pages
5 Continuous Probabilities
No ratings yet
5 Continuous Probabilities
55 pages
L1 RVs-1
No ratings yet
L1 RVs-1
47 pages
Topic Two. Random Variable and Probability Distribution
No ratings yet
Topic Two. Random Variable and Probability Distribution
43 pages
Continuous Random Variables and Probability Distribution: Learning Objectives
No ratings yet
Continuous Random Variables and Probability Distribution: Learning Objectives
19 pages
Module 4
No ratings yet
Module 4
34 pages
MTE 201 (2024) Prof Mushayabasa
No ratings yet
MTE 201 (2024) Prof Mushayabasa
40 pages
Lecture04 Continuous Random Variables Ver1
No ratings yet
Lecture04 Continuous Random Variables Ver1
35 pages
Unit 4 - Continuous Random Variables
No ratings yet
Unit 4 - Continuous Random Variables
35 pages
Seismic Resistant Design of Structures: Random Variables
No ratings yet
Seismic Resistant Design of Structures: Random Variables
30 pages
Group 2 Continuous Random Variable
No ratings yet
Group 2 Continuous Random Variable
30 pages
Continuous Random Variables and Probability Distributions: Institute of Technology of Cambodia
No ratings yet
Continuous Random Variables and Probability Distributions: Institute of Technology of Cambodia
34 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
Lecture Note 3
No ratings yet
Lecture Note 3
11 pages
Unit II - ML
No ratings yet
Unit II - ML
29 pages
Chapter 2 - Random Variables and Distributions
No ratings yet
Chapter 2 - Random Variables and Distributions
31 pages
Inf Sta2
No ratings yet
Inf Sta2
22 pages
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Gallery of Continuous Random Variables Class 5, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
7 pages
Continuous Random Variables Probability Distribution: Department of Mathematics
No ratings yet
Continuous Random Variables Probability Distribution: Department of Mathematics
35 pages
Continuous Probability Distributions: Supporting Australian Mathematics Project
No ratings yet
Continuous Probability Distributions: Supporting Australian Mathematics Project
29 pages
Using Resistance Moisture Meters
No ratings yet
Using Resistance Moisture Meters
6 pages
Lecture 5 - Fall 2023
No ratings yet
Lecture 5 - Fall 2023
15 pages
Descriptive Statistics and Probability Distributions: Session 1
No ratings yet
Descriptive Statistics and Probability Distributions: Session 1
34 pages
Math403 - 4.0 Continuous Probability Distribution
No ratings yet
Math403 - 4.0 Continuous Probability Distribution
42 pages
Astm17 0409
No ratings yet
Astm17 0409
5 pages
Mit18 05 s22 Class05-Prep-C
No ratings yet
Mit18 05 s22 Class05-Prep-C
8 pages
CH 7 - Random Variables Discrete and Continuous
No ratings yet
CH 7 - Random Variables Discrete and Continuous
7 pages
Ice Cream Lab PDF
No ratings yet
Ice Cream Lab PDF
2 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
14 pages
تقرير الإحصاء PDF
No ratings yet
تقرير الإحصاء PDF
8 pages
6 Continuous Variables
No ratings yet
6 Continuous Variables
8 pages
02 Random Variables
No ratings yet
02 Random Variables
51 pages
Report On ACI Ltd.
No ratings yet
Report On ACI Ltd.
39 pages
Week 4 Continuous Probability Distribution PDF
No ratings yet
Week 4 Continuous Probability Distribution PDF
40 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Continuous Random Variables and Probability Distributions
No ratings yet
Continuous Random Variables and Probability Distributions
3 pages
LECTURE 6 - Treatment Structure 1 Factorial and Nested New
No ratings yet
LECTURE 6 - Treatment Structure 1 Factorial and Nested New
31 pages
Chapter (3) (1) CCCCCCCCCCCC
No ratings yet
Chapter (3) (1) CCCCCCCCCCCC
16 pages
CH 3
No ratings yet
CH 3
22 pages
Exam P Review Sheet
No ratings yet
Exam P Review Sheet
12 pages
History of Geostatistics
No ratings yet
History of Geostatistics
33 pages
Ie 604 Project Final
No ratings yet
Ie 604 Project Final
30 pages
IntroMarkovChainsandApplications PDF
No ratings yet
IntroMarkovChainsandApplications PDF
8 pages
Applied Econometrics For HRM2021-23: Pcpadhan@xlri - Ac.in
No ratings yet
Applied Econometrics For HRM2021-23: Pcpadhan@xlri - Ac.in
22 pages
Kriging: Reservoir Modeling With GSLIB
No ratings yet
Kriging: Reservoir Modeling With GSLIB
16 pages
Surface Tension (Liquid Solutions
No ratings yet
Surface Tension (Liquid Solutions
14 pages
Unit VII Homework-Nastasskia Sy
No ratings yet
Unit VII Homework-Nastasskia Sy
14 pages
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
17 pages
Density Estimation From The Sonic Log: A Case Study Sl2.3: James P. Disiena and Fred J. Hilterman
No ratings yet
Density Estimation From The Sonic Log: A Case Study Sl2.3: James P. Disiena and Fred J. Hilterman
4 pages
Application of Statistical Concepts in The Determination of Weight Variation in Samples
No ratings yet
Application of Statistical Concepts in The Determination of Weight Variation in Samples
6 pages
Social Statistics
No ratings yet
Social Statistics
7 pages
STAT112 WEEK 1 Note
No ratings yet
STAT112 WEEK 1 Note
23 pages
Measurements of Equivalent Salt Deposit Density (ESDD) On A Suspension Insulator
No ratings yet
Measurements of Equivalent Salt Deposit Density (ESDD) On A Suspension Insulator
12 pages
Understanding Analysis: Foundations and Applications
From Everand
Understanding Analysis: Foundations and Applications
Tanmay Shroff
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
A-level Physics Revision: Cheeky Revision Shortcuts
From Everand
A-level Physics Revision: Cheeky Revision Shortcuts
Scool Revision
3/5 (10)
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet

Week 1

Uploaded by

Week 1

Uploaded by

Introduction to Data Science

Dr. Irfan Yousuf

• In singular form, it is an academic discipline.

• Data are units of information, often numeric, that are

• Data is a collection of facts, such as numbers, words,

• Sample: It is a subset of a population

• A statistic is a number describing a sample.

• With inferential statistics, we use sample statistics to make educated

• Qualitative: These are descriptive attributes and characterized

• Independent variables: Variables you manipulate in order to

• Dependent variables: Variables that represent the outcome of

• Inferential: concerned with the analysis of a sample data

• Define the population we are studying.

• Probability distributions are typically defined in terms of the

• A continuous variable is a variable that can take on any

• For continuous random variables, the probability that X takes on

probability that a person waits less than

Probability for a continuous random variable is given by areas

Let X have pdf f , then the cdf F is given by

Day Time 11 28.24

Mean 29.52 M+SD 31.48

As the formula shows, the z-score is simply the raw score

You might also like