0% found this document useful (0 votes)

22 views29 pages

Understanding Data and Its Types-Lecture 1

The document discusses data and data analysis, noting that data is the outcome of random systems and contains information, while data analysis uses statistical tools and machine learning to understand random systems and make predictions. Data analysis aims to predict outcomes using statistical or machine learning models along with accounting for potential errors, allowing for analysis of larger datasets with greater accuracy. Machine learning techniques can incorporate heuristic understanding of data beyond exact statistical models.

Uploaded by

ikki123123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views29 pages

Understanding Data and Its Types-Lecture 1

Uploaded by

ikki123123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Data and Data Analysis

ti sti cs
Ikram E Khuda / Sta
RCD
,A
2 023
u d a©
K h
E
I kram
y
pe db
e lo
nt dev
te
Con
Recognize the Data
• In mathematics there are two types of experiments
• Deterministic and
• Non deterministic (also called random)
sti cs
S tati
• By deterministic it is meant that which can be determined. Hence those experiments D/
whose outputs can be found using a derived mathematical
RC
formula and with no variations in it is called deterministic systems
23 ,A
©20
u d a
h Hence those experiments whose outputs cannot be found because there is
• EK
By non deterministic it is meant that which can not be determined.
ra m or random systems.
no available mathematical formula is called non deterministic
yI k
b
o p ed
• Output of random systems are associated evelwith uncertainties …doubt and mistrust.
e ntd
t
Con
• Data is the outcome variable of a random system

• Data determines the facts and figures which contain information in them, i.e. data is raw form of information

• Information is an entity that resolves problems containing uncertainty. Resolve is different from solve. Resolves means to brining the problem
to an end or to its conclusion. Solve is the process of finding an answer.
sti cs
S tati
/
Identify a random2023system ? , AR
CD

a ©
u d
E Kh
I kram
y
pe db
e lo
nt dev
te
Con
Example 1

• Consider the following System A cs

sti
S tati
D/
C
Input as X=1.2,1.2,..,1.2 , AR Output as Y=2.1,2.1,…,2.1
A 2 023
u d a©
E Kh
I kram
y
dbpe
lo
d eve
What
te ntis the output of System A for an input of 1.2?
Con
?

It is 2.1
Example 2
• Consider the following System B
sti cs
S tati
D/
C
Input as X=1.2,1.2,..,1.2 , AR Output as Y=2.1,1.8,…,2.3,1.7,2.2,2.1..
B 2 023
u d a©
Kh
E
m
y Ikra
d b
What is the eoutput of System B for an input of 1.2?
o p
evel
e ntd ?
nt
Co
It is .. How do I know?
Articulate Data Analysis
• How can we find the output variable Y in System B?

Approach 1: cs
sti
Open up the system and trace the input to output flow through itSto tatifind the output
D/
This an engineering approach C
3 , AR
2 02
Approach 2:
u d a©
h
Use the data or the random variable (short: rv) E K to understand the System B to an extent that we are able to
characterize System and be able to predict I kramthe output of System B
d by
lo pe
• Approach 2 in short we dcalleveas Data Analysis
e t
nby
t
• We do Data Analysis applying statistical tools
Con
• By using Machine Learning tools. Machine Learning uses data analysis (i.e. statistical) and computer algorithms to
imitate the way humans would understand working of a random system (e.g. System B)

• In any case the whole purpose of data analysis is to do predictions and hence perform decision making.

• This whole process can be summarized in the following equation:

𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒=𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑎𝑙 𝑚𝑜𝑑𝑒𝑙± 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡h𝑒𝑚𝑜𝑑𝑒𝑙

or
𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑜𝑢𝑡𝑐𝑜𝑚𝑒=𝑚𝑎𝑐h𝑖𝑛𝑒 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑚𝑜𝑑𝑒𝑙 ± 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡h𝑒 𝑚𝑜𝑑𝑒𝑙
sti cs
S tati
D/
C
, AR
2 023
u d a©
h
ram
E K
statistical models +
I k
pe db
y computer algorithms
e lo
nt dev
te
Con

Machine Learning
Discuss Data Analysis
• With the usage of computer algorithms, bigger data with better
accuracy in lesser time could be achieved.
sti cs
S tati
• It also enable to include heuristic understanding RCD/ of data; which is not
,A
conceivable using exact mathematical ©20 models used in statistics.
23
u d a
h
EK
I kram
• This is the essence ofohuman d by intelligence.
p e
evel
e ntd
t
Con
For example while driving a car, a better driver is one who is trained/
experienced with the car and the road conditions rather then the one
who is making calculations with road turns or angles of road deviations !
Compare Data Types
• There is no one definition of data types
• In terms of mathematical representations
• Integer ( no decimal values)
sti cs
• Continuous (with decimal values) S tati
D/
C
, AR
• In terms of content 2 023
• Numeric or quantitative (numbers integer or continuous)
u d a©
h
EK
• String or qualitative ( alphabetic or alphanumeric)
I kram
• In terms of count d by
lo pe
• Discrete (finite count), can e numeric or string
evbe
• Continuous (infinite t d
ncount)
te , always numeric
Con
• In terms of levels of measurements
• Nominal (just labels, can be discrete, integers, quantitative or qualitative)
• Ordinal (showing order or ranks among data values, can be discrete, integers, quantitative or qualitative)
• Interval (always Continuous but includes those variables where zero is not defined)
• Ratio (always Continuous but includes those variables where zero is defined)
Interval and Ratio are together also called as Scale.
Compare Data Types
• Data types can also be described by the way they are analysised
• Sample Data
• Population Data sti cs
S tati
D/
C
, AR
2 023
• If whole data is used for data analysis
u d a © ,then it is called a population
h
data EK
ram
by Ik
pe d
e lo
dev
nt is used from some data set then it is called a
• If a fraction ofodata
te
C n
sample data
sti cs
S tati
D/
C
, AR
23
2 0
Population Data
a ©
d
E K hu
I kram Sample
y
pe db Data
e lo
nt dev
te
Con
Compare Data Types
• Data is also classified in terms of its dimensions
• Low dimension data is one with low features
• High dimension data is one with higher features sti cs
S tati
D/
C
, AR
2 023
High-dimensional data are defined as data in which the number
u d a ©of features (variables observed), p,
h
are close to or larger than the number of observations (or E Kdata points), n.
I kram
The opposite is low-dimensional data in which y number of observations, n,
bthe
pe d
far outnumbers the number of features,evp. e lo
nt d
te
A related concept is wide data, Con which refers to data with numerous features irrespective of the number of observations
(similarly, tall data is often used to denote data with a large
number of observations).

Analyses of high-dimensional data require consideration of potential problems that come from having more features than
observations.
Example 2 (cont’d)
• In Example 2, the output variable Y is called the data or random
variable (rv) sti cs
S tati
D/
• Every single value of Y is called an event. C
3 , AR
2 02
• Every event has a chance of occurrence. u d a©
h
EK
• This chance of occurrence I ramcalled the Probability of an event or
is
k
d by
o pe
• Mathematically this
t d eveprobability can be calculated as:
l

nten
Co 𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝐶𝑎𝑠𝑒𝑠 𝑜𝑓 𝑎𝑛 𝐸𝑣𝑒𝑛𝑡
𝑃 ( 𝐸 )=
𝑇𝑜𝑡𝑎𝑙 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑝𝑎𝑐𝑒
Classification of Probability
The way a sample space is defined categorizes whether the probability is
being theoretically calculated or experimentally.
sti cs
There are two different types of probability that we S tatioften talk about: theoretical
probability and experimental probability. D/
C
3 , AR
2 02
Theoretical probability describes how likelydaan © event is to occur. We know that a coin is
u
Kh theoretical probability of getting heads is 1/2.
equally likely to land heads or tails, so Ethe
I kram
Experimental probability describes d by how frequently an event actually occurred in an
o p e
experiment. So if you tossed evel a coin 20 times and got heads 8 times, the experimental
probability of getting e ntd
heads would be 8/20, which is the same as 2/5, or 0.4, or 40%.
ont
C
The theoretical probability of an event will always be the same, but the experimental
probability is affected by chance, so it can be different for different experiments.

The more trials you carry out (for example, the more times you toss the coin), the closer
the experimental probability is likely to be to the theoretical probability.
Rules of Probability
• Probability of an event ,i.e. is always a real no. between 0 and 1
sti cs
S tati
D/
C
, AR
023
• Sum of the probability of all the mutually u d a©
2
exclusive
h
EK
I kram
d by
lo pe
d eve
te nt
Con
Example 3 sti cs
tati
(Extrapolate what we have , A RCD/ S
discussed so
023
far) EK
h u d a©
2

I kram
y
pe db
e lo
nt dev
te
Con
string, discrete and nominal Data Types

string and qualitative

string, discrete and ordinal

string, discrete and ordinal Data Types

sti cs
S tati
D/
C
, AR
2 023
u d a©
K h
E
I kram numeric, continuous and ratio/ scale
y
pe db
e lo
nt dev
te
Con
Data Types
integers, discrete and ordinal

sti cs
S tati
D/
C
, AR
2 023
u d a© string and qualitative
K h
E
I kram
y
pe db
e lo
nt dev
te
Con
Data Types
5 variables
string, discrete and ordinal

No. of variables/ features s < no. of observations

No. of variables/ features > no. of observations

sti cs
S tati
D/
C
, AR
2 023
u d a©
K h
E
I kram
y
pe db
e lo
nt dev
te
Con
Recognize Measurement Error and
Accuracy
• Measurement error is the difference between the cs true value of
ti
tis that value.
something and the numbers used to represent
D/
S ta
, A RC
20 23
a ©
u d
• Accuracy is the degree to which E K the value being measured is close to
h
ram
the object’s actual measurement. d by
I k
lo pe
d eve
te nt
on
• It is the degree to which the measured value is similar to a reference
C

or genuine value.
Recognize Accuracy Formula

• The accuracy formula helps one to understand measurement cs errors. It

tisti
is considered to be highly accurate and error-free D/S ta if the measured
RC
value is equal to the real value. Error 0rate
23 , A and accuracy are mutually
2
exclusive. hud
a ©
EK
I kram
y
pe db
e lo
nt dev
te
Con
Comparison between accuracy and precision

sti cs
S tati
D/
C
, AR
2 023
u d a©
K h
E
I kram
y
pe db
e lo
nt dev
te
Con
Discuss The Process of Data Analysis
stic
s
S tati
D /
C
3 , AR
The process of data analysis, or alternately,
2 02 data analysis steps,
involves gathering all the information,
u d a© processing it, exploring
h
the data, and using it to find E K patterns and other insights
I kram
d by
lo pe
d eve
te nt
Con
Review/ Assessment
• Kindly go through the Practice Problems 1 available on Blackboard for
the questions related to the topic. cs
ati sti
D/ St
C
, AR
2 023
u d a©
K h
E
I kram
y
pe db
e lo
nt dev
te
Con
cs
Feedback CD/ S tati sti

GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
155 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Chapter1 - Statistics For Managerial Decisions
No ratings yet
Chapter1 - Statistics For Managerial Decisions
26 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Data Science Dse
No ratings yet
Data Science Dse
24 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
C207 Study Guide
No ratings yet
C207 Study Guide
27 pages
Machine Learning With Python: The Complete Course
No ratings yet
Machine Learning With Python: The Complete Course
17 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
38 pages
5-6 - Nature of Data, Statistical Modeling, and Visualization
No ratings yet
5-6 - Nature of Data, Statistical Modeling, and Visualization
69 pages
1 - Lecture 1 - Introduction To Statistics
No ratings yet
1 - Lecture 1 - Introduction To Statistics
33 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
No ratings yet
Fourth Edition: Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization
66 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
157 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
156 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
MLCourse Slides
No ratings yet
MLCourse Slides
427 pages
CS194 Lec 06 EDA
No ratings yet
CS194 Lec 06 EDA
40 pages
Data Science and Visualization
No ratings yet
Data Science and Visualization
37 pages
MLCourse Slides
No ratings yet
MLCourse Slides
356 pages
Pertemuan 10. Pengolahan Data - Maksi Feb Unpad Mei 2024
No ratings yet
Pertemuan 10. Pengolahan Data - Maksi Feb Unpad Mei 2024
33 pages
L9 Planning Data Management & Analysis
No ratings yet
L9 Planning Data Management & Analysis
26 pages
CEC 218_042006
No ratings yet
CEC 218_042006
83 pages
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
No ratings yet
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
76 pages
Statistics N Probability
No ratings yet
Statistics N Probability
31 pages
Lecture 1 Introduction To Statistics
100% (1)
Lecture 1 Introduction To Statistics
31 pages
-ch05
No ratings yet
-ch05
124 pages
Statistics: Statistics, Data, & Statistical Thinking
No ratings yet
Statistics: Statistics, Data, & Statistical Thinking
40 pages
Ai & DS Iat-2 QB Soln
No ratings yet
Ai & DS Iat-2 QB Soln
27 pages
مبادئ الاحصاء
No ratings yet
مبادئ الاحصاء
66 pages
Lecture 1 - MATH 122
No ratings yet
Lecture 1 - MATH 122
50 pages
Measurement Scale: Dr. Myint Moe Moe Khin Professor / Head Department of Statistics Monywa University of Economics
No ratings yet
Measurement Scale: Dr. Myint Moe Moe Khin Professor / Head Department of Statistics Monywa University of Economics
27 pages
E-Note 33325 Content Document 20250319114322AM
No ratings yet
E-Note 33325 Content Document 20250319114322AM
69 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
Lecture-1-Inroduction To Statistics and Data
No ratings yet
Lecture-1-Inroduction To Statistics and Data
49 pages
Types of Data
No ratings yet
Types of Data
14 pages
Unit - II - Part I - Importance of Statistics in Data Science
No ratings yet
Unit - II - Part I - Importance of Statistics in Data Science
10 pages
Essays On Data Analysis
100% (1)
Essays On Data Analysis
136 pages
Introduction Data
No ratings yet
Introduction Data
32 pages
5 - InnovatiCS - Data Types - Measure of Shape - Position - Dispersion
No ratings yet
5 - InnovatiCS - Data Types - Measure of Shape - Position - Dispersion
47 pages
QT Summary Document 1
No ratings yet
QT Summary Document 1
45 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Introduction To Satistics .Doc1
No ratings yet
Introduction To Satistics .Doc1
7 pages
Eda 1
No ratings yet
Eda 1
137 pages
Unit 1 Ganeshk e
No ratings yet
Unit 1 Ganeshk e
24 pages
Part 1 - Basic Statistics
No ratings yet
Part 1 - Basic Statistics
44 pages
Week 2 Descriptive Analytics I Nature of Data, Statistical Modeling, and Visualization
No ratings yet
Week 2 Descriptive Analytics I Nature of Data, Statistical Modeling, and Visualization
45 pages
Quantitative Methods - I (Statistics)
No ratings yet
Quantitative Methods - I (Statistics)
30 pages
Essential Stats For Decision Making-1 Descriptive Stats-2011
No ratings yet
Essential Stats For Decision Making-1 Descriptive Stats-2011
116 pages
Statistical Characteristics of Numerical Data
No ratings yet
Statistical Characteristics of Numerical Data
9 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Data Management
No ratings yet
Data Management
36 pages
Chapter 2 Descriptive Analytics I Nature of Data, Statistical Modeling, and Visualization
100% (1)
Chapter 2 Descriptive Analytics I Nature of Data, Statistical Modeling, and Visualization
54 pages
BDA Unit-1-1
No ratings yet
BDA Unit-1-1
33 pages
Unit 2
No ratings yet
Unit 2
20 pages
Step 1: Ask Questions
No ratings yet
Step 1: Ask Questions
30 pages
Reviewer +Ch+1+Data+and+Data+Preparation+
No ratings yet
Reviewer +Ch+1+Data+and+Data+Preparation+
3 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
Using Models To Explore
No ratings yet
Using Models To Explore
17 pages
Lecture 4
No ratings yet
Lecture 4
41 pages
Business Statistics-Problems Set 3 PDF
No ratings yet
Business Statistics-Problems Set 3 PDF
3 pages
Business Statistics: Critical Z Value and Confidence Interval
No ratings yet
Business Statistics: Critical Z Value and Confidence Interval
15 pages
Business Statistics: Types of Probability Distributions For Random Variables
No ratings yet
Business Statistics: Types of Probability Distributions For Random Variables
19 pages
AISTech 2019 Successful Use Case Applications of Artificial Intelligence in The Steel Industry
No ratings yet
AISTech 2019 Successful Use Case Applications of Artificial Intelligence in The Steel Industry
14 pages
Machine Learning Application in Battery Prediction: A Systematic Literature Review and Bibliometric Study
No ratings yet
Machine Learning Application in Battery Prediction: A Systematic Literature Review and Bibliometric Study
8 pages
Aktu Btech Cse 5th Sem Syllabus
No ratings yet
Aktu Btech Cse 5th Sem Syllabus
5 pages
Digital Libraries - Data, Information, and Knowledge
No ratings yet
Digital Libraries - Data, Information, and Knowledge
329 pages
Best Paper and Presenter
No ratings yet
Best Paper and Presenter
8 pages
Ai Project Life Cycle
No ratings yet
Ai Project Life Cycle
16 pages
Decision Trees
No ratings yet
Decision Trees
150 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
15 pages
A Brief Overview of Xilinx Alveo
No ratings yet
A Brief Overview of Xilinx Alveo
16 pages
Predicting Profit of A Startup Companies Using Machine Learning Algorithms
No ratings yet
Predicting Profit of A Startup Companies Using Machine Learning Algorithms
5 pages
Diffusion Models in Deep Learning
No ratings yet
Diffusion Models in Deep Learning
14 pages
When AI Meets Store Layout Design A Review
No ratings yet
When AI Meets Store Layout Design A Review
24 pages
Feature Selection
No ratings yet
Feature Selection
32 pages
Silvia and Ihendinihu Et Al
No ratings yet
Silvia and Ihendinihu Et Al
13 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
51 pages
Roadmap:: Six Months To Machine Learning
No ratings yet
Roadmap:: Six Months To Machine Learning
22 pages
Battery Management System To Estimate Battery Agin
No ratings yet
Battery Management System To Estimate Battery Agin
15 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
Machine Learning Methods For Estimating Heterogeneous Causal Effects
No ratings yet
Machine Learning Methods For Estimating Heterogeneous Causal Effects
25 pages
Subrata Mondal: Experience
No ratings yet
Subrata Mondal: Experience
2 pages
22n01f0038-Deep Side A Deep Learning Framework For Drug Side Effect Prediction
No ratings yet
22n01f0038-Deep Side A Deep Learning Framework For Drug Side Effect Prediction
36 pages
Synopsis 3d Objects2
No ratings yet
Synopsis 3d Objects2
21 pages
RNN LSTM BiRNN Notes
No ratings yet
RNN LSTM BiRNN Notes
3 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
9 pages
Media Piracy Detection Using Artificial Intelligence, Machine Learning and Data Mining
No ratings yet
Media Piracy Detection Using Artificial Intelligence, Machine Learning and Data Mining
3 pages
Unit 4
100% (1)
Unit 4
7 pages
Me Internship Certificate(s)
No ratings yet
Me Internship Certificate(s)
27 pages
Open Problems and Fundamental Limitations of Reinforcement Learning From Human Feedback
No ratings yet
Open Problems and Fundamental Limitations of Reinforcement Learning From Human Feedback
34 pages

Understanding Data and Its Types-Lecture 1

Uploaded by

Understanding Data and Its Types-Lecture 1

Uploaded by

Data and Data Analysis

• Consider the following System A cs

• This whole process can be summarized in the following equation:

string and qualitative

string, discrete and ordinal

No. of variables/ features s < no. of observations

No. of variables/ features > no. of observations

• The accuracy formula helps one to understand measurement cs errors. It

You might also like