0% found this document useful (0 votes)

4 views3 pages

Data Science Notes

Subsets refer to smaller portions of a larger dataset, which can be created through row-based, column-based, or data-based subsetting techniques. Two-way frequency tables display the frequency of two variables, while two-way relative frequency tables present this data as percentages. Additionally, measures of central tendency such as mean and median are discussed, along with standard deviation, which quantifies the spread of data around the mean.

Uploaded by

rohansinghnirwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views3 pages

Data Science Notes

Uploaded by

rohansinghnirwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

What are subsets?

When we have a lot of data then instead of working with the whole data set, we
can take a certain part of the data for our analysis. This division of a small set of data from
a large set of data is known as a Subset.
Row-based subsetting:
Row-based subsetting, also known as filtering or selecting rows, is a technique used
to extract specific rows from a dataset based on certain criteria,
Column based subsetting:
When data is selected from specific columns from the dataset. This process of
subsetting is known as column-based subsetting.
Data-based subsetting
Data-based subsetting is a technique that extracts a smaller, representative portion
of a larger dataset.

Two-Way Frequency Tables

A two-way table is a statistical table that demonstrates the observed number or
frequency for two variables, the rows indicate one category and the columns indicate the
other category.
Interpreting two-way tables
The entries in the table are counts. The table has several features:

 Categories are in the left column and top row

 The counts are placed in the center of the table.
 The totals are at the end of each row and column.
 A sum of all counts (a total) is placed at the bottom right
Two-way relative frequency table
Two-way relative frequency table very similar to the two-way frequency type of
table. Only difference here is we consider percentage instead of numbers.

Mean:
Mean is a measure of central tendency. In data science, Mean, also termed as the
simple average, is an average value of a data set. Basically, mean is a value in the data set
around which entire data is spread out.
Example

 Consider that we have a set of 11 numbers 10 to 20 in a data set.

 Array = {10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}
 So mean is calculated by adding up 10 numbers in the data set.
 Sum of all the numbers = 165
 Mean = 165/10 = 16.5
Median
To calculate median, we must order our data set in ascending or descending order.
If the data set is sorted from smallest value to biggest value, the exact middle value of the
set is the Median.

Mean VS Median
Mean

1. Its is the average value of the whole list or Array

2. Even no of elements: add all the elements and divide the sum with the no of
elements.
3. Odd no of elements: add all the elements and divide the sum with the no of
elements.
Median

1. Median is the middle element of the list irrespective.

2. Even no of elements: add the center elements after sorting the list and divide by 2
3. Odd no of elements: Middle element of the sorted list.
4. Median is a more accurate form of central tendency specially in scenarios where
there are some irregular values also known as Outliers

Standard Deviation (SD):

standard deviation represents how much the data is spread out around the mean or
an average.

To find standard deviation:

1. Calculate the mean by adding up all the data pieces and dividing it by the number
of pieces of the data.
2. Subtract mean fromevery value
3. Square each of the differences
4. Find the average of squared numbers calculated in point number 3 to find the
variance.
5. Lastly, find the square root of variance. That is the standard deviation.
Example

Values; [1, 2, 3, 5, 8]

1. Calculate the mean

1+2+3+5+8 = 19
19/5 = 3.8 (mean)
2. Subtract mean from every value
1- 3.8= -2.8
2- 3.8= -1.8
3- 3.8= -0.8
5- 3.8= 1.2
8- 3.8= 4.2

3. Square each difference

(-2.8)*(-2.8) = 7.84
(-1.8)*(-1.8) = 3.24
(-0.8)*(-0.8) = 0.64
(1.2)*1.2) = 1.44
(4.2)*(4.2) = 17.64
4. Calculate the average of the squared numbers to get the variance
7.84+3.24+0.64+1.44+17.64 = 30.8
30.8/5 = 6.16 (Variance)
5. Find the square root of the variance
The square root of 6.16 = 2.48

Thus, the Standard deviation of values 1,2,3,5 and 8 is 2.48.

Mean Median Mode
0% (1)
Mean Median Mode
10 pages
classX_DS_UNIT 1 (1)
No ratings yet
classX_DS_UNIT 1 (1)
49 pages
Use of Statistics in Data Science
No ratings yet
Use of Statistics in Data Science
11 pages
Math Written Reportgroup 4 PDF
No ratings yet
Math Written Reportgroup 4 PDF
18 pages
Standard Deviation Formulas
No ratings yet
Standard Deviation Formulas
10 pages
ACC 324_ Data Analysis
No ratings yet
ACC 324_ Data Analysis
11 pages
Week 4 Measures of Central Tendency
No ratings yet
Week 4 Measures of Central Tendency
29 pages
Health Statistics: Principles of Secondary Data Analysis
No ratings yet
Health Statistics: Principles of Secondary Data Analysis
61 pages
MAT112 CH 11 Ungrouped Data PDF
No ratings yet
MAT112 CH 11 Ungrouped Data PDF
4 pages
20230630-Statistical Skills
No ratings yet
20230630-Statistical Skills
12 pages
Paraon Number 1
No ratings yet
Paraon Number 1
6 pages
Lecture 4 - Measures of Central Tendency and Dispersion
No ratings yet
Lecture 4 - Measures of Central Tendency and Dispersion
59 pages
standard error
No ratings yet
standard error
14 pages
BBA Statistics
No ratings yet
BBA Statistics
4 pages
Q 4 RESEARCH Module 2 3
No ratings yet
Q 4 RESEARCH Module 2 3
27 pages
Lec 4
No ratings yet
Lec 4
25 pages
Stats 7th Sems
No ratings yet
Stats 7th Sems
3 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
32 pages
Descriptive Statistics: Mean or Average
No ratings yet
Descriptive Statistics: Mean or Average
5 pages
Standard Deviation
No ratings yet
Standard Deviation
13 pages
Statistics
No ratings yet
Statistics
25 pages
Central Tendency Cot 1
No ratings yet
Central Tendency Cot 1
7 pages
Unit 4 - Descriptive Statistics (A)
No ratings yet
Unit 4 - Descriptive Statistics (A)
19 pages
Evaluating Analytical Chemistry
No ratings yet
Evaluating Analytical Chemistry
4 pages
E-Note_33325_Content_Document_20250319114322AM
No ratings yet
E-Note_33325_Content_Document_20250319114322AM
69 pages
Chapter 4
No ratings yet
Chapter 4
7 pages
BUSN 2429 Chapter 3 Calculating Descriptive Statistics
No ratings yet
BUSN 2429 Chapter 3 Calculating Descriptive Statistics
147 pages
DSP UNIT-I
No ratings yet
DSP UNIT-I
19 pages
Statistics Assignment 05
50% (2)
Statistics Assignment 05
14 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
Statistics
No ratings yet
Statistics
29 pages
prw questions
No ratings yet
prw questions
31 pages
Chapter - 3 Data Pre - Processing
No ratings yet
Chapter - 3 Data Pre - Processing
54 pages
Measures-of-Central-Tendency
No ratings yet
Measures-of-Central-Tendency
7 pages
Measures of Center and Variation
No ratings yet
Measures of Center and Variation
41 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
No ratings yet
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
4 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
65 pages
Unit 4
No ratings yet
Unit 4
66 pages
الفصل الثالث مقدمة في الاحصاء.pdf
No ratings yet
الفصل الثالث مقدمة في الاحصاء.pdf
69 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
13 pages
Measures of Central Tendency Dispersion
No ratings yet
Measures of Central Tendency Dispersion
30 pages
unit 5 brm
No ratings yet
unit 5 brm
17 pages
Ge 4 Topic 2-Statistics
67% (3)
Ge 4 Topic 2-Statistics
11 pages
Toaz - Info Ge 4 Topic 2 Statistics PR
No ratings yet
Toaz - Info Ge 4 Topic 2 Statistics PR
11 pages
ENDATA130 Data Summarization - Measures of Central Tendency
No ratings yet
ENDATA130 Data Summarization - Measures of Central Tendency
30 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
3 pages
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
No ratings yet
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
4 pages
Add Math Project
No ratings yet
Add Math Project
25 pages
Quantitative Methods For Decision Making: Dr. Akhter
No ratings yet
Quantitative Methods For Decision Making: Dr. Akhter
100 pages
719 Final Syllabus Merged
No ratings yet
719 Final Syllabus Merged
200 pages
Unit 6 Interpreting Evaluation Results
No ratings yet
Unit 6 Interpreting Evaluation Results
54 pages
Statistics
No ratings yet
Statistics
6 pages
4x @6ote ) 'Btda2@m
No ratings yet
4x @6ote ) 'Btda2@m
55 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Basic Stats Session
No ratings yet
Basic Stats Session
16 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
300+ Python Algorithms: Mastering the Art of Problem-Solving
From Everand
300+ Python Algorithms: Mastering the Art of Problem-Solving
Hernando Abella
5/5 (1)
Business Statistics BBA LM 1st Year PDF
0% (1)
Business Statistics BBA LM 1st Year PDF
9 pages
Problem Set On Desc Stats Regression - PGDBA
No ratings yet
Problem Set On Desc Stats Regression - PGDBA
65 pages
4
No ratings yet
4
2 pages
Training Course 4 Statistics and Probability
No ratings yet
Training Course 4 Statistics and Probability
33 pages
Short Questions Answers Statistics
100% (3)
Short Questions Answers Statistics
17 pages
CH 6
No ratings yet
CH 6
64 pages
91%-UGRD-IT6210 Quantitative Methods or Quantitative (Same Title)
No ratings yet
91%-UGRD-IT6210 Quantitative Methods or Quantitative (Same Title)
14 pages
Evaluation of Modulus of Elasticity and Modulus of Subgrade Reaction of Soils Using CBR Test
100% (1)
Evaluation of Modulus of Elasticity and Modulus of Subgrade Reaction of Soils Using CBR Test
7 pages
Module 16 - Analyzing Data - 2
No ratings yet
Module 16 - Analyzing Data - 2
37 pages
08 Parametric Tests
100% (1)
08 Parametric Tests
129 pages
Class 2 SP
No ratings yet
Class 2 SP
30 pages
1st 2nd 3rd 4th
No ratings yet
1st 2nd 3rd 4th
6 pages
Practice Examination 2
No ratings yet
Practice Examination 2
6 pages
Catholic Junior College: Mathematics
No ratings yet
Catholic Junior College: Mathematics
4 pages
Testing Procedure For Ultrablock
No ratings yet
Testing Procedure For Ultrablock
9 pages
PRV-SPR 2k24-Assignment 3
No ratings yet
PRV-SPR 2k24-Assignment 3
2 pages
Practice Exercises (ANOVA) v2
100% (1)
Practice Exercises (ANOVA) v2
5 pages
Variance and Standard Deviation Ungrouped Data ROSARIO R. GILLADOGA
No ratings yet
Variance and Standard Deviation Ungrouped Data ROSARIO R. GILLADOGA
6 pages
Industrial Statistics - A Computer Based Approach With Python
No ratings yet
Industrial Statistics - A Computer Based Approach With Python
140 pages
4.1.2.A CandyStatistics 2021 - Covid
No ratings yet
4.1.2.A CandyStatistics 2021 - Covid
4 pages
En-50341-1-OHTLs-General-Requirements
No ratings yet
En-50341-1-OHTLs-General-Requirements
2 pages
Intro To Traditional and Bayesian M Using R-Guilford 2017
No ratings yet
Intro To Traditional and Bayesian M Using R-Guilford 2017
330 pages
DAS20502 Exam Exercises (Part 1)
No ratings yet
DAS20502 Exam Exercises (Part 1)
3 pages
11th EGIG Report
No ratings yet
11th EGIG Report
53 pages
LM01 Rates and Returns IFT Notes
No ratings yet
LM01 Rates and Returns IFT Notes
20 pages
Properties of Normal Distribution
No ratings yet
Properties of Normal Distribution
23 pages
Chapter 4 MMW Data Management 1
No ratings yet
Chapter 4 MMW Data Management 1
27 pages
Board of Intermediate Education, Karachi Statistics Paper - Ii
No ratings yet
Board of Intermediate Education, Karachi Statistics Paper - Ii
1 page
Research Tools
100% (7)
Research Tools
20 pages
Statistics For Economics-Mean
No ratings yet
Statistics For Economics-Mean
64 pages

Data Science Notes

Uploaded by

Data Science Notes

Uploaded by

What are subsets?

Two-Way Frequency Tables

 Categories are in the left column and top row

 Consider that we have a set of 11 numbers 10 to 20 in a data set.

1. Its is the average value of the whole list or Array

1. Median is the middle element of the list irrespective.

Standard Deviation (SD):

To find standard deviation:

1. Calculate the mean

3. Square each difference

Thus, the Standard deviation of values 1,2,3,5 and 8 is 2.48.

You might also like