Lab 3 Report

This report explains essential statistical concepts for data analysis in data science, including measures of central tendency, variability, and visualization techniques. It emphasizes the importance of using robust measures like the median and z-scores to handle outliers and improve data interpretation. The combination of statistical methods and visual tools enhances understanding of complex datasets and supports effective decision-making.

Uploaded by

prashant.080bct026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

Lab 3 Report

Uploaded by

prashant.080bct026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

TITLE: Understanding Statistical Analysis in Data Science

Introduction
This report breaks down basic and advanced statistical concepts to help understand data analysis
in data science. It covers measures of central tendency, variability, and relative standing, as well
as visualization techniques and how to deal with outliers. Using examples and Python tools, it
explains the strengths and weaknesses of different statistical methods when analyzing real-world
datasets.

Theory
 Statistics:

Statistics is about estimating values that represent an entire population by analyzing a

sample. Here are the key parts:

 Measures of Central Tendency: These identify the middle or typical value in a dataset.

a. Mean: The average of all values. It’s easy to calculate but very sensitive to
outliers.
b. Median: The middle value when data is sorted. It’s not affected by outliers,
making it more reliable for skewed data.
c. Mode: The most common value. This is useful for datasets with repeated values.

 Measures of Variation: These show how spread out the data is.

a. Range: The difference between the largest and smallest values.

b. Interquartile Range (IQR): Measures the spread of the middle 50% of data by
removing extremes.
c. Standard Deviation & Variance: Show how much data points differ from the
mean.

 Relative Standing: Helps understand where a specific value fits in the dataset.

a. Z-Scores: Measure how far a value is from the mean in terms of standard
deviations.
b. Percentiles and ranks also fall under this category.

Why Visualization Matters

Graphs like bar charts, histograms, and line graphs make it easier to spot patterns and outliers in
data. Visuals help present numerical findings in a way that’s easier to interpret.

Applications and Problems

1. How Outliers Affect Central Tendency

 Mean: Adding extreme values, like 110, can make the mean much higher and less useful
for understanding the dataset.
 Median: Outliers like 31 have little to no effect on the median, so it remains a reliable
measure of central tendency.

2. Understanding Data Spread

 A dataset of Facebook friends showed a mean of 789 and a standard deviation of 425.
This large standard deviation indicated a wide variation in the number of friends among
users. Visualizing this data helped highlight the differences.
3.Using Z-Scores

 Z-scores standardized the dataset, showing how far each value was from the mean.
 Negative scores represented below-average values, while positive scores highlighted
above-average ones.

Results and Insights

Variation Metrics

1. Range: Shows the total spread but doesn’t provide details about distribution.
2. Standard Deviation: Gives a clearer picture of data spread when combined with the mean.
3. Coefficient of Variation (CV): Allows comparisons between datasets with different units
or scales.

What Visuals Reveal

Graphs pinpointed patterns and outliers in the data. For example, bar charts highlighted clusters
of data points within one standard deviation of the mean and made outliers stand out clearly.

Relative Standing

Z-scores helped identify how individual data points compared to the rest of the dataset, making it
easier to detect unusual values.

Discussion
Combining measures of central tendency and variation with visualization techniques gives a full
understanding of a dataset. While the mean and standard deviation are useful, they’re affected by
outliers. Methods like the median and z-scores offer more reliable insights in such cases. Visual
tools add another layer of clarity by presenting data trends and anomalies visually.

Conclusion
This report highlights the importance of using a mix of statistical techniques and visualizations
for effective data analysis. By addressing outliers with robust measures like the median and z-
scores, analysts can make more accurate interpretations. Visualization tools enhance these
findings, making complex data easier to understand. Together, these methods form a strong
foundation for more advanced data science tasks and decision-making.

BS en 60584-1-2013
100% (2)
BS en 60584-1-2013
72 pages
M6 - Basic Statistics
No ratings yet
M6 - Basic Statistics
66 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Statistics Assignment Chinar Dawod Ozair
100% (1)
Statistics Assignment Chinar Dawod Ozair
12 pages
Unit 1 - FoDS - Sep 2023
No ratings yet
Unit 1 - FoDS - Sep 2023
147 pages
Practice Math AA HL Paper1
100% (2)
Practice Math AA HL Paper1
12 pages
Module 4
No ratings yet
Module 4
195 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Week13 2 Data Analysis 2
No ratings yet
Week13 2 Data Analysis 2
44 pages
ML Lab Manual Bcsl602
No ratings yet
ML Lab Manual Bcsl602
108 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Data Mining: Prepared By: Eesha Tur Razia Babar
No ratings yet
Data Mining: Prepared By: Eesha Tur Razia Babar
49 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Module 3 Data Analysis Techniques
No ratings yet
Module 3 Data Analysis Techniques
55 pages
02 Data
No ratings yet
02 Data
36 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
S2.Measures of Central Tendency and Variability, Data Visualization
No ratings yet
S2.Measures of Central Tendency and Variability, Data Visualization
17 pages
#CH-2 2 3
No ratings yet
#CH-2 2 3
21 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Data and Metrics
No ratings yet
Data and Metrics
35 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
UCS551 Chapter 4 - Descriptive Analytics - Visualization
No ratings yet
UCS551 Chapter 4 - Descriptive Analytics - Visualization
39 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Data Exploration
No ratings yet
Data Exploration
11 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
25 Essential Data Analysis Terms Every Analyst Should Know
No ratings yet
25 Essential Data Analysis Terms Every Analyst Should Know
11 pages
Program-1
No ratings yet
Program-1
15 pages
Catalogue Corolla Altis Compressed 1
No ratings yet
Catalogue Corolla Altis Compressed 1
8 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Understanding Data Variability Position and The Normal Curve
No ratings yet
Understanding Data Variability Position and The Normal Curve
9 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
DataUnderstandingAndPreparation DOM304
No ratings yet
DataUnderstandingAndPreparation DOM304
19 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Financial Analysis of A Selected Company
100% (1)
Financial Analysis of A Selected Company
20 pages
ADS Imp Ans
No ratings yet
ADS Imp Ans
11 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Unit .......
No ratings yet
Unit .......
45 pages
Understanding Data Variability Position and The Normal Curve
No ratings yet
Understanding Data Variability Position and The Normal Curve
9 pages
L1-D3 Concepts of Data Analysis
No ratings yet
L1-D3 Concepts of Data Analysis
17 pages
Dsa Report
No ratings yet
Dsa Report
11 pages
Comprehensive Ebook of Statistics For Data Science - Chaitali
No ratings yet
Comprehensive Ebook of Statistics For Data Science - Chaitali
21 pages
Chapter - 3
No ratings yet
Chapter - 3
11 pages
1 (A) - Definition and Importance: 1. Central Tendency Measures
No ratings yet
1 (A) - Definition and Importance: 1. Central Tendency Measures
5 pages
Exploratory Data Analysis: Datascience Using Python Topic: 3
No ratings yet
Exploratory Data Analysis: Datascience Using Python Topic: 3
32 pages
Statistics and Its Types (v1.0)
No ratings yet
Statistics and Its Types (v1.0)
6 pages
Data Exploration and Visualization Unit 1
No ratings yet
Data Exploration and Visualization Unit 1
4 pages
Ia - Eda
No ratings yet
Ia - Eda
10 pages
ADS PRINT Ans
No ratings yet
ADS PRINT Ans
4 pages
ADS IA 1 Syllabus Prep
No ratings yet
ADS IA 1 Syllabus Prep
5 pages
Lvsuysl Blikr DH Iysv) Píjsa RFKK Ifùk K¡ Fof'Kf"V: HKKJRH Ekud
100% (4)
Lvsuysl Blikr DH Iysv) Píjsa RFKK Ifùk K¡ Fof'Kf"V: HKKJRH Ekud
17 pages
Module 2 Lesson 3 Measures of Dispersion
No ratings yet
Module 2 Lesson 3 Measures of Dispersion
6 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
BNAP Forms 2023 1
No ratings yet
BNAP Forms 2023 1
5 pages
Polar & Non Polar-Electronegativity
No ratings yet
Polar & Non Polar-Electronegativity
23 pages
Why Do We Need One?: According To Richard Girling's Book Rubbish!
No ratings yet
Why Do We Need One?: According To Richard Girling's Book Rubbish!
3 pages
THE Infinite Game: Simon Sinek
No ratings yet
THE Infinite Game: Simon Sinek
27 pages
Friction Torque of A Rotary Shaft Lip Seal
No ratings yet
Friction Torque of A Rotary Shaft Lip Seal
5 pages
Prostate Cancer Thesis Statement
100% (3)
Prostate Cancer Thesis Statement
8 pages
Homeworkproblems PDF
No ratings yet
Homeworkproblems PDF
144 pages
Ds LEIAN DCDU 12B Specification
No ratings yet
Ds LEIAN DCDU 12B Specification
9 pages
Navdeep Singh Sodhi
No ratings yet
Navdeep Singh Sodhi
35 pages
One Snowy Night
No ratings yet
One Snowy Night
7 pages
RGUHS - B.SC Nursing - 2012 - 1 - Mar - 1754 Anatomy and Physiology (Rs 3)
No ratings yet
RGUHS - B.SC Nursing - 2012 - 1 - Mar - 1754 Anatomy and Physiology (Rs 3)
1 page
Biology syllabus-WPS Office
No ratings yet
Biology syllabus-WPS Office
35 pages
Aiesec: Abbreviations Used in AIESEC Aka. How To Survive The First Weeks in
No ratings yet
Aiesec: Abbreviations Used in AIESEC Aka. How To Survive The First Weeks in
5 pages
YETI Documentation: Release 1.0
No ratings yet
YETI Documentation: Release 1.0
53 pages
Analyzing User Comments On YouTube Coding Tutorial Videos
No ratings yet
Analyzing User Comments On YouTube Coding Tutorial Videos
50 pages
Novum Manual
No ratings yet
Novum Manual
27 pages
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
No ratings yet
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
2 pages
Template Jurnal Al-Manar
No ratings yet
Template Jurnal Al-Manar
3 pages
DOST PCHRD Calls For Thesis Grant Applications
No ratings yet
DOST PCHRD Calls For Thesis Grant Applications
3 pages
Animal Husbandry MCQ
No ratings yet
Animal Husbandry MCQ
8 pages
Figlet
No ratings yet
Figlet
10 pages
UNIT 11 - BT MLH 11 - Test 2
No ratings yet
UNIT 11 - BT MLH 11 - Test 2
3 pages
Strings (ALL PROGRAMS)
No ratings yet
Strings (ALL PROGRAMS)
4 pages
Feb - 2023-2
No ratings yet
Feb - 2023-2
2 pages
AP Physics 1 Practice Test 1: Kinematics
No ratings yet
AP Physics 1 Practice Test 1: Kinematics
1 page
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet