0% found this document useful (0 votes)

20 views4 pages

Stat Python

The document outlines the differences between descriptive and inferential statistics, detailing types of data and methods for data visualization. It explains measures of central tendency, standard deviation, and the importance of sampling, emphasizing that random sampling is crucial for accurate representation of a population. Additionally, it discusses the impact of outliers on statistical measures and the use of interquartile range (IQR) to identify them.

Uploaded by

youssef mahmoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views4 pages

Stat Python

Uploaded by

youssef mahmoud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Descriptive statistics summarize and describe data that has already been collected, focusing on specific

details like averages, percentages, and patterns.

Inferential statistics, on the other hand, use data from a sample to make predictions or generalizations
about a larger population.

Types of Data

 Numeric (Quantitative):

o Continuous: Can take any value (e.g., stock price).

o Discrete: Whole numbers (e.g., cups of coffee per day).

 Categorical (Qualitative):

o Nominal: Unordered categories (e.g., eye color).

o Ordinal: Ordered categories (e.g., survey responses).

5. Data Visualization

 Numeric Data: Scatter plots, histograms.

 Categorical Data: Bar charts, grouped aggregations.

. Measures of Center

 Mean (Average): Sum of values / number of values.

 Median: Middle value when sorted.

 Mode: Most frequent value.

 Choosing the Right Measure:

The shape of the data in the histogram describes how the CO2 emissions are spread across countries.

 No skew: The data is evenly distributed.

 Left-skewed: The data has a long tail on the left side (lower values are more common).

 Right-skewed: The data has a long tail on the right side (higher values are more common).

To calculate the mean and median of CO2 emissions, you would use the .agg() function. “.agg() allows
you to apply multiple aggregation functions (like mean and median) to a column.”

Given the skew, median is usually better for summarizing the data because it isn’t affected by extreme
values like the mean is. So, the median is the best measure of central tendency for this case.
Standard deviation measures how much data points deviate from the mean. It tells you whether the values in a
dataset are closely packed around the average or widely spread out.

 A low standard deviation means most values are close to the mean (less variation).

 A high standard deviation means values are spread out (more variation).

Standard deviation is just the square root of variance, making it more practical for real-world
interpretation! 🚀

ChatGPT said:

Short Difference Between Variance & Standard Deviation

 Variance → Measures the spread of data but in squared units (harder to interpret).

 Standard Deviation → Square root of variance, showing spread in the same units as the data
(easier to understand).

👉 Standard deviation is more practical for business decisions

Finding outliers using IQR

Outliers can have big effects on statistics like mean, as well as statistics that rely on the mean, such as
variance and standard deviation. Interquartile range, or IQR, is another way of measuring spread that’s
less influenced by outliers. IQR is also often used to find outliers. If a value is less than \ - 1.5 \ or greater
than \ + 1.5 \, it’s considered an outlier. In fact, this is how the lengths of the whiskers in
a matplotlib box plot are calculated
Sampling

Simple Definition:

Sampling is the process of selecting a small group (sample) from a larger group (population) to analyze,
instead of looking at every single item in the population. The goal is to use the sample to make
estimates or conclusions about the entire population.

In this exercise, you are:

1. Calculating the average (mean) of song durations for the whole dataset (population).

2. Taking a random sample of songs and calculating the average (mean) of the sample.

3. Comparing the two averages to see how well the sample represents the whole dataset.

Sample 1000 rows from spotify_population

spotify_sample = spotify_population.sample(n=1000)

# Print the sample

print(spotify_sample)

# Calculate the mean duration in mins from spotify_population

mean_dur_pop = spotify_population["duration_minutes"].mean()

# Calculate the mean duration in mins from spotify_sample

mean_dur_samp = spotify_sample["duration_minutes"].mean()

# Print the means

print(mean_dur_pop)
print(mean_dur_samp)

 Convenience sampling selects data in the easiest way, often leading to biased samples that don’t
represent the population.

 You compared acousticness distributions of:

1. General population (spotify_population).

2. A sample of 1107 songs (spotify_mysterious_sample).

 Findings: The sample had higher acousticness values than the population, meaning it is not
representative.

 Conclusion: The findings are not generalizable because the sample is biased. Random sampling
would be a better approach for accurate insights.

Notebook Statistics
No ratings yet
Notebook Statistics
6 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Statistics
No ratings yet
Statistics
23 pages
Statistics for Computer Science Students
No ratings yet
Statistics for Computer Science Students
6 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Principles of Data Science WEB 5
No ratings yet
Principles of Data Science WEB 5
30 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
5 pages
Session 12
No ratings yet
Session 12
8 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
15 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
93 pages
Descriptive Statistics & Data Analysis
No ratings yet
Descriptive Statistics & Data Analysis
48 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
1-2-3 Descriptive Stats & Central Tendency
No ratings yet
1-2-3 Descriptive Stats & Central Tendency
21 pages
Questions
No ratings yet
Questions
22 pages
Module 3 - Branches of Statistics
No ratings yet
Module 3 - Branches of Statistics
50 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Research Presentation
No ratings yet
Research Presentation
29 pages
Godinez Kizzha G Asynchronous Output 3
No ratings yet
Godinez Kizzha G Asynchronous Output 3
7 pages
Brick Exchange - Descriptive Statistics and Data Representation
No ratings yet
Brick Exchange - Descriptive Statistics and Data Representation
24 pages
Week 4 Bioscience
No ratings yet
Week 4 Bioscience
37 pages
EDA: Key Stats & Visualizations in Python
No ratings yet
EDA: Key Stats & Visualizations in Python
15 pages
Understanding Statistics: Types & Methods
No ratings yet
Understanding Statistics: Types & Methods
7 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
11 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Statistics, Statistical Modelling & Data Analytics
No ratings yet
Statistics, Statistical Modelling & Data Analytics
68 pages
CHAPTER 1 Descriptive Statistics
No ratings yet
CHAPTER 1 Descriptive Statistics
5 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Understanding Statistics: Concepts & Applications
No ratings yet
Understanding Statistics: Concepts & Applications
35 pages
Statistical Analysis of Bridge Conditions
No ratings yet
Statistical Analysis of Bridge Conditions
9 pages
Lecture 6
No ratings yet
Lecture 6
84 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
Experiment No 1: Statistical Measures Such As Mean, Median and Mode of The Data
No ratings yet
Experiment No 1: Statistical Measures Such As Mean, Median and Mode of The Data
61 pages
Data Mining and Predictive Modelling Assignment
No ratings yet
Data Mining and Predictive Modelling Assignment
34 pages
Descriptive Statistics - Practical1
No ratings yet
Descriptive Statistics - Practical1
12 pages
Social Science Statistics (June-Aug) 2025-Topic 2
No ratings yet
Social Science Statistics (June-Aug) 2025-Topic 2
21 pages
Descriptive Stastistics
No ratings yet
Descriptive Stastistics
10 pages
Chapter 2 BSC TY Statistical Data Analysis
No ratings yet
Chapter 2 BSC TY Statistical Data Analysis
124 pages
Lecture Notes 2 - Descriptive Statistics-1720598791715
No ratings yet
Lecture Notes 2 - Descriptive Statistics-1720598791715
21 pages
Project Report Writing Guidelines
No ratings yet
Project Report Writing Guidelines
31 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
ch2 (Descriptive Statistics)
No ratings yet
ch2 (Descriptive Statistics)
18 pages
Ai - Ssmda
No ratings yet
Ai - Ssmda
142 pages
Statistical Measures and Analysis
No ratings yet
Statistical Measures and Analysis
47 pages
Mean, Median, and Mode Explained
No ratings yet
Mean, Median, and Mode Explained
4 pages
PSYC104 Central Tendency & Variability
No ratings yet
PSYC104 Central Tendency & Variability
6 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Chapter2-Statistical Analysis
No ratings yet
Chapter2-Statistical Analysis
86 pages
Statistics 1
No ratings yet
Statistics 1
16 pages
Descriptive Statistics W25
No ratings yet
Descriptive Statistics W25
41 pages
Deck 1 - Data Types, Data Display, and Summary 2024F
No ratings yet
Deck 1 - Data Types, Data Display, and Summary 2024F
42 pages
Statistics Basics for Students
No ratings yet
Statistics Basics for Students
46 pages
Statistics - Compendium - DMS IIT DELHI - 2025
No ratings yet
Statistics - Compendium - DMS IIT DELHI - 2025
18 pages
EE311 Lecture #2 Descriptive Statistics
No ratings yet
EE311 Lecture #2 Descriptive Statistics
47 pages
Keshav Bagri: Mechanical Engineer Profile
No ratings yet
Keshav Bagri: Mechanical Engineer Profile
2 pages
Jet Pump Flow Analysis for Oil Wells
100% (1)
Jet Pump Flow Analysis for Oil Wells
10 pages
R Cheat Sheets for ECON1267
No ratings yet
R Cheat Sheets for ECON1267
13 pages
G10 Maths - FAKE TEST of Emg Math Education Here
No ratings yet
G10 Maths - FAKE TEST of Emg Math Education Here
10 pages
Zinematics Type 1
No ratings yet
Zinematics Type 1
23 pages
EDU 303 B Research Statistics-1
No ratings yet
EDU 303 B Research Statistics-1
41 pages
42459
No ratings yet
42459
11 pages
Settlement of Piled Foundations Using Equivalent Raft Approach
No ratings yet
Settlement of Piled Foundations Using Equivalent Raft Approach
17 pages
Sequences and Series: Geometric Progression
No ratings yet
Sequences and Series: Geometric Progression
16 pages
WTW 158 Study Guide (2023)
No ratings yet
WTW 158 Study Guide (2023)
46 pages
Volume 5
No ratings yet
Volume 5
25 pages
MBA Assignment Help by Dr. Palaniappan
No ratings yet
MBA Assignment Help by Dr. Palaniappan
5 pages
ISI Exam Math Solutions
No ratings yet
ISI Exam Math Solutions
6 pages
Design and Analysis of Algorithm: Binary Tree
No ratings yet
Design and Analysis of Algorithm: Binary Tree
18 pages
Electroanalytical Techniques Guide
No ratings yet
Electroanalytical Techniques Guide
98 pages
Real-Life Intersecting Lines in Geometry
No ratings yet
Real-Life Intersecting Lines in Geometry
6 pages
Generic Elective Mathematics Syllabus
No ratings yet
Generic Elective Mathematics Syllabus
1 page
Demand for Risky Assets Analysis
No ratings yet
Demand for Risky Assets Analysis
7 pages
Advanced PAM and Nyquist Exercises
No ratings yet
Advanced PAM and Nyquist Exercises
13 pages
Rounding Practice for Students
No ratings yet
Rounding Practice for Students
2 pages
Soft Computing and Its Components: Artificial Neural Networks
No ratings yet
Soft Computing and Its Components: Artificial Neural Networks
4 pages
4 - New Architecture of The Object-Oriented Functional Coverage Mechanism For Digital Verification
No ratings yet
4 - New Architecture of The Object-Oriented Functional Coverage Mechanism For Digital Verification
6 pages
Pile Eccentricity
No ratings yet
Pile Eccentricity
8 pages
Sentiment Analysis Using LSTM
No ratings yet
Sentiment Analysis Using LSTM
5 pages
Fluids 3 Pipe Network Assignment
No ratings yet
Fluids 3 Pipe Network Assignment
13 pages
Kinetic Theory of Gases Explained
No ratings yet
Kinetic Theory of Gases Explained
8 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
3 pages
Bauer 2002
No ratings yet
Bauer 2002
9 pages
Mathematics Paper 1 - 2025 Kapsabet Boys Pre Mock Trial 1-3687
100% (2)
Mathematics Paper 1 - 2025 Kapsabet Boys Pre Mock Trial 1-3687
15 pages
Modeling Dynamixel MX-28AT Servo Dynamics
No ratings yet
Modeling Dynamixel MX-28AT Servo Dynamics
7 pages

Stat Python

Uploaded by

Stat Python

Uploaded by

Descriptive statistics summarize and describe data that has already been collected, focusing on specific

details like averages, percentages, and patterns.

o Continuous: Can take any value (e.g., stock price).

o Discrete: Whole numbers (e.g., cups of coffee per day).

o Nominal: Unordered categories (e.g., eye color).

o Ordinal: Ordered categories (e.g., survey responses).

 Numeric Data: Scatter plots, histograms.

 Categorical Data: Bar charts, grouped aggregations.

 Mean (Average): Sum of values / number of values.

 Median: Middle value when sorted.

 Mode: Most frequent value.

 Choosing the Right Measure:

 No skew: The data is evenly distributed.

Short Difference Between Variance & Standard Deviation

👉 Standard deviation is more practical for business decisions

Finding outliers using IQR

In this exercise, you are:

Sample 1000 rows from spotify_population

# Print the sample

# Calculate the mean duration in mins from spotify_population

# Calculate the mean duration in mins from spotify_sample

# Print the means

 You compared acousticness distributions of:

1. General population (spotify_population).

2. A sample of 1107 songs (spotify_mysterious_sample).

You might also like