0% found this document useful (0 votes)

72 views7 pages

Lab 2 - Basic Statistical Analysis

Uploaded by

078msdsa001.baikuntha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views7 pages

Lab 2 - Basic Statistical Analysis

Uploaded by

078msdsa001.baikuntha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lab 2 - Basic Statistical Analysis

December 12, 2024

0.1 Imports
0.1.1 Step 1: Import Required Libraries
Import essential libraries for data manipulation, visualization, and statistics.
[1]: import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

0.1.2 Step 2: Load the Dataset

Load the student performance dataset from the specified CSV file.
[2]: file_path = "dataset/Student_performance_10k.csv"
data = pd.read_csv(file_path)

[3]: data.head()

[3]: roll_no gender race_ethnicity parental_level_of_education lunch \

0 std-01 male group D some college 1.0
1 std-02 male group B high school 1.0
2 std-03 male group C master's degree 1.0
3 std-04 male group D some college 1.0
4 std-05 male group C some college 0.0

test_preparation_course math_score reading_score writing_score \

0 1.0 89.0 38.0 85.0
1 0.0 65.0 100.0 67.0
2 0.0 10.0 99.0 97.0
3 1.0 22.0 51.0 41.0
4 1.0 26.0 58.0 64.0

science_score total_score grade

0 26.0 238.0 C
1 96.0 328.0 A
2 58.0 264.0 B
3 84.0 198.0 D
4 65.0 213.0 C

1
0.2 Exploratory Data Analysis (EDA)
0.2.1 Step 3: Basic Dataset Information
Display basic information about the dataset, such as column names and data types.
[4]: print(data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 roll_no 9999 non-null object
1 gender 9982 non-null object
2 race_ethnicity 9977 non-null object
3 parental_level_of_education 9978 non-null object
4 lunch 9976 non-null float64
5 test_preparation_course 9977 non-null float64
6 math_score 9976 non-null float64
7 reading_score 9975 non-null float64
8 writing_score 9976 non-null float64
9 science_score 9977 non-null float64
10 total_score 9981 non-null float64
11 grade 9997 non-null object
dtypes: float64(7), object(5)
memory usage: 937.6+ KB
None

0.2.2 Step 4: Statistical Summary

Use describe() to compute summary statistics for numerical columns.
[5]: print("\nDescriptive Statistics:")
print(data.describe())

Descriptive Statistics:
lunch test_preparation_course math_score reading_score \
count 9976.000000 9977.000000 9976.000000 9975.000000
mean 0.644246 0.388694 57.177125 70.125915
std 0.478765 0.487478 21.746777 19.026245
min 0.000000 0.000000 0.000000 17.000000
25% 0.000000 0.000000 41.000000 57.000000
50% 1.000000 0.000000 58.000000 71.000000
75% 1.000000 1.000000 73.000000 85.000000
max 1.000000 1.000000 100.000000 100.000000

writing_score science_score total_score

count 9976.000000 9977.000000 9981.000000
mean 71.415798 66.063045 264.740908

2
std 18.245360 19.324331 42.304858
min 10.000000 9.000000 89.000000
25% 59.000000 53.000000 237.000000
50% 72.500000 67.000000 268.000000
75% 85.000000 81.000000 294.000000
max 100.000000 100.000000 383.000000

0.2.3 Step 5: Check for Missing Values

Identify the total number of missing values in each column.
[6]: missing_values = data.isnull().sum()
print("\nMissing Values in Each Column:")
print(missing_values)

Missing Values in Each Column:

roll_no 1
gender 18
race_ethnicity 23
parental_level_of_education 22
lunch 24
test_preparation_course 23
math_score 24
reading_score 25
writing_score 24
science_score 23
total_score 19
grade 3
dtype: int64

0.2.4 Step 6: Handle Missing Values

Simple approach: Drop rows with missing values (not preferred)
[7]: # Uncomment the following block to drop rows with missing values.
# data = data.dropna()

Better Approach: Fill missing numerical values with the mean and categorical values
with the mode.
[8]: numerical_cols = data.select_dtypes(include=[np.number]).columns
categorical_cols = data.select_dtypes(include=["object"]).columns

data[numerical_cols] = data[numerical_cols].fillna(data[numerical_cols].mean())
data[categorical_cols] = data[categorical_cols].fillna(data[categorical_cols].
↪mode().iloc[0])

Verify if missing values are handled

[9]: print("\nMissing Values After Handling:")
print(data.isnull().sum())

3
Missing Values After Handling:
roll_no 0
gender 0
race_ethnicity 0
parental_level_of_education 0
lunch 0
test_preparation_course 0
math_score 0
reading_score 0
writing_score 0
science_score 0
total_score 0
grade 0
dtype: int64

0.3 Visualization: Distributions

0.3.1 Step 7: Distribution of Grades
Plot the distribution of grades to understand grade trends.
[10]: plt.figure(figsize=(4, 2))
sns.countplot(x='grade', data=data, palette='magma', hue="grade")
plt.title('Distribution of Grades')
plt.xlabel('Grade')
plt.ylabel('Count')
plt.show()

0.3.2 Step 8: Individual Subject Score Distributions

Visualize the distribution of scores for each subject.

4
[11]: subjects = ['math_score', 'reading_score', 'writing_score', 'science_score']
for subject in subjects:
plt.figure(figsize=(3, 2))
ax = sns.histplot(data[subject], kde=True, bins=20)
plt.title(f'Distribution of {subject.capitalize()}')
plt.xlabel(subject.capitalize())
plt.ylabel('Frequency')
plt.show()

5
0.4 Probability & Statistics Questions
0.4.1 Step 10: Calculate Z-scores
Example: Calculate the probability that a student scores above 300 in total scores.

Note: The function (0.5 * (1 + math.erf(z / np.sqrt(2)))) calculates the cumulative

probability or the area under the curve from − ∞ to �, which is z-score area. You can
refer Z-score table as well.
[12]: def z_score(value):
return (value - data['total_score'].mean()) / data['total_score'].std()

z = z_score(300)
probability_above_300 = 1 - (0.5 * (1 + math.erf(z / np.sqrt(2))))
print(f"Probability of scoring above 300: {probability_above_300 * 100:.2f}%")

Probability of scoring above 300: 20.21%

6
0.4.2 Step 11: Solve Statistical Problems
Example: What percentage of students score between 250 and 350?
[13]: z1 = z_score(250)
z2 = z_score(350)
probability_between = (0.5 * (1 + math.erf(z2 / np.sqrt(2)))) - (0.5 * (1 +␣
↪math.erf(z1 / np.sqrt(2))))

print(f"Percentage of students scoring between 250 and 350:␣

↪{probability_between * 100:.2f}%")

Percentage of students scoring between 250 and 350: 61.45%

Student Performance Analysis and Prediction 2.3
No ratings yet
Student Performance Analysis and Prediction 2.3
19 pages
Student Performance Analysis Insights
No ratings yet
Student Performance Analysis Insights
12 pages
Student Performance Analysis and Prediction
No ratings yet
Student Performance Analysis and Prediction
19 pages
First 4
No ratings yet
First 4
11 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
16 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
Class Activity-2
No ratings yet
Class Activity-2
3 pages
Da Pra Week-8 (Karthik S) - 074713
No ratings yet
Da Pra Week-8 (Karthik S) - 074713
9 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Python Case Study
No ratings yet
Python Case Study
7 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
Lab 13
No ratings yet
Lab 13
5 pages
Numpy Advanced Functional Analysis Questions
No ratings yet
Numpy Advanced Functional Analysis Questions
1 page
Data Wrangling 2
No ratings yet
Data Wrangling 2
4 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
CSC 222 - Data Wrangling and EDA
No ratings yet
CSC 222 - Data Wrangling and EDA
5 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
ADS Exp3 BE9 29
No ratings yet
ADS Exp3 BE9 29
5 pages
Codes
No ratings yet
Codes
44 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Project Work Info
No ratings yet
Project Work Info
20 pages
2 - IP Practical
No ratings yet
2 - IP Practical
2 pages
Pandas & NumPy Data Analysis Guide
No ratings yet
Pandas & NumPy Data Analysis Guide
11 pages
Assignment
No ratings yet
Assignment
27 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
Data Wrangling, 2
No ratings yet
Data Wrangling, 2
4 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Case Study 1
No ratings yet
Case Study 1
4 pages
Lab Week 1
No ratings yet
Lab Week 1
2 pages
EDA Student
No ratings yet
EDA Student
8 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
Academic Performance Data Wrangling
No ratings yet
Academic Performance Data Wrangling
9 pages
A09Ass02 - Jupyter Notebook
No ratings yet
A09Ass02 - Jupyter Notebook
11 pages
Jamboree
No ratings yet
Jamboree
10 pages
Research File 3
No ratings yet
Research File 3
10 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
FOUND. DATA SCIENCE Practical
No ratings yet
FOUND. DATA SCIENCE Practical
15 pages
EDA Basics: Python for Data Analysis
100% (1)
EDA Basics: Python for Data Analysis
30 pages
Lab2 Day4 23BCSA84 AssignmentSolution
No ratings yet
Lab2 Day4 23BCSA84 AssignmentSolution
18 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
7 pages
DSBDA Practical 2 Tutorial
No ratings yet
DSBDA Practical 2 Tutorial
14 pages
Advanced Statistics (AS) Project Report
No ratings yet
Advanced Statistics (AS) Project Report
52 pages
R Code
No ratings yet
R Code
9 pages
Day-4 DS Practicals
No ratings yet
Day-4 DS Practicals
5 pages
DS Lab Manual Final
No ratings yet
DS Lab Manual Final
49 pages
Introduction to Exploratory Data Analysis
No ratings yet
Introduction to Exploratory Data Analysis
12 pages
LP II Practical
No ratings yet
LP II Practical
5 pages
Dsbda Lab - 2.1 - 1736750718198
No ratings yet
Dsbda Lab - 2.1 - 1736750718198
9 pages
ML Lab FileDhruv
No ratings yet
ML Lab FileDhruv
74 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
DS2DW2
No ratings yet
DS2DW2
1 page
Data Analysis
No ratings yet
Data Analysis
2 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
ADS EXP Assignments
No ratings yet
ADS EXP Assignments
38 pages
Employer Liability in Bus Accident
100% (1)
Employer Liability in Bus Accident
2 pages
Detail - D: SCALE 1:30
No ratings yet
Detail - D: SCALE 1:30
1 page
Cakes Bakery 96ac069341 PDF
No ratings yet
Cakes Bakery 96ac069341 PDF
3 pages
DL Custom Engineering Document
No ratings yet
DL Custom Engineering Document
1 page
Syllabus For The Post of Sanitary Inspector
No ratings yet
Syllabus For The Post of Sanitary Inspector
2 pages
Airbus A350-800 Aerostructures Case Study
No ratings yet
Airbus A350-800 Aerostructures Case Study
32 pages
Maintenance Project Revised
No ratings yet
Maintenance Project Revised
19 pages
Edgar Allan Poe: Short Stories Collection
No ratings yet
Edgar Allan Poe: Short Stories Collection
45 pages
CS103 Midterm 1 Reference Sheet
No ratings yet
CS103 Midterm 1 Reference Sheet
2 pages
SH CX 9.2.0
No ratings yet
SH CX 9.2.0
30 pages
National Institute of Technology, Srinagar Department of Electrical Engineering
No ratings yet
National Institute of Technology, Srinagar Department of Electrical Engineering
2 pages
Service Manual For Nidek CP-770M Ocular Projector
No ratings yet
Service Manual For Nidek CP-770M Ocular Projector
59 pages
Full Download Structure and Interpretation of Computer Programs 2th PDF
100% (2)
Full Download Structure and Interpretation of Computer Programs 2th PDF
24 pages
Mid-Norfolk Times May 2010
No ratings yet
Mid-Norfolk Times May 2010
28 pages
L5 U1-U4
No ratings yet
L5 U1-U4
2 pages
Ten Things FAthers Should Do
No ratings yet
Ten Things FAthers Should Do
1 page
Contextual Factors in Event Planning
No ratings yet
Contextual Factors in Event Planning
49 pages
Starting Point
No ratings yet
Starting Point
15 pages
Gce Core Rules 1.21
No ratings yet
Gce Core Rules 1.21
59 pages
Blank 11
No ratings yet
Blank 11
10 pages
Head and Neck Lecture Outline
No ratings yet
Head and Neck Lecture Outline
1 page
Chapter 1 Module in Purposive Com.
No ratings yet
Chapter 1 Module in Purposive Com.
3 pages
Offer Letter
100% (2)
Offer Letter
6 pages
S52 DeviceNet Brochure
No ratings yet
S52 DeviceNet Brochure
3 pages
Penilaian Harian TAHUN PELAJARAN 2017/2018: Bahasa Inggris Invitation
No ratings yet
Penilaian Harian TAHUN PELAJARAN 2017/2018: Bahasa Inggris Invitation
6 pages
Recog Invitation 2023 2024
No ratings yet
Recog Invitation 2023 2024
12 pages
Minutes Progress SPT SGC
No ratings yet
Minutes Progress SPT SGC
7 pages
Music Cognition for Scholars
100% (1)
Music Cognition for Scholars
20 pages
F.T PVC Inovyn 675la
No ratings yet
F.T PVC Inovyn 675la
2 pages
KBSR & KSSR
No ratings yet
KBSR & KSSR
8 pages

Lab 2 - Basic Statistical Analysis

Uploaded by

Lab 2 - Basic Statistical Analysis

Uploaded by

Lab 2 - Basic Statistical Analysis

December 12, 2024

0.1.2 Step 2: Load the Dataset

[3]: roll_no gender race_ethnicity parental_level_of_education lunch \

test_preparation_course math_score reading_score writing_score \

science_score total_score grade

0.2.2 Step 4: Statistical Summary

writing_score science_score total_score

0.2.3 Step 5: Check for Missing Values

Missing Values in Each Column:

0.2.4 Step 6: Handle Missing Values

Verify if missing values are handled

0.3 Visualization: Distributions

0.3.2 Step 8: Individual Subject Score Distributions

Note: The function (0.5 * (1 + math.erf(z / np.sqrt(2)))) calculates the cumulative

Probability of scoring above 300: 20.21%

print(f"Percentage of students scoring between 250 and 350:␣

Percentage of students scoring between 250 and 350: 61.45%

You might also like