0% found this document useful (0 votes)

3 views22 pages

Student Performance Analysis

The document contains a detailed analysis of a dataset named 'StudentsPerformance.csv', which includes information about students' demographics and their scores in math, reading, and writing. It includes data loading, descriptive statistics, and various visualizations to explore relationships between scores and factors like gender, lunch type, and parental education. The dataset consists of 1000 entries with 8 attributes, and the analysis highlights trends and distributions in student performance.

Uploaded by

informativetutor66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views22 pages

Student Performance Analysis

Uploaded by

informativetutor66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

import pandas as pd

# Load data
df = pd.read_csv("StudentsPerformance.csv")

# Preview
df.head()

gender race/ethnicity parental level of education lunch \

0 female group B bachelor's degree standard
1 female group C some college standard
2 female group B master's degree standard
3 male group A associate's degree free/reduced
4 male group C some college standard

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

df.tail(10)

gender race/ethnicity parental level of education

lunch \
990 male group E high school free/reduced

991 female group B some high school standard

992 female group D associate's degree free/reduced

993 female group D bachelor's degree free/reduced

994 male group A high school standard

995 female group E master's degree standard

996 male group C high school free/reduced

997 female group C high school free/reduced

998 female group D some college standard

999 female group D some college free/reduced

test preparation course math score reading score writing score

990 completed 86 81 75
991 completed 65 82 78

992 none 55 76 76

993 none 62 72 74

994 none 63 63 62

995 completed 88 99 95

996 none 62 55 55

997 completed 59 71 65

998 completed 68 78 77

999 none 77 86 86

df.shape

(1000, 8)

df.dtypes

gender object
race/ethnicity object
parental level of education object
lunch object
test preparation course object
math score int64
reading score int64
writing score int64
dtype: object

df['math score'].describe()

count 1000.00000
mean 66.08900
std 15.16308
min 0.00000
25% 57.00000
50% 66.00000
75% 77.00000
max 100.00000
Name: math score, dtype: float64

df['lunch'].describe()

count 1000
unique 2
top standard
freq 645
Name: lunch, dtype: object

df.describe()

math score reading score writing score

count 1000.00000 1000.000000 1000.000000
mean 66.08900 69.169000 68.054000
std 15.16308 14.600192 15.195657
min 0.00000 17.000000 10.000000
25% 57.00000 59.000000 57.750000
50% 66.00000 70.000000 69.000000
75% 77.00000 79.000000 79.000000
max 100.00000 100.000000 100.000000

df.iloc[100]

gender male
race/ethnicity group B
parental level of education some college
lunch standard
test preparation course none
math score 79
reading score 67
writing score 67
Name: 100, dtype: object

df.loc[:,"lunch"]

0 standard
1 standard
2 standard
3 free/reduced
4 standard
...
995 standard
996 free/reduced
997 free/reduced
998 standard
999 free/reduced
Name: lunch, Length: 1000, dtype: object

df.sort_values(by = "math score", ascending=False).head()

gender race/ethnicity parental level of education

lunch \
962 female group E associate's degree standard

458 female group E bachelor's degree standard

149 male group E associate's degree free/reduced

625 male group D some college standard

916 male group E bachelor's degree standard

test preparation course math score reading score writing score

962 none 100 100 100

458 none 100 100 100

149 completed 100 100 93

625 completed 100 97 99

916 completed 100 100 100

df["lunch"].head(10)

0 standard
1 standard
2 standard
3 free/reduced
4 standard
5 standard
6 standard
7 free/reduced
8 free/reduced
9 free/reduced
Name: lunch, dtype: object

df[df["math score"]==99]

gender race/ethnicity parental level of education lunch \

114 female group E bachelor's degree standard
263 female group E high school standard
306 male group E some college standard

test preparation course math score reading score writing score

114 completed 99 100 100

263 none 99 93 90

306 completed 99 87 81

Q1 = df['math score'].quantile(0.25)
Q3 = df['math score'].quantile(0.75)

IQR = Q3-Q1

print("The interquartile range is: ", IQR)

The interquartile range is: 20.0

df.isnull()

gender race/ethnicity parental level of education lunch \

0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False False False
.. ... ... ... ...
995 False False False False
996 False False False False
997 False False False False
998 False False False False
999 False False False False

test preparation course math score reading score writing score

0 False False False False

1 False False False False

2 False False False False

3 False False False False

4 False False False False

.. ... ... ... ...

995 False False False False

996 False False False False

997 False False False False

998 False False False False

999 False False False False

[1000 rows x 8 columns]

df.isnull().sum()
gender 0
race/ethnicity 0
parental level of education 0
test preparation course 0
math score 0
reading score 0
writing score 0
dtype: int64

df.count()

gender 1000
race/ethnicity 1000
parental level of education 1000
test preparation course 1000
math score 1000
reading score 1000
writing score 1000
dtype: int64

# remove all the rows that contain a missing value

df.dropna(inplace=True)

# Students with high math scores (above 90)

df[df["math score"] > 90]

gender race/ethnicity parental level of education \

34 male group E some college
104 male group C some college
114 female group E bachelor's degree
121 male group B associate's degree
149 male group E associate's degree
165 female group C bachelor's degree
171 male group E some high school
179 female group D some high school
233 male group E some high school
263 female group E high school
286 male group E associate's degree
306 male group E some college
451 female group E some college
458 female group E bachelor's degree
469 male group C some college
501 female group B associate's degree
503 female group E associate's degree
521 female group C associate's degree
539 male group A associate's degree
546 female group A some high school
562 male group C bachelor's degree
566 female group E bachelor's degree
571 male group A bachelor's degree
594 female group C bachelor's degree
612 male group C bachelor's degree
618 male group D master's degree
623 male group A some college
625 male group D some college
685 female group E master's degree
689 male group E some college
710 male group C some college
712 female group D some college
717 female group C associate's degree
719 male group E associate's degree
736 male group C associate's degree
779 male group E associate's degree
784 male group C bachelor's degree
815 male group B some high school
846 male group C master's degree
855 female group B bachelor's degree
864 male group C associate's degree
886 female group E associate's degree
903 female group D bachelor's degree
916 male group E bachelor's degree
919 male group B some college
934 male group C associate's degree
950 male group E high school
957 female group D master's degree
962 female group E associate's degree
979 female group C associate's degree

test preparation course math score reading score writing score

34 none 97 87 82

104 completed 98 86 90

114 completed 99 100 100

121 completed 91 89 92

149 completed 100 100 93

165 completed 96 100 100

171 none 94 88 78

179 completed 97 100 100

233 none 92 87 78

263 none 99 93 90
286 completed 97 82 88

306 completed 99 87 81

451 none 100 92 97

458 none 100 100 100

469 none 91 74 76

501 completed 94 87 92

503 completed 95 89 92

521 none 91 86 84

539 completed 97 92 86

546 completed 92 100 97

562 completed 96 90 92

566 completed 92 100 100

571 none 91 96 92

594 completed 92 100 99

612 completed 94 90 91

618 none 95 81 84

623 completed 100 96 86

625 completed 100 97 99

685 completed 94 99 100

689 none 93 90 83

710 completed 93 84 90

712 none 98 100 99

717 completed 96 96 99

719 completed 91 73 80

736 none 92 79 84

779 completed 94 85 82
784 completed 91 81 79

815 completed 94 86 87

846 completed 91 85 85

855 none 97 97 96

864 none 97 93 91

886 completed 93 100 95

903 completed 93 100 100

916 completed 100 100 100

919 completed 91 96 91

934 completed 98 87 90

950 none 94 73 71

957 none 92 100 100

962 none 100 100 100

979 none 91 95 94

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

%matplotlib inline
warnings.filterwarnings('ignore')
plt.rcParams["figure.figsize"] = [10, 5]

# Load data
df = pd.read_csv("StudentsPerformance.csv")

df.head()

gender race/ethnicity parental level of education lunch \

test preparation course math score reading score writing score

0 none 72 72 74
1 completed 69 90 88
2 none 90 95 93
3 none 47 57 44
4 none 76 78 75

# Color palettes
sns.palplot(sns.color_palette("colorblind"))
plt.title("Color Palette: Colorblind")
plt.show()

sns.palplot(sns.color_palette("Reds"))
plt.title("Color Palette: Reds")
plt.show()

sns.histplot(df['math score'], kde=False)

plt.title("Distribution of Math Scores")
plt.show()
sns.distplot(df['reading score'], hist=False)
plt.title("KDE of Reading Scores")
plt.show()

plt.figure(figsize=(8, 8))
sns.distplot(df['writing score'])
plt.title("Distribution of Writing Scores")
plt.show()

plt.figure(figsize=(8, 8))
sns.scatterplot(x="math score", y="writing score", hue="gender",
data=df)
plt.title("Math vs Writing Scores by Gender")
plt.show()
# 5. Bar Plot: Average Writing Score by Gender and Lunch Type
plt.figure(figsize=(8, 8))
sns.barplot(x="gender", y="writing score", hue="lunch", data=df)
plt.title("Average Writing Score by Gender & Lunch")
plt.show()
# 6. Relplot – math vs reading
sns.relplot(x="math score", y="reading score", hue="gender",
style="gender", kind="scatter", data=df)
plt.title("Math vs Reading by Gender")
plt.show()
# 8. Lineplot – reading vs writing by gender
plt.figure(figsize=(7, 7))
sns.lineplot(x="reading score", y="writing score", hue="gender",
data=df)
plt.title("Reading vs Writing by Gender (Lineplot)")
plt.show()
# 10. Barplot – math score by test prep and gender
plt.figure(figsize=(7, 7))
sns.barplot(x="test preparation course", y="math score", hue="gender",
data=df)
plt.title("Math Score by Test Prep and Gender")
plt.show()
# 11. Boxplot – reading score by parental education
plt.figure(figsize=(12, 6))
sns.boxplot(x="parental level of education", y="reading score",
data=df)
plt.title("Reading Score by Parental Education")
plt.xticks(rotation=45)
plt.show()
# 12. Violin plot – writing score by lunch
plt.figure(figsize=(6, 6))
sns.violinplot(x="lunch", y="writing score", data=df)
plt.title("Writing Score by Lunch Type")
plt.show()
# 13. Boxplot – writing score by gender
sns.boxplot(x="gender", y="writing score", data=df)
plt.title("Writing Score by Gender")
plt.show()
plt.figure(figsize=(8, 6))
sns.boxplot(x="race/ethnicity", y="math score", data=df)
plt.title("Math Score by Race/Ethnicity Group")
plt.show()
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

from sklearn.preprocessing import LabelEncoder

# Make a copy to avoid errors if run twice

df = df.copy()

# List of categorical columns

cat_cols = ['gender', 'race/ethnicity', 'parental level of education',
'lunch', 'test preparation course']

# Apply Label Encoding to each categorical column

for col in cat_cols:
le = LabelEncoder()
df[col] = le.fit_transform(df[col])

# 🎯 Create target column

df['pass_math'] = (df['math score'] >= 50).astype(int)
# 🎯 Define features and target
X = df.drop(['math score', 'pass_math'], axis=1)
y = df['pass_math']

# ✂️ Train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=0)

# 🧠 Train Logistic Regression

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

LogisticRegression()

# ✅ Evaluate
from sklearn.metrics import accuracy_score, confusion_matrix
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc * 100:.2f}%")

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Accuracy: 93.50%
Confusion Matrix:
[[ 20 7]
[ 6 167]]

GCSE Mathematics For AQA Teachers Resource Free Online
100% (7)
GCSE Mathematics For AQA Teachers Resource Free Online
391 pages
Quiz Coding Question 1
No ratings yet
Quiz Coding Question 1
9 pages
Codealpha Studentseda
No ratings yet
Codealpha Studentseda
2 pages
SAT Revyan F
No ratings yet
SAT Revyan F
6 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
Student Analysis
No ratings yet
Student Analysis
16 pages
PMA Experiment 1
No ratings yet
PMA Experiment 1
9 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
DSBDA Prac2
No ratings yet
DSBDA Prac2
2 pages
Data Manipulation With Python Pandas 1700003764
No ratings yet
Data Manipulation With Python Pandas 1700003764
10 pages
DW 14
No ratings yet
DW 14
14 pages
Students Performance
No ratings yet
Students Performance
49 pages
Student Dropout
No ratings yet
Student Dropout
38 pages
Students Performance
No ratings yet
Students Performance
21 pages
Students Performance in An Admission Test
No ratings yet
Students Performance in An Admission Test
29 pages
Asatasdfs
No ratings yet
Asatasdfs
6 pages
00 - Lesson - Data Science Workflow - Jupyter Notebook
No ratings yet
00 - Lesson - Data Science Workflow - Jupyter Notebook
6 pages
Prac 1 Feb
No ratings yet
Prac 1 Feb
22 pages
Students Performance
No ratings yet
Students Performance
17 pages
Assignment-Data Preprocessing (All)
No ratings yet
Assignment-Data Preprocessing (All)
1 page
Exams CSV
No ratings yet
Exams CSV
21 pages
Test Preparation C 1000 1642 1.642 0.230066 Math Score 1000 66089 66.089 229.919
No ratings yet
Test Preparation C 1000 1642 1.642 0.230066 Math Score 1000 66089 66.089 229.919
63 pages
Original
No ratings yet
Original
48 pages
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
No ratings yet
Predict Students' Dropout and Academic Success Using Machine Learning Techniques
21 pages
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
No ratings yet
Ai YasmeenAlhajYousef 0197638 Mohammad Almajali 2191370 End
2 pages
Lambda Functions & Alternative Methods in Python
No ratings yet
Lambda Functions & Alternative Methods in Python
8 pages
Students Performance
No ratings yet
Students Performance
46 pages
Analyzing Student Performance in Exams Using Python
No ratings yet
Analyzing Student Performance in Exams Using Python
11 pages
Study Performance
No ratings yet
Study Performance
21 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
Study Performance
No ratings yet
Study Performance
4 pages
Webinar Practice
No ratings yet
Webinar Practice
323 pages
Webinar Practice
No ratings yet
Webinar Practice
418 pages
Assignment 02
No ratings yet
Assignment 02
4 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Gen Some - Highsch 0: Worksheet - 2 Name-Sarah Nuzhat Khan ID-20175008 Creating Dummy Variables
No ratings yet
Gen Some - Highsch 0: Worksheet - 2 Name-Sarah Nuzhat Khan ID-20175008 Creating Dummy Variables
6 pages
Students Exam Scores Analysis - Ipynb
No ratings yet
Students Exam Scores Analysis - Ipynb
4 pages
Assignment College
No ratings yet
Assignment College
6 pages
Samarth Raghav
No ratings yet
Samarth Raghav
15 pages
Session 4 - Datasets
No ratings yet
Session 4 - Datasets
11 pages
Regression Intuition I
No ratings yet
Regression Intuition I
8 pages
Section 2 Lesson 1
No ratings yet
Section 2 Lesson 1
32 pages
Student Grade Prediction
No ratings yet
Student Grade Prediction
9 pages
Proccesing
No ratings yet
Proccesing
25 pages
BUS 445 - Tutorial 3
No ratings yet
BUS 445 - Tutorial 3
8 pages
Ex 8
No ratings yet
Ex 8
3 pages
Student Performance in Exams
No ratings yet
Student Performance in Exams
71 pages
Practice Assignment 2
No ratings yet
Practice Assignment 2
1 page
Python Case Study
No ratings yet
Python Case Study
7 pages
Jamboree
No ratings yet
Jamboree
10 pages
Assignment College R
No ratings yet
Assignment College R
6 pages
Ordering Proc Freq Arpind
No ratings yet
Ordering Proc Freq Arpind
9 pages
StudentsPerformance Updated
No ratings yet
StudentsPerformance Updated
70 pages
CS29002 Algorithms Laboratory: Assignment No: 1 Last Date of Submission: 27-July-2016
No ratings yet
CS29002 Algorithms Laboratory: Assignment No: 1 Last Date of Submission: 27-July-2016
2 pages
Student Score - CSV
No ratings yet
Student Score - CSV
1,278 pages
Govt Non Govt
No ratings yet
Govt Non Govt
11 pages
CS Xii PB MS - Set1
No ratings yet
CS Xii PB MS - Set1
6 pages
ZF D5 L Qa BPC Fohk JD
No ratings yet
ZF D5 L Qa BPC Fohk JD
2 pages
Mathematical Thinking for Primary
From Everand
Mathematical Thinking for Primary
Pervaiz Salik
No ratings yet
Essay: Don Honorio Ventura State University
No ratings yet
Essay: Don Honorio Ventura State University
3 pages
Communication Skills and Customer Satisfaction Research Papers
No ratings yet
Communication Skills and Customer Satisfaction Research Papers
49 pages
Copia de Copia de Get Ready
No ratings yet
Copia de Copia de Get Ready
4 pages
(Sample) Contact Form
No ratings yet
(Sample) Contact Form
2 pages
Destination B1-╤Б╤В╤А╨░╨╜╨╕╤Ж╤Л-8
No ratings yet
Destination B1-╤Б╤В╤А╨░╨╜╨╕╤Ж╤Л-8
6 pages
Seminar Final Report
No ratings yet
Seminar Final Report
41 pages
DLL - Science 3 - Q3 - W5
No ratings yet
DLL - Science 3 - Q3 - W5
3 pages
Indian Institute of Technology Kharagpur
No ratings yet
Indian Institute of Technology Kharagpur
18 pages
How To Say I Finished My Homework in Japanese
100% (1)
How To Say I Finished My Homework in Japanese
8 pages
LAW 112 Legal Methods II
No ratings yet
LAW 112 Legal Methods II
97 pages
Architecture Government Policy Crane 2019 Algerian Socialism and The Architec
No ratings yet
Architecture Government Policy Crane 2019 Algerian Socialism and The Architec
19 pages
Emerging Adulthood
No ratings yet
Emerging Adulthood
7 pages
Seminar - Cicl
No ratings yet
Seminar - Cicl
7 pages
Jpia Day 2019 Narrative/Accomplishment Report
No ratings yet
Jpia Day 2019 Narrative/Accomplishment Report
2 pages
Action Research Proposal
No ratings yet
Action Research Proposal
24 pages
The Truth! We Will Know It
No ratings yet
The Truth! We Will Know It
16 pages
Career Handbook
No ratings yet
Career Handbook
88 pages
2Nd Week (Matutinović) Case Study Grade and Signing Date 1St Week (Krkač)
No ratings yet
2Nd Week (Matutinović) Case Study Grade and Signing Date 1St Week (Krkač)
2 pages
Printable Primary Math Worksheet For Math Grades 1 To 6 Based On The Singapore Math Curriculum
No ratings yet
Printable Primary Math Worksheet For Math Grades 1 To 6 Based On The Singapore Math Curriculum
1 page
LESSON PLAN - GRADE1-MATH v.3
100% (1)
LESSON PLAN - GRADE1-MATH v.3
4 pages
Academic Qualifications: Maruti Suzuki India Limited
No ratings yet
Academic Qualifications: Maruti Suzuki India Limited
2 pages
PGD Solid Waste MGMT 1011
No ratings yet
PGD Solid Waste MGMT 1011
16 pages
Scholarship Recommendation Letter From Coach Template-28321
No ratings yet
Scholarship Recommendation Letter From Coach Template-28321
13 pages
DKG Dec13 Newsletter
No ratings yet
DKG Dec13 Newsletter
6 pages
Physics - TRIAL S1, STPM 2022 - Cover
No ratings yet
Physics - TRIAL S1, STPM 2022 - Cover
1 page
1550315853896how To Learn Faster and Remember More Summary
100% (1)
1550315853896how To Learn Faster and Remember More Summary
27 pages
Racial and Ethnic Disparities in Childrens Social Care
No ratings yet
Racial and Ethnic Disparities in Childrens Social Care
23 pages
Spanish: Pearson Edexcel Level 3 GCE
100% (1)
Spanish: Pearson Edexcel Level 3 GCE
16 pages
Stephen P. Timoshenko: Engineering Legends
No ratings yet
Stephen P. Timoshenko: Engineering Legends
6 pages