Jamboree Linear Regression Version 2 Jupyter Notebook

The document details a Jupyter Notebook analysis on graduate admissions using linear regression techniques. It includes data preprocessing, visualizations of variable distributions, correlation analysis, and model training with various regression algorithms. The results indicate the performance of different models in predicting the 'Chance of Admit' based on several factors like GRE scores and CGPA.

Uploaded by

MithileshBhavarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views12 pages

Jamboree Linear Regression Version 2 Jupyter Notebook

Uploaded by

MithileshBhavarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

07/12/2023, 22:39 Jamboree_Linear_Regression_Version_2 - Jupyter Notebook

In [ ]: import pandas as pd
import warnings
#warnings.filterwarnings("ignore")
df = pd.read_csv('/Users/suraaj/Desktop/Datasets/Admission_Predict_Ver1.1.csv')
df.head()

Out[1]: Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA Research Chance of Admit

0 1 337 118 4 4.5 4.5 9.65 1 0.92

1 2 324 107 4 4.0 4.5 8.87 1 0.76

2 3 316 104 3 3.0 3.5 8.00 1 0.72

3 4 322 110 3 3.5 2.5 8.67 1 0.80

4 5 314 103 2 2.0 3.0 8.21 0 0.65

In [ ]: df.shape

Out[2]: (500, 9)

Now, let us drop the irrelevant column and check if there are any null values in the dataset

In [ ]: df = df.drop(['Serial No.'], axis=1)

df.isnull().sum()

Out[3]: GRE Score 0

TOEFL Score 0
University Rating 0
SOP 0
LOR 0
CGPA 0
Research 0
Chance of Admit 0
dtype: int64

Lets see the distribution of the variables of graduate applicants.

localhost:8888/notebooks/Desktop/DSML/dsml-case-studies/Jamboree_Linear_Regression_Version_2.ipynb 1/12
07/12/2023, 22:39 Jamboree_Linear_Regression_Version_2 - Jupyter Notebook

In [ ]: imporx plt
import seaborn as sns

fig = sns.distplot(df['GRE Score'], kde=False)
plt.title("Distribution of GRE Scores")
plt.show()

fig = sns.distplot(df['TOEFL Score'], kde=False)
plt.title("Distribution of TOEFL Scores")
plt.show()

fig = sns.distplot(df['University Rating'], kde=False)
plt.title("Distribution of University Rating")
plt.show()

fig = sns.distplot(df['SOP'], kde=False)
plt.title("Distribution of SOP Ratings")
plt.show()

fig = sns.distplot(df['CGPA'], kde=False)
plt.title("Distribution of CGPA")
plt.show()

plt.show()

It is clear from the distributions, students with varied merit apply for the university.

Understanding the relation between different factors responsible for graduate admissions

In [ ]: fig = sns.regplot(x="GRE Score", y="TOEFL Score", data=df)

plt.title("GRE Score vs TOEFL Score")
plt.show()

localhost:8888/notebooks/Desktop/DSML/dsml-case-studies/Jamboree_Linear_Regression_Version_2.ipynb 2/12
07/12/2023, 22:39 Jamboree_Linear_Regression_Version_2 - Jupyter Notebook

People with higher GRE Scores also have higher TOEFL Scores which is justified because both TOEFL and GRE
have a verbal section which although not similar are relatable