0% found this document useful (0 votes)

71 views1 page

15BCE0435 - Lab 3

The document analyzes the linear relationships between runs scored and various baseball statistics from a dataset of MLB teams. Scatter plots are created to visualize the relationships, and correlation coefficients are calculated to quantify the linearity. Hits have the strongest positive linear correlation to runs, while strikeouts have a moderate negative linear correlation. Other variables like stolen bases and on-base percentage show weaker linear relationships to runs scored.

Uploaded by

rsp varun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views1 page

15BCE0435 - Lab 3

Uploaded by

rsp varun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

In [2]: # DATA VISUALIZATION LAB: LINEAR REGRESSION USING NUMPY

# TANAYA YADAV - 15BCE0461

# LAB SLOT L53+L54

In [2]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt

In [3]: mlb11 = pd.read_csv('/Users/tanaya/Semester 7/Data Visualization/mlb11.csv')

In [4]: # QUESTION 1

# What type of plot would you use to display the relationship between runs and one of the other nume
rical variables?
# Plot this relationship using the variable at bats as the predictor. Does the relationship look lin
ear?
# If you knew a team’s at bats, would you be comfortable using a linear model to predict the number
of runs?

In [5]: # Scatter Plot to present two numerical variables simultaneously because it permits the relationship
between the variables to be examined with ease.
# Linear relationship between runs scored in a season and a number of other player statistics.
# If the relationship looks linear, we can quantify the strength of the relationship with the correl
ation coefficient.

In [6]: dataframe1=mlb11[['runs','at_bats']]
dataframe1

Out[6]:
runs at_bats

0 855 5659

1 875 5710

2 787 5563

3 730 5672

4 762 5532

5 718 5600

6 867 5518

7 721 5447

8 735 5544

9 615 5598

10 708 5585

11 644 5436

12 654 5549

13 735 5612

14 667 5513

15 713 5579

16 654 5502

17 704 5509

18 731 5421

19 743 5559

20 619 5487

21 625 5508

22 610 5421

23 645 5452

24 707 5436

25 641 5528

26 624 5441

27 570 5486

28 593 5417

29 556 5421

In [7]: dataframe1.columns

Out[7]: Index(['runs', 'at_bats'], dtype='object')

In [8]: # Scatter Plot (X= Runs, Y= At Bats)

# The relationship looks moderately linear but not strong enough to be able to comfortably use a lin
ear model to predict the number of runs.

plot1=dataframe1.plot.scatter(x='runs', y='at_bats', c='pink')

In [9]: # Since the relationship is linear we can quanitfy the strength of the relationship with the correla
tion coefficient.

dataframe1.corr(method='pearson', min_periods=1)

Out[9]:
runs at_bats

runs 1.000000 0.610627

at_bats 0.610627 1.000000

In [46]: at_bats = np.array([5659,5710,5563,5672,5532,5600,5518,5447,5544,5598,5585,5436,5549,5612,5513,5579,

5502,5509,
5421,5559,5487,5508,5421,5452,5436,5528,5441,5486,5417,5421])

# Linear Model

linear_model1= np.polyfit(runs, at_bats,1)

linear_model1

Out[46]: array([5.91333589e-01, 5.11335102e+03])

In [48]: dataframe_atbats=mlb11[['runs', 'at_bats']]

plot_atbats=dataframe_atbats.plot.scatter(x='runs', y='at_bats', c='grey')

In [52]: p = np.poly1d(linear_model)
p30 = np.poly1d(np.polyfit(runs, at_bats, 10))
xp = np.linspace(60,1000, 100)
plot3= plt.plot(runs, wins, '.', xp, p(xp), '-', xp, p30(xp), '--')
plt.ylim(60,110)
plt.show()

In [ ]: # If the team’s at bats was known using a linear model to predict the number of runs would be suitab
le.

# The correlation coefficient of X=RUNS; Y=AT_BATS is 0.610627.

In [10]: # QUESTION 2

# Choose another traditional variable from mlb11.csv that you think might be a good predictor of run
s.
# Produce a scatterplot of the two variables and fit a linear model.
# At a glance, does there seem to be a linear relationship?

In [11]: runs= np.array([855,875,787,730,762,718,867,721,735,615,708,644,654,735,667,713,654,704,731,743,619,

625,610,645,
707,641,624,570,593,556])

wins= np.array([96,90,95,71,90,77,97,96,73,56,69,82,71,79,86,102,79,80,94,81,63,72,72,74,91,89,80,86
,71,67])

# Linear Model

linear_model= np.polyfit(runs, wins,1)

linear_model

Out[11]: array([ 0.08315339, 23.29147734])

In [12]: # Taking 'Wins' as the traditional variable

dataframe2=mlb11[['runs', 'wins']]
plot2=dataframe2.plot.scatter(x='runs', y='wins', c='orange')

In [13]: # Correlation Coefficient between Runs and Wins

dataframe2.corr(method='pearson', min_periods=1)

Out[13]:
runs wins

runs 1.000000 0.600809

wins 0.600809 1.000000

In [14]: p = np.poly1d(linear_model)
p30 = np.poly1d(np.polyfit(runs, wins, 10))
xp = np.linspace(60,1000, 100)
plot3= plt.plot(runs, wins, '.', xp, p(xp), '-', xp, p30(xp), '--')
plt.ylim(60,110)
plt.show()

In [15]: # Yes, the relationship between X= Runs and Y= Wins seems to be LINEAR.

In [16]: # QUESTION 3

# Now that you can summarize the linear relationship between two variables, investigate the relation
ships between runs and each of the other five traditional variables.
# Which variable best predicts runs?
# Support your conclusion using the graphical and numerical methods we’ve discussed.

In [17]: # Variable 1 - HITS

hits= np.array([1599,1600,1540,1560,1513,1477,1452,1422,1429,1442,1434,1395,1423,1438,1394,1409,1387
,1380,1357,
1384,1357,1358,1325,1330,1324,1345,1319,1327,1284,1263])

# Linear Model

linear_model_2= np.polyfit(runs, hits,1)

linear_model_2

Out[17]: array([ 0.84592348, 822.16747161])

In [18]: dataframe3=mlb11[['runs', 'hits']]

plot3=dataframe3.plot.scatter(x='runs', y='hits', c='blue')

In [19]: # Numerical Prediction

# Correlation Coefficient between Runs and Hits

dataframe3.corr(method='pearson', min_periods=1)

Out[19]:
runs hits

runs 1.000000 0.801211

hits 0.801211 1.000000

In [20]: p = np.poly1d(linear_model)
p30 = np.poly1d(np.polyfit(runs, hits, 10))
xp = np.linspace(60,1000, 10)
plot4= plt.plot(runs, hits, '.', xp, p(xp), '-', xp, p30(xp), '--')
plt.ylim(1100,1500)
plt.show()

In [ ]: # Yes, the relationship between X= Runs and Y= HITS seems to be LINEAR.

In [21]: # Variable 2 - BAT_AVG

batting_average = np.array([0.283,0.280,0.277,0.275,0.273,0.264,0.263,0.261,0.258,0.258,0.257,0.257,
0.256,0.256,
0.253,0.253,0.252,0.25,0.25,0.249,0.247,0.247,0.244,0.244,0.244,0.243,0.
242,0.242,
0.237,0.233])
runs= np.array([855,875,787,730,762,718,867,721,735,615,708,644,654,735,667,713,654,704,731,743,619,
625,610,645,
707,641,624,570,593,556])
# Linear Model

linear_model_3= np.polyfit(runs, batting_average,1)

linear_model_3

Out[21]: array([1.25152321e-04, 1.68127684e-01])

In [22]: # Variable 3 - STRIKEOUTS

strikeouts = np.array([930,1108,1143,1006,978,1085,1138,1083,1201,1164,1120,1087,1202,1250,1086,1024
,989,1269,
1249,1184,1048,1244,1308,1094,1193,1260,1323,1122,1320,1280])

# Linear Model

linear_model_4= np.polyfit(runs, batting_average,1)

linear_model_4

Out[22]: array([1.25152321e-04, 1.68127684e-01])

In [24]: dataframe4=mlb11[['runs', 'strikeouts']]

plot4=dataframe4.plot.scatter(x='runs', y='strikeouts', c='orange')

In [25]: # Numerical Prediction

# Correlation Coefficient between Runs and Strikeouts

dataframe4.corr(method='pearson', min_periods=1)

Out[25]:
runs strikeouts

runs 1.000000 -0.411531

strikeouts -0.411531 1.000000

In [26]: p = np.poly1d(linear_model)
p30 = np.poly1d(np.polyfit(runs, strikeouts, 10))
xp = np.linspace(60,1000, 10)
plot4= plt.plot(runs, hits, '.', xp, p(xp), '-', xp, p30(xp), '--')
plt.ylim(1100,1500)
plt.show()

In [ ]: # No, the relationship between X= Runs and Y= Strikeouts does not seem to be LINEAR.

In [27]: # Variable 4 - STOLEN BASES

stolen_bases = np.array([143,102,49,153,57,130,147,94,118,118,81,126,69,97,135,96,81,89,133,131,92,9
5,108,117,
155,77,106,85,170,125])

# Linear Model

linear_model_5= np.polyfit(runs, stolen_bases,1)

linear_model_5

Out[27]: array([1.95487456e-02, 9.57409900e+01])

In [28]: dataframe5=mlb11[['runs', 'stolen_bases']]

plot5=dataframe5.plot.scatter(x='runs', y='stolen_bases', c='magenta')

In [29]: # Numerical Prediction

# Correlation Coefficient between Runs and Stolen Bases

dataframe5.corr(method='pearson', min_periods=1)

Out[29]:
runs stolen_bases

runs 1.000000 0.053981

stolen_bases 0.053981 1.000000

In [30]: p = np.poly1d(linear_model)
p30 = np.poly1d(np.polyfit(runs, stolen_bases, 10))
xp = np.linspace(60,1000, 10)
plot5= plt.plot(runs, hits, '.', xp, p(xp), '-', xp, p30(xp), '--')
plt.ylim(1100,1500)
plt.show()

In [ ]: # Yes, the relationship between X= Runs and Y= Stolen Bases seems to be WEAKLY LINEAR.

In [32]: # Variable 5 - NEW ON BASE

new_onbase = np.array([0.34,0.349,0.34,0.329,0.341,0.335,0.343,0.325,0.329,0.311,0.316,0.322,0.314,
0.326,
0.313,0.323,0.319,0.317,0.322,0.317,0.306,0.318,0.309,0.311,0.322,0.308,0.309
,
0.303,0.305,0.292])

# Linear Model

linear_model_6= np.polyfit(runs, new_onbase,1)

linear_model_6

Out[32]: array([1.50169403e-04, 2.16309169e-01])

In [33]: dataframe6=mlb11[['runs', 'new_onbase']]

plot6=dataframe6.plot.scatter(x='runs', y='new_onbase', c='purple')

In [34]: # Numerical Prediction

# Correlation Coefficient between Runs and New on Base

dataframe6.corr(method='pearson', min_periods=1)

Out[34]:
runs new_onbase

runs 1.000000 0.921469

new_onbase 0.921469 1.000000

In [53]: p = np.poly1d(linear_model)
p30 = np.poly1d(np.polyfit(runs, new_onbase, 10))
xp = np.linspace(60,1000, 100)
plot3= plt.plot(runs, wins, '.', xp, p(xp), '-', xp, p30(xp), '--')
plt.ylim(60,110)
plt.show()

Final Exam
No ratings yet
Final Exam
5 pages
Multple Linear Regression
No ratings yet
Multple Linear Regression
8 pages
HaritaSamhita 10664116
No ratings yet
HaritaSamhita 10664116
413 pages
Signed Off - Practical Research 1 G11 - q1 - Mod1 - QualiResearch - V3
100% (10)
Signed Off - Practical Research 1 G11 - q1 - Mod1 - QualiResearch - V3
24 pages
Stock Watson 3U ExerciseSolutions Chapter03 Students PDF
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter03 Students PDF
12 pages
Simple Regression Coursera PDF
No ratings yet
Simple Regression Coursera PDF
7 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
5 pages
Regression Model
No ratings yet
Regression Model
6 pages
Astros
No ratings yet
Astros
20 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
17 pages
Principal Component Analysis Regression Visualization and Interpretation
No ratings yet
Principal Component Analysis Regression Visualization and Interpretation
10 pages
Regression Using Spss
No ratings yet
Regression Using Spss
12 pages
Stats Lab 2
No ratings yet
Stats Lab 2
7 pages
ML Expt 3
No ratings yet
ML Expt 3
2 pages
Mitola Daniel Project
No ratings yet
Mitola Daniel Project
10 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Fdsa Lab Manual
No ratings yet
Fdsa Lab Manual
17 pages
正規化411210002
No ratings yet
正規化411210002
8 pages
正規化411210002
No ratings yet
正規化411210002
8 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Data Science Linear Regression
No ratings yet
Data Science Linear Regression
105 pages
Dsbda 4
No ratings yet
Dsbda 4
4 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
Dav Pracs
No ratings yet
Dav Pracs
9 pages
EDS - Jupyter Notebook
No ratings yet
EDS - Jupyter Notebook
9 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
Predicting Baseball Wins Using Machine Learning
No ratings yet
Predicting Baseball Wins Using Machine Learning
3 pages
Lab 14 112
No ratings yet
Lab 14 112
21 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
MBA 8040 MODEL BUILDING With Data Transformations PDF
No ratings yet
MBA 8040 MODEL BUILDING With Data Transformations PDF
17 pages
Business Statistics Assignment
No ratings yet
Business Statistics Assignment
7 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
ML Lab
No ratings yet
ML Lab
12 pages
Four Factors Celtics Start Jupyter Notebook
No ratings yet
Four Factors Celtics Start Jupyter Notebook
13 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
KRAI Practical
No ratings yet
KRAI Practical
14 pages
Linear Regression (BA)
No ratings yet
Linear Regression (BA)
13 pages
MLR Example 2predictors
No ratings yet
MLR Example 2predictors
5 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
Matlabnoteschap 06
No ratings yet
Matlabnoteschap 06
34 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
No ratings yet
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
4 pages
Dream Team
No ratings yet
Dream Team
4 pages
Assignment 1 - Answer
No ratings yet
Assignment 1 - Answer
11 pages
Project Three: Simple Linear Regression and Multiple Regression
No ratings yet
Project Three: Simple Linear Regression and Multiple Regression
10 pages
ML Exp 3-7 Manuval
No ratings yet
ML Exp 3-7 Manuval
21 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
ML 1
No ratings yet
ML 1
16 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Python File
No ratings yet
Python File
5 pages
Experiment 1111
No ratings yet
Experiment 1111
25 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Sales and Advertising
No ratings yet
Sales and Advertising
14 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
1-Walk Up Experience - Workshop Excerpt
No ratings yet
1-Walk Up Experience - Workshop Excerpt
34 pages
Profile 20102376
No ratings yet
Profile 20102376
2 pages
GRE Cheatsheet
No ratings yet
GRE Cheatsheet
11 pages
The Indian Origin of Paternal Haplogroup R1a1 Substantiates The Autochthonous Origin of Brahmins and The Caste System
No ratings yet
The Indian Origin of Paternal Haplogroup R1a1 Substantiates The Autochthonous Origin of Brahmins and The Caste System
3 pages
Biswa Course
No ratings yet
Biswa Course
1 page
DV - Question Bank
No ratings yet
DV - Question Bank
7 pages
Chap 012
75% (4)
Chap 012
91 pages
Common Reasons of Failing of Students in Their Academic Performance
No ratings yet
Common Reasons of Failing of Students in Their Academic Performance
9 pages
Cover Letter-Fieldfacilitator
No ratings yet
Cover Letter-Fieldfacilitator
1 page
Data Scientist - Careers - McKinsey & Company
No ratings yet
Data Scientist - Careers - McKinsey & Company
3 pages
An - Aconitus An - Maculatus: Data Kepadatan Vektor Dan Kasus Malaria Pada Di Kecamatan "X" Tahun 2011
No ratings yet
An - Aconitus An - Maculatus: Data Kepadatan Vektor Dan Kasus Malaria Pada Di Kecamatan "X" Tahun 2011
5 pages
Cooper Et Al. 2021
No ratings yet
Cooper Et Al. 2021
6 pages
Business Systems
No ratings yet
Business Systems
26 pages
Solution Test 2
100% (1)
Solution Test 2
11 pages
13 Correlation Analysis 1633738603
No ratings yet
13 Correlation Analysis 1633738603
17 pages
TYBSc (CS) WT & DS Practical Slips (1) (1) - Removed - Removed - Removed - Removed - Removed
No ratings yet
TYBSc (CS) WT & DS Practical Slips (1) (1) - Removed - Removed - Removed - Removed - Removed
25 pages
Ba4206 - Ba Answer Key
No ratings yet
Ba4206 - Ba Answer Key
6 pages
Thesis-Dissertation Writing Guideline Final Version PDF
No ratings yet
Thesis-Dissertation Writing Guideline Final Version PDF
48 pages
T - KEYS To Pastexam1
No ratings yet
T - KEYS To Pastexam1
13 pages
Product Analyst Resume
100% (1)
Product Analyst Resume
7 pages
Data Storytelling
No ratings yet
Data Storytelling
2 pages
M-12-SS-AInML in Sales - Report
No ratings yet
M-12-SS-AInML in Sales - Report
27 pages
New PRJ
No ratings yet
New PRJ
23 pages
Hetero Stata
No ratings yet
Hetero Stata
2 pages
Regression Analysis - Classical Assumptions Additional Notes
No ratings yet
Regression Analysis - Classical Assumptions Additional Notes
7 pages
Lumacad Submission 01 Methodology
No ratings yet
Lumacad Submission 01 Methodology
2 pages
Chi-Square and Analysis of Variance (ANOVA)
No ratings yet
Chi-Square and Analysis of Variance (ANOVA)
21 pages
Centre For Statistics (SRM Ist) : Unit Ii: Data Analytic Unit
No ratings yet
Centre For Statistics (SRM Ist) : Unit Ii: Data Analytic Unit
2 pages
Caddo Parish Criminal Justice Task Force Report
100% (1)
Caddo Parish Criminal Justice Task Force Report
36 pages
Correlation
No ratings yet
Correlation
12 pages
CS 2032 Datawarehousing & Data Mining QB Topic Wise
No ratings yet
CS 2032 Datawarehousing & Data Mining QB Topic Wise
11 pages
M3 DS21-Data Mining Dan Statistik - Rev
No ratings yet
M3 DS21-Data Mining Dan Statistik - Rev
101 pages