0% found this document useful (0 votes)
84 views

Assignment CSE-520

This document contains an assignment submission for a statistics for data science course. The assignment addresses identifying qualitative and quantitative variables, appropriate graphical representations for qualitative variables, creating a histogram and commenting on it for a quantitative variable (mpg), and creating and commenting on stem-leaf plots for all numerical variables. Pie charts are used to represent engine type and transmission type variables. A histogram is used to represent the mpg variable, showing it is right-skewed and multi-modal. Stem-leaf plots are created and commented on for mpg and cylinders variables to examine spread, skewness, outliers, and modality.

Uploaded by

Shafat91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Assignment CSE-520

This document contains an assignment submission for a statistics for data science course. The assignment addresses identifying qualitative and quantitative variables, appropriate graphical representations for qualitative variables, creating a histogram and commenting on it for a quantitative variable (mpg), and creating and commenting on stem-leaf plots for all numerical variables. Pie charts are used to represent engine type and transmission type variables. A histogram is used to represent the mpg variable, showing it is right-skewed and multi-modal. Stem-leaf plots are created and commented on for mpg and cylinders variables to examine spread, skewness, outliers, and modality.

Uploaded by

Shafat91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Statistics for Data Science

Course Code: CSE-520


Assignment - 01

Submitted to: Dr. Md. Sohel Rana


Associate Professor
Department of Mathematical and Physical Sciences
East West University

Submitted by: Tarik Adnan


Language used: Python 3.8
ID no: 2019-03-96-003
Department of CSE
Date of submission: 20 August, 2020
Q1. Identify the qualitative and quantitative variables.

Answer:

Code Snippet:
import pandas as pd

# Read the CSV file


data = pd.read_csv('F:/File/Car_data.csv')
print(data)

Output:
C:\Python38\python.exe "C:/Users/Tarik Adnan/PycharmProjects/CSE520/read_csv.py"

Car Model mpg cyl disp hp ... qsec vs am gear carb


0 Mazda RX4 21.0 6 160.0 110 ... 16.46 0 1 4 4
1 Mazda RX4 Wag 21.0 6 160.0 110 ... 17.02 0 1 4 4
2 Datsun 710 22.8 4 108.0 93 ... 18.61 1 1 4 1
3 Hornet 4 Drive 21.4 6 258.0 110 ... 19.44 1 0 3 1
4 Hornet Sportabout 18.7 8 360.0 175 ... 17.02 0 0 3 2
5 Valiant 18.1 6 225.0 105 ... 20.22 1 0 3 1
6 Duster 360 14.3 8 360.0 245 ... 15.84 0 0 3 4
7 Merc 240D 24.4 4 146.7 62 ... 20.00 1 0 4 2
8 Merc 230 22.8 4 140.8 95 ... 22.90 1 0 4 2
9 Merc 280 19.2 6 167.6 123 ... 18.30 1 0 4 4
10 Merc 280C 17.8 6 167.6 123 ... 18.90 1 0 4 4
11 Merc 450SE 16.4 8 275.8 180 ... 17.40 0 0 3 3
12 Merc 450SL 17.3 8 275.8 180 ... 17.60 0 0 3 3
13 Merc 450SLC 15.2 8 275.8 180 ... 18.00 0 0 3 3
14 Cadillac Fleetwood 10.4 8 472.0 205 ... 17.98 0 0 3 4
15 Lincoln Continental 10.4 8 460.0 215 ... 17.82 0 0 3 4
16 Chrysler Imperial 14.7 8 440.0 230 ... 17.42 0 0 3 4
17 Fiat 128 32.4 4 78.7 66 ... 19.47 1 1 4 1
18 Honda Civic 30.4 4 75.7 52 ... 18.52 1 1 4 2
19 Toyota Corolla 33.9 4 71.1 65 ... 19.90 1 1 4 1
20 Toyota Corona 21.5 4 120.1 97 ... 20.01 1 0 3 1
21 Dodge Challenger 15.5 8 318.0 150 ... 16.87 0 0 3 2
22 AMC Javelin 15.2 8 304.0 150 ... 17.30 0 0 3 2
23 Camaro Z28 13.3 8 350.0 245 ... 15.41 0 0 3 4
24 Pontiac Firebird 19.2 8 400.0 175 ... 17.05 0 0 3 2
25 Fiat X1-9 27.3 4 79.0 66 ... 18.90 1 1 4 1
26 Porsche 914-2 26.0 4 120.3 91 ... 16.70 0 1 5 2
27 Lotus Europa 30.4 4 95.1 113 ... 16.90 1 1 5 2
28 Ford Pantera L 15.8 8 351.0 264 ... 14.50 0 1 5 4
29 Ferrari Dino 19.7 6 145.0 175 ... 15.50 0 1 5 6
30 Maserati Bora 15.0 8 301.0 335 ... 14.60 0 1 5 8
31 Volvo 142E 21.4 4 121.0 109 ... 18.60 1 1 4 2

[32 rows x 12 columns]


From the data, we can find out the Qualitative and Quantitative variables.
Quantitative variables:
1. mpg (Miles/(US) gallon)
2. cyl (Number of cylinders)
3. disp (Displacement (cu.in.))
4. hp (Gross horsepower)
5. drat (Rear axle ratio)
6. wt (Weight (1000 lbs)
7. qsec (1/4 mile time)
8. gear (Number of forward gears)
9. carb (Number of carburetors)

Qualitative variables:
1. vs (Engine (0 = V-shaped, 1 = straight))
2. am (Transmission (0 = automatic, 1 = manual))

Q2. Give the appropriate graphical representation for qualitative variables. Comment the
graphs.
Answer:

a. Graphical representation for the variable: “vs (Engine (0 = V-shaped, 1 = straight))”


Code Snippet:
import matplotlib.pyplot as plt
import pandas as pd

fig, ax = plt.subplots(figsize=(10, 6), subplot_kw=dict(aspect="equal"))

# Read CSV
data = pd.read_csv('F:/File/Car_data.csv').vs
ingredients= ['V -Shaped', 'Straight']
data.value_counts().plot.pie(autopct='%1.2f%%', startangle=90,
counterclock=True)

ax.set_title("A Pie Representation of Engine Type (V -Shaped/Straight)


Distribution")
ax.legend(ingredients,
title="Engine Type: ",
loc="center left",
bbox_to_anchor=(1, 0, 0.5, 1))

plt.xlabel('Feature Identifier: ‘vs’')


plt.ylabel('')
plt.show()
Output:

Comment on the plot:


From the above representation it is visible that, 56% of the total cars from the data set is of V –
Shaped engine type and 44% of the total cars from the data set is of Straight engine type.
b. Graphical representation for the variable: “am (Transmission (0 = automatic, 1 =
manual))”
Code Snippet:
import matplotlib.pyplot as plt
import pandas as pd

fig, ax = plt.subplots(figsize=(7, 4), subplot_kw=dict(aspect="equal"))

# Read CSV data


data = pd.read_csv('F:/File/Car_data.csv').am
ingredients = ['Automatic', 'Manual']
data.value_counts().plot.pie(autopct='%1.1f%%', startangle=90,
counterclock=True)

ax.set_title("A Pie Representation of Transmission Type (Automatic/Manual)


Distribution")
ax.legend(ingredients,
title="Transmission Type: ",
loc="center left",
bbox_to_anchor=(1, 0, 0.5, 1))

plt.ylabel('')
plt.xlabel('Feature Identifier: ‘am’')
plt.show()

Output:
Comment on the plot:
From the above representation it is visible that, ~60% of the total cars from the data set is of
Automatic transmission type and ~40% of the total cars from the data set is of Manual
transmission type.
Q3. Give a histogram for the variable ‘mpg (Miles/(US) gallon)’ and comment the plot.
Answer:

Code Snippet:

import pandas as pd
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(6, 4))
data = pd.read_csv('F:/File/Car_data.csv', quoting=3)['mpg']
histogram = plt.hist(data, width=2.2, color='#0504aa', alpha=0.7)
plt.xlim([8, 35])
plt.grid(axis='y', alpha=0.75)
plt.ylabel('Frequency')
plt.xlabel('‘mpg (Miles/(US) gallon)’')
plt.title('Normal Distribution Histogram of Feature: ‘mpg (Miles/(US)
gallon)’')
plt.show()

Output:
Comment on the plot:
 The histogram shows that the majority of the cars have a mileage between 12-24
Miles/gallon
 This is a unimodal histogram which has a hump in between 15-20
 This is not a symmetrical histogram
 The upper tail is longer than the lower tail, so it is positively skewed.

Q4. Give stem-leaf-plot for all numerical variables and comment the plots.
Answer:
1. Stem –leaf plot for feature ‘mpg (Miles/(US) gallon)’
Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()

Input:

Type the feature name to plot stem-leaf: mpg


Input scale to plot stem-leaf: 1
Output:
Comment on the plot:
 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘mpg (Miles/(US) gallon)’ usage of the cars. The values range from 10.4 to 33.9.

 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is right skewed.

 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘mpg’ it is not indicating any outliers.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 2 peaks

2. Stem –leaf plot for feature ‘cyl (Number of cylinders)’


Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()
Input:

Type the feature name to plot stem-leaf: cyl


Input scale to plot stem-leaf: 1

Output:
Comment on the plot:
 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘cyl (Number of cylinders)’ of the cars. The values range from 4 to 8.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is not skewed.

 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘cyl it is not indicating any outliers.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 3 peaks

3. Stem – leaf plot for feature ‘disp (Displacement (cu.in.))’


Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()
Input:

Type the feature name to plot stem-leaf: disp


Input scale to plot stem-leaf: 10

Output:
Comment on the plot:
 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘disp (Displacement (cu.in.))’ of the cars. The values range from 71.1 to 472.0.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is right skewed.

 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘disp it is not indicating any outliers.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 2 peaks

4. Stem – leaf plot for feature ‘hp (Gross horsepower)’


Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()
Input:

Type the feature name to plot stem-leaf: hp


Input scale to plot stem-leaf: 10

Output:
Comment on the plot:
 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘hp (Gross horsepower)’ of the cars. The values range from 52 to 335.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is right skewed.

 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘hp it is indicating an outliers which is 335.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 4 peaks

5. Stem – leaf plot for feature ‘drat (Rear axle ratio)’


Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()
Input:

Type the feature name to plot stem-leaf: drat


Input scale to plot stem-leaf: .1

Output:
Comment on the plot:
 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘drat (Rear axle ratio)’ of the cars. The values range from 2.76 to 4.93.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is right skewed.

 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘drat’ is indicating an outliers which is 4.93.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 2 peaks

6. Stem – leaf plot for feature ‘wt (Weight (1000 lbs))’


Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()
Input:

Type the feature name to plot stem-leaf: wt


Input scale to plot stem-leaf: 1

Output:

Comment on the plot:


 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘drat (Rear axle ratio)’ of the cars. The values range from 1.513 to 5.424.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is right skewed.

 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘wt is not indicating any outliers.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 1 peak.

7. Stem – leaf plot for feature ‘qsec (1/4 mile time)’


Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()

Input:

Type the feature name to plot stem-leaf: qsec


Input scale to plot stem-leaf: 1
Output:

Comment on the plot:


 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘qsec (1/4 mile time)’ of the cars. The values range from 14.5 to 22.9.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is symmetrical.

 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘qsec is indicating an outliers which is 22.9.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 1 peak.

8. Stem – leaf plot for feature ‘gear (Number of forward gears) ’


Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()

Input:

Type the feature name to plot stem-leaf: gear


Input scale to plot stem-leaf: 1
Output:

Comment on the plot:


 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘gear (Number of forward gears)’ of the cars. The values range from 3 to 5.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is symmetric.


 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘gear is not indicating any outlier.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 1 peak.

9. Stem – leaf plot for feature ‘carb (Number of carburetors)’

Code Snippet:
import pandas as pd
import stemgraphic
import matplotlib.pyplot as plt
col_name = input("Type the feature name to plot stem-leaf: ")
input_scale = float(input("Input scale to plot stem-leaf: "))

data = pd.read_csv('F:/File/Car_data.csv')[col_name]
y = pd.Series(data)
fig, ax = stemgraphic.stem_graphic(data, scale=input_scale)
fig.set_figheight(8)
fig.set_figwidth(6)

plt.title('Stem-Leaf plot of feature:' + " " + col_name)


plt.show()

Input:

Type the feature name to plot stem-leaf: carb


Input scale to plot stem-leaf: 1
Output:

Comment on the plot:


 Spread
The spread shows how much the data vary. The following stem-and-leaf plot shows
‘gear (Number of forward gears)’ of the cars. The values range from 1 to 8.
 Skewed data

When data are skewed, the majority of the data are located on the high or low side
of the graph. Skewness indicates that the data may not be normally
distributed. Often, skewness is easiest to detect with a histogram or a boxplot.

The following stem-and-leaf plot is right skewed.


 Outliers

Outliers, which are data values that are far away from other data values, can strongly
affect the results. Often, outliers are easiest to identify on a boxplot. On a stem-and-
leaf plot, isolated values at the ends identify possible outliers.
The above stem-leaf plot of feature ‘carb is indicating an outliers which is 8.

 Multi-modal data

Multi-modal data have multiple peaks, also called modes. Multi-modal data often
indicate that important variables are not yet accounted for.
From the above stem-leaf plot it is visible that it has 2 peaks.
Q5. Find the boxplot for the variable ‘wt(Weight (1000 lbs))’. Do you identify outliers from
this data? If so, how will you interpret those car models?
Answer:
Code Snippet:
import pandas
import matplotlib.pyplot as plt
import glob
files = glob.glob('F:/File/*.csv')
fig = plt.figure(figsize=(10, 4))

for file in files:


df = pandas.read_csv(file)
plt.title('Box-Whiskers Plot For The Feature: ‘wt(Weight (1000 lbs))’')
plt.xlim([1, 6])
plt.xlabel('Weight (1000 lbs)')
plt.ylabel('Feature Identifier')
box = df.boxplot(column=['wt'], vert=0, color='b')
output = df.wt.describe()
print(output)
plt.show()

Output:
Output of describe() function:
C:\Python38\python.exe "C:/Users/Tarik
Adnan/PycharmProjects/CSE520/Box_Whiskers_Plot_WT.py"

count 32.000000
mean 3.217250
std 0.978457
min 1.513000
25% 2.581250
50% 3.325000
75% 3.610000
max 5.424000
Name: wt, dtype: float64

Comment on the plot:

 It is visible from the box whiskers plot that there is 3 outliers. The car models that fall
into outlier are listed below
a. Cadillac Fleetwood (Weight (1000 lbs: 5.25)
b. Lincoln Continental (Weight (1000 lbs: 5.424)
c. Chrysler Imperial (Weight (1000 lbs: 5.345)
 The 75% cars have the weight (1000 lbs) in between 3.610
 The mean of the data for the variable “wt” is 3.217250
 The outlier labeled cars are way too heavy in terms of Weight (1000 lbs) from the
upper fence 4.084.
Q6. Which car models have less ‘mpg’ than the first quarter in terms of less ‘mpg’? Also, find
car models that have more ‘mpg’ than the third quarter.
Answer:
Code Snippet:
import pandas
import matplotlib.pyplot as plt
import glob
files = glob.glob('F:/File/*.csv')
fig = plt.figure(figsize=(6, 4))

for file in files:


df = pandas.read_csv(file)
plt.title('Box-Whiskers Plot For The Feature: ‘mpg (Miles/(US) gallon)’')
plt.xlim([0, 40])
plt.xlabel('Miles/(US) gallon')
plt.ylabel('Feature Identifier')
box = df.boxplot(column=['mpg'], vert=0, color='b')
output = df.mpg.describe()
print(output)
plt.show()

Output:
Output of describe() function:
C:\Python38\python.exe "C:\Users\Tarik
Adnan\PycharmProjects\CSE520\Box_Whiskers_Plot_MPG.py"

count 32.000000
mean 20.090625
std 6.026948
min 10.400000
25% 15.425000
50% 19.200000
75% 22.800000
max 33.900000
Name: mpg, dtype: float64

Comment on the plot:

 It is visible from the box whiskers plot that there is 1 outlier. The car model that falls
into outlier is listed below
a. Toyota Corolla (Miles/(US) gallon: 33.9)
 First quarter (25%) in terms of ‘mpg’ is 15.425. There are total 8 Car models that have
less ‘mpg’ than the first quarter in terms of less ‘mpg’. Those are:
a. Duster 360 (Miles/(US) gallon: 14.3)
b. Merc 450SLC (Miles/(US) gallon: 15.2)
c. Cadillac Fleetwood (Miles/(US) gallon: 10.4)
d. Lincoln Continental (Miles/(US) gallon: 10.4)
e. Chrysler Imperial (Miles/(US) gallon: 14.7)
f. AMC Javelin (Miles/(US) gallon: 15.2)
g. Camaro Z28 (Miles/(US) gallon: 13.3)
h. Maserati Bora (Miles/(US) gallon: 15)
 Third quarter (75%) in terms of ‘mpg’ is 22.80. There are total 7 car models that have
more ‘mpg’ than the third quarter. Those are:
a. Merc 240D (Miles/(US) gallon: 24.4)
b. Fiat 128 (Miles/(US) gallon: 32.4)
c. Honda Civic (Miles/(US) gallon: 30.4)
d. Toyota Corolla (Miles/(US) gallon: 33.9)
e. Fiat X1-9 (Miles/(US) gallon: 27.3)
f. Porsche 914-2 (Miles/(US) gallon: 26.0)
g. Lotus Europa (Miles/(US) gallon: 30.4)

You might also like