0% found this document useful (0 votes)
34 views

Assignment 1

The document summarizes the steps taken to clean, analyze, and visualize vehicle dataset. It merges 3 datasets, identifies and handles missing/spurious values, creates pairwise plots to examine relationships between variables, calculates correlations, identifies outliers in Eligible attribute, and provides summary statistics of the dataset. Key analyses include merging datasets, imputing missing values, identifying outliers, and examining correlations between variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Assignment 1

The document summarizes the steps taken to clean, analyze, and visualize vehicle dataset. It merges 3 datasets, identifies and handles missing/spurious values, creates pairwise plots to examine relationships between variables, calculates correlations, identifies outliers in Eligible attribute, and provides summary statistics of the dataset. Key analyses include merging datasets, imputing missing values, identifying outliers, and examining correlations between variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [2]:

In [3]:

In [4]:

Out[4]:

Vehicle RAM storage Trustfactor Distance TransmissionRate Eligible

0 1 51.8125 34.83125 91 88.0 77.777778 68.684306

1 2 91.0625 18.09375 87 24.9 70.370370 58.285324

2 3 84.2875 92.60000 15 30.4 40.740741 52.605648

3 4 45.1750 76.64375 25 84.9 66.666667 59.677083

4 5 63.8625 19.33125 66 78.0 85.185185 62.475787

In [5]:

Out[5]:

Vehicle RAM storage Trustfactor Distance TransmissionRate Eligible

0 1 46.2125 37.38125 23 94.0 48.148148 49.748380

1 2 59.2125 44.24375 81 32.7 70.370370 57.505324

2 3 87.1750 83.56250 94 42.4 48.148148 71.057130

3 4 19.9000 17.38125 77 17.2 44.444444 35.185139

4 5 77.6500 45.91250 64 92.8 14.814815 59.035463


14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [6]:

Out[6]:

Vehicle RAM storage Trustfactor Distance TransmissionRate Eligible

0 1 59.4875 45.61875 12 25.8 77.777778 44.136806

1 2 77.6625 7.56875 58 14.8 48.148148 41.235880

2 3 50.0875 38.43750 18 14.9 55.555556 35.396111

3 4 32.9250 42.66875 29 9.7 18.518519 26.562454

4 5 72.2625 48.36875 54 12.8 33.333333 44.152917

#Q1 Merge 3 datasets

In [8]:

Out[8]:

Vehicle RAM storage Trustfactor Distance TransmissionRate Eligible

0 1 51.8125 34.83125 91 88.0 77.777778 68.684306

1 2 91.0625 18.09375 87 24.9 70.370370 58.285324

2 3 84.2875 92.60000 15 30.4 40.740741 52.605648

3 4 45.1750 76.64375 25 84.9 66.666667 59.677083

4 5 63.8625 19.33125 66 78.0 85.185185 62.475787

... ... ... ... ... ... ... ...

245 246 19.3875 46.46250 15 90.7 81.481481 50.606296

246 247 96.5375 19.13750 28 57.7 18.518519 43.978704

247 248 18.9375 9.33750 93 89.8 33.333333 48.881667

248 249 88.7750 95.56250 90 85.2 22.222222 76.351944

249 250 69.8250 51.82500 24 69.6 85.185185 60.087037

23550 rows × 7 columns

##Q2 Find missing and spurious values and impute them using proper methods
14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [9]:

Out[9]:

(23550, 7)

In [11]:

Out[11]:

Vehicle 0
RAM 0
storage 0
Trustfactor 0
Distance 0
TransmissionRate 0
Eligible 0
dtype: int64

In [25]:

Out[25]:

['RAM', 'storage', 'Trustfactor', 'Distance', 'TransmissionRate', 'Eli


gible']
14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [41]:

##Q3 Pairwise plot


14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [42]:

Out[42]:

<seaborn.axisgrid.PairGrid at 0x793d025ec250>

<Figure size 1200x1200 with 0 Axes>

##Q4 Correlation
14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [45]:

df.corr()

Out[45]:

Vehicle RAM storage Trustfactor Distance TransmissionRate El

Vehicle 1.000000 -0.004802 -0.008946 -0.003112 0.000961 0.001025 -0.00

RAM -0.004802 1.000000 0.003719 0.005694 -0.008105 0.002096 0.42

storage -0.008946 0.003719 1.000000 0.002591 0.001866 0.006379 0.45

Trustfactor -0.003112 0.005694 0.002591 1.000000 -0.008402 0.012408 0.44

Distance 0.000961 -0.008105 0.001866 -0.008402 1.000000 0.004374 0.47

TransmissionRate 0.001025 0.002096 0.006379 0.012408 0.004374 1.000000 0.45

Eligible -0.006483 0.422447 0.454903 0.441740 0.470783 0.454520 1.00


14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [47]:

Out[47]:

<Axes: >

##Q5 Find outliers


14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [49]:

#Eligibile attributes has outliers

##Q6 Summary of statistics


14/09/2023, 19:38 Assignment_1 (1) - Jupyter Notebook

In [50]:

Out[50]:

Vehicle RAM storage Trustfactor Distance TransmissionRate

count 23550.000000 23550.000000 23550.000000 23550.000000 23550.000000 23550.000000

mean 7875.500000 56.263255 53.227972 55.233291 50.323745 55.462924

std 5649.902112 25.294350 26.932319 26.222575 28.529604 26.619421

min 1.000000 12.500000 6.250000 10.000000 1.500000 11.111111

25% 2819.250000 34.465625 30.100000 33.000000 25.600000 33.333333

50% 6925.500000 56.243750 53.121875 55.000000 50.100000 55.555556

75% 12812.750000 78.246875 76.585938 78.000000 75.100000 77.777778

max 18700.000000 100.000000 100.000000 100.000000 100.000000 100.000000

In [ ]:

Type Markdown and LaTeX: 𝛼

Type Markdown and LaTeX: 𝛼


2

You might also like