0% found this document useful (0 votes)
36 views

Lab Assignment 6

This document shows how to load and explore automobile data from a CSV file using Pandas in Python. It loads the data, previews the first 5 rows, checks data types and column details. It then describes numeric columns, finds minimum and maximum highway fuel efficiency, counts unique fuel types and engine locations, extracts brand names from names, and renames columns before checking final column details. The document performs basic data loading, cleaning and exploration of an automotive dataset.

Uploaded by

Evan Reifsnyder
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Lab Assignment 6

This document shows how to load and explore automobile data from a CSV file using Pandas in Python. It loads the data, previews the first 5 rows, checks data types and column details. It then describes numeric columns, finds minimum and maximum highway fuel efficiency, counts unique fuel types and engine locations, extracts brand names from names, and renames columns before checking final column details. The document performs basic data loading, cleaning and exploration of an automotive dataset.

Uploaded by

Evan Reifsnyder
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

In [1]:

import pandas as pd

In [2]:
cars = pd.read_csv('cars.csv')

In [3]:
cars.head(5)

Out[3]: car_ID symboling CarName fueltype aspiration doornumber carbody drivewheel engin
0 1 3 alfa-romero
giulia gas std two convertible rwd
1 2 3 alfa-romero
stelvio gas std two convertible rwd
2 3 alfa-romero
1 Quadrifoglio gas std two hatchback rwd
3 4 2 audi 100 ls gas std four sedan fwd
4 5 2 audi 100ls gas std four sedan 4wd
5 rows × 26 columns
In [4]:
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 car_ID 205 non-null int64
1 symboling 205 non-null int64
2 CarName 205 non-null object
3 fueltype 205 non-null object
4 aspiration 205 non-null object
5 doornumber 205 non-null object
6 carbody 205 non-null object
7 drivewheel 205 non-null object
8 enginelocation 205 non-null object
9 wheelbase 205 non-null float64
10 carlength 205 non-null float64
11 carwidth 205 non-null float64
12 carheight 205 non-null float64
13 curbweight 205 non-null int64
14 enginetype 205 non-null object
15 cylindernumber 205 non-null object
16 enginesize 205 non-null int64
17 fuelsystem 205 non-null object
18 boreratio 205 non-null float64
19 stroke 205 non-null float64
20 compressionratio 205 non-null float64
21 horsepower 205 non-null int64
22 peakrpm 205 non-null int64
23 citympg 205 non-null int64
24 highwaympg 205 non-null int64
25 price 205 non-null float64
dtypes: float64(8), int64(8), object(10)
memory usage: 41.8+ KB

In [5]:
cars.describe()

Out[5]: car_ID symboling wheelbase carlength carwidth carheight curbweight


count 205.000000 205.000000 205.000000 205.000000 205.000000 205.000000 205.000000
mean 103.000000 0.834146 98.756585 174.049268 65.907805 53.724878 2555.565854
std 59.322565 1.245307 6.021776 12.337289 2.145204 2.443522 520.680204
min 1.000000 -2.000000 86.600000 141.100000 60.300000 47.800000 1488.000000
25% 52.000000 0.000000 94.500000 166.300000 64.100000 52.000000 2145.000000
50% 103.000000 1.000000 97.000000 173.200000 65.500000 54.100000 2414.000000
75% 154.000000 2.000000 102.400000 183.100000 66.900000 55.500000 2935.000000
max 205.000000 3.000000 120.900000 208.100000 72.300000 59.800000 4066.000000

In [6]:
cars['highwaympg'].head(5)

Out[6]: 0 27
1 27
2 26
3 30
4 22
Name: highwaympg, dtype: int64

In [7]:
cars['highwaympg'].nlargest(5)

Out[7]: 30 54
18 53
90 50
159 47
160 47
Name: highwaympg, dtype: int64

In [8]:
cars['highwaympg'].value_counts().head(5)

Out[8]: 25 19
38 17
24 17
30 16
32 16
Name: highwaympg, dtype: int64

In [10]:
cars['price'].tail(5)

Out[10]: 200 16845.0


201 19045.0
202 21485.0
203 22470.0
204 22625.0
Name: price, dtype: float64
In [11]:
cars['price'].value_counts().head(5)

Out[11]: 8916.5 2
16500.0 2
7609.0 2
7898.0 2
6692.0 2
Name: price, dtype: int64

In [12]:
cars['fueltype'].unique()

Out[12]: array(['gas', 'diesel'], dtype=object)

In [13]:
cars['enginelocation'].unique()

Out[13]: array(['front', 'rear'], dtype=object)

In [14]:
cars['brand'] = cars.apply(lambda x: x.CarName.split(' ')[0], axis=1)
cars['name'] = cars.apply(lambda x: ' '.join(x.CarName.split(' ')[1:]), axis=1)

In [15]:
cars['brand'].unique()

Out[15]: array(['alfa-romero', 'audi', 'bmw', 'chevrolet', 'dodge', 'honda',


'isuzu', 'jaguar', 'maxda', 'mazda', 'buick', 'mercury',
'mitsubishi', 'Nissan', 'nissan', 'peugeot', 'plymouth', 'porsche',
'porcshce', 'renault', 'saab', 'subaru', 'toyota', 'toyouta',
'vokswagen', 'volkswagen', 'vw', 'volvo'], dtype=object)

In [16]:
brand_respellings = {'maxda': 'mazda', 'Nissan': 'nissan', 'porcshce': 'porsche'
cars['brand'] = cars['brand'].replace(brand_respellings)

In [17]:
cars['brand'].unique()

Out[17]: array(['alfa-romero', 'audi', 'bmw', 'chevrolet', 'dodge', 'honda',


'isuzu', 'jaguar', 'mazda', 'buick', 'mercury', 'mitsubishi',
'nissan', 'peugeot', 'plymouth', 'porsche', 'renault', 'saab',
'subaru', 'toyota', 'volkswagen', 'volvo'], dtype=object)

In [18]:
cars['name'] = cars['name'].str.replace('|'.join(brand_respellings.keys()), lamb

<ipython-input-18-925c5e646d60>:1: FutureWarning: The default value of regex wil


l change from True to False in a future version.
cars['name'] = cars['name'].str.replace('|'.join(brand_respellings.keys()), la
mbda x: brand_respellings[x.group()])

In [19]:
cars['name'].unique()

Out[19]: array(['giulia', 'stelvio', 'Quadrifoglio', '100 ls', '100ls', 'fox',


'5000', '4000', '5000s (diesel)', '320i', 'x1', 'x3', 'z4', 'x4',
'x5', 'impala', 'monte carlo', 'vega 2300', 'rampage',
'challenger se', 'd200', 'monaco (sw)', 'colt hardtop',
'colt (sw)', 'coronet custom', 'dart custom',
'coronet custom (sw)', 'civic', 'civic cvcc', 'accord cvcc',
'accord lx', 'civic 1500 gl', 'accord', 'civic 1300', 'prelude',
'civic (auto)', 'MU-X', 'D-Max ', 'D-Max V-Cross', 'xj', 'xf',
'xk', 'rx3', 'glc deluxe', 'rx2 coupe', 'rx-4', '626', 'glc',
'rx-7 gs', 'glc 4', 'glc custom l', 'glc custom',
'electra 225 custom', 'century luxus (sw)', 'century', 'skyhawk',
'opel isuzu deluxe', 'skylark', 'century special',
'regal sport coupe (turbo)', 'cougar', 'mirage', 'lancer',
'outlander', 'g4', 'mirage g4', 'montero', 'pajero', 'versa',
'gt-r', 'rogue', 'latio', 'titan', 'leaf', 'juke', 'note',
'clipper', 'nv200', 'dayz', 'fuga', 'otti', 'teana', 'kicks',
'504', '304', '504 (sw)', '604sl', '505s turbo diesel', 'fury iii',
'cricket', 'satellite custom (sw)', 'fury gran sedan', 'valiant',
'duster', 'macan', 'panamera', 'cayenne', 'boxter', '12tl',
'5 gtl', '99e', '99le', '99gle', '', 'dl', 'brz', 'baja', 'r1',
'r2', 'trezia', 'tribeca', 'corona mark ii', 'corona',
'corolla 1200', 'corona hardtop', 'corolla 1600 (sw)', 'carina',
'mark ii', 'corolla', 'corolla liftback', 'celica gt liftback',
'corolla tercel', 'corona liftback', 'starlet', 'tercel',
'cressida', 'celica gt', 'rabbit', '1131 deluxe sedan',
'model 111', 'type 3', '411 (sw)', 'super beetle', 'dasher',
'rabbit custom', '145e (sw)', '144ea', '244dl', '245', '264gl',
'diesel', '246'], dtype=object)

In [20]:
cars.rename(columns={'name':'brandname', 'car_ID':'carid'}, inplace=True)

In [21]:
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 28 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 carid 205 non-null int64
1 symboling 205 non-null int64
2 CarName 205 non-null object
3 fueltype 205 non-null object
4 aspiration 205 non-null object
5 doornumber 205 non-null object
6 carbody 205 non-null object
7 drivewheel 205 non-null object
8 enginelocation 205 non-null object
9 wheelbase 205 non-null float64
10 carlength 205 non-null float64
11 carwidth 205 non-null float64
12 carheight 205 non-null float64
13 curbweight 205 non-null int64
14 enginetype 205 non-null object
15 cylindernumber 205 non-null object
16 enginesize 205 non-null int64
17 fuelsystem 205 non-null object
18 boreratio 205 non-null float64
19 stroke 205 non-null float64
20 compressionratio 205 non-null float64
21 horsepower 205 non-null int64
22 peakrpm 205 non-null int64
23 citympg 205 non-null int64
24 highwaympg 205 non-null int64
25 price 205 non-null float64
26 brand 205 non-null object
27 brandname 205 non-null object
dtypes: float64(8), int64(8), object(12)
memory usage: 45.0+ KB

In [23]:
dropped_columns = ['carid', 'symboling']
cars.drop(columns=dropped_columns, inplace=True)

In [24]:
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 205 entries, 0 to 204
Data columns (total 26 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CarName 205 non-null object
1 fueltype 205 non-null object
2 aspiration 205 non-null object
3 doornumber 205 non-null object
4 carbody 205 non-null object
5 drivewheel 205 non-null object
6 enginelocation 205 non-null object
7 wheelbase 205 non-null float64
8 carlength 205 non-null float64
9 carwidth 205 non-null float64
10 carheight 205 non-null float64
11 curbweight 205 non-null int64
12 enginetype 205 non-null object
13 cylindernumber 205 non-null object
14 enginesize 205 non-null int64
15 fuelsystem 205 non-null object
16 boreratio 205 non-null float64
17 stroke 205 non-null float64
18 compressionratio 205 non-null float64
19 horsepower 205 non-null int64
20 peakrpm 205 non-null int64
21 citympg 205 non-null int64
22 highwaympg 205 non-null int64
23 price 205 non-null float64
24 brand 205 non-null object
25 brandname 205 non-null object
dtypes: float64(8), int64(6), object(12)
memory usage: 41.8+ KB

In [ ]:

You might also like