0% found this document useful (0 votes)
5 views18 pages

Pandas 2

The document provides a Python script using the pandas library to read and analyze an automobile dataset from a CSV file. It includes commands to display the first few records, check for missing values, and summarize the dataset's statistics. The dataset consists of 397 entries with various attributes related to automobiles, such as miles per gallon, weight, and horsepower.

Uploaded by

praveen838307
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

Pandas 2

The document provides a Python script using the pandas library to read and analyze an automobile dataset from a CSV file. It includes commands to display the first few records, check for missing values, and summarize the dataset's statistics. The dataset consists of 397 entries with various attributes related to automobiles, such as miles per gallon, weight, and horsepower.

Uploaded by

praveen838307
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

pandas-2

April 21, 2025

[1]: import pandas as pd

[2]: pwd # current working directory ( the folder where code is present )

[2]: 'C:\\Users\\admin\\2802'

[ ]: # ensure that to keep data set in the above folder

[3]: df1 = pd.read_csv('Auto.csv')


df1.head()

[3]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70
2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino

[ ]: # these steps only work in local machine ( not for google colab )

[10]: r"C\Users" # r refers to raw string

[10]: 'C\\Users'

[4]: # r indicates raw string


df1 = pd.read_csv(r'C:\Users\admin\Downloads\Dataset\Auto.csv')
df1.head() # top 5 records in the data set

[4]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70

1
1 15.0 8.0 350.0 165 3693 11.5 70
2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino

[5]: df1.head(7) # top 7 records

[5]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70
2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70
5 NaN 8.0 NaN 198 4341 10.0 70
6 NaN 8.0 NaN 220 4354 9.0 70

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino
5 1 ford galaxie 500
6 1 chevrolet impala

[6]: df1.tail() # bottom 5 records

[6]: mpg cylinders displacement Horse Power weight acceleration year \


392 27.0 4.0 140.0 86 2790 15.6 82
393 44.0 4.0 97.0 52 2130 24.6 82
394 32.0 4.0 135.0 84 2295 11.6 82
395 28.0 4.0 120.0 79 2625 18.6 82
396 31.0 4.0 119.0 82 2720 19.4 82

origin name
392 1 ford mustang gl
393 2 vw pickup
394 1 dodge rampage
395 1 ford ranger
396 1 chevy s-10

2
[7]: df1.tail(2)

[7]: mpg cylinders displacement Horse Power weight acceleration year \


395 28.0 4.0 120.0 79 2625 18.6 82
396 31.0 4.0 119.0 82 2720 19.4 82

origin name
395 1 ford ranger
396 1 chevy s-10

[11]: df1.shape # no.of rows , columns

[11]: (397, 9)

[12]: df1.head(2)

[12]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320

[ ]: # Attribute Error
# Case sensistive
# spelling mistake
# the function doesnt exist

[13]: df1.dtypes # returns the data type of individual column # numbers : int / float␣
↪, character/string : object

[13]: mpg float64


cylinders float64
displacement float64
Horse Power object
weight int64
acceleration float64
year int64
origin int64
name object
dtype: object

[14]: df1.head()

[14]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70

3
1 15.0 8.0 350.0 165 3693 11.5 70
2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino

[16]: df1.isnull().sum()
# it will return no.of times True appearing

[16]: mpg 5
cylinders 3
displacement 5
Horse Power 0
weight 0
acceleration 0
year 0
origin 0
name 0
dtype: int64

[17]: df1.columns

[17]: Index(['mpg', 'cylinders', 'displacement', 'Horse Power', 'weight',


'acceleration', 'year', 'origin', 'name'],
dtype='object')

[18]: df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 397 entries, 0 to 396
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 mpg 392 non-null float64
1 cylinders 394 non-null float64
2 displacement 392 non-null float64
3 Horse Power 397 non-null object
4 weight 397 non-null int64
5 acceleration 397 non-null float64
6 year 397 non-null int64
7 origin 397 non-null int64

4
8 name 397 non-null object
dtypes: float64(4), int64(3), object(2)
memory usage: 28.0+ KB

[19]: df1.describe() # it will work on float / int data types

[19]: mpg cylinders displacement weight acceleration \


count 392.000000 394.000000 392.000000 397.000000 397.000000
mean 23.611735 5.439086 191.080357 2970.261965 15.555668
std 7.827466 1.693452 102.452019 847.904119 2.749995
min 9.000000 3.000000 68.000000 1613.000000 8.000000
25% 17.500000 4.000000 100.750000 2223.000000 13.800000
50% 23.000000 4.000000 145.500000 2800.000000 15.500000
75% 29.000000 8.000000 260.000000 3609.000000 17.100000
max 46.600000 8.000000 455.000000 5140.000000 24.800000

year origin
count 397.000000 397.000000
mean 75.994962 1.574307
std 3.690005 0.802549
min 70.000000 1.000000
25% 73.000000 1.000000
50% 76.000000 1.000000
75% 79.000000 2.000000
max 82.000000 3.000000

[20]: df1['mpg']

[20]: 0 18.0
1 15.0
2 NaN
3 NaN
4 NaN

392 27.0
393 44.0
394 32.0
395 28.0
396 31.0
Name: mpg, Length: 397, dtype: float64

[21]: df1[['mpg','weight','acceleration']] # pass the column names in list

[21]: mpg weight acceleration


0 18.0 3504 12.0
1 15.0 3693 11.5
2 NaN 3436 11.0

5
3 NaN 3433 12.0
4 NaN 3449 10.5
.. … … …
392 27.0 2790 15.6
393 44.0 2130 24.6
394 32.0 2295 11.6
395 28.0 2625 18.6
396 31.0 2720 19.4

[397 rows x 3 columns]

[ ]: # min,max

[ ]: # outer square bracket --> taking subset


# inner square bracket --> multiple values

[22]: df1[['mpg','weight','acceleration']].mean() # mean = average

[22]: mpg 23.611735


weight 2970.261965
acceleration 15.555668
dtype: float64

[24]: df1.min()

[24]: mpg 9.0


cylinders 3.0
displacement 68.0
Horse Power 100
weight 1613
acceleration 8.0
year 70
origin 1
name amc ambassador brougham
dtype: object

[25]: df1['mpg'].min()

[25]: 9.0

[23]: df1.describe()

[23]: mpg cylinders displacement weight acceleration \


count 392.000000 394.000000 392.000000 397.000000 397.000000
mean 23.611735 5.439086 191.080357 2970.261965 15.555668
std 7.827466 1.693452 102.452019 847.904119 2.749995
min 9.000000 3.000000 68.000000 1613.000000 8.000000

6
25% 17.500000 4.000000 100.750000 2223.000000 13.800000
50% 23.000000 4.000000 145.500000 2800.000000 15.500000
75% 29.000000 8.000000 260.000000 3609.000000 17.100000
max 46.600000 8.000000 455.000000 5140.000000 24.800000

year origin
count 397.000000 397.000000
mean 75.994962 1.574307
std 3.690005 0.802549
min 70.000000 1.000000
25% 73.000000 1.000000
50% 76.000000 1.000000
75% 79.000000 2.000000
max 82.000000 3.000000

[ ]:

create a new column in dataset


[26]: df1.head(2)

[26]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320

[ ]: # if you want to create a new column which should consists of only value 2

[27]: df1['col1'] = 2

[30]: df1.head(2)

[30]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70

origin name col1


0 1 chevrolet chevelle malibu 2
1 1 buick skylark 320 2

[31]: # all arithimatic operations can be performed


df1['col2'] = df1['acceleration'] + df1['origin']

[32]: df1.head(2)

7
[32]: mpg cylinders displacement Horse Power weight acceleration year \
0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70

origin name col1 col2


0 1 chevrolet chevelle malibu 2 13.0
1 1 buick skylark 320 2 12.5

[ ]: # describe results will change when you perform any operation on that column (␣
↪adding new rows / deleting rows )

[36]: df1['col3'] = df1['mpg'].isnull()

[37]: df1.head(5)

[37]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70
2 NaN 8.0 318.0 150 3436 11.0 70
3 NaN 8.0 NaN 150 3433 12.0 70
4 NaN 8.0 NaN 140 3449 10.5 70

origin name col1 col2 col3


0 1 chevrolet chevelle malibu 2 13.0 False
1 1 buick skylark 320 2 12.5 False
2 1 plymouth satellite 2 12.0 True
3 1 amc rebel sst 2 13.0 True
4 1 ford torino 2 11.5 True

[33]: 1+2+3+4+5/5

[33]: 11.0

[34]: 1+2+3+4/4

[34]: 7.0

[ ]:

Data Cleaning
[ ]: # soft conversion
# int --> float , float --> int

[38]: df1.head(2)

8
[38]: mpg cylinders displacement Horse Power weight acceleration year \
0 18.0 8.0 307.0 130 3504 12.0 70
1 15.0 8.0 350.0 165 3693 11.5 70

origin name col1 col2 col3


0 1 chevrolet chevelle malibu 2 13.0 False
1 1 buick skylark 320 2 12.5 False

[40]: df1['year'] = df1['year'].astype(float)

[41]: df1.head()

[41]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130 3504 12.0 70.0
1 15.0 8.0 350.0 165 3693 11.5 70.0
2 NaN 8.0 318.0 150 3436 11.0 70.0
3 NaN 8.0 NaN 150 3433 12.0 70.0
4 NaN 8.0 NaN 140 3449 10.5 70.0

origin name col1 col2 col3


0 1 chevrolet chevelle malibu 2 13.0 False
1 1 buick skylark 320 2 12.5 False
2 1 plymouth satellite 2 12.0 True
3 1 amc rebel sst 2 13.0 True
4 1 ford torino 2 11.5 True

[42]: df1['Horse Power'].astype(int)

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[42], line 1
----> 1 df1['Horse Power'].astype(int)

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:6643, in NDFrame.


↪astype(self, dtype, copy, errors)

6637 results = [
6638 ser.astype(dtype, copy=copy, errors=errors) for _, ser in self.
↪items()

6639 ]
6641 else:
6642 # else, only a single dtype is given
-> 6643 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6644 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6645 return res.__finalize__(self, method="astype")

File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:430, in␣


↪BaseBlockManager.astype(self, dtype, copy, errors)

9
427 elif using_copy_on_write():
428 copy = False
--> 430 return self.apply(
431 "astype",
432 dtype=dtype,
433 copy=copy,
434 errors=errors,
435 using_cow=using_copy_on_write(),
436 )

File ~\anaconda3\Lib\site-packages\pandas\core\internals\managers.py:363, in␣


↪BaseBlockManager.apply(self, f, align_keys, **kwargs)

361 applied = b.apply(f, **kwargs)


362 else:
--> 363 applied = getattr(b, f)(**kwargs)
364 result_blocks = extend_blocks(applied, result_blocks)
366 out = type(self).from_blocks(result_blocks, self.axes)

File ~\anaconda3\Lib\site-packages\pandas\core\internals\blocks.py:758, in Block.


↪astype(self, dtype, copy, errors, using_cow, squeeze)

755 raise ValueError("Can not squeeze with more than one column.")
756 values = values[0, :] # type: ignore[call-overload]
--> 758 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
760 new_values = maybe_coerce_values(new_values)
762 refs = None

File ~\anaconda3\Lib\site-packages\pandas\core\dtypes\astype.py:237, in␣


↪astype_array_safe(values, dtype, copy, errors)

234 dtype = dtype.numpy_dtype


236 try:
--> 237 new_values = astype_array(values, dtype, copy=copy)
238 except (ValueError, TypeError):
239 # e.g. _astype_nansafe can fail on object-dtype of strings
240 # trying to convert to float
241 if errors == "ignore":

File ~\anaconda3\Lib\site-packages\pandas\core\dtypes\astype.py:182, in␣


↪astype_array(values, dtype, copy)

179 values = values.astype(dtype, copy=copy)


181 else:
--> 182 values = _astype_nansafe(values, dtype, copy=copy)
184 # in pandas we don't store numpy str dtypes, so convert to object
185 if isinstance(dtype, np.dtype) and issubclass(values.dtype.type, str):

File ~\anaconda3\Lib\site-packages\pandas\core\dtypes\astype.py:133, in␣


↪_astype_nansafe(arr, dtype, copy, skipna)

129 raise ValueError(msg)


131 if copy or arr.dtype == object or dtype == object:

10
132 # Explicit copy, or required since NumPy can't view from / to object.
--> 133 return arr.astype(dtype, copy=True)
135 return arr.astype(dtype, copy=copy)

ValueError: invalid literal for int() with base 10: '?'

[ ]: # in my data of horse power i have question marks

[43]: int('?')

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[43], line 1
----> 1 int('?')

ValueError: invalid literal for int() with base 10: '?'

[44]: int('a')

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[44], line 1
----> 1 int('a')

ValueError: invalid literal for int() with base 10: 'a'

[ ]: # to convert horse power into integer we need to remove strings

[46]: df1['Horse Power'].isnull().sum()

[46]: 0

[51]: df1.head(2)

[51]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130.0 3504 12.0 70.0
1 15.0 8.0 350.0 165.0 3693 11.5 70.0

origin name col1 col2 col3


0 1 chevrolet chevelle malibu 2 13.0 False
1 1 buick skylark 320 2 12.5 False

[52]: int('130')

[52]: 130

11
[ ]: # int('?') --> converting these strings into null

[48]: df1['Horse Power'] = pd.to_numeric(df1['Horse Power'],errors = 'coerce')


# errors = coerce -->it trys to convert each row into int and if the conversion␣
↪is not possible it will return null

[49]: df1['Horse Power'].isnull().sum() # 5 strings has been removed

[49]: 5

[50]: df1['Horse Power'].dtype

[50]: dtype('float64')

[ ]:

[53]: df1.isnull().sum()

[53]: mpg 5
cylinders 3
displacement 5
Horse Power 5
weight 0
acceleration 0
year 0
origin 0
name 0
col1 0
col2 0
col3 0
dtype: int64

[ ]: # missing values --> drop the records with missing values


# replace missing vlaues

[ ]: # 397 rows

[54]: df1.dropna()

[54]: mpg cylinders displacement Horse Power weight acceleration year \


0 18.0 8.0 307.0 130.0 3504 12.0 70.0
1 15.0 8.0 350.0 165.0 3693 11.5 70.0
8 14.0 8.0 455.0 225.0 4425 10.0 70.0
9 15.0 8.0 390.0 190.0 3850 8.5 70.0
10 15.0 8.0 383.0 170.0 3563 10.0 70.0
.. … … … … … … …
392 27.0 4.0 140.0 86.0 2790 15.6 82.0

12
393 44.0 4.0 97.0 52.0 2130 24.6 82.0
394 32.0 4.0 135.0 84.0 2295 11.6 82.0
395 28.0 4.0 120.0 79.0 2625 18.6 82.0
396 31.0 4.0 119.0 82.0 2720 19.4 82.0

origin name col1 col2 col3


0 1 chevrolet chevelle malibu 2 13.0 False
1 1 buick skylark 320 2 12.5 False
8 1 pontiac catalina 2 11.0 False
9 1 amc ambassador dpl 2 9.5 False
10 1 dodge challenger se 2 11.0 False
.. … … … … …
392 1 ford mustang gl 2 16.6 False
393 2 vw pickup 2 26.6 False
394 1 dodge rampage 2 12.6 False
395 1 ford ranger 2 19.6 False
396 1 chevy s-10 2 20.4 False

[383 rows x 12 columns]

[55]: df1['mpg'].mean() # ideal way of delaing with null values is to replace them␣
↪with average

[55]: 23.611734693877548

[56]: df1['mpg'] = df1['mpg'].fillna(23.61) # inside the brackets mention the value␣


↪which should be used in the place NaN

[57]: df1.isnull().sum()

[57]: mpg 0
cylinders 3
displacement 5
Horse Power 5
weight 0
acceleration 0
year 0
origin 0
name 0
col1 0
col2 0
col3 0
dtype: int64

[ ]:

13
[58]: df1.rename({'Horse Power': 'horse_power'}, axis = 1,inplace = True)
# axis = 1--> columns --> look for Horse Power in column name
# inplace = True --> commit your changes --> make the change permanent

[59]: df1.head()

[59]: mpg cylinders displacement horse_power weight acceleration year \


0 18.00 8.0 307.0 130.0 3504 12.0 70.0
1 15.00 8.0 350.0 165.0 3693 11.5 70.0
2 23.61 8.0 318.0 150.0 3436 11.0 70.0
3 23.61 8.0 NaN 150.0 3433 12.0 70.0
4 23.61 8.0 NaN 140.0 3449 10.5 70.0

origin name col1 col2 col3


0 1 chevrolet chevelle malibu 2 13.0 False
1 1 buick skylark 320 2 12.5 False
2 1 plymouth satellite 2 12.0 True
3 1 amc rebel sst 2 13.0 True
4 1 ford torino 2 11.5 True

[60]: df1.drop(['col1','col2','col3'],axis = 1,inplace = True)

[61]: df1.head()

[61]: mpg cylinders displacement horse_power weight acceleration year \


0 18.00 8.0 307.0 130.0 3504 12.0 70.0
1 15.00 8.0 350.0 165.0 3693 11.5 70.0
2 23.61 8.0 318.0 150.0 3436 11.0 70.0
3 23.61 8.0 NaN 150.0 3433 12.0 70.0
4 23.61 8.0 NaN 140.0 3449 10.5 70.0

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino

[ ]:

[ ]: # take subset in data set

[62]: df1.head(10)

[62]: mpg cylinders displacement horse_power weight acceleration year \


0 18.00 8.0 307.0 130.0 3504 12.0 70.0
1 15.00 8.0 350.0 165.0 3693 11.5 70.0

14
2 23.61 8.0 318.0 150.0 3436 11.0 70.0
3 23.61 8.0 NaN 150.0 3433 12.0 70.0
4 23.61 8.0 NaN 140.0 3449 10.5 70.0
5 23.61 8.0 NaN 198.0 4341 10.0 70.0
6 23.61 8.0 NaN 220.0 4354 9.0 70.0
7 14.00 8.0 NaN 215.0 4312 8.5 70.0
8 14.00 8.0 455.0 225.0 4425 10.0 70.0
9 15.00 8.0 390.0 190.0 3850 8.5 70.0

origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino
5 1 ford galaxie 500
6 1 chevrolet impala
7 1 plymouth fury iii
8 1 pontiac catalina
9 1 amc ambassador dpl

[65]: df1.iloc[1:9,2:7] # iloc --> locate the data using index

[65]: displacement horse_power weight acceleration year


1 350.0 165.0 3693 11.5 70.0
2 318.0 150.0 3436 11.0 70.0
3 NaN 150.0 3433 12.0 70.0
4 NaN 140.0 3449 10.5 70.0
5 NaN 198.0 4341 10.0 70.0
6 NaN 220.0 4354 9.0 70.0
7 NaN 215.0 4312 8.5 70.0
8 455.0 225.0 4425 10.0 70.0

[ ]: # loc : locate data using names


# end is included

[66]: df1.head(5)

[66]: mpg cylinders displacement horse_power weight acceleration year \


0 18.00 8.0 307.0 130.0 3504 12.0 70.0
1 15.00 8.0 350.0 165.0 3693 11.5 70.0
2 23.61 8.0 318.0 150.0 3436 11.0 70.0
3 23.61 8.0 NaN 150.0 3433 12.0 70.0
4 23.61 8.0 NaN 140.0 3449 10.5 70.0

origin name
0 1 chevrolet chevelle malibu

15
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino

[ ]: # rows : numbers will be treated as names


# columns : mpg, cylinder.. name

[67]: df1.loc[1:4,'mpg': 'weight']

[67]: mpg cylinders displacement horse_power weight


1 15.00 8.0 350.0 165.0 3693
2 23.61 8.0 318.0 150.0 3436
3 23.61 8.0 NaN 150.0 3433
4 23.61 8.0 NaN 140.0 3449

[68]: df1.loc[1:4,'mpg']

[68]: 1 15.00
2 23.61
3 23.61
4 23.61
Name: mpg, dtype: float64

[69]: df1.loc[1:4,['mpg','weight']]

[69]: mpg weight


1 15.00 3693
2 23.61 3436
3 23.61 3433
4 23.61 3449

[ ]:

[70]: def sqr_num(n):


return n**2

[71]: sqr_num(4)

[71]: 16

[72]: df1.head(2)

[72]: mpg cylinders displacement horse_power weight acceleration year \


0 18.0 8.0 307.0 130.0 3504 12.0 70.0
1 15.0 8.0 350.0 165.0 3693 11.5 70.0

16
origin name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320

[73]: df1['col1'] = df1['acceleration'].apply(sqr_num)


df1.head(2)

[73]: mpg cylinders displacement horse_power weight acceleration year \


0 18.0 8.0 307.0 130.0 3504 12.0 70.0
1 15.0 8.0 350.0 165.0 3693 11.5 70.0

origin name col1


0 1 chevrolet chevelle malibu 144.00
1 1 buick skylark 320 132.25

[74]: df1['col2'] = df1['acceleration'].apply(lambda n: n**2)


df1.head(2)

[74]: mpg cylinders displacement horse_power weight acceleration year \


0 18.0 8.0 307.0 130.0 3504 12.0 70.0
1 15.0 8.0 350.0 165.0 3693 11.5 70.0

origin name col1 col2


0 1 chevrolet chevelle malibu 144.00 144.00
1 1 buick skylark 320 132.25 132.25

[76]: 10%2 #--> remainder is zero

[76]: 0

[75]: # pick first value from acceleration and store it in n


# check this condition n%2 == 0 --> if true return 'even' else 'odd'
# store the result in col3 and repeat the same process till last row
df1['col3'] = df1['acceleration'].apply(lambda n: 'Even' if n%2 == 0 else␣
↪'Odd')

df1.head(4)

[75]: mpg cylinders displacement horse_power weight acceleration year \


0 18.00 8.0 307.0 130.0 3504 12.0 70.0
1 15.00 8.0 350.0 165.0 3693 11.5 70.0
2 23.61 8.0 318.0 150.0 3436 11.0 70.0
3 23.61 8.0 NaN 150.0 3433 12.0 70.0

origin name col1 col2 col3


0 1 chevrolet chevelle malibu 144.00 144.00 Even
1 1 buick skylark 320 132.25 132.25 Odd
2 1 plymouth satellite 121.00 121.00 Odd

17
3 1 amc rebel sst 144.00 144.00 Even

[81]: df1.head(5)

[81]: mpg cylinders displacement horse_power weight acceleration year \


0 18.00 8.0 307.0 130.0 3504 12.0 70.0
1 15.00 8.0 350.0 165.0 3693 11.5 70.0
2 23.61 8.0 318.0 150.0 3436 11.0 70.0
3 23.61 8.0 NaN 150.0 3433 12.0 70.0
4 23.61 8.0 NaN 140.0 3449 10.5 70.0

origin name col1 col2 col3 col4


0 1 chevrolet chevelle malibu 144.00 144.00 Even NaN
1 1 buick skylark 320 132.25 132.25 Odd NaN
2 1 plymouth satellite 121.00 121.00 Odd NaN
3 1 amc rebel sst 144.00 144.00 Even NaN
4 1 ford torino 110.25 110.25 Odd NaN

18

You might also like