0% found this document useful (0 votes)
6 views20 pages

pandas.py

The document demonstrates the use of pandas and numpy libraries in Python for data manipulation and analysis. It includes creating DataFrames, reading from and writing to CSV files, and performing basic operations like describing data, indexing, and modifying values. Additionally, it showcases handling of large datasets and provides examples of generating random data.

Uploaded by

vinaysikarwar199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views20 pages

pandas.py

The document demonstrates the use of pandas and numpy libraries in Python for data manipulation and analysis. It includes creating DataFrames, reading from and writing to CSV files, and performing basic operations like describing data, indexing, and modifying values. Additionally, it showcases handling of large datasets and provides examples of generating random data.

Uploaded by

vinaysikarwar199
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

In [1]: import numpy as np

import pandas as pd

In [2]: dict1 = {
"name":['harry', 'rohan','skillf','shubh'],
"marks":[92,34,24,17],
"city":['rampur','kolkata','barelly','antarctica']
}

In [3]: df = pd.DataFrame(dict1)

In [4]: df

Out[4]: name marks city

0 harry 92 rampur

1 rohan 34 kolkata

2 skillf 24 barelly

3 shubh 17 antarctica

In [5]: df.to_csv('friends.csv')

In [6]: df.to_csv('friends_index_false.csv ', index = False)

In [7]: # if we have millions of lines in data

In [8]: df.head(2)

Out[8]: name marks city

0 harry 92 rampur

1 rohan 34 kolkata

In [9]: df.tail(2)

Out[9]: name marks city

2 skillf 24 barelly

3 shubh 17 antarctica

In [10]: df.describe()

Loading [MathJax]/extensions/Safe.js
Out[10]: marks

count 4.00000

mean 41.75000

std 34.21866

min 17.00000

25% 22.25000

50% 29.00000

75% 48.50000

max 92.00000

In [11]: vinay = pd.read_csv('vinay.csv') # to read data

In [12]: vinay

Out[12]: Unnamed: 0.1 Unnamed: 0 train no. speed city

0 0 0 1521644 50 rampur

1 1 1 24165 34 kolkata

2 2 2 54876 24 barelly

3 3 3 5157 17 antarctica

In [13]: vinay['speed'][0] = 50

C:\Users\vinay\AppData\Local\Temp\ipykernel_12824\473427975.py:1: SettingWithCopyWarnin
g:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_


guide/indexing.html#returning-a-view-versus-a-copy
vinay['speed'][0] = 50

In [14]: vinay

Out[14]: Unnamed: 0.1 Unnamed: 0 train no. speed city

0 0 0 1521644 50 rampur

1 1 1 24165 34 kolkata

2 2 2 54876 24 barelly

3 3 3 5157 17 antarctica

In [15]: vinay.to_csv('vinay.csv')

In [16]: vinay.index = ['first','second','third','fourth']

In [17]: vinay

Loading [MathJax]/extensions/Safe.js
Out[17]: Unnamed: 0.1 Unnamed: 0 train no. speed city

first 0 0 1521644 50 rampur

second 1 1 24165 34 kolkata

third 2 2 54876 24 barelly

fourth 3 3 5157 17 antarctica

In [18]: ser = pd.Series(np.random.rand(34))

In [19]: type(ser)

pandas.core.series.Series
Out[19]:

In [20]: newdf = pd.DataFrame(np.random.rand(334,5), index=np.arange(334))

In [21]: newdf.head()

Out[21]: 0 1 2 3 4

0 0.192439 0.483302 0.182232 0.109495 0.346556

1 0.072344 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

In [22]: type(newdf)

pandas.core.frame.DataFrame
Out[22]:

In [23]: newdf.describe()

Out[23]: 0 1 2 3 4

count 334.000000 334.000000 334.000000 334.000000 334.000000

mean 0.511220 0.502170 0.515231 0.514727 0.501599

std 0.289863 0.280761 0.293557 0.272481 0.291661

min 0.009534 0.000230 0.009467 0.004386 0.002222

25% 0.275510 0.284415 0.254842 0.301877 0.229439

50% 0.526640 0.513685 0.526909 0.508590 0.516764

75% 0.767359 0.739335 0.781234 0.742986 0.753503

max 0.997394 0.996811 0.999285 0.998439 0.999803

In [24]: newdf.dtypes

0 float64
Out[24]:
1 float64
2 float64
3 float64
4 float64
dtype: object

In [25]: newdf[0][1] = 'vinay'


Loading [MathJax]/extensions/Safe.js
C:\Users\vinay\AppData\Local\Temp\ipykernel_12824\4287450646.py:1: FutureWarning: Settin
g an item of incompatible dtype is deprecated and will raise in a future error of panda
s. Value 'vinay' has dtype incompatible with float64, please explicitly cast to a compat
ible dtype first.
newdf[0][1] = 'vinay'

In [26]: newdf.head()

Out[26]: 0 1 2 3 4

0 0.192439 0.483302 0.182232 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

In [27]: newdf.index

Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
Out[27]:
...
324, 325, 326, 327, 328, 329, 330, 331, 332, 333],
dtype='int32', length=334)

In [28]: newdf.columns

RangeIndex(start=0, stop=5, step=1)


Out[28]:

In [29]: newdf.to_numpy()

array([[0.19243897629678863, 0.4833016054951558, 0.18223248119149482,


Out[29]:
0.10949487441382522, 0.346555717762674],
['vinay', 0.358511057712223, 0.8361359540599419,
0.38920071958360003, 0.6622558371512339],
[0.3511260190014004, 0.4535179465768121, 0.5329625751629071,
0.8060513324243946, 0.8801421656725747],
...,
[0.2833121022519195, 0.8041833005905062, 0.30184328883816447,
0.33450997341497823, 0.09415712001759435],
[0.6543592257723887, 0.5571194761629852, 0.24589863402724477,
0.9873811670345046, 0.7192368401412679],
[0.6643166221995344, 0.725229517706132, 0.19252707794502544,
0.38162343584405134, 0.4854364965153011]], dtype=object)

In [30]: newdf[0][0]= 0.3

In [31]: newdf.head()

Out[31]: 0 1 2 3 4

0 0.3 0.483302 0.182232 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

In [32]: newdf.T

Loading [MathJax]/extensions/Safe.js
Out[32]: 0 1 2 3 4 5 6 7 8 9 ...

0 0.3 vinay 0.351126 0.808912 0.121119 0.541671 0.810778 0.013301 0.970215 0.834933 ... 0.443

1 0.483302 0.358511 0.453518 0.194086 0.840377 0.332581 0.49378 0.546343 0.357016 0.844727 ... 0.215

2 0.182232 0.836136 0.532963 0.2441 0.933503 0.743576 0.173255 0.78586 0.456049 0.842426 ... 0.821

3 0.109495 0.389201 0.806051 0.224745 0.33241 0.498823 0.027296 0.580119 0.22295 0.937127 ... 0.761

4 0.346556 0.662256 0.880142 0.603455 0.57951 0.498658 0.963489 0.033478 0.524955 0.784691 ... 0.611

5 rows × 334 columns

In [33]: newdf.head()

Out[33]: 0 1 2 3 4

0 0.3 0.483302 0.182232 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

In [34]: newdf.sort_index(axis=0, ascending=False)

Out[34]: 0 1 2 3 4

333 0.664317 0.725230 0.192527 0.381623 0.485436

332 0.654359 0.557119 0.245899 0.987381 0.719237

331 0.283312 0.804183 0.301843 0.334510 0.094157

330 0.168163 0.853079 0.751411 0.833227 0.176438

329 0.759106 0.047294 0.450999 0.568085 0.224133

... ... ... ... ... ...

4 0.121119 0.840377 0.933503 0.332410 0.579510

3 0.808912 0.194086 0.244100 0.224745 0.603455

2 0.351126 0.453518 0.532963 0.806051 0.880142

1 vinay 0.358511 0.836136 0.389201 0.662256

0 0.3 0.483302 0.182232 0.109495 0.346556

334 rows × 5 columns

In [35]: newdf.head()

Out[35]: 0 1 2 3 4

0 0.3 0.483302 0.182232 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

Loading [MathJax]/extensions/Safe.js
In [36]: type(newdf[0])

pandas.core.series.Series
Out[36]:

In [37]: newdf2 = newdf #Newdf2 is only a view , will not copy

In [38]: newdf2[0][0]= 5498

In [39]: newdf

Out[39]: 0 1 2 3 4

0 5498 0.483302 0.182232 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

... ... ... ... ... ...

329 0.759106 0.047294 0.450999 0.568085 0.224133

330 0.168163 0.853079 0.751411 0.833227 0.176438

331 0.283312 0.804183 0.301843 0.334510 0.094157

332 0.654359 0.557119 0.245899 0.987381 0.719237

333 0.664317 0.725230 0.192527 0.381623 0.485436

334 rows × 5 columns

In [40]: # to copy

In [41]: newdf2 = newdf.copy()

In [42]: newdf2[0][0] = 2

C:\Users\vinay\AppData\Local\Temp\ipykernel_12824\2252306501.py:1: SettingWithCopyWarnin
g:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_


guide/indexing.html#returning-a-view-versus-a-copy
newdf2[0][0] = 2

In [43]: newdf

Loading [MathJax]/extensions/Safe.js
Out[43]: 0 1 2 3 4

0 5498 0.483302 0.182232 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

... ... ... ... ... ...

329 0.759106 0.047294 0.450999 0.568085 0.224133

330 0.168163 0.853079 0.751411 0.833227 0.176438

331 0.283312 0.804183 0.301843 0.334510 0.094157

332 0.654359 0.557119 0.245899 0.987381 0.719237

333 0.664317 0.725230 0.192527 0.381623 0.485436

334 rows × 5 columns

In [44]: newdf.loc[0,2] = 654

In [45]: newdf.head(3)

Out[45]: 0 1 2 3 4

0 5498 0.483302 654.000000 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

In [46]: newdf.columns = list('ABCDE')

In [47]: newdf.head()

Out[47]: A B C D E

0 5498 0.483302 654.000000 0.109495 0.346556

1 vinay 0.358511 0.836136 0.389201 0.662256

2 0.351126 0.453518 0.532963 0.806051 0.880142

3 0.808912 0.194086 0.244100 0.224745 0.603455

4 0.121119 0.840377 0.933503 0.332410 0.579510

In [48]: newdf.loc[0,0] = 654


newdf

Loading [MathJax]/extensions/Safe.js
Out[48]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 vinay 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

3 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

... ... ... ... ... ... ...

329 0.759106 0.047294 0.450999 0.568085 0.224133 NaN

330 0.168163 0.853079 0.751411 0.833227 0.176438 NaN

331 0.283312 0.804183 0.301843 0.334510 0.094157 NaN

332 0.654359 0.557119 0.245899 0.987381 0.719237 NaN

333 0.664317 0.725230 0.192527 0.381623 0.485436 NaN

334 rows × 6 columns

In [49]: newdf.loc[1,'A'] = 654541


newdf

Out[49]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

3 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

... ... ... ... ... ... ...

329 0.759106 0.047294 0.450999 0.568085 0.224133 NaN

330 0.168163 0.853079 0.751411 0.833227 0.176438 NaN

331 0.283312 0.804183 0.301843 0.334510 0.094157 NaN

332 0.654359 0.557119 0.245899 0.987381 0.719237 NaN

333 0.664317 0.725230 0.192527 0.381623 0.485436 NaN

334 rows × 6 columns

In [50]: newdf = newdf.drop(1, axis=1)

Loading [MathJax]/extensions/Safe.js
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[50], line 1
----> 1 newdf = newdf.drop(1, axis=1)

File ~\anaconda3\Lib\site-packages\pandas\core\frame.py:5347, in DataFrame.drop(self, la


bels, axis, index, columns, level, inplace, errors)
5199 def drop(
5200 self,
5201 labels: IndexLabel | None = None,
(...)
5208 errors: IgnoreRaise = "raise",
5209 ) -> DataFrame | None:
5210 """
5211 Drop specified labels from rows or columns.
5212
(...)
5345 weight 1.0 0.8
5346 """
-> 5347 return super().drop(
5348 labels=labels,
5349 axis=axis,
5350 index=index,
5351 columns=columns,
5352 level=level,
5353 inplace=inplace,
5354 errors=errors,
5355 )

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:4711, in NDFrame.drop(self, la


bels, axis, index, columns, level, inplace, errors)
4709 for axis, labels in axes.items():
4710 if labels is not None:
-> 4711 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
4713 if inplace:
4714 self._update_inplace(obj)

File ~\anaconda3\Lib\site-packages\pandas\core\generic.py:4753, in NDFrame._drop_axis(se


lf, labels, axis, level, errors, only_slice)
4751 new_axis = axis.drop(labels, level=level, errors=errors)
4752 else:
-> 4753 new_axis = axis.drop(labels, errors=errors)
4754 indexer = axis.get_indexer(new_axis)
4756 # Case for non-unique axis
4757 else:

File ~\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:6992, in Index.drop(self,


labels, errors)
6990 if mask.any():
6991 if errors != "ignore":
-> 6992 raise KeyError(f"{labels[mask].tolist()} not found in axis")
6993 indexer = indexer[~mask]
6994 return self.delete(indexer)

KeyError: '[1] not found in axis'

In [51]: newdf.head()

Loading [MathJax]/extensions/Safe.js
Out[51]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

3 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

In [52]: newdf.loc[[1,2],['C','D']]

Out[52]: C D

1 0.836136 0.389201

2 0.532963 0.806051

In [53]: newdf.head()

Out[53]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

3 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

In [54]: newdf.loc[[1,2],:]

Out[54]: A B C D E 0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

In [55]: newdf.loc[:,['C','D']]

Loading [MathJax]/extensions/Safe.js
Out[55]: C D

0 654.000000 0.109495

1 0.836136 0.389201

2 0.532963 0.806051

3 0.244100 0.224745

4 0.933503 0.332410

... ... ...

329 0.450999 0.568085

330 0.751411 0.833227

331 0.301843 0.334510

332 0.245899 0.987381

333 0.192527 0.381623

334 rows × 2 columns

In [56]: newdf.loc[(newdf['A']<0.3)]

Out[56]: A B C D E 0

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

7 0.013301 0.546343 0.785860 0.580119 0.033478 NaN

10 0.135702 0.660754 0.382900 0.996195 0.144280 NaN

11 0.119561 0.370444 0.343563 0.792946 0.889031 NaN

17 0.264975 0.796818 0.150061 0.508361 0.895146 NaN

... ... ... ... ... ... ...

322 0.158642 0.768554 0.455983 0.236494 0.321771 NaN

323 0.024951 0.461243 0.380886 0.816249 0.067329 NaN

327 0.2804 0.002557 0.094892 0.759649 0.311843 NaN

330 0.168163 0.853079 0.751411 0.833227 0.176438 NaN

331 0.283312 0.804183 0.301843 0.334510 0.094157 NaN

92 rows × 6 columns

In [57]: newdf.loc[(newdf['A']<0.3) & newdf['C']>0.1]

Loading [MathJax]/extensions/Safe.js
Out[57]: A B C D E 0

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

7 0.013301 0.546343 0.785860 0.580119 0.033478 NaN

10 0.135702 0.660754 0.382900 0.996195 0.144280 NaN

11 0.119561 0.370444 0.343563 0.792946 0.889031 NaN

17 0.264975 0.796818 0.150061 0.508361 0.895146 NaN

... ... ... ... ... ... ...

322 0.158642 0.768554 0.455983 0.236494 0.321771 NaN

323 0.024951 0.461243 0.380886 0.816249 0.067329 NaN

327 0.2804 0.002557 0.094892 0.759649 0.311843 NaN

330 0.168163 0.853079 0.751411 0.833227 0.176438 NaN

331 0.283312 0.804183 0.301843 0.334510 0.094157 NaN

92 rows × 6 columns

In [58]: newdf.head(2)

Out[58]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

In [59]: newdf.iloc[0,4]

0.346555717762674
Out[59]:

In [60]: newdf.iloc[[0,5],[1,2]]

Out[60]: B C

0 0.483302 654.000000

5 0.332581 0.743576

In [61]: newdf.head(3)

Out[61]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

In [62]: newdf.drop([0])

Loading [MathJax]/extensions/Safe.js
Out[62]: A B C D E 0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

3 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

5 0.541671 0.332581 0.743576 0.498823 0.498658 NaN

... ... ... ... ... ... ...

329 0.759106 0.047294 0.450999 0.568085 0.224133 NaN

330 0.168163 0.853079 0.751411 0.833227 0.176438 NaN

331 0.283312 0.804183 0.301843 0.334510 0.094157 NaN

332 0.654359 0.557119 0.245899 0.987381 0.719237 NaN

333 0.664317 0.725230 0.192527 0.381623 0.485436 NaN

333 rows × 6 columns

In [63]: newdf.head(2)

Out[63]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

In [64]: newdf.iloc[0,4]

0.346555717762674
Out[64]:

In [65]: newdf.iloc[[0,1],[1,2]]

Out[65]: B C

0 0.483302 654.000000

1 0.358511 0.836136

In [66]: newdf.head(3)

Out[66]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

In [67]: newdf.drop([0])

Loading [MathJax]/extensions/Safe.js
Out[67]: A B C D E 0

1 654541 0.358511 0.836136 0.389201 0.662256 NaN

2 0.351126 0.453518 0.532963 0.806051 0.880142 NaN

3 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

4 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

5 0.541671 0.332581 0.743576 0.498823 0.498658 NaN

... ... ... ... ... ... ...

329 0.759106 0.047294 0.450999 0.568085 0.224133 NaN

330 0.168163 0.853079 0.751411 0.833227 0.176438 NaN

331 0.283312 0.804183 0.301843 0.334510 0.094157 NaN

332 0.654359 0.557119 0.245899 0.987381 0.719237 NaN

333 0.664317 0.725230 0.192527 0.381623 0.485436 NaN

333 rows × 6 columns

In [69]: newdf.drop(['A','C'],axis=1) # newdf is not affected

Out[69]: B D E 0

0 0.483302 0.109495 0.346556 654.0

1 0.358511 0.389201 0.662256 NaN

2 0.453518 0.806051 0.880142 NaN

3 0.194086 0.224745 0.603455 NaN

4 0.840377 0.332410 0.579510 NaN

... ... ... ... ...

329 0.047294 0.568085 0.224133 NaN

330 0.853079 0.833227 0.176438 NaN

331 0.804183 0.334510 0.094157 NaN

332 0.557119 0.987381 0.719237 NaN

333 0.725230 0.381623 0.485436 NaN

334 rows × 4 columns

In [74]: newdf.drop([1,5], axis=0, inplace= True) # It will delete from newdf


#-> It will return to the newdf

In [75]: newdf.head(3)

Out[75]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

2 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

3 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

In [76]: newdf.reset_index(drop=True, inplace=True)

Loading [MathJax]/extensions/Safe.js
In [77]: newdf.head(3)

Out[77]: A B C D E 0

0 5498 0.483302 654.000000 0.109495 0.346556 654.0

1 0.808912 0.194086 0.244100 0.224745 0.603455 NaN

2 0.121119 0.840377 0.933503 0.332410 0.579510 NaN

In [78]: newdf.loc[:, ['B']]= 5

In [80]: newdf.head()

Out[80]: A B C D E 0

0 5498 5.0 654.000000 0.109495 0.346556 654.0

1 0.808912 5.0 0.244100 0.224745 0.603455 NaN

2 0.121119 5.0 0.933503 0.332410 0.579510 NaN

3 0.810778 5.0 0.173255 0.027296 0.963489 NaN

4 0.970215 5.0 0.456049 0.222950 0.524955 NaN

In [ ]:

In [ ]:

In [ ]:

NUMPY
In [81]: import numpy as np

In [91]: myarr = np.array([[14,6,32,7]], np.int8)


myarr

# By np.int_size we define the or set the limit how much we want the size it may be 8,32

array([[14, 6, 32, 7]], dtype=int8)


Out[91]:

In [92]: myarr.shape

(1, 4)
Out[92]:

In [93]: myarr.dtype

dtype('int8')
Out[93]:

In [94]: myarr[0,1]

6
Out[94]:

In [95]: myarr[0,1] =45


myarr

Loading [MathJax]/extensions/Safe.js
array([[14, 45, 32, 7]], dtype=int8)
Out[95]:

Array creation: Conversion from other python structures


In [96]: listarry = np.array([[1,2,3],[8,6,4],[2,6,7]])

In [97]: listarry

array([[1, 2, 3],
Out[97]:
[8, 6, 4],
[2, 6, 7]])

In [99]: listarry.shape

(3, 3)
Out[99]:

In [100… listarry.size

9
Out[100]:

In [102… zeros = np.zeros((2,5))

In [103… zeros

array([[0., 0., 0., 0., 0.],


Out[103]:
[0., 0., 0., 0., 0.]])

In [105… rng = np.arange(15)


rng

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])


Out[105]:

In [109… ispace = np.linspace(1,5,9)


ispace

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])


Out[109]:

In [112… emp = np.empty((4,6))


emp

array([[6.23042070e-307, 1.86918699e-306, 1.69121096e-306,


Out[112]:
1.33511562e-306, 7.56587585e-307, 1.12503450e-311],
[0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000],
[0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
nan, 0.00000000e+000, 0.00000000e+000],
[0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 8.34451715e-308, 2.22507386e-306]])

In [114… emp_like = np.empty_like(ispace)


emp_like

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])


Out[114]:

In [116… ide = np.identity(45)


ide

Loading [MathJax]/extensions/Safe.js
array([[1., 0., 0., ..., 0., 0., 0.],
Out[116]:
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 1., 0.],
[0., 0., 0., ..., 0., 0., 1.]])

In [117… ide.shape

(45, 45)
Out[117]:

In [119… arr = np.arange(99)


arr

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,


Out[119]:
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98])

In [120… arr.reshape(3,33)

array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,


Out[120]:
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65],
[66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,
98]])

In [121… arr.reshape(3,31)

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[121], line 1
----> 1 arr.reshape(3,31)

ValueError: cannot reshape array of size 99 into shape (3,31)

In [122… arr.ravel()

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,


Out[122]:
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98])

In [123… x = [[1,2,3],[4,5,6],[7,1,0]]

In [126… ar = np.array(x)
ar

array([[1, 2, 3],
Out[126]:
[4, 5, 6],
[7, 1, 0]])

In [127… ar.sum(axis=0)

array([12, 8, 9])
Out[127]:
Loading [MathJax]/extensions/Safe.js
In [128… ar.sum(axis=1)

array([ 6, 15, 8])


Out[128]:

In [130… ar.T

array([[1, 4, 7],
Out[130]:
[2, 5, 1],
[3, 6, 0]])

In [131… ar.flat

<numpy.flatiter at 0x21230427be0>
Out[131]:

In [132… for item in ar.flat:


print(item)

1
2
3
4
5
6
7
1
0

In [134… ar.ndim # No. of dimensions

2
Out[134]:

In [135… ar.size

9
Out[135]:

In [136… ar.nbytes

36
Out[136]:

In [137… one = np.array([1,3,4,634,2])

In [140… one.argmax() # Returns index

3
Out[140]:

In [142… one.argmin()

0
Out[142]:

In [143… one.argsort()

array([0, 4, 1, 2, 3], dtype=int64)


Out[143]:

In [144… ar

array([[1, 2, 3],
Out[144]:
[4, 5, 6],
[7, 1, 0]])

In [146… ar.argmin()
Loading [MathJax]/extensions/Safe.js
8
Out[146]:

In [147… ar.argmax(axis=0)

array([2, 1, 1], dtype=int64)


Out[147]:

In [148… ar.argmax(axis=1)

array([2, 2, 0], dtype=int64)


Out[148]:

In [149… ar.argsort(axis=0)

array([[0, 2, 2],
Out[149]:
[1, 0, 0],
[2, 1, 1]], dtype=int64)

In [150… ar.ravel()

array([1, 2, 3, 4, 5, 6, 7, 1, 0])
Out[150]:

In [151… ar.reshape((9,1))

array([[1],
Out[151]:
[2],
[3],
[4],
[5],
[6],
[7],
[1],
[0]])

In [152… ar

array([[1, 2, 3],
Out[152]:
[4, 5, 6],
[7, 1, 0]])

In [157… ar2 = np.array([[1,2,1],[8,5,12],[4,0,6]])


ar2

array([[ 1, 2, 1],
Out[157]:
[ 8, 5, 12],
[ 4, 0, 6]])

In [156… ar + ar2

array([[ 2, 4, 4],
Out[156]:
[12, 10, 18],
[11, 1, 6]])

In [158… ar * ar2

array([[ 1, 4, 3],
Out[158]:
[32, 25, 72],
[28, 0, 0]])

In [159… np.sqrt(ar)

array([[1. , 1.41421356, 1.73205081],


Out[159]:
[2. , 2.23606798, 2.44948974],
[2.64575131, 1. , 0. ]])

In [160… ar.sum()
Loading [MathJax]/extensions/Safe.js
29
Out[160]:

In [161… ar.max()

7
Out[161]:

In [162… ar.min()

0
Out[162]:

In [163… ar

array([[1, 2, 3],
Out[163]:
[4, 5, 6],
[7, 1, 0]])

In [164… np.where(ar>5)

(array([1, 2], dtype=int64), array([2, 0], dtype=int64))


Out[164]:

In [165… np.count_nonzero(ar)

8
Out[165]:

In [166… np.nonzero(ar)

(array([0, 0, 0, 1, 1, 1, 2, 2], dtype=int64),


Out[166]:
array([0, 1, 2, 0, 1, 2, 0, 1], dtype=int64))

In [167… ar[1,2] = 0

In [168… np.nonzero(ar)

(array([0, 0, 0, 1, 1, 2, 2], dtype=int64),


Out[168]:
array([0, 1, 2, 0, 1, 0, 1], dtype=int64))

In [169… import sys

In [170… py_ar = [0,4,55,2]

In [171… np_ar = np.array(py_ar)

In [172… sys.getsizeof(1)*len(py_ar)

112
Out[172]:

In [174… np_ar.itemsize * np_ar.size

16
Out[174]:

The above two are showing that numpy saves the space

In [ ]:

Loading [MathJax]/extensions/Safe.js

You might also like