0% found this document useful (0 votes)

91 views18 pages

AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory

The document discusses NumPy arrays and broadcasting in Python. It introduces how to create 1D, 2D and 3D NumPy arrays. It shows various ways of initializing arrays using NumPy functions like ones, zeros, empty, full, arange and linspace. It demonstrates inspecting array attributes like shape, size, itemsize and nbytes. It explains broadcasting rules that allow arithmetic operations between arrays of different dimensions. In particular, it shows how broadcasting pads arrays with fewer dimensions and stretches arrays of shape 1 to match other array shapes.

Uploaded by

palaniappan.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views18 pages

AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory

Uploaded by

palaniappan.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.

ipynb - Colaboratory

Python Basic operations

from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

Numpy Array Basics

importing numpy and creating different types of numpy arrays

# importing numpy
import numpy as np

# Defining 1D array
my1DArray = np.array([1, 8, 27, 64])
print(my1DArray)

"""
# Defining and printing 2D array
my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

#Defining and printing 3D array

my3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1, 2, 3, 4],[ 9, 10, 11, 1
print(my3Darray)
"""

[ 1 8 27 64]
'\n# Defining and printing 2D array\nmy2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16
and printing 3D array\nmy3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1
y)\n'

# Defining and printing 2D array

my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

[[ 1 2 3 4]
[ 2 4 9 16]
[ 4 8 18 32]]

Array using numpy built-in functions

creating an array using built-in NumPy functions, we will use the following code:

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 1/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

# Array of ones
print ("Arrays containing Ones\n")
ones = np.ones((2,2),int)
print(ones)
print ("\n\nArrays containing Zeros\n")

# Array of zeros
zeros = np.zeros((2,10),int)
print(zeros)

Arrays containing Ones

[[1 1]
[1 1]]

Arrays containing Zeros

[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]

# Array with random values

print(np.random.randint(1,10,(3,5)))
print(np.random.random((3,5)))

[[5 9 1 9 4]
[8 6 5 3 8]
[6 7 3 8 4]]
[[0.6253672 0.36754516 0.51695366 0.4776632 0.98696659]
[0.32946382 0.10104098 0.95875064 0.63990203 0.50221926]
[0.39447164 0.78698187 0.41467121 0.18306899 0.59477965]]

# Empty array
emptyArray = np.empty((3,2))
print(emptyArray)

[[2.1018035e-316 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]
[0.0000000e+000 0.0000000e+000]]

# Full array
fullArray = np.full((2,2),np.pi)
print(fullArray)

# Array of evenly-spaced values

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 2/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

evenSpacedArray = np.arange(10,25,4)
#evenSpacedArray = np.arange(12).reshape(3,4)
print(evenSpacedArray)

# Array of evenly-spaced values

evenSpacedArray2 = np.linspace(0,2,10)
print(evenSpacedArray2)

[[3.14159265 3.14159265]
[3.14159265 3.14159265]]
[10 14 18 22]
[0. 0.22222222 0.44444444 0.66666667 0.88888889 1.11111111
1.33333333 1.55555556 1.77777778 2. ]

arange allow you to define the size of the step. linspace allow you to define the number of steps.

Inspecting Numpy Arrays

NumPy Array Attributes

ndarray.flags ------>

Information about the memory layout of the array.

'''
# Print the number of `my2DArray`'s dimensions
print(my2DArray.ndim)

# Print the number of `my2DArray`'s elements

print(my2DArray.size)

# Print information about `my2DArray`'s memory layout

print(my2DArray.flags)
'''

# Print the length of one array element in bytes

print(zeros.itemsize)

#itemsize returns the size (in bytes) of each element of a NumPy array

ch = np.array([['a','b','c'],['d','e','f']])
print(ch)
print(ch.itemsize)

# Print the total consumed bytes by `my2DArray`'s elements

print(ch.nbytes)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 3/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

8
[['a' 'b' 'c']
['d' 'e' 'f']]
4
24

# This is formatted as code

Broadcasting in NumPy
Broadcasting is a mechanism that permits NumPy to operate with arrays of different shapes
when performing arithmetic operations.

If the dimensions of two arrays are dissimilar, element-to-element operations are not possible.
However, operations on arrays of non-similar shapes is still possible in NumPy, because of the
broadcasting capability.

a = np.array([1,2,3,4])
print(a.shape)
b = np.array([10,20,30,40])
print(b.shape)
c = a + b
print(c.shape)
print (c)

(4,)
(5,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-161c6451a913> in <module>
3 b = np.array([10,20,30,40,50])
4 print(b.shape)
----> 5 c = a + b
6 print(c.shape)
7 print (c)

ValueError: operands could not be broadcast together with shapes (4,) (5,)

SEARCH STACK OVERFLOW

a = np.array([1,2,3,4])
print(a.shape)
b = 5

#b = np.array([5])

c = a + b

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 4/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

print(c.shape)
print (c)

(4,)
(4,)
[6 7 8 9]

We can think of this as an operation that stretches or duplicates the value 5 into the array [5, 5,
5,5], and adds the results. The advantage of NumPy's broadcasting is that this duplication of
values does not actually take place, but it is a useful mental model as we think about
broadcasting.

We can similarly extend this to arrays of higher dimension. Observe the result when we add a
one-dimensional array to a two-dimensional array:

Broadcasting is an operation of matching the dimensions of differently shaped arrays in order to

be able to perform further operations on those arrays (eg per-element arithmetic).

Rule 1: Two dimensions are operatable if they are equal

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer
dimensions is padded with ones on its leading (left) side.

# Create an array of two dimension

A =np.ones((6, 8))

# Shape of A
print(A.shape)

# Create another array

B = np.random.random((6,8))

# Shape of B
print(B.shape)

# Sum of A and B, here the shape of both the matrix is same.

print(A + B)

(6, 8)
(6, 8)
[[1.02235338 1.07426222 1.07796564 1.92165778 1.69896669 1.27116382
1.87724144 1.3361346 ]
[1.44088172 1.67502615 1.07147457 1.79199182 1.14112224 1.36919479
1.80727734 1.01625392]
[1.65668092 1.84113539 1.54138284 1.23198665 1.31228451 1.59306397
1.46382149 1.87437678]
[1.5552956 1.65484125 1.39042551 1.7380376 1.95208506 1.40201893
1.8552564 1.90180634]
[1.97609831 1.77141474 1.76127984 1.07468606 1.93739374 1.03612855
1.48747198 1.78455217]
[1.37723426 1.57121336 1.99178788 1.77909114 1.16735127 1.80668618
1.80688359 1.18839159]]

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 5/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape
equal to 1 in that dimension is stretched to match the other shape.

Rule 2: Two dimensions are also comptable when one of them is 1

# Initialize `x`
x = np.ones((2, 5))
print(x)

# Check shape of `x`

print(x.shape)

# Initialize `y`
y = np.arange(5)
print(y)

# Check shape of `y`

print(y.shape)

# Subtract `x` and `y`

x-y

[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
(2, 5)
[0 1 2 3 4]
(5,)
array([[ 1., 0., -1., -2., -3.],
[ 1., 0., -1., -2., -3.]])

# Rule 3: Arrays can be broadcasted together if they are compatible in all dimensions

x = np.ones((1,2,8))
print("x = "+"\n", x)
print("shape of x = ")
y = np.random.random((2, 1, 1))
print("y = "+"\n", y)

print("\nthe output = ",x + y)

# Analytical question

#The dimensions of x(1,1,4) and y(3,2,4) are diffrent. However, it is possible

#to add them. Why is that? Also, change x(10,2,8) or y(10,1,4) it will
#give ValueError. Can you find out why?

x =
[[[1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]]]
shape of x =
y =
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 6/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

[[[0.8065366 ]]

[[0.00414051]]]

the output = [[[1.8065366 1.8065366 1.8065366 1.8065366 1.8065366 1.8065366

1.8065366 1.8065366 ]
[1.8065366 1.8065366 1.8065366 1.8065366 1.8065366 1.8065366
1.8065366 1.8065366 ]]

[[1.00414051 1.00414051 1.00414051 1.00414051 1.00414051 1.00414051

1.00414051 1.00414051]
[1.00414051 1.00414051 1.00414051 1.00414051 1.00414051 1.00414051
1.00414051 1.00414051]]]

Numpy and mathematics at work

x = np.array([[1, 2, 3], [2, 3, 4]])

y = np.array([[1, 4, 9], [2, 3, -2]])
print(x+y)

[[ 2 6 12]
[ 4 6 2]]

NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic
operators. The standard addition, subtraction, multiplication, and division can all be used:

The following table lists the arithmetic operators implemented in NumPy:

Operator Equivalent ufunc Description

+ np.add Addition (e.g., 1 + 1 = 2 )

- np.subtract Subtraction (e.g., 3 - 2 = 1 )

- np.negative Unary negation (e.g., -2 )

* np.multiply Multiplication (e.g., 2 * 3 = 6 )

/ np.divide Division (e.g., 3 / 2 = 1.5 )

// np.floor_divide Floor division (e.g., 3 // 2 = 1 )

np.power Exponentiation (e.g., 2 3 = 8 )

% np.mod Modulus/remainder (e.g., 9 % 4 = 1 )

# Basica operations (+, -, *, /, %)

x = np.array([[1, 2, 3], [2, 3, 4]])
y = np.array([[1, 4, 9], [2, 3, -2]])

# Add two array

add = np.add(x, y)
print(add)

# Subtract two array

sub = np.subtract(x, y)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 7/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

print(sub)

# Multipley two array

mul = np.multiply(x, y)
print(mul)

# Divide x, y
div = np.divide(x,y)
print(div)

# Calculated the remainder of x and y

rem = np.mod(x, y)
print(rem)

[[ 2 6 12]
[ 4 6 2]]
[[ 0 -2 -6]
[ 0 0 6]]
[[ 1 8 27]
[ 4 9 -8]]
[[ 1. 0.5 0.33333333]
[ 1. 1. -2. ]]
[[0 2 3]
[0 0 0]]

Subset, Slice, And Index Arrays

x = np.array([10, 20, 30, 40, 50])

# Select items at index 0 and 1

print("# Select items at index 0 and 1")
print(x[0:2])

#Output the Columns

print("#Output the Columns")
print(y[:,1])

#Output the Rows

print("#Output the Rows")
print(y[1,:])

# Select item at row 0 and 1 and column 1 from 2D array

print('#Select item at row 0 and 1 and column 1 from 2D array')
y = np.array([[ 1, 2, 3, 4], [ 9, 10, 11 ,12],[13,14,15,16]])
print(y)
print(y[0:3, 1])

# Specifying conditions
print("# Specifying conditions")

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 8/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

biggerThan2 = (y >= 2)
print(y[biggerThan2])

# Select items at index 0 and 1

[10 20]
#Output the Columns
[4 3]
#Output the Rows
[ 2 3 -2]
#Select item at row 0 and 1 and column 1 from 2D array
[[ 1 2 3 4]
[ 9 10 11 12]
[13 14 15 16]]
[ 2 10 14]
# Specifying conditions
[ 2 3 4 9 10 11 12 13 14 15 16]

Pandas

# Importing pandas
import pandas as pd

Can you set default parameters in Pandas?

print("Pandas Version:", pd.version)

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

Pandas Version: 1.3.5

Data structure of pandas

Series
DataFrames

The Pandas Series Object

** A Pandas Series is a one-dimensional array of indexed data. It can be created from a list or
array as follows: **

series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])

print(series)

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 9/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

series1 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = [0,4,6,1,2,9,8,7])

print(series1)
print("series1[2] = ",series1[2:4])

series2 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = list('abcdefgh'))

print(series2)
print("series2['g'] = ",series2['g'])

0 2
1 3
2 7
3 11
4 13
5 17
6 19
7 23
dtype: int64
0 2
4 3
6 7
1 11
2 13
9 17
8 19
7 23
dtype: int64
series1[2] = 6 7
1 11
dtype: int64
a 2
b 3
c 7
d 11
e 13
f 17
g 19
h 23
dtype: int64
series2['g'] = 19

series1.values
series1.keys
series1.index

array([ 2, 3, 7, 11, 13, 17, 19, 23])<bound method Series.keys of 0 2

4 3
6 7
1 11
2 13
9 17
8 19
7 23
dtype: int64>Int64Index([0, 4, 6, 1, 2, 9, 8, 7], dtype='int64')

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 10/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

The Pandas DataFrame Object

The next fundamental structure in Pandas is the DataFrame. Like the Series object the
DataFrame can be thought of either as a generalization of a NumPy array, or as a specialization
of a Python dictionary

# Creating dataframe from Dictionary

dict_df = [{'A': 'Apple', 'B': 'Ball'},{'A': 'Aeroplane', 'B': 'Bat', 'C': 'Cat'}]
dict_df = pd.DataFrame(dict_df)
print(dict_df)

A B C
0 Apple Ball NaN
1 Aeroplane Bat Cat

# Creating dataframe from Series

series_df = pd.DataFrame({
'A': range(1, 5),
'B': pd.Timestamp('20190526'),
'C': pd.Series(5, index=list(range(4)), dtype='float64'),
'D': np.array([3] * 4, dtype='int64'),
'E': pd.Categorical(["Depression", "Social Anxiety", "Bipolar Disorder", "Eating Disor
'F': 'Mental health',
'G': 'is challenging'
})
print(series_df)

A B C D E F G
0 1 2019-05-26 5.0 3 Depression Mental health is challenging
1 2 2019-05-26 5.0 3 Social Anxiety Mental health is challenging
2 3 2019-05-26 5.0 3 Bipolar Disorder Mental health is challenging
3 4 2019-05-26 5.0 3 Eating Disorder Mental health is challenging

# Creating a dataframe from ndarrays

sdf = {
'County':['Østfold', 'Hordaland', 'Oslo', 'Hedmark', 'Oppland', 'Buskerud'],
'ISO-Code':[1,2,3,4,5,6],
'Area': [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94],
'Administrative centre': ["Sarpsborg", "Oslo", "City of Oslo", "Hamar", "Lillehammer",
}
sdf = pd.DataFrame(sdf)
print(sdf)

County ISO-Code Area Administrative centre

0 Østfold 1 4180.69 Sarpsborg
1 Hordaland 2 4917.94 Oslo
2 Oslo 3 454.07 City of Oslo
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 11/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

3 Hedmark 4 27397.76 Hamar

4 Oppland 5 25192.10 Lillehammer
5 Buskerud 6 14910.94 Drammen

Loading a dataset into Pandas DataFrame

import pandas as pd
columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num',
'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender','capital_gain','
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.dat
df.head(10)

age workclass fnlwgt education education_num marital_status occupation rela

0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical N

Self-emp- Married-civ- Exec-

1 50 83311 Bachelors 13
not-inc spouse managerial

Handlers-
2 38 Private 215646 HS-grad 9 Divorced N
cleaners

Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners

Married-civ- Prof-
4 28 Private 338409 Bachelors 13
spouse specialty

Married-civ- Exec-
5 37 Private 284582 Masters 14
spouse managerial

Married-spouse- Other-
6 49 Private 160187 9th 5 N
absent service

Self-emp- Married-civ- Exec-

7 52 209642 HS-grad 9
not-inc spouse managerial

Prof-
8 31 Private 45781 Masters 14 Never-married N
specialty

Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial

# Displays the rows, columns, data types and memory used by the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 12/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

0 age 32561 non-null int64

1 workclass 32561 non-null object
2 fnlwgt 32561 non-null int64
3 education 32561 non-null object
4 education_num 32561 non-null int64
5 marital_status 32561 non-null object
6 occupation 32561 non-null object
7 relationship 32561 non-null object
8 ethnicity 32561 non-null object
9 gender 32561 non-null object
10 capital_gain 32561 non-null int64
11 capital_loss 32561 non-null int64
12 hours_per_week 32561 non-null int64
13 country_of_origin 32561 non-null object
14 income 32561 non-null object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB

# Displays the no of data points and columns in the dataframe

df.shape

(32561, 15)

# Display all columns of the dataframe

df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education_num',

'marital_status', 'occupation', 'relationship', 'ethnicity', 'gender',
'capital_gain', 'capital_loss', 'hours_per_week', 'country_of_origin',
'income'],
dtype='object')

# Displays summary statistics for each numerical column in the dataframe

df.describe()

age fnlwgt education_num capital_gain capital_loss hours_p

count 32561.000000 3.256100e+04 32561.000000 32561.000000 32561.000000 32561

mean 38.581647 1.897784e+05 10.080679 1077.648844 87.303830 40

std 13.640433 1.055500e+05 2.572720 7385.292085 402.960219 12

min 17.000000 1.228500e+04 1.000000 0.000000 0.000000 1

25% 28.000000 1.178270e+05 9.000000 0.000000 0.000000 40

50% 37.000000 1.783560e+05 10.000000 0.000000 0.000000 40

75% 48.000000 2.370510e+05 12.000000 0.000000 0.000000 45

max 90.000000 1.484705e+06 16.000000 99999.000000 4356.000000 99

Selecting rows and columns in the dataframe

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 13/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

# Selects a row
df.iloc[10]

age 37
workclass Private
fnlwgt 280464
education Some-college
education_num 10
marital_status Married-civ-spouse
occupation Exec-managerial
relationship Husband
ethnicity Black
gender Male
capital_gain 0
capital_loss 0
hours_per_week 80
country_of_origin United-States
income >50K
Name: 10, dtype: object

# Selects 10 rows
df.iloc[0:10]

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 14/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

age workclass fnlwgt education education_num marital_status occupation rela

0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical N

Self-emp- Married-civ- Exec-

1 50 83311 Bachelors 13
# Selects a range not-inc
of rows spouse managerial
df.iloc[10:15] Handlers-
2 38 Private 215646 HS-grad 9 Divorced N
cleaners

Married-civ- Handlers-
3 53 Private 234721 11th 7
spouse cleaners
age workclass fnlwgt education education_num marital_status occupation re
Married-civ- Prof-
4 28 Private 338409 Some-
Bachelors 13 Married-civ- Exec-
10 37 Private 280464 10 spouse specialty
college spouse managerial
Married-civ- Exec-
5 37 Private 284582 Masters 14 Married-civ- Prof-
11 30 State-gov 141297 Bachelors 13 spouse managerial
spouse specialty
Married-spouse- Other-
12
6 23
49 Private 160187
Private 122272 Bachelors
9th 13
5 Never-married Adm-clerical N
absent service
Assoc-
13 32 Private 205019
Self-emp- 12 Never-married
Married-civ- Sales
Exec- N
7 52 209642 acdm
HS-grad 9
not-inc spouse managerial
Married-civ-
14 40 Private 121772 Assoc-voc 11 Craft-repair
Prof-
8 31 Private 45781 Masters 14 spouse
Never-married N
specialty

Married-civ- Exec-
9 42 Private 159449 Bachelors 13
spouse managerial

# Selects the last 2 rows

df.iloc[-2:]

age workclass fnlwgt education education_num marital_status occupation

32559 22 Private 201490 HS-grad 9 Never-married Adm-clerical

Self-emp- Married-civ- Exec-

32560 52 287927 HS-grad 9
inc spouse managerial

# Selects every other row in columns 3-5

df.iloc[::2, 3:5].head()

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 15/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

education education_num

0 Bachelors 13

2 HS-grad 9

4 Bachelors 13

6 9th 5
Combine Pandas and Numpy
8 Masters 14

import pandas as pd
import numpy as np

np.random.seed(24)
df = pd.DataFrame({'F': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 5), columns=list('EDCBA'))],
axis=1)
df.iloc[::2, 3:5] = np.nan
df

F E D C B A

0 1.0 1.329212 -0.770033 NaN NaN -1.070816

1 2.0 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.0 0.678805 1.889273 NaN NaN -0.481165

3 4.0 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.0 -1.336936 0.562861 NaN NaN 0.121668

5 6.0 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.0 -0.385684 0.519818 NaN NaN 1.428984

7 8.0 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.0 1.264103 0.290035 NaN NaN 1.030550

9 10.0 0.118098 -0.021853 0.046841 -1.628753 -0.392361

# Define a function that should color the values that are less than 0
def colorNegativeValueToRed(value):
if value < 0:
color = 'red'
elif value > 0:
color = 'black'
else:
color = 'green'

return 'color: %s' % color

s = df.style.applymap(colorNegativeValueToRed, subset=['A','B','C','D','E'])
s
https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 16/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

# Let us hightlight max value in the column with green background and min value with orang
def highlightMax(s):
isMax = s == s.max()
return ['background-color: orange' if v else '' for v in isMax]

def highlightMin(s):
isMin = s == s.min()
return ['background-color: green' if v else '' for v in isMin]

df.style.apply(highlightMax).apply(highlightMin).highlight_null(null_color='red')

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

import seaborn as sns

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 17/18
9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.ipynb - Colaboratory

cm = sns.light_palette("pink", as_cmap=True)

s = df.style.background_gradient(cmap=cm)
s

F E D C B A

0 1.000000 1.329212 -0.770033 nan nan -1.070816

1 2.000000 -1.438713 0.564417 0.295722 -1.626404 0.219565

2 3.000000 0.678805 1.889273 nan nan -0.481165

3 4.000000 0.850229 1.453425 1.057737 0.165562 0.515018

4 5.000000 -1.336936 0.562861 nan nan 0.121668

5 6.000000 1.207603 -0.002040 1.627796 0.354493 1.037528

6 7.000000 -0.385684 0.519818 nan nan 1.428984

7 8.000000 -2.089354 -0.129820 0.631523 -0.586538 0.290720

8 9.000000 1.264103 0.290035 nan nan 1.030550

9 10.000000 0.118098 -0.021853 0.046841 -1.628753 -0.392361

Colab paid products - Cancel contracts here

https://colab.research.google.com/drive/1VBohDB5UrkcmuJIPcO8YNbAw4fIKW7DK#printMode=true 18/18

Numpy Full
100% (1)
Numpy Full
40 pages
Fire Safety Compliance Report
No ratings yet
Fire Safety Compliance Report
3 pages
Customer Support Service Operations (Synopsis)
100% (1)
Customer Support Service Operations (Synopsis)
8 pages
Engine Control Unit Type ECU 4/G: MTU/DDC Series 4000 Genset Applications
88% (8)
Engine Control Unit Type ECU 4/G: MTU/DDC Series 4000 Genset Applications
62 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
Unit III - Data Manipulation Using Python
No ratings yet
Unit III - Data Manipulation Using Python
16 pages
Lesson Plan Viii Chapter 7
100% (2)
Lesson Plan Viii Chapter 7
3 pages
Secured Party Creditor ID Card Application: Right Thumb Print
100% (5)
Secured Party Creditor ID Card Application: Right Thumb Print
1 page
3rd Unit
100% (1)
3rd Unit
75 pages
2010 Dodge Ram 1500 Ebrochure
No ratings yet
2010 Dodge Ram 1500 Ebrochure
16 pages
NumPy Array Operations and Functions
No ratings yet
NumPy Array Operations and Functions
14 pages
Pfaffenberger Anthro of Technology
100% (1)
Pfaffenberger Anthro of Technology
27 pages
TECHNICAL REVIEW CHECKLIST - Reviewed
No ratings yet
TECHNICAL REVIEW CHECKLIST - Reviewed
15 pages
Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
Why Is Technology Important
100% (2)
Why Is Technology Important
2 pages
Sakuntala - English Translation by JG Jennings
No ratings yet
Sakuntala - English Translation by JG Jennings
231 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
Applied Machine Learning For Engineers: Introduction To Numpy
No ratings yet
Applied Machine Learning For Engineers: Introduction To Numpy
13 pages
Numpy Matplot
No ratings yet
Numpy Matplot
14 pages
UNIT-03 Numpy
No ratings yet
UNIT-03 Numpy
49 pages
Visual Reference Guide 216B3 226B3 v3
No ratings yet
Visual Reference Guide 216B3 226B3 v3
10 pages
Modeling and Analysis of Water Pumping Windmills
No ratings yet
Modeling and Analysis of Water Pumping Windmills
14 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Registered Office: Customer Care
No ratings yet
Registered Office: Customer Care
1 page
Discourse Community of Business Marketing
No ratings yet
Discourse Community of Business Marketing
7 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3007 ETH VL2023240104352 2023-09-27 Reference-Material-I
47 pages
M.tech Curriculum 0
No ratings yet
M.tech Curriculum 0
49 pages
Numpy
No ratings yet
Numpy
64 pages
Java FlowCharts Operators
No ratings yet
Java FlowCharts Operators
17 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Num Py
No ratings yet
Num Py
49 pages
Air Content Test Apparatus
No ratings yet
Air Content Test Apparatus
10 pages
Unit 3
No ratings yet
Unit 3
56 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Num Py
No ratings yet
Num Py
31 pages
Num Py
No ratings yet
Num Py
30 pages
Introduction To Numpy: by Adnan Amin Lecturer
No ratings yet
Introduction To Numpy: by Adnan Amin Lecturer
40 pages
Arrays
No ratings yet
Arrays
28 pages
Python Data Toolkit
No ratings yet
Python Data Toolkit
147 pages
Unit 4
No ratings yet
Unit 4
49 pages
CAP776 Numpy
No ratings yet
CAP776 Numpy
71 pages
Japan's Energy Policy in The Asian Region: Reiji Takeishi
No ratings yet
Japan's Energy Policy in The Asian Region: Reiji Takeishi
35 pages
Num Py
No ratings yet
Num Py
18 pages
RAW Data
No ratings yet
RAW Data
22 pages
Numpy, Pandas
No ratings yet
Numpy, Pandas
19 pages
02 Numpy
No ratings yet
02 Numpy
11 pages
Basic Array Creation and Operations
No ratings yet
Basic Array Creation and Operations
27 pages
15 Numpy
No ratings yet
15 Numpy
32 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Python Lab6 NumPy
No ratings yet
Python Lab6 NumPy
46 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Numpy - Basics
No ratings yet
Numpy - Basics
18 pages
Numpy
No ratings yet
Numpy
40 pages
Ultra-Light Decoder For Turbo Product Codes: IEEE Communications Letters December 2017
No ratings yet
Ultra-Light Decoder For Turbo Product Codes: IEEE Communications Letters December 2017
5 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
13 pages
Python Sem V Portion 2
No ratings yet
Python Sem V Portion 2
29 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Jovia Report
No ratings yet
Jovia Report
18 pages
VD4 Manual-En BA 460-07 E
No ratings yet
VD4 Manual-En BA 460-07 E
55 pages
Numpy Notes
No ratings yet
Numpy Notes
67 pages
Unit2 - NumPy - Jupyter Notebook
No ratings yet
Unit2 - NumPy - Jupyter Notebook
15 pages
Numpy
No ratings yet
Numpy
18 pages
Numpy
No ratings yet
Numpy
27 pages
International Journal of Industrial Ergonomics: A A B A
No ratings yet
International Journal of Industrial Ergonomics: A A B A
10 pages
Lets Begin With Numpy
No ratings yet
Lets Begin With Numpy
16 pages
RC0201FR-0710KL Yageo
No ratings yet
RC0201FR-0710KL Yageo
10 pages
Numpy
No ratings yet
Numpy
24 pages
Numpy
No ratings yet
Numpy
11 pages
2.2 Working With Numpy
No ratings yet
2.2 Working With Numpy
11 pages
NUMPY
No ratings yet
NUMPY
8 pages
Data Sheet: E Cores and Accessories
No ratings yet
Data Sheet: E Cores and Accessories
6 pages
The Blockchain Trilemma: An Evaluation Framework
No ratings yet
The Blockchain Trilemma: An Evaluation Framework
10 pages
NumPy Basics
No ratings yet
NumPy Basics
9 pages
3 Introduction To Numpy
No ratings yet
3 Introduction To Numpy
9 pages
Sources of Finance Practice IB Questions
No ratings yet
Sources of Finance Practice IB Questions
3 pages
Working With NumPy For Class 12th PDF
No ratings yet
Working With NumPy For Class 12th PDF
5 pages
Numpy Notes
No ratings yet
Numpy Notes
7 pages
Topic 5 Broadcasting
No ratings yet
Topic 5 Broadcasting
6 pages
Smart Temperature Detector Documentation
No ratings yet
Smart Temperature Detector Documentation
3 pages
N50M Grade Neodymium Magnets Data
No ratings yet
N50M Grade Neodymium Magnets Data
1 page
Session 14 Numpy Advanced
No ratings yet
Session 14 Numpy Advanced
13 pages
Numpy
No ratings yet
Numpy
9 pages
DB Density-Standards en 1.0
No ratings yet
DB Density-Standards en 1.0
2 pages
Global Leather Sofa Industry 2015 Market Research Report
No ratings yet
Global Leather Sofa Industry 2015 Market Research Report
8 pages
Nominal Run Pipe Reference Standards Thicknesses (E MM) TUBASYS SLU Manufacturing Standards (E MM)
No ratings yet
Nominal Run Pipe Reference Standards Thicknesses (E MM) TUBASYS SLU Manufacturing Standards (E MM)
1 page
Numpy Guide
No ratings yet
Numpy Guide
1 page

AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory

Uploaded by

AD3301 - Numpy - and - Pandas - Ipynb - Colaboratory

Uploaded by

9/7/22, 10:21 AM AD3301 - Numpy_and_Pandas.

Python Basic operations

from IPython.core.interactiveshell import InteractiveShell

Numpy Array Basics

importing numpy and creating different types of numpy arrays

#Defining and printing 3D array

# Defining and printing 2D array

Array using numpy built-in functions

Arrays containing Ones

Arrays containing Zeros

# Array with random values

# Array of evenly-spaced values

# Array of evenly-spaced values

Inspecting Numpy Arrays

NumPy Array Attributes

Information about the memory layout of the array.

# Print the number of `my2DArray`'s elements

# Print information about `my2DArray`'s memory layout

# Print the length of one array element in bytes

# Print the total consumed bytes by `my2DArray`'s elements

# This is formatted as code

SEARCH STACK OVERFLOW

Broadcasting is an operation of matching the dimensions of differently shaped arrays in order to

Rule 1: Two dimensions are operatable if they are equal

# Create an array of two dimension

# Create another array

# Sum of A and B, here the shape of both the matrix is same.

Rule 2: Two dimensions are also comptable when one of them is 1

# Check shape of `x`

# Check shape of `y`

# Subtract `x` and `y`

print("\nthe output = ",x + y)

#The dimensions of x(1,1,4) and y(3,2,4) are diffrent. However, it is possible

the output = [[[1.8065366 1.8065366 1.8065366 1.8065366 1.8065366 1.8065366

[[1.00414051 1.00414051 1.00414051 1.00414051 1.00414051 1.00414051

Numpy and mathematics at work

x = np.array([[1, 2, 3], [2, 3, 4]])

The following table lists the arithmetic operators implemented in NumPy:

+ np.add Addition (e.g., 1 + 1 = 2 )

- np.subtract Subtraction (e.g., 3 - 2 = 1 )

- np.negative Unary negation (e.g., -2 )

* np.multiply Multiplication (e.g., 2 * 3 = 6 )

/ np.divide Division (e.g., 3 / 2 = 1.5 )

// np.floor_divide Floor division (e.g., 3 // 2 = 1 )

** np.power Exponentiation (e.g., 2 ** 3 = 8 )

% np.mod Modulus/remainder (e.g., 9 % 4 = 1 )

# Basica operations (+, -, *, /, %)

# Add two array

# Subtract two array

# Multipley two array

# Calculated the remainder of x and y

Subset, Slice, And Index Arrays

x = np.array([10, 20, 30, 40, 50])

# Select items at index 0 and 1

#Output the Columns

#Output the Rows

# Select item at row 0 and 1 and column 1 from 2D array

# Select items at index 0 and 1

Can you set default parameters in Pandas?

print("Pandas Version:", pd.__version__)

Pandas Version: 1.3.5

Data structure of pandas

The Pandas Series Object

series = pd.Series([2, 3, 7, 11, 13, 17, 19, 23])

series1 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = [0,4,6,1,2,9,8,7])

series2 = pd.Series([2, 3, 7, 11, 13, 17, 19, 23],index = list('abcdefgh'))

array([ 2, 3, 7, 11, 13, 17, 19, 23])<bound method Series.keys of 0 2

The Pandas DataFrame Object

# Creating dataframe from Dictionary

# Creating dataframe from Series

# Creating a dataframe from ndarrays

County ISO-Code Area Administrative centre

3 Hedmark 4 27397.76 Hamar

Loading a dataset into Pandas DataFrame

age workclass fnlwgt education education_num marital_status occupation rela

0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical N

np.power Exponentiation (e.g., 2 3 = 8 )

print("Pandas Version:", pd.version)