0% found this document useful (0 votes)

10 views22 pages

Python 101 - Python Libraries for Data Analysis - Numpy and Pandas

The document provides an introduction to Python libraries for data analysis, specifically focusing on NumPy and its capabilities for handling single and multi-dimensional arrays. It covers various tasks such as defining arrays, leveraging built-in methods, performing mathematical operations, and array slicing and indexing, along with mini challenges for practical application. Additionally, it briefly introduces Pandas as a data manipulation tool built on top of NumPy.

Uploaded by

ndiayemalickn638

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views22 pages

Python 101 - Python Libraries for Data Analysis - Numpy and Pandas

Uploaded by

ndiayemalickn638

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Python 101 - Python Libraries for Data Analysis - Numpy and

Pandas

May 29, 2025

1 TASK #1: DEFINE SINGLE AND MULTI-DIMENSIONAL

NUMPY ARRAYS
[20]: # NumPy is a Linear Algebra Library used for multidimensional arrays
# NumPy brings the best of two worlds: (1) C/Fortran computational efficiency,␣
↪(2) Python language easy syntax

import numpy as np

# Let's define a one-dimensional array

list_1 = [50, 60, 80, 100, 200, 300, 500, 600,]
list_1

[20]: [50, 60, 80, 100, 200, 300, 500, 600]

[21]: # Let's create a numpy array from the list "my_list"

my_numpy_array = np.array(list_1)
my_numpy_array

[21]: array([ 50, 60, 80, 100, 200, 300, 500, 600])

[5]: type(my_numpy_array)

[5]: numpy.ndarray

Multi-dimensional (Matrix definition)

[6]: my_matrix = np.array([[2, 5, 8], [7, 3, 6]])
my_matrix

[6]: array([[2, 5, 8],

[7, 3, 6]])

MINI CHALLENGE #1: - Write a code that creates the following 2x4 numpy array
[[3 7 9 3]
[4 3 2 2]]

1
[3]: x = np.array([[3, 7, 9, 3],
[4, 3, 2, 1]])
x

[3]: array([[3, 7, 9, 3],

[4, 3, 2, 1]])

[ ]:

2 TASK #2: LEVERAGE NUMPY BUILT-IN METHODS AND

FUNCTIONS
[8]: # "rand()" uniform distribution between 0 and 1: génére une valeur aléatoire
x = np.random.rand(20)
x

[8]: array([0.14056323, 0.53908128, 0.29549647, 0.03517011, 0.89102171,

0.05271959, 0.3741947 , 0.2051953 , 0.16712427, 0.65044685,
0.68705185, 0.26958268, 0.13184144, 0.36498677, 0.67224159,
0.42635753, 0.75119414, 0.82521819, 0.09219216, 0.85630017])

[9]: # you can create a matrix of random number as well

x = np.random.rand(3, 3)
x

[9]: array([[0.75189542, 0.8534315 , 0.58733699],

[0.10275586, 0.36892311, 0.54795311],
[0.55516788, 0.91208212, 0.45541749]])

[10]: # "randint" is used to generate random integers between upper and lower bounds

x = np.random.randint(1,50)
x

[10]: 22

[11]: # "randint" can be used to generate a certain number of random itegers as␣
↪follows

x = np.random.randint(1, 100, 15)

[11]: array([77, 80, 61, 59, 73, 97, 19, 22, 82, 78, 49, 97, 75, 69, 84])

[12]: # np.arange creates an evenly spaced values within a given interval

x = np.arange(1, 50)
x

2
[12]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

[13]: # create a diagonal of ones and zeros everywhere else

x = np.eye(7)
x

[13]: array([[1., 0., 0., 0., 0., 0., 0.],

[0., 1., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 1.]])

[14]: # Matrix of ones

x = np.ones((7, 7))
x

[14]: array([[1., 1., 1., 1., 1., 1., 1.],

[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.]])

[15]: # Array of zeros

x = np.zeros(8)
x

[15]: array([0., 0., 0., 0., 0., 0., 0., 0.])

MINI CHALLENGE #2: - Write a code that takes in a positive integer “x” from the user and
creates a 1x10 array with random numbers ranging from 0 to “x”
[22]: #ask user to inter a positive interger
x = int(input("please enter a positive integer value"))
#verification
if x <= 0:
print("please enter a positif integer")
else:
#create table 1X10
array = np.random.randint(0, x, size=(1, 10))
print("generated table:")
print(array)

3
please enter a positive integer value6
generated table:
[[5 3 2 4 4 1 4 4 3 2]]

[ ]:

3 TASK #3: PERFORM MATHEMATICAL OPERATIONS IN

NUMPY
[17]: # np.arange() returns an evenly spaced values within a given interval
x = np.arange(1, 10)
x

[17]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

[18]: y = np.arange(1, 10)

[18]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

[19]: # Add 2 numpy arrays together

sum = x + y
sum

[19]: array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])

[20]: squared = x**2

squared

[20]: array([ 1, 4, 9, 16, 25, 36, 49, 64, 81])

[21]: sqrt = np.sqrt(squared)

sqrt

[21]: array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

[22]: z = np.exp(y)
z

[22]: array([2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,

1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
8.10308393e+03])

MINI CHALLENGE #3: - Given the X and Y values below, obtain the distance between them
X = [5, 7, 20]
Y = [9, 15, 4]

4
[23]: x = np.array([5, 7, 20])
y = np.array([9, 15, 4])

d = np.sqrt(x**2 + y**2)
d

[23]: array([10.29563014, 16.55294536, 20.39607805])

4 TASK #4: PERFORM ARRAYS SLICING AND INDEXING

[5]: my_numpy_array = np.array([3, 5, 6, 2, 8, 10, 20, 50])
my_numpy_array

[5]: array([ 3, 5, 6, 2, 8, 10, 20, 50])

[6]: # Access specific index from the numpy array

my_numpy_array[-1]

[6]: 50

[7]: # Starting from the first index 0 up until and NOT including the last element
my_numpy_array[0:3]

[7]: array([3, 5, 6])

[8]: # Broadcasting, altering several values in a numpy array at once

my_numpy_array[0:4] = 7
my_numpy_array

[8]: array([ 7, 7, 7, 7, 8, 10, 20, 50])

[9]: # Let's define a two dimensional numpy array

matrix = np.random.randint(1, 10,(4,4))
matrix

[9]: array([[8, 6, 8, 4],

[5, 4, 9, 9],
[7, 6, 4, 1],
[1, 4, 2, 1]])

[10]: # Get a row from a mtrix

matrix[-1]

[10]: array([1, 4, 2, 1])

5
[11]: # Get one element
matrix[0][0]

[11]: 8

MINI CHALLENGE #4: - In the following matrix, replace the last row with 0
X = [2 30 20 -2 -4] [3 4 40 -3 -2] [-3 4 -6 90 10] [25 45 34 22 12] [13 24 22 32 37]

[13]: X = np.array([[2, 30, 20,-2 ,-4],

[3, 4, 40 ,-3 ,-2],
[-3, 4,-6, 90, 10],
[25, 45,34, 22, 12],
[13, 24,22, 32, 37]])
X

[13]: array([[ 2, 30, 20, -2, -4],

[ 3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])

[30]: X[4] = 0
X

[30]: array([[ 2, 30, 20, 0, 0],

[ 0, 4, 40, 0, 0],
[ 0, 4, 0, 90, 10],
[ 0, 0, 34, 22, 12],
[ 0, 0, 0, 0, 0]])

[ ]:

5 TASK #5: PERFORM ELEMENTS SELECTION (CONDI-

TIONAL)
[33]: matrix = np.random.randint(1, 10, (5, 5))
matrix

[33]: array([[8, 1, 5, 1, 6],

[2, 9, 8, 5, 9],
[4, 8, 4, 9, 2],
[2, 8, 8, 3, 6],
[4, 6, 9, 5, 8]])

6
[46]: new_matrix = matrix[ matrix > 7 ]
new_matrix

[46]: array([8, 9, 8, 9, 8, 9, 8, 8, 9, 8])

[47]: # Obtain odd elements only

new_matrix = matrix[ matrix % 2 == 1]
new_matrix

[47]: array([1, 5, 1, 9, 5, 9, 9, 3, 9, 5])

MINI CHALLENGE #5: - In the following matrix, replace negative elements by 0 and replace odd
elements with -2
X = [2 30 20 -2 -4]
[3 4 40 -3 -2]
[-3 4 -6 90 10]
[25 45 34 22 12]
[13 24 22 32 37]
[4]: X = np.array([[2, 30, 20, -2, -4],
[3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])
X

[4]: array([[ 2, 30, 20, -2, -4],

[ 3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])

[23]: X[ X < 0 ]= 0
X[ X % 2 == 1] = -2
X

[23]: array([[ 2, 30, 20, 0, 0],

[ 0, 4, 40, 0, 0],
[ 0, 4, 0, 90, 10],
[ 0, 0, 34, 22, 12],
[ 0, 24, 22, 32, 0]])

7
6 TASK #6: UNDERSTAND PANDAS FUNDAMENTALS
[35]: # Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft␣
↪excel in Python).

# DataFrames empower programmers to store and manipulate data in a tabular␣

↪fashion (rows and columns).

# Series Vs. DataFrame? Series is considered a single column of a DataFrame.

[1]: import pandas as pd

[25]: # Let's define a two-dimensional Pandas DataFrame

# Note that you can create a pandas dataframe from a python dictionary
bank_client_df = pd.DataFrame({'Bank Client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net Worth [$]':[3500, 29000, 10000, 2000],

'Years with bank':[3, 4, 9, 5]})
bank_client_df

[25]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[26]: # Let's obtain the data type

type(bank_client_df)

[26]: pandas.core.frame.DataFrame

[28]: # you can only view the first couple of rows using .head()
bank_client_df.head(2)

[28]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4

[29]: # you can only view the last couple of rows using .tail()
bank_client_df.tail(2)

[29]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
2 333 Mitch 10000 9
3 444 Ryan 2000 5

MINI CHALLENGE #6: - A porfolio contains a collection of securities such as stocks, bonds and
ETFs. Define a dataframe named ‘portfolio_df’ that holds 3 different stock ticker symbols, number

8
of shares, and price per share (feel free to choose any stocks) - Calculate the total value of the
porfolio including all stocks
[44]: portfolio_df = pd.DataFrame({'stock ticker symbol':['AAPL', 'AMZN', 'T'],
'price per share [$]':[3500, 200, 40],
'Number of stocks': [3, 4, 9]})
portfolio_df

[44]: stock ticker symbol price per share [$] Number of stocks
0 AAPL 3500 3
1 AMZN 200 4
2 T 40 9

[46]: stocks_dollar_value = portfolio_df['price per share [$]'] *␣

↪portfolio_df['Number of stocks']

stocks_dollar_value.sum()

[46]: 11660

[ ]:

7 TASK #7: PANDAS WITH CSV AND HTML DATA

[47]: # Pandas is used to read a csv file and store data in a DataFrame
house_price_df = pd.read_html('https://www.livingin-canada.com/
↪house-prices-canada.html')

house_price_df[0]

[47]: City \
0 Vancouver, BC
1 Toronto, Ont
2 Ottawa, Ont
3 Calgary, Alb
4 Montreal, Que
5 Halifax, NS
6 Regina, Sask
7 Fredericton, NB
8 (adsbygoogle = window.adsbygoogle || []).push(…

Average House Price \

0 $1,036,000
1 $870,000
2 $479,000
3 $410,000
4 $435,000
5 $331,000

9
6 $254,000
7 $198,000
8 (adsbygoogle = window.adsbygoogle || []).push(…

12 Month Change
0 + 2.63 %
1 +10.2 %
2 + 15.4 %
3 – 1.5 %
4 + 9.3 %
5 + 3.6 %
6 – 3.9 %
7 – 4.3 %
8 (adsbygoogle = window.adsbygoogle || []).push(…

[48]: house_price_df[1]

[48]: Province \
0 British Columbia
1 Ontario
2 Alberta
3 Quebec
4 Manitoba
5 Saskatchewan
6 Nova Scotia
7 Prince Edward Island
8 Newfoundland / Labrador
9 New Brunswick
10 Canadian Average
11 (adsbygoogle = window.adsbygoogle || []).push(…

Average House Price \

0 $736,000
1 $594,000
2 $353,000
3 $340,000
4 $295,000
5 $271,000
6 $266,000
7 $243,000
8 $236,000
9 $183,000
10 $488,000
11 (adsbygoogle = window.adsbygoogle || []).push(…

12 Month Change
0 + 7.6 %

10
1 – 3.2 %
2 – 7.5 %
3 + 7.6 %
4 – 1.4 %
5 – 3.8 %
6 + 3.5 %
7 + 3.0 %
8 – 1.6 %
9 – 2.2 %
10 – 1.3 %
11 (adsbygoogle = window.adsbygoogle || []).push(…

[41]: # Read tabular data using read_html

[ ]:

MINI CHALLENGE #7: - Write a code that uses Pandas to read tabular US retirement data -
You can use data from here: https://www.ssa.gov/oact/progdata/nra.html

[ ]: retirement_df = pd.read_html('https://www.ssa.gov/oact/progdata/nra.html')
retirement_df[0]

8 TASK #8: PANDAS OPERATIONS

[58]: # Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank Client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net Worth [$]':[3500, 29000, 10000, 2000],

'Years with bank':[3, 4, 9, 5]})
bank_client_df

[58]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[59]: # Pick certain rows that satisfy a certain criteria

df_loyal = bank_client_df[ bank_client_df['Years with bank'] >=5]
df_loyal

[59]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
2 333 Mitch 10000 9

11
3 444 Ryan 2000 5

[60]: # Delete a column from a DataFrame

del bank_client_df['Bank Client ID']

bank_client_df

[60]: Bank Client Name Net Worth [$] Years with bank
0 Chanel 3500 3
1 Steve 29000 4
2 Mitch 10000 9
3 Ryan 2000 5

MINI CHALLENGE #8: - Using “bank_client_df” DataFrame, leverage pandas operations to

only select high networth individuals with minimum $5000 - What is the combined networth for
all customers with 5000+ networth?
[62]: df_high_networth = bank_client_df[ bank_client_df['Net Worth [$]'] >=5000]
df_high_networth

[62]: Bank Client Name Net Worth [$] Years with bank
1 Steve 29000 4
2 Mitch 10000 9

9 TASK #9: PANDAS WITH FUNCTIONS

[4]: # Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net worth [$]':[3500, 29000, 10000, 2000],

'Years with bank':[3, 4, 9, 5]})
bank_client_df

[4]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[2]: # Define a function that increases all clients networth (stocks) by a fixed␣
↪value of 20% (for simplicity sake)

def networth_update(balance):
return balance * 1.2

12
[5]: # You can apply a function to the DataFrame
bank_client_df['Net worth [$]'].apply(networth_update)

[5]: 0 4200.0
1 34800.0
2 12000.0
3 2400.0
Name: Net worth [$], dtype: float64

[ ]:

MINI CHALLENGE #9: - Define a function that triples the stock prices and adds $200 - Apply
the function to the DataFrame - Calculate the updated total networth of all clients combined
[8]: def networth_update(balance):
return balance *3 + 200

[11]: results= bank_client_df['Net worth [$]'].apply(networth_update)

results

[11]: 0 10700
1 87200
2 30200
3 6200
Name: Net worth [$], dtype: int64

[ ]:

10 TASK #10: PERFORM SORTING AND ORDERING IN

PANDAS
[12]: # Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net worth [$]':[3500, 29000, 10000, 2000],

'Years with bank':[3, 4, 9, 5]})
bank_client_df

[12]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

13
[14]: # You can sort the values in the dataframe according to number of years with␣
↪bank

bank_client_df.sort_values(by = 'Years with bank')

[14]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
3 444 Ryan 2000 5
2 333 Mitch 10000 9

[15]: # Note that nothing changed in memory! you have to make sure that inplace is␣
↪set to True

bank_client_df

[15]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[ ]: # Set inplace = True to ensure that change has taken place in memory
bank_client_df.sort_values(by = 'Years with bank', inplace = True)

[16]: # Note that now the change (ordering) took place

bank_client_df

[16]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

11 TASK #11: PERFORM CONCATENATING AND MERG-

ING WITH PANDAS
[ ]: # Check this out: https://pandas.pydata.org/pandas-docs/stable/user_guide/
↪merging.html

[24]: df1 = pd.DataFrame({'A':['A0', 'A1', 'A2', 'A3'],

'B':['B0', 'B1', 'B2', 'B3'],
'C':['C0', 'C1', 'C2', 'C3'],
'D':['D0', 'D1', 'D2', 'D3']},
index = [0, 1, 2, 3])
df1

14
[24]: A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3

[25]: df1

[25]: A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3

[27]: df2 = pd.DataFrame({'A':['A4', 'A5', 'A6', 'A7'],

'B':['B4', 'B5', 'B6', 'B7'],
'C':['C4', 'C5', 'C6', 'C7'],
'D':['D4', 'D5', 'D6', 'D7']},
index = [4, 5, 6, 7])

[28]: df2

[28]: A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7

[29]: df3 = pd.DataFrame({'A':['A8', 'A9', 'A10', 'A11'],

'B':['B8', 'B9', 'B10', 'B11'],
'C':['C8', 'C9', 'C10', 'C11'],
'D':['D8', 'D9', 'D10', 'D11']},
index = [8, 9, 10, 11])

[30]: df3

[30]: A B C D
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

[31]: pd.concat([df1, df2, df3])

[31]: A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1

15
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

12 TASK #12: PROJECT AND CONCLUDING REMARKS

• Define a dataframe named ‘Bank_df_1’ that contains the first and last names for 5 bank
clients with IDs = 1, 2, 3, 4, 5
• Assume that the bank got 5 new clients, define another dataframe named ‘Bank_df_2’ that
contains a new clients with IDs = 6, 7, 8, 9, 10
• Let’s assume we obtained additional information (Annual Salary) about all our bank cus-
tomers (10 customers)
• Concatenate both ‘bank_df_1’ and ‘bank_df_2’ dataframes
• Merge client names and their newly added salary information using the ‘Bank Client ID’
• Let’s assume that you became a new client to the bank
• Define a new DataFrame that contains your information such as client ID (choose 11), first
name, last name, and annual salary.
• Add this new dataframe to the original dataframe ‘bank_df_all’.
[41]: raw_data ={ 'Bank Client ID':['1', '2', '3', '4', '5'],
'First Name':['ElMalick', 'Ibra', 'Fallou', 'Idiatou', 'Rassoul'],
'Last Name':['Ndiaye', 'Dione', 'Diop', 'Bah', 'Fall']}

Bank_df_1 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣

↪'Last Name'])

Bank_df_1

[41]: Bank Client ID First Name Last Name

0 1 ElMalick Ndiaye
1 2 Ibra Dione
2 3 Fallou Diop
3 4 Idiatou Bah
4 5 Rassoul Fall

[42]: raw_data = { 'Bank Client ID': ['6', '7', '8', '9', '10'],
'First Name':['Babacar', 'Ibrahima', 'Assane', 'Youssoufa',␣
↪'alphonse'],

'Last Name':['Kane', 'Ndior', 'Diakhoumpa', 'Sy', 'Mbengue']}

16
Bank_df_2 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣
↪'Last Name'])

Bank_df_2

[42]: Bank Client ID First Name Last Name

0 6 Babacar Kane
1 7 Ibrahima Ndior
2 8 Assane Diakhoumpa
3 9 Youssoufa Sy
4 10 alphonse Mbengue

[51]: raw_data ={'Bank Client ID':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Annual Salary[$/year]':[25000, 35000, 45000, 48000, 49000, 32000,␣
↪33000, 34000, 23000, 22000]}

bank_df_salary = pd.DataFrame(raw_data, columns = [ 'Bank Client ID', 'Annual␣

↪Salary[$/year]']).astype(int)

bank_df_salary

[51]: Bank Client ID Annual Salary[$/year]

0 1 25000
1 2 35000
2 3 45000
3 4 48000
4 5 49000
5 6 32000
6 7 33000
7 8 34000
8 9 23000
9 10 22000

[55]: bank_df_all = pd.concat([Bank_df_1, Bank_df_2])

bank_df_all

[55]: Bank Client ID First Name Last Name

0 1 ElMalick Ndiaye
1 2 Ibra Dione
2 3 Fallou Diop
3 4 Idiatou Bah
4 5 Rassoul Fall
0 6 Babacar Kane
1 7 Ibrahima Ndior
2 8 Assane Diakhoumpa
3 9 Youssoufa Sy
4 10 alphonse Mbengue

17
[58]: bank_df_all['Bank Client ID'] = bank_df_all['Bank Client ID'].astype(int)
bank_df_salary['Bank Client ID'] = bank_df_salary['Bank Client ID'].astype(int)

[59]: bank_df_all = pd.merge(bank_df_all, bank_df_salary, on='Bank Client ID')

bank_df_all

[59]: Bank Client ID First Name Last Name Annual Salary[$/year]

0 1 ElMalick Ndiaye 25000
1 2 Ibra Dione 35000
2 3 Fallou Diop 45000
3 4 Idiatou Bah 48000
4 5 Rassoul Fall 49000
5 6 Babacar Kane 32000
6 7 Ibrahima Ndior 33000
7 8 Assane Diakhoumpa 34000
8 9 Youssoufa Sy 23000
9 10 alphonse Mbengue 22000

[84]: new_client = {'Bank Client ID':['11'],

'First Name':['Cheikh'],
'Last Name':['Thiame'],
'Annual Salary[$/year]':[5000]}
new_client_df = pd.DataFrame(new_client, columns = ['Bank Client ID', 'First␣
↪Name', 'Last Name', 'Annual Salary [$/year]'])

new_client_df

[84]: Bank Client ID First Name Last Name Annual Salary [$/year]
0 11 Cheikh Thiame NaN

[ ]:

13 EXCELLENT JOB!

14 MINI CHALLENGES SOLUTIONS

MINI CHALLENGE #1 SOLUTION: - Write a code that creates the following 2x4 numpy array
[[3 7 9 3]
[4 3 2 2]]
[ ]: x = np.array([[[3, 7, 9, 3] , [4, 3, 2, 2]]])
x

18
MINI CHALLENGE #2 SOLUTION: - Write a code that takes in a positive integer “x” from the
user and creates a 1x10 array with random numbers ranging from 0 to “x”
[ ]: x = int(input("Please enter a positive integer value: "))
x = np.random.randint(1, x, 10)
x

[ ]:

MINI CHALLENGE #3 SOLUTION: - Given the X and Y values below, obtain the distance
between them
X = [5, 7, 20]
Y = [9, 15, 4]
[ ]: X = np.array([5, 7, 20])
Y = np.array([9, 15, 4])
Z = np.sqrt(X**2 + Y**2)
Z

MINI CHALLENGE #4 SOLUTION: - In the following matrix, replace the last row with 0
X = [2 30 20 -2 -4]
[3 4 40 -3 -2]
[-3 4 -6 90 10]
[25 45 34 22 12]
[13 24 22 32 37]
[ ]: X = np.array([[2, 30, 20, -2, -4],
[3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])

[ ]: X[4] = 0
X

MINI CHALLENGE #5 SOLUTION: - In the following matrix, replace negative elements by 0 and
replace odd elements with -2
X = [2 30 20 -2 -4]
[3 4 40 -3 -2]
[-3 4 -6 90 10]
[25 45 34 22 12]
[13 24 22 32 37]
[ ]: X = np.array([[2, 30, 20, -2, -4],
[3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],

19
[13, 24, 22, 32, 37]])

X[X<0] = 0
X[X%2==1] = -2
X

MINI CHALLENGE #6 SOLUTION: - A porfolio contains a collection of securities such as stocks,

bonds and ETFs. Define a dataframe named ‘portfolio_df’ that holds 3 different stock ticker
symbols, number of shares, and price per share (feel free to choose any stocks) - Calculate the total
value of the porfolio including all stocks
[ ]: portfolio_df = pd.DataFrame({'stock ticker symbols':['AAPL', 'AMZN', 'T'],
'price per share [$]':[3500, 200, 40],
'Number of stocks':[3, 4, 9]})
portfolio_df

[ ]: stocks_dollar_value = portfolio_df['price per share [$]'] *␣

↪portfolio_df['Number of stocks']

print(stocks_dollar_value)
print('Total portfolio value = {}'.format(stocks_dollar_value.sum()))

MINI CHALLENGE #7 SOLUTION: - Write a code that uses Pandas to read tabular US retirement
data - You can use data from here: https://www.ssa.gov/oact/progdata/nra.html

[ ]: # Read tabular data using read_html

retirement_age_df = pd.read_html('https://www.ssa.gov/oact/progdata/nra.html')
retirement_age_df

MINI CHALLENGE #8 SOLUTION: - Using “bank_client_df” DataFrame, leverage pandas op-

erations to only select high networth individuals with minimum $5000 - What is the combined
networth for all customers with 5000+ networth?
[ ]: df_high_networth = bank_client_df[ (bank_client_df['Net worth [$]'] >= 5000) ]
df_high_networth

[ ]: df_high_networth['Net worth [$]'].sum()

MINI CHALLENGE #9 SOLUTION: - Define a function that triples the stock prices and adds
$200 - Apply the function to the DataFrame - Calculate the updated total networth of all clients
combined
[ ]: def networth_update(balance):
return balance * 3 + 200

[ ]: # You can apply a function to the DataFrame

results = bank_client_df['Net worth [$]'].apply(networth_update)
results

20
[ ]: results.sum()

PROJECT SOLUTION:
[ ]: # Creating a dataframe from a dictionary
# Let's define a dataframe with a list of bank clients with IDs = 1, 2, 3, 4, 5

raw_data = {'Bank Client ID': ['1', '2', '3', '4', '5'],

'First Name': ['Nancy', 'Alex', 'Shep', 'Max', 'Allen'],
'Last Name': ['Rob', 'Ali', 'George', 'Mitch', 'Steve']}

Bank_df_1 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣

↪'Last Name'])

Bank_df_1

# Let's define another dataframe for a separate list of clients (IDs = 6, 7, 8,␣
↪9, 10)

raw_data = {
'Bank Client ID': ['6', '7', '8', '9', '10'],
'First Name': ['Bill', 'Dina', 'Sarah', 'Heather', 'Holly'],
'Last Name': ['Christian', 'Mo', 'Steve', 'Bob', 'Michelle']}
Bank_df_2 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣
↪'Last Name'])

Bank_df_2

# Let's assume we obtained additional information (Annual Salary) about our␣

↪bank customers

# Note that data obtained is for all clients with IDs 1 to 10

raw_data = {
'Bank Client ID': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],
'Annual Salary [$/year]': [25000, 35000, 45000, 48000, 49000, 32000,␣
↪33000, 34000, 23000, 22000]}

bank_df_salary = pd.DataFrame(raw_data, columns = ['Bank Client ID','Annual␣

↪Salary [$/year]'])

bank_df_salary

# Let's concatenate both dataframes #1 and #2

# Note that we now have client IDs from 1 to 10
bank_df_all = pd.concat([Bank_df_1, Bank_df_2])
bank_df_all

# Let's merge all data on 'Bank Client ID'

bank_df_all = pd.merge(bank_df_all, bank_df_salary, on = 'Bank Client ID')

21
bank_df_all

[ ]: new_client = {
'Bank Client ID': ['11'],
'First Name': ['Ry'],
'Last Name': ['Aly'],
'Annual Salary [$/year]' : [1000]}
new_client_df = pd.DataFrame(new_client, columns = ['Bank Client ID', 'First␣
↪Name', 'Last Name', 'Annual Salary [$/year]'])

new_client_df

[70]: new_df = pd.concat([bank_df_all, new_client_df], axis = 0)

new_df

[70]: Bank Client ID First Name Last Name Annual Salary[$/year] \

0 1 ElMalick Ndiaye 25000.0
1 2 Ibra Dione 35000.0
2 3 Fallou Diop 45000.0
3 4 Idiatou Bah 48000.0
4 5 Rassoul Fall 49000.0
5 6 Babacar Kane 32000.0
6 7 Ibrahima Ndior 33000.0
7 8 Assane Diakhoumpa 34000.0
8 9 Youssoufa Sy 23000.0
9 10 alphonse Mbengue 22000.0
0 11 Cheikh Thiame NaN

Annual Salary [$/year]

0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
0 5000.0

[ ]:

Endogenic Processes 1
100% (2)
Endogenic Processes 1
59 pages
The Book of The Dun Cow by Walter Wangerin - Teacher Study Guide
No ratings yet
The Book of The Dun Cow by Walter Wangerin - Teacher Study Guide
33 pages
Step by Step On Changing ECC Source Systems Without Affecting Data Modeling Objects in SAP BW
No ratings yet
Step by Step On Changing ECC Source Systems Without Affecting Data Modeling Objects in SAP BW
16 pages
Section 7
No ratings yet
Section 7
33 pages
Numpy Library Basics
No ratings yet
Numpy Library Basics
16 pages
Labmanualfds
No ratings yet
Labmanualfds
49 pages
N Umpy Pandas Tutorial
No ratings yet
N Umpy Pandas Tutorial
65 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
Numpy Revision Exercise
No ratings yet
Numpy Revision Exercise
2 pages
Exercises 1
No ratings yet
Exercises 1
10 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Numpy
No ratings yet
Numpy
8 pages
Basic Python
No ratings yet
Basic Python
7 pages
Numpy 33
No ratings yet
Numpy 33
8 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Sheet 3 Numpy
No ratings yet
Sheet 3 Numpy
10 pages
NUMPY
No ratings yet
NUMPY
16 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
15 Numpy
No ratings yet
15 Numpy
32 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
Numpy Coding Question
No ratings yet
Numpy Coding Question
11 pages
Numpy
No ratings yet
Numpy
11 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
Week 1 exercises-SOLN
No ratings yet
Week 1 exercises-SOLN
5 pages
Machine
No ratings yet
Machine
33 pages
Lab1 ML Eac22050
No ratings yet
Lab1 ML Eac22050
17 pages
Python Numpy
No ratings yet
Python Numpy
48 pages
Python Numpy
100% (1)
Python Numpy
31 pages
NumPy Is
No ratings yet
NumPy Is
8 pages
Numpy Prcatise Problems
No ratings yet
Numpy Prcatise Problems
10 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Numpy
No ratings yet
Numpy
14 pages
NumPy Tutorial
No ratings yet
NumPy Tutorial
2 pages
Numerical Methods Using Python: (MCSC-202)
No ratings yet
Numerical Methods Using Python: (MCSC-202)
34 pages
Efficient Computing With NumPy
No ratings yet
Efficient Computing With NumPy
73 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
Data Science Practical
No ratings yet
Data Science Practical
28 pages
Numpy
No ratings yet
Numpy
20 pages
Workshop Notes-2 Handling Array With NumPy
No ratings yet
Workshop Notes-2 Handling Array With NumPy
13 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
03 Numpy
No ratings yet
03 Numpy
12 pages
Numpy Guide
No ratings yet
Numpy Guide
1 page
Module3 Advance Pythonlibraries
No ratings yet
Module3 Advance Pythonlibraries
53 pages
MP2 Exercise 01 - Numpy Arrays
No ratings yet
MP2 Exercise 01 - Numpy Arrays
6 pages
MP2 Exercise
No ratings yet
MP2 Exercise
3 pages
Numpy and Scipy: Numerical Computing in Python
No ratings yet
Numpy and Scipy: Numerical Computing in Python
47 pages
NumPy: From Basic To Advance
No ratings yet
NumPy: From Basic To Advance
119 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
NumPy Tutorial
No ratings yet
NumPy Tutorial
8 pages
Ex2. Beginner Level NumPy Exercises
No ratings yet
Ex2. Beginner Level NumPy Exercises
2 pages
Lab1 - ML_230276
No ratings yet
Lab1 - ML_230276
21 pages
13 - NumPy
No ratings yet
13 - NumPy
46 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
Python Lectures
No ratings yet
Python Lectures
29 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
Questions
No ratings yet
Questions
25 pages
Module 3.2.5
No ratings yet
Module 3.2.5
21 pages
Numpy Semi 1
No ratings yet
Numpy Semi 1
15 pages
Unit II -Final
No ratings yet
Unit II -Final
37 pages
DiGi KaGB T&C
No ratings yet
DiGi KaGB T&C
5 pages
Product List
No ratings yet
Product List
42 pages
STCMB 1
No ratings yet
STCMB 1
59 pages
Lists and Array Variables
No ratings yet
Lists and Array Variables
5 pages
Design and Manufacturing of Carbon Fiber Composite Drive Shaft As An Alternative To Conventional Steel Drive Shaft
No ratings yet
Design and Manufacturing of Carbon Fiber Composite Drive Shaft As An Alternative To Conventional Steel Drive Shaft
10 pages
NF/NFOM Panelboards Tableros de Alumbrado y Distribución NF y Nfom Panneaux de Distribution NF/NFOM
No ratings yet
NF/NFOM Panelboards Tableros de Alumbrado y Distribución NF y Nfom Panneaux de Distribution NF/NFOM
116 pages
TB3 - 117 Engine Maintenance Manual: (EMM Book1 TOC) (Chapter 72 TOC)
No ratings yet
TB3 - 117 Engine Maintenance Manual: (EMM Book1 TOC) (Chapter 72 TOC)
14 pages
Purbasari and Purbararang Script
No ratings yet
Purbasari and Purbararang Script
22 pages
From Vivaldi To Viotti - A History of The Early Classical - White, Chappell - 2. Print, Philadelphia, 1992 - Philadelphia - Gordon and Breach - 97828812449
No ratings yet
From Vivaldi To Viotti - A History of The Early Classical - White, Chappell - 2. Print, Philadelphia, 1992 - Philadelphia - Gordon and Breach - 97828812449
416 pages
Pega CSSA Cheat Sheet For OOTB Rules
No ratings yet
Pega CSSA Cheat Sheet For OOTB Rules
4 pages
Abs Paris
No ratings yet
Abs Paris
2 pages
Title List
No ratings yet
Title List
2 pages
Button
No ratings yet
Button
11 pages
Lecture O03: ENGR90024 Computational Fluid Dynamics
No ratings yet
Lecture O03: ENGR90024 Computational Fluid Dynamics
43 pages
Đề Khảo Sát Cuối Kỳ Ii
No ratings yet
Đề Khảo Sát Cuối Kỳ Ii
5 pages
Kohlberg's Stages of Moral Development: Presenter: Ma. Cristina B. Calago Maed-Edl Student EDUC. 202
No ratings yet
Kohlberg's Stages of Moral Development: Presenter: Ma. Cristina B. Calago Maed-Edl Student EDUC. 202
43 pages
Physics 107L-03 Wednesday 2:00 - 4:20 P.M. Small 128: L L L L
No ratings yet
Physics 107L-03 Wednesday 2:00 - 4:20 P.M. Small 128: L L L L
2 pages
The World During Rizal's Time PDF
No ratings yet
The World During Rizal's Time PDF
29 pages
Percentage Prelims - I: 1 Exclusively Prepared For IACE Students Toll Free: 1800-270-9975, PH: 9533200400
No ratings yet
Percentage Prelims - I: 1 Exclusively Prepared For IACE Students Toll Free: 1800-270-9975, PH: 9533200400
3 pages
AES DRRM Memo PASS
No ratings yet
AES DRRM Memo PASS
2 pages
Introduction To Data Science and Python For Data
No ratings yet
Introduction To Data Science and Python For Data
12 pages
Awrrpt 1 66643 66644
No ratings yet
Awrrpt 1 66643 66644
228 pages
Ship's Particulars
No ratings yet
Ship's Particulars
1 page
PeriUrja Company Profile
No ratings yet
PeriUrja Company Profile
10 pages
GD4400
No ratings yet
GD4400
52 pages
USPCAS-E Manual
No ratings yet
USPCAS-E Manual
119 pages