0% found this document useful (0 votes)
5 views22 pages

Python 101 - Python Libraries for Data Analysis - Numpy and Pandas

The document provides an introduction to Python libraries for data analysis, specifically focusing on NumPy and its capabilities for handling single and multi-dimensional arrays. It covers various tasks such as defining arrays, leveraging built-in methods, performing mathematical operations, and array slicing and indexing, along with mini challenges for practical application. Additionally, it briefly introduces Pandas as a data manipulation tool built on top of NumPy.

Uploaded by

ndiayemalickn638
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views22 pages

Python 101 - Python Libraries for Data Analysis - Numpy and Pandas

The document provides an introduction to Python libraries for data analysis, specifically focusing on NumPy and its capabilities for handling single and multi-dimensional arrays. It covers various tasks such as defining arrays, leveraging built-in methods, performing mathematical operations, and array slicing and indexing, along with mini challenges for practical application. Additionally, it briefly introduces Pandas as a data manipulation tool built on top of NumPy.

Uploaded by

ndiayemalickn638
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Python 101 - Python Libraries for Data Analysis - Numpy and

Pandas

May 29, 2025

1 TASK #1: DEFINE SINGLE AND MULTI-DIMENSIONAL


NUMPY ARRAYS
[20]: # NumPy is a Linear Algebra Library used for multidimensional arrays
# NumPy brings the best of two worlds: (1) C/Fortran computational efficiency,␣
↪(2) Python language easy syntax

import numpy as np

# Let's define a one-dimensional array


list_1 = [50, 60, 80, 100, 200, 300, 500, 600,]
list_1

[20]: [50, 60, 80, 100, 200, 300, 500, 600]

[21]: # Let's create a numpy array from the list "my_list"


my_numpy_array = np.array(list_1)
my_numpy_array

[21]: array([ 50, 60, 80, 100, 200, 300, 500, 600])

[5]: type(my_numpy_array)

[5]: numpy.ndarray

Multi-dimensional (Matrix definition)


[6]: my_matrix = np.array([[2, 5, 8], [7, 3, 6]])
my_matrix

[6]: array([[2, 5, 8],


[7, 3, 6]])

MINI CHALLENGE #1: - Write a code that creates the following 2x4 numpy array
[[3 7 9 3]
[4 3 2 2]]

1
[3]: x = np.array([[3, 7, 9, 3],
[4, 3, 2, 1]])
x

[3]: array([[3, 7, 9, 3],


[4, 3, 2, 1]])

[ ]:

2 TASK #2: LEVERAGE NUMPY BUILT-IN METHODS AND


FUNCTIONS
[8]: # "rand()" uniform distribution between 0 and 1: génére une valeur aléatoire
x = np.random.rand(20)
x

[8]: array([0.14056323, 0.53908128, 0.29549647, 0.03517011, 0.89102171,


0.05271959, 0.3741947 , 0.2051953 , 0.16712427, 0.65044685,
0.68705185, 0.26958268, 0.13184144, 0.36498677, 0.67224159,
0.42635753, 0.75119414, 0.82521819, 0.09219216, 0.85630017])

[9]: # you can create a matrix of random number as well


x = np.random.rand(3, 3)
x

[9]: array([[0.75189542, 0.8534315 , 0.58733699],


[0.10275586, 0.36892311, 0.54795311],
[0.55516788, 0.91208212, 0.45541749]])

[10]: # "randint" is used to generate random integers between upper and lower bounds

x = np.random.randint(1,50)
x

[10]: 22

[11]: # "randint" can be used to generate a certain number of random itegers as␣
↪follows

x = np.random.randint(1, 100, 15)


x

[11]: array([77, 80, 61, 59, 73, 97, 19, 22, 82, 78, 49, 97, 75, 69, 84])

[12]: # np.arange creates an evenly spaced values within a given interval


x = np.arange(1, 50)
x

2
[12]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

[13]: # create a diagonal of ones and zeros everywhere else


x = np.eye(7)
x

[13]: array([[1., 0., 0., 0., 0., 0., 0.],


[0., 1., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 1.]])

[14]: # Matrix of ones


x = np.ones((7, 7))
x

[14]: array([[1., 1., 1., 1., 1., 1., 1.],


[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1.]])

[15]: # Array of zeros


x = np.zeros(8)
x

[15]: array([0., 0., 0., 0., 0., 0., 0., 0.])

MINI CHALLENGE #2: - Write a code that takes in a positive integer “x” from the user and
creates a 1x10 array with random numbers ranging from 0 to “x”
[22]: #ask user to inter a positive interger
x = int(input("please enter a positive integer value"))
#verification
if x <= 0:
print("please enter a positif integer")
else:
#create table 1X10
array = np.random.randint(0, x, size=(1, 10))
print("generated table:")
print(array)

3
please enter a positive integer value6
generated table:
[[5 3 2 4 4 1 4 4 3 2]]

[ ]:

3 TASK #3: PERFORM MATHEMATICAL OPERATIONS IN


NUMPY
[17]: # np.arange() returns an evenly spaced values within a given interval
x = np.arange(1, 10)
x

[17]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

[18]: y = np.arange(1, 10)


y

[18]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

[19]: # Add 2 numpy arrays together


sum = x + y
sum

[19]: array([ 2, 4, 6, 8, 10, 12, 14, 16, 18])

[20]: squared = x**2


squared

[20]: array([ 1, 4, 9, 16, 25, 36, 49, 64, 81])

[21]: sqrt = np.sqrt(squared)


sqrt

[21]: array([1., 2., 3., 4., 5., 6., 7., 8., 9.])

[22]: z = np.exp(y)
z

[22]: array([2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,


1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
8.10308393e+03])

MINI CHALLENGE #3: - Given the X and Y values below, obtain the distance between them
X = [5, 7, 20]
Y = [9, 15, 4]

4
[23]: x = np.array([5, 7, 20])
y = np.array([9, 15, 4])

d = np.sqrt(x**2 + y**2)
d

[23]: array([10.29563014, 16.55294536, 20.39607805])

4 TASK #4: PERFORM ARRAYS SLICING AND INDEXING


[5]: my_numpy_array = np.array([3, 5, 6, 2, 8, 10, 20, 50])
my_numpy_array

[5]: array([ 3, 5, 6, 2, 8, 10, 20, 50])

[6]: # Access specific index from the numpy array


my_numpy_array[-1]

[6]: 50

[7]: # Starting from the first index 0 up until and NOT including the last element
my_numpy_array[0:3]

[7]: array([3, 5, 6])

[8]: # Broadcasting, altering several values in a numpy array at once


my_numpy_array[0:4] = 7
my_numpy_array

[8]: array([ 7, 7, 7, 7, 8, 10, 20, 50])

[9]: # Let's define a two dimensional numpy array


matrix = np.random.randint(1, 10,(4,4))
matrix

[9]: array([[8, 6, 8, 4],


[5, 4, 9, 9],
[7, 6, 4, 1],
[1, 4, 2, 1]])

[10]: # Get a row from a mtrix


matrix[-1]

[10]: array([1, 4, 2, 1])

5
[11]: # Get one element
matrix[0][0]

[11]: 8

MINI CHALLENGE #4: - In the following matrix, replace the last row with 0
X = [2 30 20 -2 -4] [3 4 40 -3 -2] [-3 4 -6 90 10] [25 45 34 22 12] [13 24 22 32 37]

[13]: X = np.array([[2, 30, 20,-2 ,-4],


[3, 4, 40 ,-3 ,-2],
[-3, 4,-6, 90, 10],
[25, 45,34, 22, 12],
[13, 24,22, 32, 37]])
X

[13]: array([[ 2, 30, 20, -2, -4],


[ 3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])

[30]: X[4] = 0
X

[30]: array([[ 2, 30, 20, 0, 0],


[ 0, 4, 40, 0, 0],
[ 0, 4, 0, 90, 10],
[ 0, 0, 34, 22, 12],
[ 0, 0, 0, 0, 0]])

[ ]:

[ ]:

5 TASK #5: PERFORM ELEMENTS SELECTION (CONDI-


TIONAL)
[33]: matrix = np.random.randint(1, 10, (5, 5))
matrix

[33]: array([[8, 1, 5, 1, 6],


[2, 9, 8, 5, 9],
[4, 8, 4, 9, 2],
[2, 8, 8, 3, 6],
[4, 6, 9, 5, 8]])

6
[46]: new_matrix = matrix[ matrix > 7 ]
new_matrix

[46]: array([8, 9, 8, 9, 8, 9, 8, 8, 9, 8])

[47]: # Obtain odd elements only


new_matrix = matrix[ matrix % 2 == 1]
new_matrix

[47]: array([1, 5, 1, 9, 5, 9, 9, 3, 9, 5])

MINI CHALLENGE #5: - In the following matrix, replace negative elements by 0 and replace odd
elements with -2
X = [2 30 20 -2 -4]
[3 4 40 -3 -2]
[-3 4 -6 90 10]
[25 45 34 22 12]
[13 24 22 32 37]
[4]: X = np.array([[2, 30, 20, -2, -4],
[3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])
X

[4]: array([[ 2, 30, 20, -2, -4],


[ 3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])

[23]: X[ X < 0 ]= 0
X[ X % 2 == 1] = -2
X

[23]: array([[ 2, 30, 20, 0, 0],


[ 0, 4, 40, 0, 0],
[ 0, 4, 0, 90, 10],
[ 0, 0, 34, 22, 12],
[ 0, 24, 22, 32, 0]])

7
6 TASK #6: UNDERSTAND PANDAS FUNDAMENTALS
[35]: # Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft␣
↪excel in Python).

# DataFrames empower programmers to store and manipulate data in a tabular␣


↪fashion (rows and columns).

# Series Vs. DataFrame? Series is considered a single column of a DataFrame.

[1]: import pandas as pd

[25]: # Let's define a two-dimensional Pandas DataFrame


# Note that you can create a pandas dataframe from a python dictionary
bank_client_df = pd.DataFrame({'Bank Client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net Worth [$]':[3500, 29000, 10000, 2000],


'Years with bank':[3, 4, 9, 5]})
bank_client_df

[25]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[26]: # Let's obtain the data type


type(bank_client_df)

[26]: pandas.core.frame.DataFrame

[28]: # you can only view the first couple of rows using .head()
bank_client_df.head(2)

[28]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4

[29]: # you can only view the last couple of rows using .tail()
bank_client_df.tail(2)

[29]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
2 333 Mitch 10000 9
3 444 Ryan 2000 5

MINI CHALLENGE #6: - A porfolio contains a collection of securities such as stocks, bonds and
ETFs. Define a dataframe named ‘portfolio_df’ that holds 3 different stock ticker symbols, number

8
of shares, and price per share (feel free to choose any stocks) - Calculate the total value of the
porfolio including all stocks
[44]: portfolio_df = pd.DataFrame({'stock ticker symbol':['AAPL', 'AMZN', 'T'],
'price per share [$]':[3500, 200, 40],
'Number of stocks': [3, 4, 9]})
portfolio_df

[44]: stock ticker symbol price per share [$] Number of stocks
0 AAPL 3500 3
1 AMZN 200 4
2 T 40 9

[46]: stocks_dollar_value = portfolio_df['price per share [$]'] *␣


↪portfolio_df['Number of stocks']

stocks_dollar_value.sum()

[46]: 11660

[ ]:

7 TASK #7: PANDAS WITH CSV AND HTML DATA


[47]: # Pandas is used to read a csv file and store data in a DataFrame
house_price_df = pd.read_html('https://www.livingin-canada.com/
↪house-prices-canada.html')

house_price_df[0]

[47]: City \
0 Vancouver, BC
1 Toronto, Ont
2 Ottawa, Ont
3 Calgary, Alb
4 Montreal, Que
5 Halifax, NS
6 Regina, Sask
7 Fredericton, NB
8 (adsbygoogle = window.adsbygoogle || []).push(…

Average House Price \


0 $1,036,000
1 $870,000
2 $479,000
3 $410,000
4 $435,000
5 $331,000

9
6 $254,000
7 $198,000
8 (adsbygoogle = window.adsbygoogle || []).push(…

12 Month Change
0 + 2.63 %
1 +10.2 %
2 + 15.4 %
3 – 1.5 %
4 + 9.3 %
5 + 3.6 %
6 – 3.9 %
7 – 4.3 %
8 (adsbygoogle = window.adsbygoogle || []).push(…

[48]: house_price_df[1]

[48]: Province \
0 British Columbia
1 Ontario
2 Alberta
3 Quebec
4 Manitoba
5 Saskatchewan
6 Nova Scotia
7 Prince Edward Island
8 Newfoundland / Labrador
9 New Brunswick
10 Canadian Average
11 (adsbygoogle = window.adsbygoogle || []).push(…

Average House Price \


0 $736,000
1 $594,000
2 $353,000
3 $340,000
4 $295,000
5 $271,000
6 $266,000
7 $243,000
8 $236,000
9 $183,000
10 $488,000
11 (adsbygoogle = window.adsbygoogle || []).push(…

12 Month Change
0 + 7.6 %

10
1 – 3.2 %
2 – 7.5 %
3 + 7.6 %
4 – 1.4 %
5 – 3.8 %
6 + 3.5 %
7 + 3.0 %
8 – 1.6 %
9 – 2.2 %
10 – 1.3 %
11 (adsbygoogle = window.adsbygoogle || []).push(…

[41]: # Read tabular data using read_html

[ ]:

[ ]:

MINI CHALLENGE #7: - Write a code that uses Pandas to read tabular US retirement data -
You can use data from here: https://www.ssa.gov/oact/progdata/nra.html

[ ]: retirement_df = pd.read_html('https://www.ssa.gov/oact/progdata/nra.html')
retirement_df[0]

8 TASK #8: PANDAS OPERATIONS


[58]: # Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank Client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net Worth [$]':[3500, 29000, 10000, 2000],


'Years with bank':[3, 4, 9, 5]})
bank_client_df

[58]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[59]: # Pick certain rows that satisfy a certain criteria


df_loyal = bank_client_df[ bank_client_df['Years with bank'] >=5]
df_loyal

[59]: Bank Client ID Bank Client Name Net Worth [$] Years with bank
2 333 Mitch 10000 9

11
3 444 Ryan 2000 5

[60]: # Delete a column from a DataFrame


del bank_client_df['Bank Client ID']

bank_client_df

[60]: Bank Client Name Net Worth [$] Years with bank
0 Chanel 3500 3
1 Steve 29000 4
2 Mitch 10000 9
3 Ryan 2000 5

MINI CHALLENGE #8: - Using “bank_client_df” DataFrame, leverage pandas operations to


only select high networth individuals with minimum $5000 - What is the combined networth for
all customers with 5000+ networth?
[62]: df_high_networth = bank_client_df[ bank_client_df['Net Worth [$]'] >=5000]
df_high_networth

[62]: Bank Client Name Net Worth [$] Years with bank
1 Steve 29000 4
2 Mitch 10000 9

9 TASK #9: PANDAS WITH FUNCTIONS


[4]: # Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net worth [$]':[3500, 29000, 10000, 2000],


'Years with bank':[3, 4, 9, 5]})
bank_client_df

[4]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[2]: # Define a function that increases all clients networth (stocks) by a fixed␣
↪value of 20% (for simplicity sake)

def networth_update(balance):
return balance * 1.2

12
[5]: # You can apply a function to the DataFrame
bank_client_df['Net worth [$]'].apply(networth_update)

[5]: 0 4200.0
1 34800.0
2 12000.0
3 2400.0
Name: Net worth [$], dtype: float64

[ ]:

MINI CHALLENGE #9: - Define a function that triples the stock prices and adds $200 - Apply
the function to the DataFrame - Calculate the updated total networth of all clients combined
[8]: def networth_update(balance):
return balance *3 + 200

[11]: results= bank_client_df['Net worth [$]'].apply(networth_update)


results

[11]: 0 10700
1 87200
2 30200
3 6200
Name: Net worth [$], dtype: int64

[ ]:

10 TASK #10: PERFORM SORTING AND ORDERING IN


PANDAS
[12]: # Let's define a dataframe as follows:
bank_client_df = pd.DataFrame({'Bank client ID':[111, 222, 333, 444],
'Bank Client Name':['Chanel', 'Steve', 'Mitch',␣
↪'Ryan'],

'Net worth [$]':[3500, 29000, 10000, 2000],


'Years with bank':[3, 4, 9, 5]})
bank_client_df

[12]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

13
[14]: # You can sort the values in the dataframe according to number of years with␣
↪bank

bank_client_df.sort_values(by = 'Years with bank')

[14]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
3 444 Ryan 2000 5
2 333 Mitch 10000 9

[15]: # Note that nothing changed in memory! you have to make sure that inplace is␣
↪set to True

bank_client_df

[15]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

[ ]: # Set inplace = True to ensure that change has taken place in memory
bank_client_df.sort_values(by = 'Years with bank', inplace = True)

[16]: # Note that now the change (ordering) took place


bank_client_df

[16]: Bank client ID Bank Client Name Net worth [$] Years with bank
0 111 Chanel 3500 3
1 222 Steve 29000 4
2 333 Mitch 10000 9
3 444 Ryan 2000 5

11 TASK #11: PERFORM CONCATENATING AND MERG-


ING WITH PANDAS
[ ]: # Check this out: https://pandas.pydata.org/pandas-docs/stable/user_guide/
↪merging.html

[24]: df1 = pd.DataFrame({'A':['A0', 'A1', 'A2', 'A3'],


'B':['B0', 'B1', 'B2', 'B3'],
'C':['C0', 'C1', 'C2', 'C3'],
'D':['D0', 'D1', 'D2', 'D3']},
index = [0, 1, 2, 3])
df1

14
[24]: A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3

[25]: df1

[25]: A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3

[27]: df2 = pd.DataFrame({'A':['A4', 'A5', 'A6', 'A7'],


'B':['B4', 'B5', 'B6', 'B7'],
'C':['C4', 'C5', 'C6', 'C7'],
'D':['D4', 'D5', 'D6', 'D7']},
index = [4, 5, 6, 7])

[28]: df2

[28]: A B C D
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7

[29]: df3 = pd.DataFrame({'A':['A8', 'A9', 'A10', 'A11'],


'B':['B8', 'B9', 'B10', 'B11'],
'C':['C8', 'C9', 'C10', 'C11'],
'D':['D8', 'D9', 'D10', 'D11']},
index = [8, 9, 10, 11])

[30]: df3

[30]: A B C D
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

[31]: pd.concat([df1, df2, df3])

[31]: A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1

15
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

12 TASK #12: PROJECT AND CONCLUDING REMARKS


• Define a dataframe named ‘Bank_df_1’ that contains the first and last names for 5 bank
clients with IDs = 1, 2, 3, 4, 5
• Assume that the bank got 5 new clients, define another dataframe named ‘Bank_df_2’ that
contains a new clients with IDs = 6, 7, 8, 9, 10
• Let’s assume we obtained additional information (Annual Salary) about all our bank cus-
tomers (10 customers)
• Concatenate both ‘bank_df_1’ and ‘bank_df_2’ dataframes
• Merge client names and their newly added salary information using the ‘Bank Client ID’
• Let’s assume that you became a new client to the bank
• Define a new DataFrame that contains your information such as client ID (choose 11), first
name, last name, and annual salary.
• Add this new dataframe to the original dataframe ‘bank_df_all’.
[41]: raw_data ={ 'Bank Client ID':['1', '2', '3', '4', '5'],
'First Name':['ElMalick', 'Ibra', 'Fallou', 'Idiatou', 'Rassoul'],
'Last Name':['Ndiaye', 'Dione', 'Diop', 'Bah', 'Fall']}

Bank_df_1 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣


↪'Last Name'])

Bank_df_1

[41]: Bank Client ID First Name Last Name


0 1 ElMalick Ndiaye
1 2 Ibra Dione
2 3 Fallou Diop
3 4 Idiatou Bah
4 5 Rassoul Fall

[42]: raw_data = { 'Bank Client ID': ['6', '7', '8', '9', '10'],
'First Name':['Babacar', 'Ibrahima', 'Assane', 'Youssoufa',␣
↪'alphonse'],

'Last Name':['Kane', 'Ndior', 'Diakhoumpa', 'Sy', 'Mbengue']}

16
Bank_df_2 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣
↪'Last Name'])

Bank_df_2

[42]: Bank Client ID First Name Last Name


0 6 Babacar Kane
1 7 Ibrahima Ndior
2 8 Assane Diakhoumpa
3 9 Youssoufa Sy
4 10 alphonse Mbengue

[51]: raw_data ={'Bank Client ID':[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],


'Annual Salary[$/year]':[25000, 35000, 45000, 48000, 49000, 32000,␣
↪33000, 34000, 23000, 22000]}

bank_df_salary = pd.DataFrame(raw_data, columns = [ 'Bank Client ID', 'Annual␣


↪Salary[$/year]']).astype(int)

bank_df_salary

[51]: Bank Client ID Annual Salary[$/year]


0 1 25000
1 2 35000
2 3 45000
3 4 48000
4 5 49000
5 6 32000
6 7 33000
7 8 34000
8 9 23000
9 10 22000

[55]: bank_df_all = pd.concat([Bank_df_1, Bank_df_2])


bank_df_all

[55]: Bank Client ID First Name Last Name


0 1 ElMalick Ndiaye
1 2 Ibra Dione
2 3 Fallou Diop
3 4 Idiatou Bah
4 5 Rassoul Fall
0 6 Babacar Kane
1 7 Ibrahima Ndior
2 8 Assane Diakhoumpa
3 9 Youssoufa Sy
4 10 alphonse Mbengue

17
[58]: bank_df_all['Bank Client ID'] = bank_df_all['Bank Client ID'].astype(int)
bank_df_salary['Bank Client ID'] = bank_df_salary['Bank Client ID'].astype(int)

[59]: bank_df_all = pd.merge(bank_df_all, bank_df_salary, on='Bank Client ID')


bank_df_all

[59]: Bank Client ID First Name Last Name Annual Salary[$/year]


0 1 ElMalick Ndiaye 25000
1 2 Ibra Dione 35000
2 3 Fallou Diop 45000
3 4 Idiatou Bah 48000
4 5 Rassoul Fall 49000
5 6 Babacar Kane 32000
6 7 Ibrahima Ndior 33000
7 8 Assane Diakhoumpa 34000
8 9 Youssoufa Sy 23000
9 10 alphonse Mbengue 22000

[84]: new_client = {'Bank Client ID':['11'],


'First Name':['Cheikh'],
'Last Name':['Thiame'],
'Annual Salary[$/year]':[5000]}
new_client_df = pd.DataFrame(new_client, columns = ['Bank Client ID', 'First␣
↪Name', 'Last Name', 'Annual Salary [$/year]'])

new_client_df

[84]: Bank Client ID First Name Last Name Annual Salary [$/year]
0 11 Cheikh Thiame NaN

[ ]:

[ ]:

[ ]:

13 EXCELLENT JOB!

14 MINI CHALLENGES SOLUTIONS


MINI CHALLENGE #1 SOLUTION: - Write a code that creates the following 2x4 numpy array
[[3 7 9 3]
[4 3 2 2]]
[ ]: x = np.array([[[3, 7, 9, 3] , [4, 3, 2, 2]]])
x

18
MINI CHALLENGE #2 SOLUTION: - Write a code that takes in a positive integer “x” from the
user and creates a 1x10 array with random numbers ranging from 0 to “x”
[ ]: x = int(input("Please enter a positive integer value: "))
x = np.random.randint(1, x, 10)
x

[ ]:

MINI CHALLENGE #3 SOLUTION: - Given the X and Y values below, obtain the distance
between them
X = [5, 7, 20]
Y = [9, 15, 4]
[ ]: X = np.array([5, 7, 20])
Y = np.array([9, 15, 4])
Z = np.sqrt(X**2 + Y**2)
Z

MINI CHALLENGE #4 SOLUTION: - In the following matrix, replace the last row with 0
X = [2 30 20 -2 -4]
[3 4 40 -3 -2]
[-3 4 -6 90 10]
[25 45 34 22 12]
[13 24 22 32 37]
[ ]: X = np.array([[2, 30, 20, -2, -4],
[3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],
[13, 24, 22, 32, 37]])

[ ]: X[4] = 0
X

MINI CHALLENGE #5 SOLUTION: - In the following matrix, replace negative elements by 0 and
replace odd elements with -2
X = [2 30 20 -2 -4]
[3 4 40 -3 -2]
[-3 4 -6 90 10]
[25 45 34 22 12]
[13 24 22 32 37]
[ ]: X = np.array([[2, 30, 20, -2, -4],
[3, 4, 40, -3, -2],
[-3, 4, -6, 90, 10],
[25, 45, 34, 22, 12],

19
[13, 24, 22, 32, 37]])

X[X<0] = 0
X[X%2==1] = -2
X

MINI CHALLENGE #6 SOLUTION: - A porfolio contains a collection of securities such as stocks,


bonds and ETFs. Define a dataframe named ‘portfolio_df’ that holds 3 different stock ticker
symbols, number of shares, and price per share (feel free to choose any stocks) - Calculate the total
value of the porfolio including all stocks
[ ]: portfolio_df = pd.DataFrame({'stock ticker symbols':['AAPL', 'AMZN', 'T'],
'price per share [$]':[3500, 200, 40],
'Number of stocks':[3, 4, 9]})
portfolio_df

[ ]: stocks_dollar_value = portfolio_df['price per share [$]'] *␣


↪portfolio_df['Number of stocks']

print(stocks_dollar_value)
print('Total portfolio value = {}'.format(stocks_dollar_value.sum()))

MINI CHALLENGE #7 SOLUTION: - Write a code that uses Pandas to read tabular US retirement
data - You can use data from here: https://www.ssa.gov/oact/progdata/nra.html

[ ]: # Read tabular data using read_html


retirement_age_df = pd.read_html('https://www.ssa.gov/oact/progdata/nra.html')
retirement_age_df

MINI CHALLENGE #8 SOLUTION: - Using “bank_client_df” DataFrame, leverage pandas op-


erations to only select high networth individuals with minimum $5000 - What is the combined
networth for all customers with 5000+ networth?
[ ]: df_high_networth = bank_client_df[ (bank_client_df['Net worth [$]'] >= 5000) ]
df_high_networth

[ ]: df_high_networth['Net worth [$]'].sum()

MINI CHALLENGE #9 SOLUTION: - Define a function that triples the stock prices and adds
$200 - Apply the function to the DataFrame - Calculate the updated total networth of all clients
combined
[ ]: def networth_update(balance):
return balance * 3 + 200

[ ]: # You can apply a function to the DataFrame


results = bank_client_df['Net worth [$]'].apply(networth_update)
results

20
[ ]: results.sum()

PROJECT SOLUTION:
[ ]: # Creating a dataframe from a dictionary
# Let's define a dataframe with a list of bank clients with IDs = 1, 2, 3, 4, 5

raw_data = {'Bank Client ID': ['1', '2', '3', '4', '5'],


'First Name': ['Nancy', 'Alex', 'Shep', 'Max', 'Allen'],
'Last Name': ['Rob', 'Ali', 'George', 'Mitch', 'Steve']}

Bank_df_1 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣


↪'Last Name'])

Bank_df_1

# Let's define another dataframe for a separate list of clients (IDs = 6, 7, 8,␣
↪9, 10)

raw_data = {
'Bank Client ID': ['6', '7', '8', '9', '10'],
'First Name': ['Bill', 'Dina', 'Sarah', 'Heather', 'Holly'],
'Last Name': ['Christian', 'Mo', 'Steve', 'Bob', 'Michelle']}
Bank_df_2 = pd.DataFrame(raw_data, columns = ['Bank Client ID', 'First Name',␣
↪'Last Name'])

Bank_df_2

# Let's assume we obtained additional information (Annual Salary) about our␣


↪bank customers

# Note that data obtained is for all clients with IDs 1 to 10


raw_data = {
'Bank Client ID': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],
'Annual Salary [$/year]': [25000, 35000, 45000, 48000, 49000, 32000,␣
↪33000, 34000, 23000, 22000]}

bank_df_salary = pd.DataFrame(raw_data, columns = ['Bank Client ID','Annual␣


↪Salary [$/year]'])

bank_df_salary

# Let's concatenate both dataframes #1 and #2


# Note that we now have client IDs from 1 to 10
bank_df_all = pd.concat([Bank_df_1, Bank_df_2])
bank_df_all

# Let's merge all data on 'Bank Client ID'


bank_df_all = pd.merge(bank_df_all, bank_df_salary, on = 'Bank Client ID')

21
bank_df_all

[ ]: new_client = {
'Bank Client ID': ['11'],
'First Name': ['Ry'],
'Last Name': ['Aly'],
'Annual Salary [$/year]' : [1000]}
new_client_df = pd.DataFrame(new_client, columns = ['Bank Client ID', 'First␣
↪Name', 'Last Name', 'Annual Salary [$/year]'])

new_client_df

[70]: new_df = pd.concat([bank_df_all, new_client_df], axis = 0)


new_df

[70]: Bank Client ID First Name Last Name Annual Salary[$/year] \


0 1 ElMalick Ndiaye 25000.0
1 2 Ibra Dione 35000.0
2 3 Fallou Diop 45000.0
3 4 Idiatou Bah 48000.0
4 5 Rassoul Fall 49000.0
5 6 Babacar Kane 32000.0
6 7 Ibrahima Ndior 33000.0
7 8 Assane Diakhoumpa 34000.0
8 9 Youssoufa Sy 23000.0
9 10 alphonse Mbengue 22000.0
0 11 Cheikh Thiame NaN

Annual Salary [$/year]


0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
0 5000.0

[ ]:

22

You might also like