Data Toolkit Assignment
Data Toolkit Assignment
In [1]: ''' 1. Demonstrate three different methods for creating identical 2D arrays in N
method and the final output after each method '''
# 1st Method:
import numpy as np
arr= np.array([[1,2,3] , [4,5,6]])
print(f" 'Method 1st output':\n{arr} ")
# 2nd Method:
arr2 = np.ones((2,3)) * np.array([1,2,3])
arr2[1] = arr2[1] * 2 + np.array([2,1,0])
print(f" 'Method 2nd output':\n{arr2} ")
# 3rd Method:
arr3=np.empty((2,3), dtype=int)
arr3[0]= [1,2,3]
arr3[1]= [4,5,6]
print(f" 'Method 3rd output':\n{arr3} ")
In [35]: ''' 2. Using the Numpy function, generate an array of 100 evenly spaced numbers
Reshape that 1D array into a 2D array '''
arr1D=np.linspace(1,10,100)
arr2D=arr1D.reshape((10,10))
arr2D
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 1/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
The np.array is used to convert a list, tuple etc. into a Numpy array and it cre
The np.asarray() is also used to convert a list, tuple, etc. into a Numpy array,
shows the changes in original array.
In [10]: ''' 4.Generate a 3x3 array with random floating-point numbers between 5 and 20 t
the array to 2 decimal places '''
arr=np.random.uniform(5,20,(3,3))
rounded_arr=np.round(arr,2)
rounded_arr
In [11]: ''' 5. Create a NumPy array with random integers between 1 and 10 of shape (5,6)
perform the following operations:
arr=np.random.uniform(1,10,(5,6)).astype(int)
even=[]
odd=[]
for row in arr:
for x in row:
if x%2 ==0:
even.append(x)
else:
odd.append(x)
print(arr)
print(f" Even integers from Array: {even}")
print(f" Odd integers from Array: {odd}")
[[7 4 9 4 1 7]
[8 9 5 2 8 7]
[3 2 2 8 9 3]
[6 2 6 5 1 2]
[3 5 2 2 8 4]]
Even integers from Array: [4, 4, 8, 2, 8, 2, 2, 8, 6, 2, 6, 2, 2, 2, 8, 4]
Odd integers from Array: [7, 9, 1, 7, 9, 5, 7, 3, 9, 3, 5, 1, 3, 5]
In [6]: ''' 6. Create a 3-D NumPy array of shape (3, 3, 3) containing random integers Be
following operations:
a) Find the indices of the maximum values along each depth level (third axis).
b) Perform element-wise multiplication of between both array '''
arr1=np.random.uniform(1,10,(3,3,3)).astype(int)
max_value_indices = np.argmax(arr, axis=2)
arr2=np.random.uniform(1,10,(3,3,3)).astype(int)
arr_multiplication=np.multiply(arr1,arr2)
print(arr1)
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 2/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
[[[3 1 3]
[2 9 6]
[5 1 5]]
[[7 5 8]
[1 6 1]
[8 6 1]]
[[9 2 7]
[9 9 3]
[4 8 5]]]
Indices of the maximum values along each depth level:
[[1 0 0]
[0 0 0]
[0 2 1]]
element-wise multiplication:
[[[18 8 24]
[ 6 36 36]
[ 5 7 10]]
[[49 45 40]
[ 2 42 3]
[32 30 5]]
[[ 9 12 28]
[45 54 9]
[24 40 35]]]
In [2]: ''' 7. Clean and transform the 'Phone' column in the sample dataset to remove no
convert it to a numeric data type. Also display the table attributes and data ty
import pandas as pd
df=pd.read_csv("People Data.csv")
df.head()
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 3/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
Out[2]:
First Last
Index User Id Gender Email P
Name Name
(599
2 3 810Ce0F276Badec Sheryl Lowery Female [email protected]
(390
4 5 9afFEafAe1CBBB9 Lindsey Rice Female [email protected]
1635
In [7]: df["Phone"].values.tolist()[:20]
Out[7]: [8571398239.0,
nan,
5997820605.0,
nan,
39041716353010.0,
8537800927.0,
9365574807895.0,
4709522945.0,
138204758.0,
56090350684985.0,
8629884096.0,
10418593844272.0,
801809918137308.0,
15111276660230.0,
9035458947.0,
4169790633058.0,
24405485211913.0,
92936685493587.0,
9732439193.0,
60611937790160.0]
In [6]: df.info()
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 4/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Index 1000 non-null int64
1 User Id 1000 non-null object
2 First Name 1000 non-null object
3 Last Name 1000 non-null object
4 Gender 1000 non-null object
5 Email 1000 non-null object
6 Phone 979 non-null float64
7 Date of birth 1000 non-null object
8 Job Title 1000 non-null object
9 Salary 1000 non-null int64
dtypes: float64(1), int64(2), object(7)
memory usage: 78.2+ KB
a) Read the 'data.csv' file using pandas, skipping the first 50 rows.
b) Only read the columns: 'Last Name', ‘Gender’,’Email’,‘Phone’ and ‘Salary’ fr
c) Display the first 10 rows of the filtered dataset.
d) Extract the ‘Salary’' column as a Series and display its last 5 values '''
In [13]: df1.head()
Out[13]:
First Last
Index User Id Gender Email
Name Name
001-85
50 51 CccE5DAb6E288e5 Jo Zavala Male [email protected]
9935
001-27
51 52 DfBDc3621D4bcec Joshua Carey Female [email protected]
84
001-59
54 55 FDaFD0c3f5387EC Christina Conrad Male [email protected]
74
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 5/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
In [22]: df2.head(10)
In [31]: df2["Salary"].tail(5)
In [12]: ''' 9. Filter and select rows from the People_Dataset, where the “Last Name' col
'Gender' column contains the word Female and ‘salary’ should be less than 85000
In [23]: ''' 10. Create a 7*5 Dataframe in Pandas using a series generated from 35 random
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 6/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
arr=np.linspace(1,6,35)
df=pd.DataFrame(arr.reshape(7,5),columns=['A','B','C','D','E'])
df
Out[23]: A B C D E
In [50]: ''' 11. Create two different Series, each of length 50, with the following crite
a) The first Series should contain random numbers ranging from 10 to 50.
b) The second Series should contain random numbers ranging from 100 to 1000.
c) Create a DataFrame by 'joining these Series by column, and, change the names
etc'''
arr1=np.random.randint(10,51,size=50)
arr2=np.random.randint(100,1001,size=50)
df1=pd.DataFrame(arr1,columns=['col1'])
df2=pd.DataFrame(arr2,columns=['col2'])
final_df=pd.concat([df1,df2],axis=True)
final_df.head()
0 39 858
1 45 526
2 38 614
3 34 920
4 21 614
In [61]: ''' 12. Perform the following operations using people data set:
a) Delete the 'Email', 'Phone', and 'Date of birth' columns from the dataset.
b) Delete the rows containing any missing values.
c) Print the final output also '''
df=pd.read_csv("People Data.csv")
df=df.drop(['Email', 'Phone', 'Date of birth'], axis=1, inplace=False)
final_df=df.dropna()
final_df
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 7/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
Counselling
3 4 BF2a889C00f0cE1 Whitney Hooper Male 65000
psychologist
Education
996 997 ECddaFEDdEc4FAB Donna Barry Female 50000
administrator
Commercial/residential
997 998 2adde51d8B8979E Cathy Mckinney Female 60000
surveyor
Nurse, learning
999 1000 8b756f6231DDC6e Lee Tran Female 90000
disability
In [52]: ''' 13. Create two NumPy arrays, x and y, each containing 100 random float value
following tasks using Matplotlib and NumPy:
a) Create a scatter plot using x and y, setting the color of the points to red a
b) Add a horizontal line at y = 0.5 using a dashed line style and label it as 'y
c) Add a vertical line at x = 0.5 using a dotted line style and label it as 'x =
d) Label the x-axis as 'X-axis' and the y-axis as 'Y-axis'.
e) Set the title of the plot as 'Advanced Scatter Plot of Random Values'.
f) Display a legend for the scatter plot, the horizontal line, and the vertical
plt.scatter(x=x,y=y,color='r',marker='o',linewidth=.02)
plt.axhline(y=0.5,linestyle="--",label="y=0.5")
plt.axvline(x=0.5,linestyle="--",label="x=0.5")
plt.legend()
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 8/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
In [29]: ''' 14. Create a time-series dataset in a Pandas DataFrame with columns: 'Date',
Perform the following tasks using Matplotlib:
a) Plot the 'Temperature' and 'Humidity' on the same plot with different y-axes
right y-axis for 'Humidity').
b) Label the x-axis as 'Date'.
c) Set the title of the plot as 'Temperature and Humidity Over Time'''
date=pd.date_range(start='2024-05-26',end='2024-06-27', freq='1D')
temp=np.random.uniform(low=10,high=30,size=len(date))
humid=np.random.uniform(low=30,high=50,size=len(date))
Data={"Dates":date,"Temperature":temp,"humidity":humid}
df=pd.DataFrame(Data)
df.head(5)
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 9/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
p2.set_ylabel('Humidity')
plt.show()
In [59]: ''' 15. Create a NumPy array data containing 1000 samples from a normal distribu
tasks using Matplotlib:
a) Plot a histogram of the data with 30 bins.
b) Overlay a line plot representing the normal distribution's probability densit
c) Label the x-axis as 'Value' and the y-axis as 'Frequency/Probability'.
d) Set the title of the plot as 'Histogram with PDF Overlay'. '''
arr=np.random.normal(loc=0 ,scale=1,size=1000)
plt.figure(figsize=(10,5))
plt.hist(arr,bins=30,density=True)
plt.show()
## took 60% help from chatgpt in this question
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 10/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment
In [ ]: ''' 16.Set the title of the plot as 'Histogram with PDF Overlay'.'''
plt.title('Histogram with PDF Overlay')
In [60]: ''' 17. Create a Seaborn scatter plot of two random arrays, color points based o
origin (quadrants), add a legend, label the axes, and set the title as 'Quadrant
arr1=np.random.uniform(1,5,10)
arr2=np.random.uniform(1,5,10)
In [66]: ''' 18. With Bokeh, plot a line chart of a sine wave function, add grid lines, l
Wave Function '''
In [71]: ''' 19. Using Plotly, create a basic line plot of a randomly generated dataset,
'Simple Line Plot.'''
https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 11/11