0% found this document useful (0 votes)
21 views

Data Toolkit Assignment

Uploaded by

ayush.bruno21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Data Toolkit Assignment

Uploaded by

ayush.bruno21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

In [1]: ''' 1. Demonstrate three different methods for creating identical 2D arrays in N
method and the final output after each method '''

# 1st Method:
import numpy as np
arr= np.array([[1,2,3] , [4,5,6]])
print(f" 'Method 1st output':\n{arr} ")

# 2nd Method:
arr2 = np.ones((2,3)) * np.array([1,2,3])
arr2[1] = arr2[1] * 2 + np.array([2,1,0])
print(f" 'Method 2nd output':\n{arr2} ")

# 3rd Method:
arr3=np.empty((2,3), dtype=int)
arr3[0]= [1,2,3]
arr3[1]= [4,5,6]
print(f" 'Method 3rd output':\n{arr3} ")

'Method 1st output':


[[1 2 3]
[4 5 6]]
'Method 2nd output':
[[1. 2. 3.]
[4. 5. 6.]]
'Method 3rd output':
[[1 2 3]
[4 5 6]]

In [35]: ''' 2. Using the Numpy function, generate an array of 100 evenly spaced numbers
Reshape that 1D array into a 2D array '''

arr1D=np.linspace(1,10,100)
arr2D=arr1D.reshape((10,10))
arr2D

Out[35]: array([[ 1. , 1.09090909, 1.18181818, 1.27272727, 1.36363636,


1.45454545, 1.54545455, 1.63636364, 1.72727273, 1.81818182],
[ 1.90909091, 2. , 2.09090909, 2.18181818, 2.27272727,
2.36363636, 2.45454545, 2.54545455, 2.63636364, 2.72727273],
[ 2.81818182, 2.90909091, 3. , 3.09090909, 3.18181818,
3.27272727, 3.36363636, 3.45454545, 3.54545455, 3.63636364],
[ 3.72727273, 3.81818182, 3.90909091, 4. , 4.09090909,
4.18181818, 4.27272727, 4.36363636, 4.45454545, 4.54545455],
[ 4.63636364, 4.72727273, 4.81818182, 4.90909091, 5. ,
5.09090909, 5.18181818, 5.27272727, 5.36363636, 5.45454545],
[ 5.54545455, 5.63636364, 5.72727273, 5.81818182, 5.90909091,
6. , 6.09090909, 6.18181818, 6.27272727, 6.36363636],
[ 6.45454545, 6.54545455, 6.63636364, 6.72727273, 6.81818182,
6.90909091, 7. , 7.09090909, 7.18181818, 7.27272727],
[ 7.36363636, 7.45454545, 7.54545455, 7.63636364, 7.72727273,
7.81818182, 7.90909091, 8. , 8.09090909, 8.18181818],
[ 8.27272727, 8.36363636, 8.45454545, 8.54545455, 8.63636364,
8.72727273, 8.81818182, 8.90909091, 9. , 9.09090909],
[ 9.18181818, 9.27272727, 9.36363636, 9.45454545, 9.54545455,
9.63636364, 9.72727273, 9.81818182, 9.90909091, 10. ]])

In [58]: ''' 3. Explain the following termsg


. The difference in nparray, npasarray and npasanyarray

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 1/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

. The difference between Deep copy and shallow copy'''

The np.array is used to convert a list, tuple etc. into a Numpy array and it cre
The np.asarray() is also used to convert a list, tuple, etc. into a Numpy array,
shows the changes in original array.

In [10]: ''' 4.Generate a 3x3 array with random floating-point numbers between 5 and 20 t
the array to 2 decimal places '''

arr=np.random.uniform(5,20,(3,3))
rounded_arr=np.round(arr,2)
rounded_arr

Out[10]: array([[ 7.9 , 15.52, 17. ],


[ 7.16, 19.54, 12.12],
[12.78, 11.58, 19.21]])

In [11]: ''' 5. Create a NumPy array with random integers between 1 and 10 of shape (5,6)
perform the following operations:

a)Extract all even integers from array.


b)Extract all odd integers from array'''

arr=np.random.uniform(1,10,(5,6)).astype(int)
even=[]
odd=[]
for row in arr:
for x in row:
if x%2 ==0:
even.append(x)

else:
odd.append(x)
print(arr)
print(f" Even integers from Array: {even}")
print(f" Odd integers from Array: {odd}")

[[7 4 9 4 1 7]
[8 9 5 2 8 7]
[3 2 2 8 9 3]
[6 2 6 5 1 2]
[3 5 2 2 8 4]]
Even integers from Array: [4, 4, 8, 2, 8, 2, 2, 8, 6, 2, 6, 2, 2, 2, 8, 4]
Odd integers from Array: [7, 9, 1, 7, 9, 5, 7, 3, 9, 3, 5, 1, 3, 5]

In [6]: ''' 6. Create a 3-D NumPy array of shape (3, 3, 3) containing random integers Be
following operations:

a) Find the indices of the maximum values along each depth level (third axis).
b) Perform element-wise multiplication of between both array '''

arr1=np.random.uniform(1,10,(3,3,3)).astype(int)
max_value_indices = np.argmax(arr, axis=2)
arr2=np.random.uniform(1,10,(3,3,3)).astype(int)
arr_multiplication=np.multiply(arr1,arr2)

print(arr1)

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 2/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

print(f"Indices of the maximum values along each depth level:\n{max_value_indice


print(f"element-wise multiplication:\n{arr_multiplication}")

[[[3 1 3]
[2 9 6]
[5 1 5]]

[[7 5 8]
[1 6 1]
[8 6 1]]

[[9 2 7]
[9 9 3]
[4 8 5]]]
Indices of the maximum values along each depth level:
[[1 0 0]
[0 0 0]
[0 2 1]]
element-wise multiplication:
[[[18 8 24]
[ 6 36 36]
[ 5 7 10]]

[[49 45 40]
[ 2 42 3]
[32 30 5]]

[[ 9 12 28]
[45 54 9]
[24 40 35]]]

In [2]: ''' 7. Clean and transform the 'Phone' column in the sample dataset to remove no
convert it to a numeric data type. Also display the table attributes and data ty

import pandas as pd
df=pd.read_csv("People Data.csv")
df.head()

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 3/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

Out[2]:
First Last
Index User Id Gender Email P
Name Name

0 1 8717bbf45cCDbEe Shelia Mahoney Male [email protected] 857.139

1 2 3d5AD30A4cD38ed Jo Rivers Female [email protected]

(599
2 3 810Ce0F276Badec Sheryl Lowery Female [email protected]

3 4 BF2a889C00f0cE1 Whitney Hooper Male [email protected]

(390
4 5 9afFEafAe1CBBB9 Lindsey Rice Female [email protected]
1635

In [7]: df["Phone"].values.tolist()[:20]

Out[7]: [8571398239.0,
nan,
5997820605.0,
nan,
39041716353010.0,
8537800927.0,
9365574807895.0,
4709522945.0,
138204758.0,
56090350684985.0,
8629884096.0,
10418593844272.0,
801809918137308.0,
15111276660230.0,
9035458947.0,
4169790633058.0,
24405485211913.0,
92936685493587.0,
9732439193.0,
60611937790160.0]

In [5]: df['Phone'] = df['Phone'].str.replace(r'\D', '', regex=True)


df['Phone'] = pd.to_numeric(df['Phone'])

In [6]: df.info()

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 4/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Index 1000 non-null int64
1 User Id 1000 non-null object
2 First Name 1000 non-null object
3 Last Name 1000 non-null object
4 Gender 1000 non-null object
5 Email 1000 non-null object
6 Phone 979 non-null float64
7 Date of birth 1000 non-null object
8 Job Title 1000 non-null object
9 Salary 1000 non-null int64
dtypes: float64(1), int64(2), object(7)
memory usage: 78.2+ KB

In [12]: ''' 8. Perform the following tasks using people dataset:

a) Read the 'data.csv' file using pandas, skipping the first 50 rows.
b) Only read the columns: 'Last Name', ‘Gender’,’Email’,‘Phone’ and ‘Salary’ fr
c) Display the first 10 rows of the filtered dataset.
d) Extract the ‘Salary’' column as a Series and display its last 5 values '''

df1=pd.read_csv("People Data.csv")[50:] #or we can also use skiprows function

In [13]: df1.head()

Out[13]:
First Last
Index User Id Gender Email
Name Name

001-85
50 51 CccE5DAb6E288e5 Jo Zavala Male [email protected]
9935

001-27
51 52 DfBDc3621D4bcec Joshua Carey Female [email protected]
84

52 53 f55b0A249f5E44D Rickey Hobbs Female [email protected] 241.179.95

53 54 Ed71DcfaBFd0beE Robyn Reilly Male [email protected] 207.797.834

001-59
54 55 FDaFD0c3f5387EC Christina Conrad Male [email protected]
74

In [5]: df2=pd.read_csv("People Data.csv",usecols=['Last Name','Gender','Email','Phone',


df2.head()

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 5/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

Out[5]: Last Name Gender Email Phone Salary

0 Mahoney Male [email protected] 857.139.8239 90000

1 Rivers Female [email protected] NaN 80000

2 Lowery Female [email protected] (599)782-0605 50000

3 Hooper Male [email protected] NaN 65000

4 Rice Female [email protected] (390)417-1635x3010 100000

In [22]: df2.head(10)

Out[22]: Last Name Gender Email Phone Salary

0 Mahoney Male [email protected] 857.139.8239 90000

1 Rivers Female [email protected] NaN 80000

2 Lowery Female [email protected] (599)782-0605 50000

3 Hooper Male [email protected] NaN 65000

4 Rice Female [email protected] (390)417-1635x3010 100000

5 Caldwell Male [email protected] 8537800927 50000

6 Hoffman Male [email protected] 093.655.7480x7895 60000

7 Andersen Male [email protected] 4709522945 65000

8 Mays Male [email protected] 013.820.4758 50000

9 Mitchell Male [email protected] (560)903-5068x4985 50000

In [31]: df2["Salary"].tail(5)

Out[31]: 995 90000


996 50000
997 60000
998 100000
999 90000
Name: Salary, dtype: int64

In [12]: ''' 9. Filter and select rows from the People_Dataset, where the “Last Name' col
'Gender' column contains the word Female and ‘salary’ should be less than 85000

filtered_rows=df2[(df2['Last Name']=='Duke') & (df2['Gender']=='Female') & (df2


filtered_rows

Out[12]: Last Name Gender Email Phone Salary

45 Duke Female [email protected] 001-366-475-8607x04350 60000

210 Duke Female [email protected] 740.434.0212 50000

457 Duke Female [email protected] +1-903-596-0995x489 50000

729 Duke Female [email protected] 982.692.6257 70000

In [23]: ''' 10. Create a 7*5 Dataframe in Pandas using a series generated from 35 random

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 6/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

arr=np.linspace(1,6,35)
df=pd.DataFrame(arr.reshape(7,5),columns=['A','B','C','D','E'])
df

Out[23]: A B C D E

0 1.000000 1.147059 1.294118 1.441176 1.588235

1 1.735294 1.882353 2.029412 2.176471 2.323529

2 2.470588 2.617647 2.764706 2.911765 3.058824

3 3.205882 3.352941 3.500000 3.647059 3.794118

4 3.941176 4.088235 4.235294 4.382353 4.529412

5 4.676471 4.823529 4.970588 5.117647 5.264706

6 5.411765 5.558824 5.705882 5.852941 6.000000

In [50]: ''' 11. Create two different Series, each of length 50, with the following crite
a) The first Series should contain random numbers ranging from 10 to 50.
b) The second Series should contain random numbers ranging from 100 to 1000.
c) Create a DataFrame by 'joining these Series by column, and, change the names
etc'''

arr1=np.random.randint(10,51,size=50)
arr2=np.random.randint(100,1001,size=50)
df1=pd.DataFrame(arr1,columns=['col1'])
df2=pd.DataFrame(arr2,columns=['col2'])
final_df=pd.concat([df1,df2],axis=True)
final_df.head()

Out[50]: col1 col2

0 39 858

1 45 526

2 38 614

3 34 920

4 21 614

In [61]: ''' 12. Perform the following operations using people data set:
a) Delete the 'Email', 'Phone', and 'Date of birth' columns from the dataset.
b) Delete the rows containing any missing values.
c) Print the final output also '''

df=pd.read_csv("People Data.csv")
df=df.drop(['Email', 'Phone', 'Date of birth'], axis=1, inplace=False)
final_df=df.dropna()
final_df

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 7/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

Out[61]: First Last


Index User Id Gender Job Title Salary
Name Name

0 1 8717bbf45cCDbEe Shelia Mahoney Male Probation officer 90000

1 2 3d5AD30A4cD38ed Jo Rivers Female Dancer 80000

2 3 810Ce0F276Badec Sheryl Lowery Female Copy 50000

Counselling
3 4 BF2a889C00f0cE1 Whitney Hooper Male 65000
psychologist

4 5 9afFEafAe1CBBB9 Lindsey Rice Female Biomedical engineer 100000

... ... ... ... ... ... ... ...

995 996 fedF4c7Fd9e7cFa Kurt Bryant Female Personnel officer 90000

Education
996 997 ECddaFEDdEc4FAB Donna Barry Female 50000
administrator

Commercial/residential
997 998 2adde51d8B8979E Cathy Mckinney Female 60000
surveyor

998 999 Fb2FE369D1E171A Jermaine Phelps Male Ambulance person 100000

Nurse, learning
999 1000 8b756f6231DDC6e Lee Tran Female 90000
disability

1000 rows × 7 columns

In [52]: ''' 13. Create two NumPy arrays, x and y, each containing 100 random float value
following tasks using Matplotlib and NumPy:
a) Create a scatter plot using x and y, setting the color of the points to red a
b) Add a horizontal line at y = 0.5 using a dashed line style and label it as 'y
c) Add a vertical line at x = 0.5 using a dotted line style and label it as 'x =
d) Label the x-axis as 'X-axis' and the y-axis as 'Y-axis'.
e) Set the title of the plot as 'Advanced Scatter Plot of Random Values'.
f) Display a legend for the scatter plot, the horizontal line, and the vertical

import seaborn as sns


import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use("ggplot")
x=np.random.uniform(0,1,100)
y=np.random.uniform(0,1,100)

plt.scatter(x=x,y=y,color='r',marker='o',linewidth=.02)
plt.axhline(y=0.5,linestyle="--",label="y=0.5")
plt.axvline(x=0.5,linestyle="--",label="x=0.5")
plt.legend()
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 8/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

In [29]: ''' 14. Create a time-series dataset in a Pandas DataFrame with columns: 'Date',
Perform the following tasks using Matplotlib:

a) Plot the 'Temperature' and 'Humidity' on the same plot with different y-axes
right y-axis for 'Humidity').
b) Label the x-axis as 'Date'.
c) Set the title of the plot as 'Temperature and Humidity Over Time'''

date=pd.date_range(start='2024-05-26',end='2024-06-27', freq='1D')
temp=np.random.uniform(low=10,high=30,size=len(date))
humid=np.random.uniform(low=30,high=50,size=len(date))
Data={"Dates":date,"Temperature":temp,"humidity":humid}
df=pd.DataFrame(Data)
df.head(5)

Out[29]: Dates Temperature humidity

0 2024-05-26 13.984043 49.916805

1 2024-05-27 13.577177 32.017619

2 2024-05-28 29.097362 35.426117

3 2024-05-29 24.003562 49.937397

4 2024-05-30 25.989365 37.828209

In [32]: fig,p1 = plt.subplots(figsize=(14,7))


p1.plot(df['Dates'],df['Temperature'])
p2=p1.twinx()
p2.plot(df['Dates'],df['humidity'],color='blue')
plt.title("Temperature and Humidity Over Time")
p1.set_xlabel('Date')
p1.set_ylabel('Temperature')

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 9/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

p2.set_ylabel('Humidity')
plt.show()

In [59]: ''' 15. Create a NumPy array data containing 1000 samples from a normal distribu
tasks using Matplotlib:
a) Plot a histogram of the data with 30 bins.
b) Overlay a line plot representing the normal distribution's probability densit
c) Label the x-axis as 'Value' and the y-axis as 'Frequency/Probability'.
d) Set the title of the plot as 'Histogram with PDF Overlay'. '''

arr=np.random.normal(loc=0 ,scale=1,size=1000)
plt.figure(figsize=(10,5))
plt.hist(arr,bins=30,density=True)

from scipy.stats import norm


x = np.linspace(min(arr), max(arr), 1000)
pdf = norm.pdf(x, 0, 1)
plt.plot(x, pdf,'black' )
plt.title("Histogram with PDF Overlay")
plt.xlabel('Value')
plt.ylabel('Frequency/Probability')

plt.show()
## took 60% help from chatgpt in this question

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 10/11
26/05/2024, 20:44 25-5-24_Data_ToolKit_Assignment

In [ ]: ''' 16.Set the title of the plot as 'Histogram with PDF Overlay'.'''
plt.title('Histogram with PDF Overlay')

In [60]: ''' 17. Create a Seaborn scatter plot of two random arrays, color points based o
origin (quadrants), add a legend, label the axes, and set the title as 'Quadrant

arr1=np.random.uniform(1,5,10)
arr2=np.random.uniform(1,5,10)

Out[60]: array([1.1768345 , 3.13803238, 3.35703567, 1.06723204, 1.70530818,


3.21356747, 4.32449402, 2.88595569, 4.3218438 , 4.98469253])

In [66]: ''' 18. With Bokeh, plot a line chart of a sine wave function, add grid lines, l
Wave Function '''

In [71]: ''' 19. Using Plotly, create a basic line plot of a randomly generated dataset,
'Simple Line Plot.'''

https://gray-doctor-onblz.pwskills.app/lab/tree/work/25-5-24_Data_ToolKit_Assignment.ipynb 11/11

You might also like