Data Toolkit Assignment
Data Toolkit Assignment
Data Toolkit Assignment
Questions
1.Demonstrate three different methods for creating identical 2D
arrays in NumPy. Provide the code for each method and the final
output after each method.
9. Filter and select rows from the People_Dataset, where the “Last
Name' column contains the name 'Duke',
'Gender' column contains the word Female and ‘Salary’ should be
less than 85000.
11. Create two different Series, each of length 50, with the following
criteria:
a) Delete the 'Email', 'Phone', and 'Date of birth' columns from the
dataset.
13. Create two NumPy arrays, x and y, each containing 100 random
float values between 0 and 1. Perform the
following tasks using Matplotlib and NumPy:
a) Create a scatter plot using x and y, setting the color of the points
to red and the marker style to 'o'.
c) Add a vertical line at x = 0.5 using a dotted line style and label it
as 'x = 0.5'.
f) Display a legend for the scatter plot, the horizontal line, and the
vertical line.
c) Set the title of the plot as 'Temperature and Humidity Over Time.
16. Set the title of the plot as 'Histogram with PDF Overlay'.
18. With Bokeh, plot a line chart of a sine wave function, add grid
lines, label the axes, and set the title as 'Sine
Wave Function'.
19. Using Bokeh, generate a bar chart of randomly generated
categorical data, color bars based on their
values, add hover tooltips to display exact values, label the axes,
and set the title as 'Random Categorical
Bar Chart'.
20. Using Plotly, create a basic line plot of a randomly generated
dataset, label the axes, and set the title as
'Simple Line Plot'.
21. Using Plotly, create an interactive pie chart of randomly
generated data, add labels and percentages, set
the title as 'Interactive Pie Chart'.
ANSWERS
1. Here are three different methods for creating identical 2D
arrays in NumPy:
● Using np.array() to create the array manually.
import numpy as np
print("Method 1 Output:")
print(array_1)
● Using np.ones() to create an array of ones and then multiply
or modify values.
Import numpy as np
print("\nMethod 2 Output:")
print(array_2)
● Using np.zeros() and assigning values manually.
Import numpy as np
print("\nMethod 3 Output:")
print(array_3)
Output for all methods:
Method 1 Output:
[[1 2 3]
[4 5 6]]
Method 2 Output:
[[1. 2. 3.]
[4. 5. 6.]]
Method 3 Output:
[[1. 2. 3.]
[4. 5. 6.]]
2.
import numpy as np
print(array_2d)
3.
Difference between np.array, np.asarray, and np.asanyarray:
● np.array():
○ This function is used to create an array. It always creates
a new array, even if the input is already an array (unless
copy=False is specified).
○ It has additional options like specifying data type
(dtype) and forcing a copy or view.
○ np.asarray():
● Similar to np.array(), but if the input is already a NumPy array,
it will not make a copy; instead, it returns the original array.
● It's faster and more memory-efficient when you're converting data
that might already be an array.
np.asanyarray():
● Shallow copy:
○ A shallow copy of an object is a copy of the object, but it
does not recursively copy the objects contained within it (if
they are mutable).
○ In the case of a shallow copy, modifications to mutable
objects inside the copy will affect the original object as they
share the same reference.
Deep copy:
4.
import numpy as np
rounded_array = np.round(random_array, 2)
print(rounded_array)
5.
import numpy as np
even_integers = array[array % 2 == 0]
odd_integers = array[array % 2 != 0]
print("Original Array:")
print(array)
print("\nEven Integers:")
print(even_integers)
print("\nOdd Integers:")
print(odd_integers)
6.
import numpy as np
# Display results
print("Original 3D Array:")
print(array_3d)
print(max_indices)
print("\nElement-wise Multiplication:")
print(elementwise_multiplication)
Explanation:
7.
import pandas as pd
data = {
# Create a DataFrame
df = pd.DataFrame(data)
print("Cleaned DataFrame:")
print(df)
Expected output:
Cleaned DataFrame:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
8.
import pandas as pd
# a) Read the 'data.csv' file using pandas, skipping the first 50 rows
df = pd.read_csv('data.csv', skiprows=50)
# b) Only read the columns: 'Last Name', 'Gender', 'Email', 'Phone', and
'Salary'
df_filtered = df[columns_to_read]
print(df_filtered.head(10))
# d) Extract the 'Salary' column as a Series and display its last 5 values
salary_series = df_filtered['Salary']
print(salary_series.tail(5))
Explanation:
9.
import pandas as pd
df = pd.read_csv('data.csv')
print("Filtered Rows:")
print(filtered_df)
10.
import pandas as pd
import numpy as np
print("7x5 DataFrame:")
print(df)
11.
import pandas as pd
import numpy as np
# b) Create the second Series with random numbers ranging from 100 to
1000
print("Combined DataFrame:")
print(df)
Expected output:
Combined DataFrame:
col1 col2
0 32 384
1 14 534
2 23 251
3 41 728
4 36 910
5 15 436
6 47 677
7 35 123
8 21 580
9 50 761
...
12.
import pandas as pd
df = pd.read_csv('data.csv')
# a) Delete the 'Email', 'Phone', and 'Date of birth' columns from the
dataset
df.dropna(inplace=True)
Expected output:
...
13.
import numpy as np
# Create two NumPy arrays, x and y, each containing 100 random float
values between 0 and 1
x = np.random.rand(100)
y = np.random.rand(100)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
# f) Display a legend
plt.legend()
plt.show()
14.
import pandas as pd
import numpy as np
data = {
'Date': date_range,
'Temperature': temperature,
'Humidity': humidity
df = pd.DataFrame(data)
# Plotting
ax1.set_xlabel('Date')
ax1.tick_params(axis='y', labelcolor='tab:red')
# Create a second y-axis for Humidity
ax2 = ax1.twinx()
ax2.tick_params(axis='y', labelcolor='tab:blue')
plt.show()
15.
import numpy as np
plt.xlabel('Value')
plt.ylabel('Frequency/Probability')
# Display legend
plt.legend()
plt.show()
16.
import numpy as np
plt.xlabel('Value')
plt.ylabel('Frequency/Probability')
# Display legend
plt.legend()
plt.show()
17.
import numpy as np
import pandas as pd
def get_quadrant(row):
if row['X'] > 0 and row['Y'] > 0:
else:
plt.figure(figsize=(10, 6))
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
# Set the title of the plot
plt.legend(title='Quadrants')
plt.grid()
plt.show()
18.
import numpy as np
output_notebook()
y = np.sin(x)
# Create a new plot with title and axis labels
show(p)
19.
import numpy as np
import pandas as pd
output_notebook()
# Create a DataFrame
# Create a ColumnDataSource
source = ColumnDataSource(data=data)
toolbar_location=None, tools="")
hover = HoverTool()
p.xaxis.axis_label = "Categories"
p.yaxis.axis_label = "Values"
p.xaxis.major_label_orientation = "vertical"
show(p)
20.
import numpy as np
import plotly.graph_objects as go
fig = go.Figure()
fig.update_layout(
xaxis_title='X Axis',
yaxis_title='Y Axis'
fig.show()
21.
import numpy as np
import plotly.graph_objects as go
labels = [f'Category {i}' for i in range(1, 6)] # Labels for the pie chart
fig.update_layout(
fig.show()
Explanation: