Manual
Manual
Ans:
import numpy as np
vector = np.zeros(10)
vector[5] = 11
print(vector)
output:
[ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]
import numpy as np
float_array = array.astype(float)
print(float_array)
output:
[1. 2. 3. 4. 5.]
Ans:
import numpy as np
print(matrix)
output:
[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]
Write a NumPy program to convert a list and tuple into arrays
Ans:
import numpy as np
lst = [1, 2, 3, 4]
tpl = (5, 6, 7, 8)
array_from_list = np.array(lst)
array_from_tuple = np.array(tpl)
print(array_from_list)
print(array_from_tuple)
output:
[1 2 3 4]
[5 6 7 8]
Write a NumPy program to convert the values of Centigrade degrees into Fahrenheit degrees and
vice versa. Values have to be stored into a NumPy array.
Ans
import numpy as np
output:
Ans:
import numpy as np
print("Addition:", addition)
print("Subtraction:", subtraction)
print("Multiplication:", multiplication)
print("Division:", division)
Output:
Subtraction: [ 9 18 27 36]
Multiplication: [ 10 40 90 160]
Ans:
import numpy as np
transpose_array = np.transpose(array)
print("Original array:")
print(array)
print("Transposed array:")
print(transpose_array)
Output:
Original array:
[[1 2 3]
[4 5 6]]
Transposed array:
[[1 4]
[2 5]
[3 6]]
Use NumPy, create an array with 5 dimensions and verify that it has 5 dimensions
Ans:
import numpy as np
Output:
Number of dimensions: 5
Write a NumPy program to merge three given NumPy arrays of same shape
Ans:
import numpy as np
output:
Merged array: [1 2 3 4 5 6 7 8 9]
Create two arrays of six elements, write a NumPy program to count the number of instances of a
value occurring in one array on the condition of another array.
Ans:
import numpy as np
value_to_count = 2
condition_value = 6
output:
Number of instances: 3
Sample output:
Original dictionary:
Type:
ndarray:
[[1. 0. 0. 2.]
[3. 1. 0. -1.]
[4. 1. 5. -1.]
Type:<class ‘numpy.ndarray’>
Ans:
import numpy as np
# Original dictionary
data_dict = {
print("Original dictionary:")
print(data_dict)
print("Type:")
print("ndarray:")
print(ndarray)
print("Type:", type(ndarray))
output:
Original dictionary:
{'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0}, 'column1': {'a': 3.0, 'b': 1, 'c': 0.0, 'd': -1.0}, 'column2': {'a':
4, 'b': 1, 'c': 5.0, 'd': -1.0}, 'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}}
Type:
ndarray:
[[ 1. 3. 4. 3.]
[ 0. 1. 1. -1.]
[ 0. 0. 5. -1.]
Ans:
import pandas as pd
data = {
df = pd.DataFrame(data)
print("DataFrame values:")
print(df.values)
output:
DataFrame values:
['Charlie' 22 'Chicago']
['David' 32 'Houston']]
Perform appending, slicing, addition and deletion of rows with a pandas dataframe.
Ans:
import pandas as pd
# Initial DataFrame
data = {
}
df = pd.DataFrame(data)
sliced_df = df.iloc[1:4]
print(sliced_df)
additional_data = pd.DataFrame({
})
df = df.drop(index=2)
print(df)
Output:
2 Charlie 22 Chicago
3 David 32 Houston
DataFrame after appending, adding, and deleting rows:
3 David 32 Houston
5 Frank 30 Seattle
6 Grace 25 Austin
Using Pandas, Create a DataFrame with a list of dictionaries, row indices, and column indices
Ans:
import pandas as pd
# List of dictionaries
data = [
print(df)
Output:
Sample data:
{‘X’: [78, 85, 96, 80, 86], ‘Y’: [84, 94, 89, 83, 86], ‘Z’: [86, 97, 96, 72, 83]}
Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 72
3 80 83 72
4 86 86 83
Ans:
import pandas as pd
import numpy as np
data = {'X': [78, 85, 96, 80, 86], 'Y': [84, 94, 89, 83, 86], 'Z': [86, 97, 96, 72, 83]}
df = pd.DataFrame(data)
print("Original DataFrame:")
Original DataFrame:
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
X Y Z Power_X_Y
0 78 84 86 0
1 85 94 97 4551265826121030281
2 96 89 96 0
3 80 83 72 0
4 86 86 83 0
Write a Pandas Program to get the numeric representation of an array by identifying distinct values
of a given column of a DataFrame
Sample output:
Original DataFrame:
[0 1 2 3 1]
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Alberto Franco', 'Gino Mcnell', 'Ryan Parkes', 'Eesha Hinton', 'Gino Mcnell'],
df = pd.DataFrame(data)
df['Name_numeric'] = pd.factorize(df['Name'])[0]
print("Original DataFrame:")
print(df['Name_numeric'].values)
print(pd.Index(df['Name'].unique()))
Output:
Original DataFrame:
[0 1 2 3 1]
exam_data = {‘name’: [‘Anastasia’, ‘Dima’, ‘Katherine’, ‘James’, ‘Emily’, ‘Michael’, ‘Matthew’, ‘Laura’,
‘Kevin’, ‘Jonas’],
‘qualify’: [‘yes’, ‘no’, ‘yes’, ‘no’, ‘no’, ‘yes’, ‘yes’, ‘no’, ‘no’, ‘yes’]}
labels = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
Expected Output:
Number of Rows: 10
Number of Columns: 4
Ans:
import pandas as pd
import numpy as np
exam_data = {
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df = pd.DataFrame(exam_data, index=labels)
num_rows = df.shape[0]
num_columns = df.shape[1]
Output:
Number of Rows: 10
Number of Columns: 4
Write a Pandas program to check a given column is present in a DataFrame or not
Sample data:
Original DataFrame
0147
1258
2 3 6 12
3491
4 7 5 11
Ans:
import pandas as pd
data = {
df = pd.DataFrame(data)
if column_name in df.columns:
else:
check_column_presence(df, 'col4')
check_column_presence(df, 'col1')
Output:
Ans:
import pandas as pd
import numpy as np
np.random.seed(42)
file_path = '/content/concrete_strength_parabolic.csv'
df.to_csv(file_path, index=False)
plt.figure(figsize=(8, 6))
plt.xlabel('Cement (kg/m³)')
plt.show()
plt.figure(figsize=(8, 6))
plt.xlabel('Water (kg/m³)')
plt.show()
Output:
Draw a Scatter Plot for the following Pandas DataFrame with Team name and Rank Points as x and
y axis,
[‘Australia’, 2500], [‘Bangladesh’, 1000], [‘England’, 2000], [‘India’, 3000], [‘Srilanka’, 1500]
Ans:
import pandas as pd
data = {
df_teams = pd.DataFrame(data)
print("DataFrame:")
print(df_teams)
plt.figure(figsize=(8, 6))
plt.xlabel('Team')
plt.ylabel('Rank Points')
plt.show()
Output:
24. Perform Reading data from text files, Excel and the web and exploring various commands for
doing descriptive analytics on the Iris data set (This program requires iris.csv file)
Ans:
import pandas as pd
# Load the Iris dataset from a text file, Excel file, or from the web
# df_text = pd.read_csv('path_to_your_file/iris.csv')
df_web = pd.read_csv('/content/iris.csv')
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
print(df_web.head())
print("\nDataset Information:")
print(df_web.info())
# 2. Summary statistics
print("\nSummary Statistics:")
print(df_web.describe())
print(df_web['species'].unique())
print(df_web['species'].value_counts())
print("\nCorrelation Matrix:")
print(df_web.corr())
print(df_web.groupby('species').mean())
sns.pairplot(df_web, hue="species")
plt.show()
Output:
Dataset Information:
<class 'pandas.core.frame.DataFrame'>
None
Summary Statistics:
[0 1 2]
species
0 50
1 50
2 50
Correlation Matrix:
species
Ans:
import numpy as np
np.random.seed(42)
x = np.random.rand(50)
y = np.random.rand(50)
z = np.random.rand(50)
# Creating a 3D plot
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z Axis')
plt.show()
Output:
27. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:
a. Frequency
b. Mean
c. Median
d. Mode
e. Variance
f. Standard Deviation
g. Skewness and Kurtosis
Ans:
import pandas as pd
import numpy as np
data = pd.read_csv(file_path)
data_0 = data[data['Outcome'] == 0]
data_1 = data[data['Outcome'] == 1]
analysis_results = {
"Outcome = 0": {
},
"Outcome = 1": {
print(f"{stat_name}: {value}")
output:
1 106
2 84
0 73
3 48
4 45
5 36
6 34
7 20
8 16
10 14
9 10
13 5
12 5
11 4
SkinThickness Mode: 0
0 38
1 29
3 27
7 25
4 23
8 22
5 21
2 19
9 18
6 16
10 10
11 7
13 5
12 4
14 2
15 1
17 1
Name: count, dtype: int64
SkinThickness Mode: 0
28. use the diabetes data set from Pima Indians Diabetes data set for performing the following:
Ans:
import pandas as pd
import statsmodels.api as sm
data = pd.read_csv(file_path)
# Bivariate Analysis
plt.show()
plt.figure(figsize=(10, 8))
plt.show()
y = data["Outcome"]
X = sm.add_constant(X)
print(model.summary())
y_pred = model.predict(X_test)
r2 = r2_score(y_test, y_pred)
Output:
Multiple Regression Analysis Summary:
==============================================================================
Df Model: 8
==================================================================================
==========
--------------------------------------------------------------------------------------------
==============================================================================
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.14e+03. This might indicate that there are
R-squared: 0.222
30. Use the diabetes data set from UCI data set from performing the following:
Ans:
import pandas as pd
import statsmodels.api as sm
data = pd.read_csv(file_path)
# Bivariate Analysis
# Correlation Heatmap
plt.figure(figsize=(10, 8))
plt.show()
linear_model = LinearRegression()
linear_model.fit(X_train_linear, y_train_linear)
y_pred_linear = linear_model.predict(X_test_linear)
print(f"R-squared: {r2_linear:.3f}\n")
logistic_model = LogisticRegression(max_iter=200)
logistic_model.fit(X_train_logistic, y_train_logistic)
y_pred_logistic = logistic_model.predict(X_test_logistic)
print(f"Accuracy: {accuracy_logistic:.3f}")
print("Confusion Matrix:")
print(conf_matrix_logistic)
print("\nClassification Report:")
print(class_report_logistic)
output:
Linear Regression Model:
R-squared: 0.061
Accuracy: 0.736
Confusion Matrix:
[[120 31]
[ 30 50]]
Classification Report:
31. Use the diabetes data set from UCI data set for performing the following:
Ans:
import pandas as pd
import statsmodels.api as sm
data = pd.read_csv(file_path)
print("Dataset Info:")
print(data.info())
print("\nDataset Head:")
print(data.head())
# Multiple Regression Analysis - Logistic Regression for 'Outcome' Prediction
logistic_model = LogisticRegression(max_iter=200)
logistic_model.fit(X_train, y_train)
y_pred = logistic_model.predict(X_test)
# Model Evaluation
print(f"Accuracy: {accuracy:.3f}")
print("Confusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)
print(result.summary())
Output:
Dataset Info:
<class 'pandas.core.frame.DataFrame'>
None
Dataset Head:
0 6 148 72 35 0 33.6
1 1 85 66 29 0 26.6
2 8 183 64 0 0 23.3
3 1 89 66 23 94 28.1
0 0.627 50 1
1 0.351 31 0
2 0.672 32 1
3 0.167 21 0
4 2.288 33 1
Accuracy: 0.736
Confusion Matrix:
[[120 31]
[ 30 50]]
Classification Report:
Iterations 6
Logistic Regression Analysis Summary (StatsModels):
==============================================================================
==================================================================================
==========
--------------------------------------------------------------------------------------------
i. Normal Value
ii. Density and contour plots
iii. Three-dimensional plotting
Ans:
import pandas as pd
import numpy as np
data = pd.read_csv(file_path)
plt.figure(figsize=(14, 6))
plt.subplot(1, 2, 1)
plt.xlabel("Glucose")
plt.ylabel("Density")
plt.subplot(1, 2, 2)
sns.histplot(data['BMI'], kde=True, stat="density", line_kws={'linestyle':'--'}, color="orange")
plt.xlabel("BMI")
plt.ylabel("Density")
plt.show()
plt.figure(figsize=(8, 6))
plt.xlabel("Glucose")
plt.ylabel("Insulin")
plt.show()
# 3. Three-Dimensional Plotting
ax = fig.add_subplot(111, projection='3d')
# Scatter plot
ax.set_xlabel("Age")
ax.set_ylabel("BMI")
ax.set_zlabel("Glucose")
plt.colorbar(sc, label="Outcome")
plt.show()
Output:
33. Apply and explore various plotting functions on UCI data set for performing the following:
Ans:
import pandas as pd
import numpy as np
file_path = "/content/diabetes.csv"
data = pd.read_csv(file_path)
# Correlation Heatmap
plt.figure(figsize=(10, 8))
plt.title("Correlation Matrix")
plt.show()
plt.show()
# ii. Histograms
plt.show()
ax = fig.add_subplot(111, projection='3d')
# Scatter plot
ax.set_xlabel("Age")
ax.set_ylabel("BMI")
ax.set_zlabel("Glucose")
ax.set_title("3D Plot of Age, BMI, and Glucose")
plt.colorbar(sc, label="Outcome")
plt.show()
output:
34. Apply and explore various plotting functions on Pima Indians Diabetes data set for performing
the following:
i. Normal Value
ii. Density and contour plots
iv. Three-dimensional plotting
Ans:
import pandas as pd
import numpy as np
file_path = "/content/diabetes.csv"
data = pd.read_csv(file_path)
# i. Normal Value Distribution Plot
plt.figure(figsize=(14, 6))
plt.subplot(1, 2, 1)
plt.xlabel("Glucose")
plt.ylabel("Density")
plt.subplot(1, 2, 2)
plt.xlabel("BMI")
plt.ylabel("Density")
plt.show()
plt.figure(figsize=(8, 6))
plt.xlabel("Glucose")
plt.ylabel("Insulin")
plt.show()
ax = fig.add_subplot(111, projection='3d')
# Scatter plot
ax.set_xlabel("Age")
ax.set_ylabel("BMI")
ax.set_zlabel("Glucose")
plt.colorbar(sc, label="Outcome")
plt.show()
output:
35. Apply and explore various plotting functions on Pima Indians Diabetes data set for performing
the following:
36. Apply and explore various plotting functions on UCI data sets
37. Compare the results of the Univariate and Bivariate analysis for the UCI diabetes data set
38. Use the diabetes data set from UCI, perform Univariate analysis
39. Use the diabetes data set from Pima Indians Diabetes, Perform Bivariate analysis
40. Perform Multiple Regression analysis on your own dataset (For example, Car dataset with
Information Company Name, Model, Volume, Weight, CO2) with more than one independent value
to predict a value based on two or more variables.