0% found this document useful (0 votes)

9 views40 pages

Data Science Practical Problems

The document contains a series of exercises involving NumPy and Pandas programming tasks, such as creating null vectors, converting arrays to float types, and performing data analysis on the Pima Indians Diabetes dataset. Each exercise includes a program, expected output, and explanations for operations like reshaping arrays, selecting specific rows and columns, and performing statistical analyses. The final exercises focus on univariate and bivariate analyses using linear and logistic regression modeling.

Uploaded by

soundaravalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views40 pages

Data Science Practical Problems

Uploaded by

soundaravalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Ex no: 1 a

Write a NumPy program to create a null vector of size 10 and update sixth value to 11

Program

import numpy as np

# Create a null vector of size 10

null_vector = np.zeros(10)

# Update the sixth value to 11 (indexing starts from 0)

null_vector[5] = 11

print("Original null vector:", null_vector)

Output:

Original null vector: [ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]

Ex no : 1 b
Write a NumPy program to convert an array to a float type

Program :

import numpy as np

# Create an example array (you can replace this with your own array)

integer_array = np.array([1, 2, 3, 4, 5])

# Convert the array to float type

float_array = integer_array.astype(float)

print("Original array (integer):", integer_array)

print("Converted array (float):", float_array)

Output:

Original array (integer): [1 2 3 4 5]

Converted array (float): [1. 2. 3. 4. 5.]

Ex no : 1 c
Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10

Program :

import numpy as np

# Create a 1D array with values ranging from 2 to 10

values_array = np.arange(2, 11)

# Reshape the 1D array into a 3x3 matrix

matrix_3x3 = values_array.reshape(3, 3)

print("3x3 Matrix with values ranging from 2 to 10:")

print(matrix_3x3)

Output :

3x3 Matrix with values ranging from 2 to 10:

[[ 2 3 4]

[ 5 6 7]

[ 8 9 10]]

Ex no : 1 d
Write a NumPy program to convert a list of numeric value into a one-dimensional NumPy
array
Program :

import numpy as np

# Create a list of numeric values

numeric_list = [1, 2, 3, 4, 5]

# Convert the list to a one-dimensional NumPy array

numpy_array = np.array(numeric_list)

print("List of numeric values:", numeric_list)

print("One-dimensional NumPy array:", numpy_array)

Output :

List of numeric values: [1, 2, 3, 4, 5]

One-dimensional NumPy array: [1 2 3 4 5]

Ex no : 2 a
Write a NumPy program to convert an array to a float type

Program :

import numpy as np

# Create an example array (you can replace this with your own array)

original_array = np.array([1, 2, 3, 4, 5])

# Convert the array to float type

float_array = original_array.astype(float)

print("Original array:", original_array)

print("Converted array (float):", float_array)

Output :

Original array: [1 2 3 4 5]

Converted array (float): [1. 2. 3. 4. 5.]

Ex no : 2 b
Write a NumPy program to create an empty and a full array

Program :

import numpy as np

# Create an empty array

empty_array = np.empty((3, 3)) # Specify the shape of the empty array (3x3 in this case)

# Create a full array with a specified value

full_array = np.full((2, 4), 7) # Specify the shape and the value (2x4 array with value 7)

print("Empty Array:")

print(empty_array)

print("\nFull Array with Value 7:")

print(full_array)

Output :

Empty Array:

[[0. 0. 0.]

[0. 0. 0.]

[0. 0. 0.]]
Full Array with Value 7:

[[7 7 7 7]

[7 7 7 7]]

Ex no : 2 c

Write a NumPy program to convert a list and tuple into arrays

Program :

import numpy as np

# Convert a list to a NumPy array

list_values = [1, 2, 3, 4, 5]

array_from_list = np.array(list_values)

# Convert a tuple to a NumPy array

tuple_values = (6, 7, 8, 9, 10)

array_from_tuple = np.array(tuple_values)

print("List to Array:")

print(array_from_list)

print("\nTuple to Array:")

print(array_from_tuple)

Output :

List to Array:

[1 2 3 4 5]

Tuple to Array:

[ 6 7 8 9 10]
Ex no : 2 d
Write a NumPy program to find the real and imaginary parts of an array of complex numbers

Program :

import numpy as np

# Create an array of complex numbers

complex_array = np.array([1 + 2j, 3 - 4j, 5 + 6j])

# Find the real and imaginary parts

real_parts = np.real(complex_array)

imaginary_parts = np.imag(complex_array)

print("Array of Complex Numbers:")

print(complex_array)

print("\nReal Parts:")

print(real_parts)

print("\nImaginary Parts:")

print(imaginary_parts)

Output :

Array of Complex Numbers:

[1.+2.j 3.-4.j 5.+6.j]

Real Parts:

[1. 3. 5.]

Imaginary Parts:

[ 2. -4. 6.]
Ex no : 3
Write a Pandas program to get the powers of an array values element-wise.
Note: First array elements raised to powers from second array
Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83

Program :

import pandas as pd

# Sample data

data = {'X': [78, 85, 96, 80, 86], 'Y': [84, 94, 89, 83, 86], 'Z': [86, 97, 96, 72, 83]}

# Create a DataFrame from the sample data

df = pd.DataFrame(data)

# Calculate the powers of array values element-wise

result_df = df.pow(df.index + 1, axis=0)

# Display the result

print(result_df)

Output :

X Y Z

0 78 84 86

1 85 94 97

2 96 89 96

3 80 83 72

4 86 86 83
Ex no : 4
Write a Pandas program to select the specified columns and rows from a given data frame.
Sample Python dictionary data and list labels:
Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Select specific columns and rows:
score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes

Program :

import numpy as np

import pandas as pd

# Sample data

exam_data = {

'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']

# Create a DataFrame from the sample data

df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])

# Select 'name' and 'score' columns in rows 1, 3, 5, 6

selected_data = df.loc[['b', 'd', 'f', 'g'], ['score', 'qualify']]

# Display the result

print("Select specific columns and rows:")

print(selected_data)

Output :

Select specific columns and rows:

score qualify

b 9.0 no

d NaN no

f 20.0 yes

g 14.5 yes

Ex no : 5
Write a Pandas program to count the number of rows and columns of a DataFrame. Sample
Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Number of Rows: 10
Number of Columns: 4

Program :

import numpy as np

import pandas as pd

# Sample data

exam_data = {

'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
}

# Create a DataFrame from the sample data

df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])

# Count the number of rows and columns

num_rows, num_columns = df.shape

# Display the result

print("Number of Rows:", num_rows)

print("Number of Columns:", num_columns)

Output :

Number of Rows: 10

Number of Columns: 4

Ex no : 6
Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set
(In Record )

Ex no : 7

Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Univariate analysis:

 Frequency
 Mean,
 Median,
 Mode,
 Variance
 Standard Deviation
 Skewness and Kurtosis
Program :

import pandas as pd

import numpy as np

from scipy.stats import skew, kurtosis

# Load the Pima Indians Diabetes dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"

column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",

"DiabetesPedigreeFunction", "Age", "Outcome"]

diabetes_data = pd.read_csv(url, names=column_names)

# Display the first few rows of the dataset

print("Dataset Head:")

print(diabetes_data.head())

# Univariate Analysis

for column in diabetes_data.columns:

print("\nColumn:", column)

print("Frequency:\n", diabetes_data[column].value_counts())

print("Mean:", diabetes_data[column].mean())

print("Median:", diabetes_data[column].median())

print("Mode:", diabetes_data[column].mode().values)

print("Variance:", diabetes_data[column].var())

print("Standard Deviation:", diabetes_data[column].std())

print("Skewness:", skew(diabetes_data[column]))

print("Kurtosis:", kurtosis(diabetes_data[column]))

Output:

Dataset Head:

Pregnancies GlucoseBloodPressureSkinThickness Insulin BMI DiabetesPedigreeFunction Age

Outcome
0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3 1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

Column: Pregnancies

Frequency:

1 135

0 111

2 103

3 75

4 68

5 57

6 50

7 45

8 38

9 28

10 24

11 11

13 10

12 9

14 2

15 1

17 1

Name: Pregnancies, dtype: int64

Mean: 3.8450520833333335

Median: 3.0

Mode: [1]

Variance: 11.35405632062147

Standard Deviation: 3.3695780626988623

Skewness: 0.9016739791518586

Kurtosis: 0.1592197711542494

...

Column: Outcome

Frequency:

0 500

1 268

Name: Outcome, dtype: int64

Mean: 0.3489583333333333

Median: 0.0

Mode: [0]

Variance: 0.22850161570824634

Standard Deviation: 0.4780286376712976

Skewness: 0.6350166433325007

Kurtosis: -1.601715582922407

Ex no : 8

Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:

Apply Bivariate analysis:

 Linear and logistic regression modeling

Program :

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression, LogisticRegression

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Pima Indians Diabetes dataset

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"

column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",

"DiabetesPedigreeFunction", "Age", "Outcome"]

diabetes_data = pd.read_csv(url, names=column_names)

# Separate features (X) and target variable (y)

X = diabetes_data.drop("Outcome", axis=1)

y = diabetes_data["Outcome"]

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression

linear_model = LinearRegression()

linear_model.fit(X_train, y_train)

# Print Linear Regression results

print("\nLinear Regression Coefficients:")

for feature, coef in zip(X.columns, linear_model.coef_):

print(f"{feature}: {coef}")

print("Intercept:", linear_model.intercept_)

linear_predictions = linear_model.predict(X_test)

print("\nLinear Regression Predictions (first 10):", linear_predictions[:10])

# Logistic Regression

logistic_model = LogisticRegression()

logistic_model.fit(X_train, y_train)

# Print Logistic Regression results

logistic_predictions = logistic_model.predict(X_test)

accuracy = accuracy_score(y_test, logistic_predictions)

conf_matrix = confusion_matrix(y_test, logistic_predictions)

classification_rep = classification_report(y_test, logistic_predictions)

print("\nLogistic Regression Accuracy:", accuracy)

print("\nConfusion Matrix:")

print(conf_matrix)

print("\nClassification Report:")

print(classification_rep)

Output:

Linear Regression Coefficients:

Pregnancies: 0.0208

Glucose: 0.0056

BloodPressure: -0.0032

SkinThickness: 0.0001

Insulin: -0.0002

BMI: 0.0124

DiabetesPedigreeFunction: 0.1472

Age: 0.0051

Intercept: -0.8254

Linear Regression Predictions (first 10):

[ 0.3216 0.2154 0.7811 0.1891 0.4727 0.2375 0.6484 0.4686 0.6511 0.5670]

Logistic Regression Accuracy: 0.7597

Confusion Matrix:

[[89 14]

[24 27]]
Classification Report:

precision recall f1-score support

0 0.79 0.86 0.82 103

1 0.66 0.53 0.59 51

accuracy 0.76 154

macro avg 0.73 0.70 0.71 154

weighted avg 0.75 0.76 0.75 154

Ex no : 9

Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:

Apply Bivariate analysis:

 Multiple Regression analysis

Program :

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset (replace 'diabetes.csv' with the actual file name)

data = pd.read_csv('C:/Users/Student/Downloads/diabetes.csv')

# Select relevant features (e.g., Glucose, BMI, BloodPressure, Insulin, Age)

X = data[['Glucose', 'BMI', 'BloodPressure', 'Insulin', 'Age']]

y = data['Outcome'] # Outcome: 1 for diabetes, 0 for non-diabetes

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the linear regression model

model = LinearRegression()

model.fit(X_train, y_train)

# Make predictions on the testing data

y_pred = model.predict(X_test)

# Evaluate the model

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

print(f"R-squared: {r2:.2f}")

Output :

Mean squared error:0.18

R -squared: 0.20

Ex no : 10

Apply and explore various plotting functions on UCI data set for performing the following:

a) Normal values
b) Density and contour plots
c) Three-dimensional plotting

Program :

import seaborn as sns

import matplotlib.pyplot as plt

import numpy as np

# Load a sample dataset (e.g., Iris dataset)

iris = sns.load_dataset("iris")

# a) Normal values plot

# Set the style

sns.set(style="whitegrid")

# Create subplots for each variable

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))

# Plot kernel density estimate for each variable

sns.kdeplot(data=iris, x="sepal_length", fill=True, ax=axes[0, 0], color="skyblue")

axes[0, 0].set_title("Kernel Density Plot - Sepal Length")

sns.kdeplot(data=iris, x="sepal_width", fill=True, ax=axes[0, 1], color="salmon")

axes[0, 1].set_title("Kernel Density Plot - Sepal Width")

sns.kdeplot(data=iris, x="petal_length", fill=True, ax=axes[1, 0], color="green")

axes[1, 0].set_title("Kernel Density Plot - Petal Length")

sns.kdeplot(data=iris, x="petal_width", fill=True, ax=axes[1, 1], color="orange")

axes[1, 1].set_title("Kernel Density Plot - Petal Width")

plt.suptitle("Normal Values Plot for Iris Dataset")

plt.tight_layout()

plt.show()

# b) Density and Contour Plots

plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)

sns.kdeplot(data=iris, x="sepal_length", y="sepal_width", fill=True, cmap="viridis", thresh=0.15)

plt.subplot(1, 2, 2)

sns.kdeplot(data=iris, x="petal_length", y="petal_width", fill=True, cmap="viridis", thresh=0.15)

plt.suptitle("Density and Contour Plots")

plt.show()

# Three-dimensional plotting

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}

ax.scatter(iris['sepal_length'], iris['petal_length'], iris['petal_width'], c=iris['species'].map(colors))

ax.set_xlabel('Sepal Length')

ax.set_ylabel('Petal Length')

ax.set_zlabel('Petal Width')

ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 11

Apply and explore various plotting functions on UCI data set for performing the following:

a) Correlation and scatter plots

b) Histograms
c) Three-dimensional plotting

Program :

import seaborn as sns

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

# Load a sample dataset (e.g., Iris dataset)

iris = sns.load_dataset("iris")

# a) Correlation and Scatter Plots

sns.set(style="ticks")

sns.pairplot(iris, hue="species", markers=["o", "s", "D"], palette="Set2")

plt.suptitle("Correlation and Scatter Plots")

plt.show()

# b) Histograms

plt.figure(figsize=(12, 6))

plt.subplot(1, 3, 1)

sns.histplot(iris['sepal_length'], kde=True, color="skyblue")

plt.title("Sepal Length Histogram")

plt.subplot(1, 3, 2)

sns.histplot(iris['sepal_width'], kde=True, color="salmon")

plt.title("Sepal Width Histogram")

plt.subplot(1, 3, 3)

sns.histplot(iris['petal_length'], kde=True, color="green")

plt.title("Petal Length Histogram")

plt.suptitle("Histograms")

plt.show()

# c) Three-dimensional plotting

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}

ax.scatter(iris['sepal_length'], iris['petal_length'], iris['petal_width'], c=iris['species'].map(colors))

ax.set_xlabel('Sepal Length')

ax.set_ylabel('Petal Length')

ax.set_zlabel('Petal Width')
ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 12

Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:

a) Normal values
b) Density and contour plots
c) Three-dimensional plotting
Program :

import seaborn as sns

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import pandas as pd

# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)

diabetes_path = "C:/Users/Student/Downloads/diabetes.csv" # Replace with the actual path

diabetes_df = pd.read_csv(diabetes_path)

# a) Normal Values Plot

plt.figure(figsize=(12, 6))

sns.set(style="whitegrid")

plt.subplot(1, 2, 1)

sns.kdeplot(data=diabetes_df, x="Glucose", y="BMI", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Glucose and BMI")

plt.subplot(1, 2, 2)

sns.kdeplot(data=diabetes_df, x="Insulin", y="BloodPressure", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Insulin and BloodPressure")

plt.suptitle("Density and Contour Plots")

plt.show()

# b) Density and Contour Plots

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)

sns.kdeplot(data=diabetes_df, x="Glucose", y="BMI", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Glucose and BMI")

plt.subplot(1, 2, 2)

sns.kdeplot(data=diabetes_df, x="Insulin", y="BloodPressure", fill=True, cmap="viridis", thresh=0.15)

plt.title("Density Plot for Insulin and BloodPressure")

plt.suptitle("Density and Contour Plots")

plt.show()

# c) Three-dimensional Plotting

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green

ax.scatter(diabetes_df['Glucose'], diabetes_df['BMI'], diabetes_df['Age'],

c=diabetes_df['Outcome'].map(colors))

ax.set_xlabel('Glucose')

ax.set_ylabel('BMI')

ax.set_zlabel('Age')

ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 13

Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:

a) Correlation and scatter plots

b) Histograms
c) Three-dimensional plotting

Program :

import seaborn as sns

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

import pandas as pd

# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)
diabetes_path = "C:/Users/Student/Downloads/diabetes.csv" # Replace with the actual path

diabetes_df = pd.read_csv(diabetes_path)

# a) Correlation and Scatter Plots

plt.figure(figsize=(12, 8))

correlation_matrix = diabetes_df.corr()

# Plotting the correlation matrix heatmap

sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)

plt.title("Correlation Matrix Heatmap")

plt.show()

# Scatter plots for selected variables

sns.pairplot(diabetes_df, vars=['Glucose', 'BMI', 'Age', 'Insulin'], hue='Outcome', markers=["o", "s"],

palette="Set1")

plt.suptitle("Scatter Plots")

plt.show()

# b) Histograms

plt.figure(figsize=(12, 6))

plt.subplot(2, 2, 1)

sns.histplot(diabetes_df['Glucose'], kde=True, color="skyblue")

plt.title("Glucose Histogram")

plt.subplot(2, 2, 2)

sns.histplot(diabetes_df['BMI'], kde=True, color="salmon")

plt.title("BMI Histogram")

plt.subplot(2, 2, 3)

sns.histplot(diabetes_df['Age'], kde=True, color="green")

plt.title("Age Histogram")

plt.subplot(2, 2, 4)

sns.histplot(diabetes_df['Insulin'], kde=True, color="orange")

plt.title("Insulin Histogram")

plt.suptitle("Histograms")

plt.tight_layout()

plt.show()

# c) Three-dimensional Plotting

fig = plt.figure(figsize=(10, 8))

ax = fig.add_subplot(111, projection='3d')

colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green

ax.scatter(diabetes_df['Glucose'], diabetes_df['BMI'], diabetes_df['Age'],

c=diabetes_df['Outcome'].map(colors))

ax.set_xlabel('Glucose')

ax.set_ylabel('BMI')

ax.set_zlabel('Age')

ax.set_title('Three-dimensional Plot')

plt.show()

Output :
Ex no : 14
Write a Pandas program to count number of columns of a DataFrame.
Sample Output:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Number of columns:
3
Program :

import pandas as pd

# Create the original DataFrame

data = {

'col1': [1, 2, 3, 4, 7],

'col2': [4, 5, 6, 9, 5],

'col3': [7, 8, 12, 1, 11]

df = pd.DataFrame(data)

# Display the original DataFrame

print("Original DataFrame:")

print(df)

# Count the number of columns

num_columns = df.shape[1]

print("Number of columns:")

print(num_columns)

Output :

Original DataFrame:

col1 col2 col3

0 1 4 7

1 2 5 8

2 3 6 12

3 4 9 1
4 7 5 11

Number of columns:

Ex no : 15

Write a Pandas program to group by the first column and get second column as lists in rows

Sample data:
Original DataFrame
col1 col2
0 C1 1
1 C1 2
2 C2 3
3 C2 3
4 C2 4
5 C3 6
6 C2 5
Group on the col1:
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object

Program :

import pandas as pd

# Create the original DataFrame

data = {

'col1': ['C1', 'C1', 'C2', 'C2', 'C2', 'C3', 'C2'],

'col2': [1, 2, 3, 3, 4, 6, 5]

df = pd.DataFrame(data)

# Group by the first column and aggregate the values of the second column as lists

result = df.groupby('col1')['col2'].apply(list)
print("Group on the col1:")

print(result)

Output :

Group on the col1:

col1

C1 [1, 2]

C2 [3, 3, 4, 5]

C3 [6]

Name: col2, dtype: object

Ex no : 16

Write a Pandas program to check whether a given column is present in a DataFrame or not.
Sample data:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Col4 is not present in DataFrame.
Col1 is present in DataFrame.

Program :

import pandas as pd

# Create the original DataFrame

data = {

'col1': [1, 2, 3, 4, 7],

'col2': [4, 5, 6, 9, 5],

'col3': [7, 8, 12, 1, 11]

df = pd.DataFrame(data)
# List of columns to check

columns_to_check = ['Col4', 'col1']

# Iterate over the list of columns and check if each column is present in the DataFrame

for col in columns_to_check:

try:

# Try to access the column

df[col]

print(f"{col} is present in DataFrame.")

except KeyError:

print(f"{col} is not present in DataFrame.")

Output :

Col4 is not present in DataFrame.

col1 is present in DataFrame.

Ex no : 17
Create two arrays of six elements. Write a NumPy program to count the number of instances
of a value occurring in one array on the condition of another array.
Sample Output:
Original arrays:
[ 10 -10 10 -10 -10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]
Number of instances of a value occurring in one array on the condition of another array:
3
Program :

import numpy as np

# Create two arrays

array1 = np.array([10, -10, 10, -10, -10, 10])

array2 = np.array([0.85, 0.45, 0.9, 0.8, 0.12, 0.6])

print("Original arrays:")
print(array1)

print(array2)

# Define the condition

condition = array2 > 0.5 # Condition: values in array2 greater than 0.5

# Count the number of instances of a value in array1 on the condition of array2

num_instances = np.sum(array1[condition])

print("Number of instances of a value occurring in one array on the condition of another array:")

print(num_instances)

Output :

Original arrays:

[ 10 -10 10 -10 -10 10]

[0.85 0.45 0.9 0.8 0.12 0.6 ]

Number of instances of a value occurring in one array on the condition of another array:

Ex no : 18
Create a 2-dimensional array of size 2 x 3, composed of 4-byte integer elements. Write a
NumPy program to find the number of occurrences of a sequence in the said array.
Sample Output:
Original NumPy array:
[[1 2 3]
[2 1 2]]
Type: <class 'numpy.ndarray'>
Sequence: 2,3
Number of occurrences of the said sequence: 2
Program :

import numpy as np

# Create the 2D array

array = np.array([[1, 2, 3],

[2, 1, 2]], dtype=np.int32)

# Define the sequence to find

sequence = np.array([2, 3], dtype=np.int32)

# Count occurrences of the sequence

count = 0

for row in array:

for i in range(len(row) - len(sequence) + 1):

if np.array_equal(row[i:i+len(sequence)], sequence):

count += 1

# Print the original array and its type

print("Original NumPy array:")

print(array)

print("Type:", type(array))

# Print the sequence and its number of occurrences

print("Sequence:", ", ".join(map(str, sequence)))

print("Number of occurrences of the said sequence:", count)

Output :

Original NumPy array:

[[1 2 3]

[2 1 2]]

Type: <class 'numpy.ndarray'>

Sequence: 2, 3

Number of occurrences of the said sequence: 1

Ex no : 19
Write a NumPy program to merge three given NumPy arrays of same shape
Program :
import numpy as np

# Three NumPy arrays of the same shape

array1 = np.array([[1, 2, 3], [4, 5, 6]])

array2 = np.array([[7, 8, 9], [10, 11, 12]])

array3 = np.array([[13, 14, 15], [16, 17, 18]])

# Merge the arrays

merged_array = np.stack((array1, array2, array3))

print("Merged array:")

print(merged_array)

Output :

Merged array:

[[[ 1 2 3]

[ 4 5 6]]

[[ 7 8 9]

[10 11 12]]

[[13 14 15]

[16 17 18]]]

Ex no : 20

Write a NumPy program to combine last element with first element of two given ndarray with
different shapes.

Sample Output:
Original arrays:
['PHP', 'JS', 'C++']
['Python', 'C#', 'NumPy']
After Combining:
['PHP' 'JS' 'C++Python' 'C#' 'NumPy']
Program :

import numpy as np

# Original arrays

array1 = np.array(['PHP', 'JS', 'C++'])

array2 = np.array(['Python', 'C#', 'NumPy'])

# Combine arrays

combined_array = np.concatenate((array1, array2))

print("Original arrays:")

print(array1)

print(array2)

print("After Combining:")

print(combined_array)

Output :

Original arrays:

['PHP' 'JS' 'C++']

['Python' 'C#' 'NumPy']

After Combining:

['PHP' 'JS' 'C++' 'Python' 'C#' 'NumPy']

(Steven S. Muchnick) Advanced Compiler Design and
79% (14)
(Steven S. Muchnick) Advanced Compiler Design and
887 pages
Nature Inspired Computing Notes 1
100% (1)
Nature Inspired Computing Notes 1
22 pages
Selenium Student Material
57% (7)
Selenium Student Material
69 pages
ECOGRAFO E-Cube-7 - Service Manual Eng Rev 01
No ratings yet
ECOGRAFO E-Cube-7 - Service Manual Eng Rev 01
261 pages
KRG331
No ratings yet
KRG331
583 pages
Pandas Practicals - Term-1
100% (1)
Pandas Practicals - Term-1
18 pages
Pandas Questions Ip File
No ratings yet
Pandas Questions Ip File
13 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
PCI Journal Prestressed Poles JL 88 January February 3
100% (1)
PCI Journal Prestressed Poles JL 88 January February 3
23 pages
Bank Additional Full
No ratings yet
Bank Additional Full
1,578 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Even Students
No ratings yet
Even Students
36 pages
Sanyam Data Science
No ratings yet
Sanyam Data Science
33 pages
Practical Record Programs - Solutions
No ratings yet
Practical Record Programs - Solutions
23 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
MB PV Utility EN Web
No ratings yet
MB PV Utility EN Web
24 pages
505E Digital Governor Manual
No ratings yet
505E Digital Governor Manual
5 pages
Lucknow Public School - 20241201 - 220143 - 0000
No ratings yet
Lucknow Public School - 20241201 - 220143 - 0000
44 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
CLASS XII - IP List of Practicals With Coding 2020
No ratings yet
CLASS XII - IP List of Practicals With Coding 2020
15 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
DV Lab Manual Modified
No ratings yet
DV Lab Manual Modified
31 pages
Manual
No ratings yet
Manual
52 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Python 1
No ratings yet
Python 1
16 pages
Manual
No ratings yet
Manual
48 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
Dfs Manual
No ratings yet
Dfs Manual
43 pages
Data Science & Analytics Lab Manual
No ratings yet
Data Science & Analytics Lab Manual
39 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
FDS Slot 1
No ratings yet
FDS Slot 1
19 pages
Rufh 4
No ratings yet
Rufh 4
24 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
IP Practical File 2022
No ratings yet
IP Practical File 2022
26 pages
Cs230exam Win19 Soln
No ratings yet
Cs230exam Win19 Soln
29 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
Mri 160725083241
No ratings yet
Mri 160725083241
10 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
CS3361 Set2
No ratings yet
CS3361 Set2
13 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
Ge - Computer Science Data Analysis
No ratings yet
Ge - Computer Science Data Analysis
16 pages
A Short Introduction To Font Characteristics
No ratings yet
A Short Introduction To Font Characteristics
8 pages
Python
No ratings yet
Python
32 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
Agent Architectures
No ratings yet
Agent Architectures
81 pages
CS3361 Lab Exp
No ratings yet
CS3361 Lab Exp
9 pages
PDK 201840 Software Aipex en
No ratings yet
PDK 201840 Software Aipex en
113 pages
Class 12 IP File 23 24
No ratings yet
Class 12 IP File 23 24
27 pages
Practical File
No ratings yet
Practical File
19 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Fdspracticals - Ipynb - Colaboratory
No ratings yet
Fdspracticals - Ipynb - Colaboratory
21 pages
Univds
No ratings yet
Univds
8 pages
Ashok Leyland
100% (1)
Ashok Leyland
1 page
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
11th PGM
No ratings yet
11th PGM
9 pages
Dal Programs With Output
No ratings yet
Dal Programs With Output
11 pages
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
No ratings yet
#Pip Install Pandas #Pandas Can Be Installed Using:: Import
6 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
2nd Yr AIDS
No ratings yet
2nd Yr AIDS
6 pages
CS3361 Set2
No ratings yet
CS3361 Set2
6 pages
Women Safety Device
100% (1)
Women Safety Device
3 pages
Nano Studio Manual
No ratings yet
Nano Studio Manual
44 pages
Review On Advanced Vehicle Recognition System OCR and REST API Integration For Efficient Results
No ratings yet
Review On Advanced Vehicle Recognition System OCR and REST API Integration For Efficient Results
78 pages
FDP Aids Aiml
No ratings yet
FDP Aids Aiml
1 page
Practical Assignment4 1
No ratings yet
Practical Assignment4 1
6 pages
Data Sci
No ratings yet
Data Sci
6 pages
CPH101L Module 01 Introduction
No ratings yet
CPH101L Module 01 Introduction
27 pages
RERB Informed Consent Form
No ratings yet
RERB Informed Consent Form
5 pages
Batch 1 Set Question
No ratings yet
Batch 1 Set Question
3 pages
Aligned Documentation System
No ratings yet
Aligned Documentation System
3 pages
4th Yr AIDS
No ratings yet
4th Yr AIDS
4 pages
MDSAP QMS Implementation Plan Version 005
No ratings yet
MDSAP QMS Implementation Plan Version 005
12 pages
Ly CRQWG 9 WPWGX
No ratings yet
Ly CRQWG 9 WPWGX
6 pages
JournalNX - Smart Public Transport
No ratings yet
JournalNX - Smart Public Transport
3 pages
EDU 3 Ian G. Module 1 Lesson 1
No ratings yet
EDU 3 Ian G. Module 1 Lesson 1
3 pages
Tentative Schedule of End Semester (Odd) Exam 2024-25
No ratings yet
Tentative Schedule of End Semester (Odd) Exam 2024-25
9 pages
From 1
No ratings yet
From 1
1 page
CCS334 Big Data Analytics Daily Test Qp3
No ratings yet
CCS334 Big Data Analytics Daily Test Qp3
1 page
ODI Administration and Development Outline
No ratings yet
ODI Administration and Development Outline
7 pages
Delphinus Guidelines
No ratings yet
Delphinus Guidelines
8 pages
Upload As Much As You Need! Unlimited Volume of Uploaded Files.
No ratings yet
Upload As Much As You Need! Unlimited Volume of Uploaded Files.
1 page
Aashlar Bussiness School, Farah, (Mathura) : Project Report and Research
No ratings yet
Aashlar Bussiness School, Farah, (Mathura) : Project Report and Research
7 pages
Pokémon Legends: Arceus: PC & Console 2022
No ratings yet
Pokémon Legends: Arceus: PC & Console 2022
1 page
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet