Ex no: 1 a
Write a NumPy program to create a null vector of size 10 and update sixth value to 11
Program
import numpy as np
# Create a null vector of size 10
null_vector = np.zeros(10)
# Update the sixth value to 11 (indexing starts from 0)
null_vector[5] = 11
print("Original null vector:", null_vector)
Output:
Original null vector: [ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]
Ex no : 1 b
Write a NumPy program to convert an array to a float type
Program :
import numpy as np
# Create an example array (you can replace this with your own array)
integer_array = np.array([1, 2, 3, 4, 5])
# Convert the array to float type
float_array = integer_array.astype(float)
print("Original array (integer):", integer_array)
print("Converted array (float):", float_array)
Output:
Original array (integer): [1 2 3 4 5]
Converted array (float): [1. 2. 3. 4. 5.]
Ex no : 1 c
Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10
Program :
import numpy as np
# Create a 1D array with values ranging from 2 to 10
values_array = np.arange(2, 11)
# Reshape the 1D array into a 3x3 matrix
matrix_3x3 = values_array.reshape(3, 3)
print("3x3 Matrix with values ranging from 2 to 10:")
print(matrix_3x3)
Output :
3x3 Matrix with values ranging from 2 to 10:
[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]
Ex no : 1 d
Write a NumPy program to convert a list of numeric value into a one-dimensional NumPy
array
Program :
import numpy as np
# Create a list of numeric values
numeric_list = [1, 2, 3, 4, 5]
# Convert the list to a one-dimensional NumPy array
numpy_array = np.array(numeric_list)
print("List of numeric values:", numeric_list)
print("One-dimensional NumPy array:", numpy_array)
Output :
List of numeric values: [1, 2, 3, 4, 5]
One-dimensional NumPy array: [1 2 3 4 5]
Ex no : 2 a
Write a NumPy program to convert an array to a float type
Program :
import numpy as np
# Create an example array (you can replace this with your own array)
original_array = np.array([1, 2, 3, 4, 5])
# Convert the array to float type
float_array = original_array.astype(float)
print("Original array:", original_array)
print("Converted array (float):", float_array)
Output :
Original array: [1 2 3 4 5]
Converted array (float): [1. 2. 3. 4. 5.]
Ex no : 2 b
Write a NumPy program to create an empty and a full array
Program :
import numpy as np
# Create an empty array
empty_array = np.empty((3, 3)) # Specify the shape of the empty array (3x3 in this case)
# Create a full array with a specified value
full_array = np.full((2, 4), 7) # Specify the shape and the value (2x4 array with value 7)
print("Empty Array:")
print(empty_array)
print("\nFull Array with Value 7:")
print(full_array)
Output :
Empty Array:
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
Full Array with Value 7:
[[7 7 7 7]
[7 7 7 7]]
Ex no : 2 c
Write a NumPy program to convert a list and tuple into arrays
Program :
import numpy as np
# Convert a list to a NumPy array
list_values = [1, 2, 3, 4, 5]
array_from_list = np.array(list_values)
# Convert a tuple to a NumPy array
tuple_values = (6, 7, 8, 9, 10)
array_from_tuple = np.array(tuple_values)
print("List to Array:")
print(array_from_list)
print("\nTuple to Array:")
print(array_from_tuple)
Output :
List to Array:
[1 2 3 4 5]
Tuple to Array:
[ 6 7 8 9 10]
Ex no : 2 d
Write a NumPy program to find the real and imaginary parts of an array of complex numbers
Program :
import numpy as np
# Create an array of complex numbers
complex_array = np.array([1 + 2j, 3 - 4j, 5 + 6j])
# Find the real and imaginary parts
real_parts = np.real(complex_array)
imaginary_parts = np.imag(complex_array)
print("Array of Complex Numbers:")
print(complex_array)
print("\nReal Parts:")
print(real_parts)
print("\nImaginary Parts:")
print(imaginary_parts)
Output :
Array of Complex Numbers:
[1.+2.j 3.-4.j 5.+6.j]
Real Parts:
[1. 3. 5.]
Imaginary Parts:
[ 2. -4. 6.]
Ex no : 3
Write a Pandas program to get the powers of an array values element-wise.
Note: First array elements raised to powers from second array
Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
Program :
import pandas as pd
# Sample data
data = {'X': [78, 85, 96, 80, 86], 'Y': [84, 94, 89, 83, 86], 'Z': [86, 97, 96, 72, 83]}
# Create a DataFrame from the sample data
df = pd.DataFrame(data)
# Calculate the powers of array values element-wise
result_df = df.pow(df.index + 1, axis=0)
# Display the result
print(result_df)
Output :
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
Ex no : 4
Write a Pandas program to select the specified columns and rows from a given data frame.
Sample Python dictionary data and list labels:
Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Select specific columns and rows:
score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes
Program :
import numpy as np
import pandas as pd
# Sample data
exam_data = {
'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
# Create a DataFrame from the sample data
df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
# Select 'name' and 'score' columns in rows 1, 3, 5, 6
selected_data = df.loc[['b', 'd', 'f', 'g'], ['score', 'qualify']]
# Display the result
print("Select specific columns and rows:")
print(selected_data)
Output :
Select specific columns and rows:
score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes
Ex no : 5
Write a Pandas program to count the number of rows and columns of a DataFrame. Sample
Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',
'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Number of Rows: 10
Number of Columns: 4
Program :
import numpy as np
import pandas as pd
# Sample data
exam_data = {
'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']
}
# Create a DataFrame from the sample data
df = pd.DataFrame(exam_data, index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
# Count the number of rows and columns
num_rows, num_columns = df.shape
# Display the result
print("Number of Rows:", num_rows)
print("Number of Columns:", num_columns)
Output :
Number of Rows: 10
Number of Columns: 4
Ex no : 6
Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set
(In Record )
Ex no : 7
Use the diabetes data set from Pima Indians Diabetes data set for performing the following:
Apply Univariate analysis:
Frequency
Mean,
Median,
Mode,
Variance
Standard Deviation
Skewness and Kurtosis
Program :
import pandas as pd
import numpy as np
from scipy.stats import skew, kurtosis
# Load the Pima Indians Diabetes dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",
"DiabetesPedigreeFunction", "Age", "Outcome"]
diabetes_data = pd.read_csv(url, names=column_names)
# Display the first few rows of the dataset
print("Dataset Head:")
print(diabetes_data.head())
# Univariate Analysis
for column in diabetes_data.columns:
print("\nColumn:", column)
print("Frequency:\n", diabetes_data[column].value_counts())
print("Mean:", diabetes_data[column].mean())
print("Median:", diabetes_data[column].median())
print("Mode:", diabetes_data[column].mode().values)
print("Variance:", diabetes_data[column].var())
print("Standard Deviation:", diabetes_data[column].std())
print("Skewness:", skew(diabetes_data[column]))
print("Kurtosis:", kurtosis(diabetes_data[column]))
Output:
Dataset Head:
Pregnancies GlucoseBloodPressureSkinThickness Insulin BMI DiabetesPedigreeFunction Age
Outcome
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
Column: Pregnancies
Frequency:
1 135
0 111
2 103
3 75
4 68
5 57
6 50
7 45
8 38
9 28
10 24
11 11
13 10
12 9
14 2
15 1
17 1
Name: Pregnancies, dtype: int64
Mean: 3.8450520833333335
Median: 3.0
Mode: [1]
Variance: 11.35405632062147
Standard Deviation: 3.3695780626988623
Skewness: 0.9016739791518586
Kurtosis: 0.1592197711542494
...
Column: Outcome
Frequency:
0 500
1 268
Name: Outcome, dtype: int64
Mean: 0.3489583333333333
Median: 0.0
Mode: [0]
Variance: 0.22850161570824634
Standard Deviation: 0.4780286376712976
Skewness: 0.6350166433325007
Kurtosis: -1.601715582922407
Ex no : 8
Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:
Apply Bivariate analysis:
Linear and logistic regression modeling
Program :
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Load the Pima Indians Diabetes dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-
indians-diabetes.data"
column_names = ["Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI",
"DiabetesPedigreeFunction", "Age", "Outcome"]
diabetes_data = pd.read_csv(url, names=column_names)
# Separate features (X) and target variable (y)
X = diabetes_data.drop("Outcome", axis=1)
y = diabetes_data["Outcome"]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Linear Regression
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
# Print Linear Regression results
print("\nLinear Regression Coefficients:")
for feature, coef in zip(X.columns, linear_model.coef_):
print(f"{feature}: {coef}")
print("Intercept:", linear_model.intercept_)
linear_predictions = linear_model.predict(X_test)
print("\nLinear Regression Predictions (first 10):", linear_predictions[:10])
# Logistic Regression
logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
# Print Logistic Regression results
logistic_predictions = logistic_model.predict(X_test)
accuracy = accuracy_score(y_test, logistic_predictions)
conf_matrix = confusion_matrix(y_test, logistic_predictions)
classification_rep = classification_report(y_test, logistic_predictions)
print("\nLogistic Regression Accuracy:", accuracy)
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(classification_rep)
Output:
Linear Regression Coefficients:
Pregnancies: 0.0208
Glucose: 0.0056
BloodPressure: -0.0032
SkinThickness: 0.0001
Insulin: -0.0002
BMI: 0.0124
DiabetesPedigreeFunction: 0.1472
Age: 0.0051
Intercept: -0.8254
Linear Regression Predictions (first 10):
[ 0.3216 0.2154 0.7811 0.1891 0.4727 0.2375 0.6484 0.4686 0.6511 0.5670]
Logistic Regression Accuracy: 0.7597
Confusion Matrix:
[[89 14]
[24 27]]
Classification Report:
precision recall f1-score support
0 0.79 0.86 0.82 103
1 0.66 0.53 0.59 51
accuracy 0.76 154
macro avg 0.73 0.70 0.71 154
weighted avg 0.75 0.76 0.75 154
Ex no : 9
Use the diabetes data set from Pima Indians Diabetes data set for performing the
following:
Apply Bivariate analysis:
Multiple Regression analysis
Program :
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the dataset (replace 'diabetes.csv' with the actual file name)
data = pd.read_csv('C:/Users/Student/Downloads/diabetes.csv')
# Select relevant features (e.g., Glucose, BMI, BloodPressure, Insulin, Age)
X = data[['Glucose', 'BMI', 'BloodPressure', 'Insulin', 'Age']]
y = data['Outcome'] # Outcome: 1 for diabetes, 0 for non-diabetes
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the testing data
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")
Output :
Mean squared error:0.18
R -squared: 0.20
Ex no : 10
Apply and explore various plotting functions on UCI data set for performing the following:
a) Normal values
b) Density and contour plots
c) Three-dimensional plotting
Program :
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Load a sample dataset (e.g., Iris dataset)
iris = sns.load_dataset("iris")
# a) Normal values plot
# Set the style
sns.set(style="whitegrid")
# Create subplots for each variable
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
# Plot kernel density estimate for each variable
sns.kdeplot(data=iris, x="sepal_length", fill=True, ax=axes[0, 0], color="skyblue")
axes[0, 0].set_title("Kernel Density Plot - Sepal Length")
sns.kdeplot(data=iris, x="sepal_width", fill=True, ax=axes[0, 1], color="salmon")
axes[0, 1].set_title("Kernel Density Plot - Sepal Width")
sns.kdeplot(data=iris, x="petal_length", fill=True, ax=axes[1, 0], color="green")
axes[1, 0].set_title("Kernel Density Plot - Petal Length")
sns.kdeplot(data=iris, x="petal_width", fill=True, ax=axes[1, 1], color="orange")
axes[1, 1].set_title("Kernel Density Plot - Petal Width")
plt.suptitle("Normal Values Plot for Iris Dataset")
plt.tight_layout()
plt.show()
# b) Density and Contour Plots
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.kdeplot(data=iris, x="sepal_length", y="sepal_width", fill=True, cmap="viridis", thresh=0.15)
plt.subplot(1, 2, 2)
sns.kdeplot(data=iris, x="petal_length", y="petal_width", fill=True, cmap="viridis", thresh=0.15)
plt.suptitle("Density and Contour Plots")
plt.show()
# Three-dimensional plotting
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
colors = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}
ax.scatter(iris['sepal_length'], iris['petal_length'], iris['petal_width'], c=iris['species'].map(colors))
ax.set_xlabel('Sepal Length')
ax.set_ylabel('Petal Length')
ax.set_zlabel('Petal Width')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 11
Apply and explore various plotting functions on UCI data set for performing the following:
a) Correlation and scatter plots
b) Histograms
c) Three-dimensional plotting
Program :
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Load a sample dataset (e.g., Iris dataset)
iris = sns.load_dataset("iris")
# a) Correlation and Scatter Plots
sns.set(style="ticks")
sns.pairplot(iris, hue="species", markers=["o", "s", "D"], palette="Set2")
plt.suptitle("Correlation and Scatter Plots")
plt.show()
# b) Histograms
plt.figure(figsize=(12, 6))
plt.subplot(1, 3, 1)
sns.histplot(iris['sepal_length'], kde=True, color="skyblue")
plt.title("Sepal Length Histogram")
plt.subplot(1, 3, 2)
sns.histplot(iris['sepal_width'], kde=True, color="salmon")
plt.title("Sepal Width Histogram")
plt.subplot(1, 3, 3)
sns.histplot(iris['petal_length'], kde=True, color="green")
plt.title("Petal Length Histogram")
plt.suptitle("Histograms")
plt.show()
# c) Three-dimensional plotting
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
colors = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}
ax.scatter(iris['sepal_length'], iris['petal_length'], iris['petal_width'], c=iris['species'].map(colors))
ax.set_xlabel('Sepal Length')
ax.set_ylabel('Petal Length')
ax.set_zlabel('Petal Width')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 12
Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:
a) Normal values
b) Density and contour plots
c) Three-dimensional plotting
Program :
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)
diabetes_path = "C:/Users/Student/Downloads/diabetes.csv" # Replace with the actual path
diabetes_df = pd.read_csv(diabetes_path)
# a) Normal Values Plot
plt.figure(figsize=(12, 6))
sns.set(style="whitegrid")
plt.subplot(1, 2, 1)
sns.kdeplot(data=diabetes_df, x="Glucose", y="BMI", fill=True, cmap="viridis", thresh=0.15)
plt.title("Density Plot for Glucose and BMI")
plt.subplot(1, 2, 2)
sns.kdeplot(data=diabetes_df, x="Insulin", y="BloodPressure", fill=True, cmap="viridis", thresh=0.15)
plt.title("Density Plot for Insulin and BloodPressure")
plt.suptitle("Density and Contour Plots")
plt.show()
# b) Density and Contour Plots
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.kdeplot(data=diabetes_df, x="Glucose", y="BMI", fill=True, cmap="viridis", thresh=0.15)
plt.title("Density Plot for Glucose and BMI")
plt.subplot(1, 2, 2)
sns.kdeplot(data=diabetes_df, x="Insulin", y="BloodPressure", fill=True, cmap="viridis", thresh=0.15)
plt.title("Density Plot for Insulin and BloodPressure")
plt.suptitle("Density and Contour Plots")
plt.show()
# c) Three-dimensional Plotting
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green
ax.scatter(diabetes_df['Glucose'], diabetes_df['BMI'], diabetes_df['Age'],
c=diabetes_df['Outcome'].map(colors))
ax.set_xlabel('Glucose')
ax.set_ylabel('BMI')
ax.set_zlabel('Age')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 13
Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:
a) Correlation and scatter plots
b) Histograms
c) Three-dimensional plotting
Program :
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
# Load the Pima Indians Diabetes dataset (replace 'path/to/diabetes.csv' with the actual path)
diabetes_path = "C:/Users/Student/Downloads/diabetes.csv" # Replace with the actual path
diabetes_df = pd.read_csv(diabetes_path)
# a) Correlation and Scatter Plots
plt.figure(figsize=(12, 8))
correlation_matrix = diabetes_df.corr()
# Plotting the correlation matrix heatmap
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix Heatmap")
plt.show()
# Scatter plots for selected variables
sns.pairplot(diabetes_df, vars=['Glucose', 'BMI', 'Age', 'Insulin'], hue='Outcome', markers=["o", "s"],
palette="Set1")
plt.suptitle("Scatter Plots")
plt.show()
# b) Histograms
plt.figure(figsize=(12, 6))
plt.subplot(2, 2, 1)
sns.histplot(diabetes_df['Glucose'], kde=True, color="skyblue")
plt.title("Glucose Histogram")
plt.subplot(2, 2, 2)
sns.histplot(diabetes_df['BMI'], kde=True, color="salmon")
plt.title("BMI Histogram")
plt.subplot(2, 2, 3)
sns.histplot(diabetes_df['Age'], kde=True, color="green")
plt.title("Age Histogram")
plt.subplot(2, 2, 4)
sns.histplot(diabetes_df['Insulin'], kde=True, color="orange")
plt.title("Insulin Histogram")
plt.suptitle("Histograms")
plt.tight_layout()
plt.show()
# c) Three-dimensional Plotting
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
colors = {0: 'red', 1: 'green'} # Assuming Outcome 0 as red and Outcome 1 as green
ax.scatter(diabetes_df['Glucose'], diabetes_df['BMI'], diabetes_df['Age'],
c=diabetes_df['Outcome'].map(colors))
ax.set_xlabel('Glucose')
ax.set_ylabel('BMI')
ax.set_zlabel('Age')
ax.set_title('Three-dimensional Plot')
plt.show()
Output :
Ex no : 14
Write a Pandas program to count number of columns of a DataFrame.
Sample Output:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Number of columns:
3
Program :
import pandas as pd
# Create the original DataFrame
data = {
'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]
df = pd.DataFrame(data)
# Display the original DataFrame
print("Original DataFrame:")
print(df)
# Count the number of columns
num_columns = df.shape[1]
print("Number of columns:")
print(num_columns)
Output :
Original DataFrame:
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11
Number of columns:
Ex no : 15
Write a Pandas program to group by the first column and get second column as lists in rows
Sample data:
Original DataFrame
col1 col2
0 C1 1
1 C1 2
2 C2 3
3 C2 3
4 C2 4
5 C3 6
6 C2 5
Group on the col1:
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object
Program :
import pandas as pd
# Create the original DataFrame
data = {
'col1': ['C1', 'C1', 'C2', 'C2', 'C2', 'C3', 'C2'],
'col2': [1, 2, 3, 3, 4, 6, 5]
df = pd.DataFrame(data)
# Group by the first column and aggregate the values of the second column as lists
result = df.groupby('col1')['col2'].apply(list)
print("Group on the col1:")
print(result)
Output :
Group on the col1:
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object
Ex no : 16
Write a Pandas program to check whether a given column is present in a DataFrame or not.
Sample data:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
Col4 is not present in DataFrame.
Col1 is present in DataFrame.
Program :
import pandas as pd
# Create the original DataFrame
data = {
'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]
df = pd.DataFrame(data)
# List of columns to check
columns_to_check = ['Col4', 'col1']
# Iterate over the list of columns and check if each column is present in the DataFrame
for col in columns_to_check:
try:
# Try to access the column
df[col]
print(f"{col} is present in DataFrame.")
except KeyError:
print(f"{col} is not present in DataFrame.")
Output :
Col4 is not present in DataFrame.
col1 is present in DataFrame.
Ex no : 17
Create two arrays of six elements. Write a NumPy program to count the number of instances
of a value occurring in one array on the condition of another array.
Sample Output:
Original arrays:
[ 10 -10 10 -10 -10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]
Number of instances of a value occurring in one array on the condition of another array:
3
Program :
import numpy as np
# Create two arrays
array1 = np.array([10, -10, 10, -10, -10, 10])
array2 = np.array([0.85, 0.45, 0.9, 0.8, 0.12, 0.6])
print("Original arrays:")
print(array1)
print(array2)
# Define the condition
condition = array2 > 0.5 # Condition: values in array2 greater than 0.5
# Count the number of instances of a value in array1 on the condition of array2
num_instances = np.sum(array1[condition])
print("Number of instances of a value occurring in one array on the condition of another array:")
print(num_instances)
Output :
Original arrays:
[ 10 -10 10 -10 -10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]
Number of instances of a value occurring in one array on the condition of another array:
Ex no : 18
Create a 2-dimensional array of size 2 x 3, composed of 4-byte integer elements. Write a
NumPy program to find the number of occurrences of a sequence in the said array.
Sample Output:
Original NumPy array:
[[1 2 3]
[2 1 2]]
Type: <class 'numpy.ndarray'>
Sequence: 2,3
Number of occurrences of the said sequence: 2
Program :
import numpy as np
# Create the 2D array
array = np.array([[1, 2, 3],
[2, 1, 2]], dtype=np.int32)
# Define the sequence to find
sequence = np.array([2, 3], dtype=np.int32)
# Count occurrences of the sequence
count = 0
for row in array:
for i in range(len(row) - len(sequence) + 1):
if np.array_equal(row[i:i+len(sequence)], sequence):
count += 1
# Print the original array and its type
print("Original NumPy array:")
print(array)
print("Type:", type(array))
# Print the sequence and its number of occurrences
print("Sequence:", ", ".join(map(str, sequence)))
print("Number of occurrences of the said sequence:", count)
Output :
Original NumPy array:
[[1 2 3]
[2 1 2]]
Type: <class 'numpy.ndarray'>
Sequence: 2, 3
Number of occurrences of the said sequence: 1
Ex no : 19
Write a NumPy program to merge three given NumPy arrays of same shape
Program :
import numpy as np
# Three NumPy arrays of the same shape
array1 = np.array([[1, 2, 3], [4, 5, 6]])
array2 = np.array([[7, 8, 9], [10, 11, 12]])
array3 = np.array([[13, 14, 15], [16, 17, 18]])
# Merge the arrays
merged_array = np.stack((array1, array2, array3))
print("Merged array:")
print(merged_array)
Output :
Merged array:
[[[ 1 2 3]
[ 4 5 6]]
[[ 7 8 9]
[10 11 12]]
[[13 14 15]
[16 17 18]]]
Ex no : 20
Write a NumPy program to combine last element with first element of two given ndarray with
different shapes.
Sample Output:
Original arrays:
['PHP', 'JS', 'C++']
['Python', 'C#', 'NumPy']
After Combining:
['PHP' 'JS' 'C++Python' 'C#' 'NumPy']
Program :
import numpy as np
# Original arrays
array1 = np.array(['PHP', 'JS', 'C++'])
array2 = np.array(['Python', 'C#', 'NumPy'])
# Combine arrays
combined_array = np.concatenate((array1, array2))
print("Original arrays:")
print(array1)
print(array2)
print("After Combining:")
print(combined_array)
Output :
Original arrays:
['PHP' 'JS' 'C++']
['Python' 'C#' 'NumPy']
After Combining:
['PHP' 'JS' 'C++' 'Python' 'C#' 'NumPy']