0% found this document useful (0 votes)
27 views21 pages

Fdspracticals - Ipynb - Colaboratory

The document discusses NumPy and Pandas programs to perform various tasks like creating arrays and matrices, selecting rows and columns from DataFrames, and loading and exploring the Iris data set from different sources like text files, Excel, and the web. Descriptive analytics like summary statistics, correlations, and value counts are also calculated on the Iris data.

Uploaded by

cc76747321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views21 pages

Fdspracticals - Ipynb - Colaboratory

The document discusses NumPy and Pandas programs to perform various tasks like creating arrays and matrices, selecting rows and columns from DataFrames, and loading and exploring the Iris data set from different sources like text files, Excel, and the web. Descriptive analytics like summary statistics, correlations, and value counts are also calculated on the Iris data.

Uploaded by

cc76747321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

1. a.

Write a NumPy program to create a null vector of size 10 and update sixth value to 11

b. Write a NumPy program to convert an array to a float type

c. Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10

d. Write a NumPy program to convert a list of numeric value into a one- dimensional NumPy array

#a
import numpy as np

# Create a null vector of size 10


vector = np.zeros(10)

# Update the sixth value to 11


vector[5] = 11

print(vector)

[ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]

#b
import numpy as np

# Create a sample array


arr = np.array([1, 2, 3, 4, 5])

# Convert the array to float type (using astype method)


float_arr = arr.astype(np.float32) # You can also use np.float64 for double precision

# Print the original and converted arrays


print("Original array:", arr)
print("Float array:", float_arr)

Original array: [1 2 3 4 5]
Float array: [1. 2. 3. 4. 5.]

#c
import numpy as np

# Create a 1D array with values from 2 to 10 (excluding 10)


arr = np.arange(2, 11)

# Reshape the array to a 3x3 matrix


matrix = arr.reshape(3, 3)

# Print the matrix


print(matrix)

[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]

#d
import numpy as np
p py p

# Create a list of numeric values


data_list = [1, 2.5, 3, 4, 5]

# Convert the list to a NumPy array


arr = np.array(data_list)

# Print the array


print(arr)

[1. 2.5 3. 4. 5. ]

2. a. Write a NumPy program to convert an array to a float type

b. Write a NumPy program to create an empty and a full array

c. Write a NumPy program to convert a list and tuple into arrays

d. Write a NumPy program to find the real and imaginary parts of an array of complex numbers

#a
import numpy as np

# Create a sample array


arr = np.array([1, 2, 3, 4, 5])

# Convert the array to float type (using astype method)


float_arr = arr.astype(np.float32) # You can also use np.float64 for double precision

# Print the original and converted arrays


print("Original array:", arr)
print("Float array:", float_arr)

Original array: [1 2 3 4 5]
Float array: [1. 2. 3. 4. 5.]

#b
import numpy as np

# Create an empty array of shape (3, 4) with float32 data type


empty_arr = np.empty((3, 4), dtype=np.float32)

# Create a full array of shape (2, 3) with integer values from 0 to 5


full_arr = np.full((2, 3), 5)

# Print the empty and full arrays


print("Empty array:", empty_arr)
print("Full array:", full_arr)

Empty array: [[-2.6408402e-04 3.3350903e-41 0.0000000e+00 0.0000000e+00]


[ 1.4433374e-43 1.3592595e-43 1.5134023e-43 4.4841551e-44]
[ 1.6114932e-43 1.4153114e-43 1.4153114e-43 1.4993894e-43]]
Full array: [[5 5 5]
[5 5 5]]
#c
import numpy as np

# Create a list and a tuple of numeric values


data_list = [1, 2.5, 3, 4, 5]
data_tuple = (10, 20.5, 30, 40, 50)

# Convert the list and tuple to NumPy arrays


list_arr = np.array(data_list)
tuple_arr = np.array(data_tuple)

# Print the arrays from list and tuple


print("Array from list:", list_arr)
print("Array from tuple:", tuple_arr)

Array from list: [1. 2.5 3. 4. 5. ]


Array from tuple: [10. 20.5 30. 40. 50. ]

#d
import numpy as np

# Create a complex number array


complex_arr = np.array([1 + 2j, 3 + 4j, 5 - 6j])

# Extract the real and imaginary parts using real and imag attributes
real_part = complex_arr.real
imag_part = complex_arr.imag

# Print the real and imaginary parts


print("Real part:", real_part)
print("Imaginary part:", imag_part)

Real part: [1. 3. 5.]


Imaginary part: [ 2. 4. -6.]

3. Write a Pandas program to get the powers of an array values element- wise. Note: First array elements raised to
powers from second array

Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}

Expected Output:

XYZ

0 78 84 86

1 85 94 97

2 96 89 96

3 80 83 72

4 86 86 83
import pandas as pd

# Create a sample DataFrame


data = {'X': [78, 85, 96, 80, 86],
'Y': [84, 94, 89, 83, 86],
'Z': [86, 97, 96, 72, 83]}
df = pd.DataFrame(data)

# Print the result


print(df)

X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83

4. Write a Pandas program to select the specified columns and rows from a given data frame.

Sample Python dictionary data and list

labels:

Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.

exam_ data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Expected Output:

Select specific columns and rows:

score qualify

b 9.0 no

d NaN no

f 20.0 yes

g 14.5 yes
import pandas as pd
import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura'

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels=['a','b','c','d','e','f','g','h','i','j']
df = pd.DataFrame(exam_data, labels)

# Select specific columns


selected_columns = [ 'score', 'qualify']
df_selected_columns = df[selected_columns]

# Select specific rows


selected_rows = [1, 3, 5, 6]
df_selected_rows = df_selected_columns.iloc[selected_rows]

print("Select specific columns and rows:")


print(df_selected_rows)

Select specific columns and rows:


score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes

5. Write a Pandas program to count the number of rows and columns of a DataFrame.

Sample Python dictionary data and list labels:

exam_ data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Expected Output:

Number of Rows: 10

Number of Columns: 4
import pandas as pd
import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura'

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

df = pd.DataFrame(exam_data)

num_rows = len(df.index)
num_columns = len(df.columns)

print("Number of Rows:", num_rows)


print("Number of Columns:", num_columns)

Number of Rows: 10
Number of Columns: 4

6. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics
on the Iris data set

import pandas as pd
import numpy as np

# Function to load Iris data from different sources


def load_iris_data(source):
if source == "text":
return pd.read_csv("Iris.csv") # Assuming CSV format
elif source == "excel":
return pd.read_excel("iris.xlsx") # Assuming Excel format
elif source == "web":
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
return pd.read_csv(url, header=None) # Web data often lacks headers
else:
raise ValueError("Invalid data source")

# Load Iris data from one of the sources


source = "text" # Choose "text", "excel", or "web"
df = load_iris_data(source)

# Assign column names if necessary


if source == "web":
df.columns = ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm", "Species"]

# Descriptive analytics
print("Data overview:\n", df.head())
print("\nSummary statistics:\n", df.describe())
print("\nData information:\n", df.info())

# Additional descriptive analytics


print("\nUnique values in Species:\n", df["Species"].value_counts())
print("\nCorrelation matrix:\n", df.corr())
Data overview:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa

Summary statistics:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB

Data information:
None

Unique values in Species:


Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
Name: Species, dtype: int64

Correlation matrix:
Id SepalLengthCm SepalWidthCm PetalLengthCm \
Id 1.000000 0.716676 -0.397729 0.882747
SepalLengthCm 0.716676 1.000000 -0.109369 0.871754
SepalWidthCm -0.397729 -0.109369 1.000000 -0.420516
PetalLengthCm 0.882747 0.871754 -0.420516 1.000000
PetalWidthCm 0.899759 0.817954 -0.356544 0.962757

PetalWidthCm
Id 0.899759
SepalLengthCm 0.817954
SepalWidthCm -0.356544
PetalLengthCm 0.962757
PetalWidthCm 1.000000
<ipython-input-13-5cf404ad1e81>:31: FutureWarning: The default value of numeric_only in DataFrame.c
print("\nCorrelation matrix:\n", df.corr())

7. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Univariate analysis:

• Frequency
• Mean,

• Median,

• Mode,

• Variance

• Standard Deviation

• Skewness and Kurtosis

import pandas as pd
import matplotlib.pyplot as plt # For visualization (optional)

# Load the dataset


try:
df = pd.read_csv("diabetes.csv") # Assuming CSV format
except FileNotFoundError:
print("Error: Data file not found. Please ensure it's in the correct location.")
exit()

# Univariate analysis for each numerical column


for col in df.select_dtypes(include=['number']):
print("Univariate analysis for", col)
print("-" * 50)

# Frequency distribution
print("Frequency distribution:\n", df[col].value_counts())

# Descriptive statistics
print("Descriptive statistics:\n", df[col].describe())

# Skewness and kurtosis


print("Skewness:", df[col].skew())
print("Kurtosis:", df[col].kurt())

Univariate analysis for Pregnancies


--------------------------------------------------
Frequency distribution:
1 135
0 111
2 103
3 75
4 68
5 57
6 50
7 45
8 38
9 28
10 24
11 11
13 10
12 9
14 2
15 1
17 1
Name: Pregnancies, dtype: int64
Descriptive statistics:
count 768.000000
mean 3.845052
std 3.369578
min 0.000000
25% 1.000000
50% 3.000000
75% 6.000000
max 17.000000
Name: Pregnancies, dtype: float64
Skewness: 0.9016739791518588
Kurtosis: 0.15921977754746486
Univariate analysis for Glucose
--------------------------------------------------
Frequency distribution:
99 17
100 17
111 14
129 14
125 14
..
191 1
177 1
44 1
62 1
190 1
Name: Glucose, Length: 136, dtype: int64
Descriptive statistics:
count 768.000000
mean 120.894531
std 31.972618
min 0.000000
25% 99.000000
50% 117.000000
75% 140.250000
max 199.000000
Name: Glucose, dtype: float64
Sk 0 3 3 0 9 88992

8. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Bivariate analysis:

• Linear and logistic regression modeling


import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split

# Load the dataset


df = pd.read_csv("diabetes.csv")

# Choose a predictor and target variable


X = df.drop('Outcome',axis=1)
y = df['Outcome']

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20, random_state=42)

# Create and fit the model


linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Make predictions
y_pred_linear = linear_model.predict(X_test)

# Print the results


print("Linear Regression Results:")
print("Intercept:", linear_model.intercept_)
print("Coefficient:", linear_model.coef_[0])
print("R-squared:", linear_model.score(X_test, y_test))
print('\n')

# 2. Logistic Regression (for categorical target variable)

# Create and fit the model


logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)

# Make predictions
y_pred_logistic = logistic_model.predict(X_test)

# Print the results


print("Logistic Regression Results:")
print("Accuracy:", logistic_model.score(X_test, y_test)*100)

Linear Regression Results:


Intercept: -0.9487546338208503
Coefficient: 0.010468179217423847
R-squared: 0.25500281176741757

Logistic Regression Results:


Accuracy: 74.67532467532467
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:458: ConvergenceWarning:
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
9. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Bivariate analysis:

• Multiple Regression analysis

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load the dataset


df = pd.read_csv("diabetes.csv")

# Choose a predictor and target variable


X = df.drop('Outcome',axis=1)
y = df['Outcome']

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.10, random_state=42)

# Create and fit the multiple regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate model performance


r2 = r2_score(y_test, y_pred)
print("Multiple Regression Model Performance:")
print(f"R-squared: {r2:.3f}")

Multiple Regression Model Performance:


R-squared: 0.214

10. Apply and explore various plotting functions on UCI data set for performing the following:

a) Normal values

b) Density and contour plots

c) Three- dimensional plotting


import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # For 3D plotting

# Load the UCI dataset (replace with your dataset's path and filename)
df = pd.read_csv("glass.csv")

# Choose two numerical columns for plotting


column1 = "column1" # Replace with your column names
column2 = "column2"

# a) Normal values
plt.figure(figsize=(8, 4))
plt.hist(df[column1]) # Histogram for column1
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Distribution of " + column1)
plt.show()

# b) Density and contour plots


plt.figure(figsize=(8, 4))
plt.subplot(121)
plt.hexbin(df[column1], df[column2], gridsize=20) # Density plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Density Plot")

plt.subplot(122)
plt.contour(df[column1].values.reshape(10, 10), df[column2].values.reshape(10, 10)) # Contour plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Contour Plot")
plt.show()

# c) Three-dimensional plotting
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df["column3"]) # Replace "column3" with your third column
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel("column3")
plt.title("3D Scatter Plot")
plt.show()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, metho
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:

4 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'column1'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)


/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, metho
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'column1'

<Figure size 800x400 with 0 Axes>

12.Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the following:(with
output)

a) Normal values

b) Density and contour plots

c) Three- dimensional plotting


import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Load the dataset


df = pd.read_csv("diabetes.csv")

# Choose numerical columns for plotting


column1 = "Glucose"
column2 = "Insulin"
column3 = "BMI"

# a) Normal values (histograms for visual inspection)


plt.figure(figsize=(12, 4))
plt.subplot(131)
plt.hist(df[column1])
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Distribution of " + column1)

plt.subplot(132)
plt.hist(df[column2])
plt.xlabel(column2)
plt.ylabel("Frequency")
plt.title("Distribution of " + column2)

plt.subplot(133)
plt.hist(df[column3])
plt.xlabel(column3)
plt.ylabel("Frequency")
plt.title("Distribution of " + column3)
plt.show()

# b) Density and contour plots


plt.figure(figsize=(8, 4))
plt.subplot(121)
plt.hexbin(df[column1], df[column2], gridsize=20) # Density plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Density Plot")

plt.subplot(122)
plt.contour(df[column1].values.reshape(48, 16), df[column2].values.reshape(48, 16)) # Contour plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Contour Plot")
plt.show()

# c) Three-dimensional plotting
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df[column3])
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel(column3)
plt.title("3D Scatter Plot")
plt.show()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-8726aef8b632> in <cell line: 43>()
41
42 plt.subplot(122)
---> 43 plt.contour(df[column1].values.reshape(48, 16), df[column2].values.reshape(48, 16)) #
44 plt.xlabel(column1)
45 plt.ylabel(column2)

6 frames
/usr/local/lib/python3.10/dist-packages/matplotlib/contour.py in _process_contour_level_args(se
1141 raise ValueError("Filled contours require at least 2 levels.")
1142 if len(self.levels) > 1 and np.min(np.diff(self.levels)) <= 0.0:
-> 1143 raise ValueError("Contour levels must be increasing")
1144
1145 def _process_levels(self):

ValueError: Contour levels must be increasing

13.Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the following:(with
output)

a) Correlation and scatter plots


b) Histograms

c) Three- dimensional plotting

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Load the dataset


df = pd.read_csv("diabetes.csv")

# Choose columns for plotting


column1 = "Glucose"
column2 = "Insulin"
column3 = "BMI"

# a) Correlation and scatter plots


plt.figure(figsize=(5,5 ))

# Correlation matrix
plt.subplot(121)
plt.matshow(df.corr())
plt.colorbar()
plt.title("Correlation Matrix")

# Scatter plot
plt.subplot(122)
plt.scatter(df[column1], df[column2])
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Scatter Plot")
plt.show()

# b) Histograms
plt.figure(figsize=(5, 5))
plt.subplot(121)
plt.hist(df[column1])
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Histogram of " + column1)

plt.subplot(122)
plt.hist(df[column2])
plt.xlabel(column2)
plt.ylabel("Frequency")
plt.title("Histogram of " + column2)
plt.show()

# c) Three-dimensional plotting
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df[column3])
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel(column3)
plt.title("3D Scatter Plot")
plt.show()
<ipython-input-35-eb4625b4dce4>:23: MatplotlibDeprecationWarning: Auto-removal of overlapping a
plt.subplot(122)

14. Write a Pandas program to count number of columns of a DataFrame. Sample Output:

Original DataFrame col1 col2 col3

0147
1258

2 3 6 12

3491

4 7 5 11

Number of columns: 3

import pandas as pd

# Create a sample DataFrame


data = {'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data)

# Print the original DataFrame


print("Original DataFrame")
print(df)

# Count the number of columns


num_columns = df.shape[1] # Access the number of columns using shape[1]
print("\nNumber of columns:")
print(num_columns)

Original DataFrame
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11

Number of columns:
3

15. Write a Pandas program to group by the first column and get second column as lists in rows

Sample data:

Original DataFrame

col1 col2

0 C1 1

1 C1 2

2 C2 3

3 C2 3

4 C2 4

5 C3 6

6 C2 5

Expected output:
Group on the col1:

col1

C1 [1, 2]

C2 [3, 3, 4, 5]

C3 [6]

Name: col2, dtype: object

import pandas as pd

# Create sample DataFrame


data = {'col1': ['C1', 'C1', 'C2', 'C2', 'C2', 'C3', 'C2'],
'col2': [1, 2, 3, 3, 4, 6, 5]}
df = pd.DataFrame(data)

# Group by col1 and get lists of col2 values


grouped_data = df.groupby('col1')['col2'].apply(list)

# Print the grouped data


print("Group on the col1:")
print(grouped_data)

Group on the col1:


col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object

16. Write a Pandas program to check whether a given column is present in a DataFrame or not. Sample data:

Original DataFrame

col1 col2 col3

0147

1258

2 3 6 12

3491

4 7 5 11

Col4 is not present in DataFrame. Col1 is present in DataFrame.


import pandas as pd

# Create sample DataFrame


data = {'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data)

# Function to check if a column exists


def column_exists(df, column_name):
return column_name in df.columns

# Check for specific columns


columns_to_check = ["Co14", "Col1"]
for column in columns_to_check:
if column_exists(df, column):
print(f"{column} is present in DataFrame.")
else:
print(f"{column} is not present in DataFrame.")

Co14 is not present in DataFrame.


Col1 is not present in DataFrame.

17.Create two arrays of six elements. Write a NumPy program to count the number of instances of a value occurring in
one array on the condition of another array.

Sample Output:

Original arrays:

[10- 10 10- 10- 10 10]

[0.85 0.45 0.9 0.8 0.12 0.6]

Number of instances of a value occurring in one array on the condition of another array: 3

import numpy as np

# Define the two arrays


arr1 = np.array([10, 10, 10, 10, 10, 10])
arr2 = np.array([0.85, 0.45, 0.9, 0.8, 0.12, 0.6])

# Value to count instances of


value = 10

# Condition for the other array


condition = arr2 >= 0.5

# Count the instances


count = np.sum((arr1 == value) & condition)

# Print the results


print("Original arrays:")
print(arr1)
print(arr2)
print("\nNumber of instances of a value occurring in one array on the condition of another array:", cou
Original arrays:
[10 10 10 10 10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]

Number of instances of a value occurring in one array on the condition of another array: 4

18. Create a 2- dimensional array of size 2 x 3, composed of 4- byte integer elements. Write a NumPy program to
find the number of occurrences of a sequence in the said array.

Sample Output:

Original NumPy array:

[[1 2 3]

[2 1 2]]

Type: <class 'numpy.ndarray'>

Sequence: 2,3

Number of occurrences of the said sequence: 2

import numpy as np

# Create the 2D array with 4-byte integer elements


arr = np.array([[1, 2, 3], [2, 1, 2]], dtype='int32')

# Sequence to search for


sequence = [2, 3]

# Count occurrences of the sequence in all possible directions


count = 1
for i in range(arr.shape[0]):
for j in range(arr.shape[1] - len(sequence) + 1):
if np.all(arr[i, j:j+len(sequence)] == sequence):
count += 1

# Check for diagonal sequences (down-right direction)


if i + len(sequence) <= arr.shape[0]:
if np.all(np.diag(arr, k=j-i)[:len(sequence)] == sequence):
count += 1

# Print the results


print("Original NumPy array:")
print(arr)
print("\nType:", type(arr))
print("\nSequence:", sequence)
print("\nNumber of occurrences of the said sequence:", count)

Original NumPy array:


[[1 2 3]
[2 1 2]]

Type: <class 'numpy.ndarray'>

Sequence: [2, 3]

N b f f th id 2

You might also like