0% found this document useful (0 votes)

27 views21 pages

Fdspracticals - Ipynb - Colaboratory

The document discusses NumPy and Pandas programs to perform various tasks like creating arrays and matrices, selecting rows and columns from DataFrames, and loading and exploring the Iris data set from different sources like text files, Excel, and the web. Descriptive analytics like summary statistics, correlations, and value counts are also calculated on the Iris data.

Uploaded by

cc76747321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views21 pages

Fdspracticals - Ipynb - Colaboratory

Uploaded by

cc76747321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

1. a.

Write a NumPy program to create a null vector of size 10 and update sixth value to 11

b. Write a NumPy program to convert an array to a float type

c. Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10

d. Write a NumPy program to convert a list of numeric value into a one- dimensional NumPy array

#a
import numpy as np

# Create a null vector of size 10

vector = np.zeros(10)

# Update the sixth value to 11

vector[5] = 11

print(vector)

[ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]

#b
import numpy as np

# Create a sample array

arr = np.array([1, 2, 3, 4, 5])

# Convert the array to float type (using astype method)

float_arr = arr.astype(np.float32) # You can also use np.float64 for double precision

# Print the original and converted arrays

print("Original array:", arr)
print("Float array:", float_arr)

Original array: [1 2 3 4 5]
Float array: [1. 2. 3. 4. 5.]

#c
import numpy as np

# Create a 1D array with values from 2 to 10 (excluding 10)

arr = np.arange(2, 11)

# Reshape the array to a 3x3 matrix

matrix = arr.reshape(3, 3)

# Print the matrix

print(matrix)

[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]

#d
import numpy as np
p py p

# Create a list of numeric values

data_list = [1, 2.5, 3, 4, 5]

# Convert the list to a NumPy array

arr = np.array(data_list)

# Print the array

print(arr)

[1. 2.5 3. 4. 5. ]

2. a. Write a NumPy program to convert an array to a float type

b. Write a NumPy program to create an empty and a full array

c. Write a NumPy program to convert a list and tuple into arrays

d. Write a NumPy program to find the real and imaginary parts of an array of complex numbers

#a
import numpy as np

# Create a sample array

arr = np.array([1, 2, 3, 4, 5])

# Convert the array to float type (using astype method)

float_arr = arr.astype(np.float32) # You can also use np.float64 for double precision

# Print the original and converted arrays

print("Original array:", arr)
print("Float array:", float_arr)

Original array: [1 2 3 4 5]
Float array: [1. 2. 3. 4. 5.]

#b
import numpy as np

# Create an empty array of shape (3, 4) with float32 data type

empty_arr = np.empty((3, 4), dtype=np.float32)

# Create a full array of shape (2, 3) with integer values from 0 to 5

full_arr = np.full((2, 3), 5)

# Print the empty and full arrays

print("Empty array:", empty_arr)
print("Full array:", full_arr)

Empty array: [[-2.6408402e-04 3.3350903e-41 0.0000000e+00 0.0000000e+00]

[ 1.4433374e-43 1.3592595e-43 1.5134023e-43 4.4841551e-44]
[ 1.6114932e-43 1.4153114e-43 1.4153114e-43 1.4993894e-43]]
Full array: [[5 5 5]
[5 5 5]]
#c
import numpy as np

# Create a list and a tuple of numeric values

data_list = [1, 2.5, 3, 4, 5]
data_tuple = (10, 20.5, 30, 40, 50)

# Convert the list and tuple to NumPy arrays

list_arr = np.array(data_list)
tuple_arr = np.array(data_tuple)

# Print the arrays from list and tuple

print("Array from list:", list_arr)
print("Array from tuple:", tuple_arr)

Array from list: [1. 2.5 3. 4. 5. ]

Array from tuple: [10. 20.5 30. 40. 50. ]

#d
import numpy as np

# Create a complex number array

complex_arr = np.array([1 + 2j, 3 + 4j, 5 - 6j])

# Extract the real and imaginary parts using real and imag attributes
real_part = complex_arr.real
imag_part = complex_arr.imag

# Print the real and imaginary parts

print("Real part:", real_part)
print("Imaginary part:", imag_part)

Real part: [1. 3. 5.]

Imaginary part: [ 2. 4. -6.]

3. Write a Pandas program to get the powers of an array values element- wise. Note: First array elements raised to
powers from second array

Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}

Expected Output:

XYZ

0 78 84 86

1 85 94 97

2 96 89 96

3 80 83 72

4 86 86 83
import pandas as pd

# Create a sample DataFrame

data = {'X': [78, 85, 96, 80, 86],
'Y': [84, 94, 89, 83, 86],
'Z': [86, 97, 96, 72, 83]}
df = pd.DataFrame(data)

# Print the result

print(df)

X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83

4. Write a Pandas program to select the specified columns and rows from a given data frame.

Sample Python dictionary data and list

labels:

Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.

exam_ data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Expected Output:

Select specific columns and rows:

score qualify

b 9.0 no

d NaN no

f 20.0 yes

g 14.5 yes
import pandas as pd
import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura'

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels=['a','b','c','d','e','f','g','h','i','j']
df = pd.DataFrame(exam_data, labels)

# Select specific columns

selected_columns = [ 'score', 'qualify']
df_selected_columns = df[selected_columns]

# Select specific rows

selected_rows = [1, 3, 5, 6]
df_selected_rows = df_selected_columns.iloc[selected_rows]

print("Select specific columns and rows:")

print(df_selected_rows)

Select specific columns and rows:

score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes

5. Write a Pandas program to count the number of rows and columns of a DataFrame.

Sample Python dictionary data and list labels:

exam_ data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Expected Output:

Number of Rows: 10

Number of Columns: 4
import pandas as pd
import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura'

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

df = pd.DataFrame(exam_data)

num_rows = len(df.index)
num_columns = len(df.columns)

print("Number of Rows:", num_rows)

print("Number of Columns:", num_columns)

Number of Rows: 10
Number of Columns: 4

6. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics
on the Iris data set

import pandas as pd
import numpy as np

# Function to load Iris data from different sources

def load_iris_data(source):
if source == "text":
return pd.read_csv("Iris.csv") # Assuming CSV format
elif source == "excel":
return pd.read_excel("iris.xlsx") # Assuming Excel format
elif source == "web":
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
return pd.read_csv(url, header=None) # Web data often lacks headers
else:
raise ValueError("Invalid data source")

# Load Iris data from one of the sources

source = "text" # Choose "text", "excel", or "web"
df = load_iris_data(source)

# Assign column names if necessary

if source == "web":
df.columns = ["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm", "Species"]

# Descriptive analytics
print("Data overview:\n", df.head())
print("\nSummary statistics:\n", df.describe())
print("\nData information:\n", df.info())

# Additional descriptive analytics

print("\nUnique values in Species:\n", df["Species"].value_counts())
print("\nCorrelation matrix:\n", df.corr())
Data overview:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa

Summary statistics:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB

Data information:
None

Unique values in Species:

Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
Name: Species, dtype: int64

Correlation matrix:
Id SepalLengthCm SepalWidthCm PetalLengthCm \
Id 1.000000 0.716676 -0.397729 0.882747
SepalLengthCm 0.716676 1.000000 -0.109369 0.871754
SepalWidthCm -0.397729 -0.109369 1.000000 -0.420516
PetalLengthCm 0.882747 0.871754 -0.420516 1.000000
PetalWidthCm 0.899759 0.817954 -0.356544 0.962757

PetalWidthCm
Id 0.899759
SepalLengthCm 0.817954
SepalWidthCm -0.356544
PetalLengthCm 0.962757
PetalWidthCm 1.000000
<ipython-input-13-5cf404ad1e81>:31: FutureWarning: The default value of numeric_only in DataFrame.c
print("\nCorrelation matrix:\n", df.corr())

7. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Univariate analysis:

• Frequency
• Mean,

• Median,

• Mode,

• Variance

• Standard Deviation

• Skewness and Kurtosis

import pandas as pd
import matplotlib.pyplot as plt # For visualization (optional)

# Load the dataset

try:
df = pd.read_csv("diabetes.csv") # Assuming CSV format
except FileNotFoundError:
print("Error: Data file not found. Please ensure it's in the correct location.")
exit()

# Univariate analysis for each numerical column

for col in df.select_dtypes(include=['number']):
print("Univariate analysis for", col)
print("-" * 50)

# Frequency distribution
print("Frequency distribution:\n", df[col].value_counts())

# Descriptive statistics
print("Descriptive statistics:\n", df[col].describe())

# Skewness and kurtosis

print("Skewness:", df[col].skew())
print("Kurtosis:", df[col].kurt())

Univariate analysis for Pregnancies

--------------------------------------------------
Frequency distribution:
1 135
0 111
2 103
3 75
4 68
5 57
6 50
7 45
8 38
9 28
10 24
11 11
13 10
12 9
14 2
15 1
17 1
Name: Pregnancies, dtype: int64
Descriptive statistics:
count 768.000000
mean 3.845052
std 3.369578
min 0.000000
25% 1.000000
50% 3.000000
75% 6.000000
max 17.000000
Name: Pregnancies, dtype: float64
Skewness: 0.9016739791518588
Kurtosis: 0.15921977754746486
Univariate analysis for Glucose
--------------------------------------------------
Frequency distribution:
99 17
100 17
111 14
129 14
125 14
..
191 1
177 1
44 1
62 1
190 1
Name: Glucose, Length: 136, dtype: int64
Descriptive statistics:
count 768.000000
mean 120.894531
std 31.972618
min 0.000000
25% 99.000000
50% 117.000000
75% 140.250000
max 199.000000
Name: Glucose, dtype: float64
Sk 0 3 3 0 9 88992

8. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Bivariate analysis:

• Linear and logistic regression modeling

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split

# Load the dataset

df = pd.read_csv("diabetes.csv")

# Choose a predictor and target variable

X = df.drop('Outcome',axis=1)
y = df['Outcome']

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20, random_state=42)

# Create and fit the model

linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

# Make predictions
y_pred_linear = linear_model.predict(X_test)

# Print the results

print("Linear Regression Results:")
print("Intercept:", linear_model.intercept_)
print("Coefficient:", linear_model.coef_[0])
print("R-squared:", linear_model.score(X_test, y_test))
print('\n')

# 2. Logistic Regression (for categorical target variable)

# Create and fit the model

logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)

# Make predictions
y_pred_logistic = logistic_model.predict(X_test)

# Print the results

print("Logistic Regression Results:")
print("Accuracy:", logistic_model.score(X_test, y_test)*100)

Linear Regression Results:

Intercept: -0.9487546338208503
Coefficient: 0.010468179217423847
R-squared: 0.25500281176741757

Logistic Regression Results:

Accuracy: 74.67532467532467
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:458: ConvergenceWarning:
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
9. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:

Apply Bivariate analysis:

• Multiple Regression analysis

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load the dataset

df = pd.read_csv("diabetes.csv")

# Choose a predictor and target variable

X = df.drop('Outcome',axis=1)
y = df['Outcome']

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.10, random_state=42)

# Create and fit the multiple regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Evaluate model performance

r2 = r2_score(y_test, y_pred)
print("Multiple Regression Model Performance:")
print(f"R-squared: {r2:.3f}")

Multiple Regression Model Performance:

R-squared: 0.214

10. Apply and explore various plotting functions on UCI data set for performing the following:

a) Normal values

b) Density and contour plots

c) Three- dimensional plotting

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # For 3D plotting

# Load the UCI dataset (replace with your dataset's path and filename)
df = pd.read_csv("glass.csv")

# Choose two numerical columns for plotting

column1 = "column1" # Replace with your column names
column2 = "column2"

# a) Normal values
plt.figure(figsize=(8, 4))
plt.hist(df[column1]) # Histogram for column1
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Distribution of " + column1)
plt.show()

# b) Density and contour plots

plt.figure(figsize=(8, 4))
plt.subplot(121)
plt.hexbin(df[column1], df[column2], gridsize=20) # Density plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Density Plot")

plt.subplot(122)
plt.contour(df[column1].values.reshape(10, 10), df[column2].values.reshape(10, 10)) # Contour plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Contour Plot")
plt.show()

# c) Three-dimensional plotting
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df["column3"]) # Replace "column3" with your third column
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel("column3")
plt.title("3D Scatter Plot")
plt.show()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, metho
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:

4 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'column1'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, metho
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'column1'

<Figure size 800x400 with 0 Axes>

12.Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the following:(with
output)

a) Normal values

b) Density and contour plots

c) Three- dimensional plotting

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Load the dataset

df = pd.read_csv("diabetes.csv")

# Choose numerical columns for plotting

column1 = "Glucose"
column2 = "Insulin"
column3 = "BMI"

# a) Normal values (histograms for visual inspection)

plt.figure(figsize=(12, 4))
plt.subplot(131)
plt.hist(df[column1])
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Distribution of " + column1)

plt.subplot(132)
plt.hist(df[column2])
plt.xlabel(column2)
plt.ylabel("Frequency")
plt.title("Distribution of " + column2)

plt.subplot(133)
plt.hist(df[column3])
plt.xlabel(column3)
plt.ylabel("Frequency")
plt.title("Distribution of " + column3)
plt.show()

# b) Density and contour plots

plt.figure(figsize=(8, 4))
plt.subplot(121)
plt.hexbin(df[column1], df[column2], gridsize=20) # Density plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Density Plot")

plt.subplot(122)
plt.contour(df[column1].values.reshape(48, 16), df[column2].values.reshape(48, 16)) # Contour plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Contour Plot")
plt.show()

# c) Three-dimensional plotting
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df[column3])
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel(column3)
plt.title("3D Scatter Plot")
plt.show()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-8726aef8b632> in <cell line: 43>()
41
42 plt.subplot(122)
---> 43 plt.contour(df[column1].values.reshape(48, 16), df[column2].values.reshape(48, 16)) #
44 plt.xlabel(column1)
45 plt.ylabel(column2)

6 frames
/usr/local/lib/python3.10/dist-packages/matplotlib/contour.py in _process_contour_level_args(se
1141 raise ValueError("Filled contours require at least 2 levels.")
1142 if len(self.levels) > 1 and np.min(np.diff(self.levels)) <= 0.0:
-> 1143 raise ValueError("Contour levels must be increasing")
1144
1145 def _process_levels(self):

ValueError: Contour levels must be increasing

13.Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the following:(with
output)

a) Correlation and scatter plots

b) Histograms

c) Three- dimensional plotting

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Load the dataset

df = pd.read_csv("diabetes.csv")

# Choose columns for plotting

column1 = "Glucose"
column2 = "Insulin"
column3 = "BMI"

# a) Correlation and scatter plots

plt.figure(figsize=(5,5 ))

# Correlation matrix
plt.subplot(121)
plt.matshow(df.corr())
plt.colorbar()
plt.title("Correlation Matrix")

# Scatter plot
plt.subplot(122)
plt.scatter(df[column1], df[column2])
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Scatter Plot")
plt.show()

# b) Histograms
plt.figure(figsize=(5, 5))
plt.subplot(121)
plt.hist(df[column1])
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Histogram of " + column1)

plt.subplot(122)
plt.hist(df[column2])
plt.xlabel(column2)
plt.ylabel("Frequency")
plt.title("Histogram of " + column2)
plt.show()

# c) Three-dimensional plotting
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df[column3])
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel(column3)
plt.title("3D Scatter Plot")
plt.show()
<ipython-input-35-eb4625b4dce4>:23: MatplotlibDeprecationWarning: Auto-removal of overlapping a
plt.subplot(122)

14. Write a Pandas program to count number of columns of a DataFrame. Sample Output:

Original DataFrame col1 col2 col3

0147
1258

2 3 6 12

3491

4 7 5 11

Number of columns: 3

import pandas as pd

# Create a sample DataFrame

data = {'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data)

# Print the original DataFrame

print("Original DataFrame")
print(df)

# Count the number of columns

num_columns = df.shape[1] # Access the number of columns using shape[1]
print("\nNumber of columns:")
print(num_columns)

Original DataFrame
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11

Number of columns:
3

15. Write a Pandas program to group by the first column and get second column as lists in rows

Sample data:

Original DataFrame

col1 col2

0 C1 1

1 C1 2

2 C2 3

3 C2 3

4 C2 4

5 C3 6

6 C2 5

Expected output:
Group on the col1:

col1

C1 [1, 2]

C2 [3, 3, 4, 5]

C3 [6]

Name: col2, dtype: object

import pandas as pd

# Create sample DataFrame

data = {'col1': ['C1', 'C1', 'C2', 'C2', 'C2', 'C3', 'C2'],
'col2': [1, 2, 3, 3, 4, 6, 5]}
df = pd.DataFrame(data)

# Group by col1 and get lists of col2 values

grouped_data = df.groupby('col1')['col2'].apply(list)

# Print the grouped data

print("Group on the col1:")
print(grouped_data)

Group on the col1:

col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object

16. Write a Pandas program to check whether a given column is present in a DataFrame or not. Sample data:

Original DataFrame

col1 col2 col3

0147

1258

2 3 6 12

3491

4 7 5 11

Col4 is not present in DataFrame. Col1 is present in DataFrame.

import pandas as pd

# Create sample DataFrame

data = {'col1': [1, 2, 3, 4, 7],
'col2': [4, 5, 6, 9, 5],
'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data)

# Function to check if a column exists

def column_exists(df, column_name):
return column_name in df.columns

# Check for specific columns

columns_to_check = ["Co14", "Col1"]
for column in columns_to_check:
if column_exists(df, column):
print(f"{column} is present in DataFrame.")
else:
print(f"{column} is not present in DataFrame.")

Co14 is not present in DataFrame.

Col1 is not present in DataFrame.

17.Create two arrays of six elements. Write a NumPy program to count the number of instances of a value occurring in
one array on the condition of another array.

Sample Output:

Original arrays:

[10- 10 10- 10- 10 10]

[0.85 0.45 0.9 0.8 0.12 0.6]

Number of instances of a value occurring in one array on the condition of another array: 3

import numpy as np

# Define the two arrays

arr1 = np.array([10, 10, 10, 10, 10, 10])
arr2 = np.array([0.85, 0.45, 0.9, 0.8, 0.12, 0.6])

# Value to count instances of

value = 10

# Condition for the other array

condition = arr2 >= 0.5

# Count the instances

count = np.sum((arr1 == value) & condition)

# Print the results

print("Original arrays:")
print(arr1)
print(arr2)
print("\nNumber of instances of a value occurring in one array on the condition of another array:", cou
Original arrays:
[10 10 10 10 10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]

Number of instances of a value occurring in one array on the condition of another array: 4

18. Create a 2- dimensional array of size 2 x 3, composed of 4- byte integer elements. Write a NumPy program to
find the number of occurrences of a sequence in the said array.

Sample Output:

Original NumPy array:

[[1 2 3]

[2 1 2]]

Type: <class 'numpy.ndarray'>

Sequence: 2,3

Number of occurrences of the said sequence: 2

import numpy as np

# Create the 2D array with 4-byte integer elements

arr = np.array([[1, 2, 3], [2, 1, 2]], dtype='int32')

# Sequence to search for

sequence = [2, 3]

# Count occurrences of the sequence in all possible directions

count = 1
for i in range(arr.shape[0]):
for j in range(arr.shape[1] - len(sequence) + 1):
if np.all(arr[i, j:j+len(sequence)] == sequence):
count += 1

# Check for diagonal sequences (down-right direction)

if i + len(sequence) <= arr.shape[0]:
if np.all(np.diag(arr, k=j-i)[:len(sequence)] == sequence):
count += 1

# Print the results

print("Original NumPy array:")
print(arr)
print("\nType:", type(arr))
print("\nSequence:", sequence)
print("\nNumber of occurrences of the said sequence:", count)

Original NumPy array:

[[1 2 3]
[2 1 2]]

Type: <class 'numpy.ndarray'>

Sequence: [2, 3]

N b f f th id 2

Numpy Lab 1-5
No ratings yet
Numpy Lab 1-5
9 pages
Manual
No ratings yet
Manual
52 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Fods Lab Ans
No ratings yet
Fods Lab Ans
36 pages
Data Science
No ratings yet
Data Science
42 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
Batch2 Ds
No ratings yet
Batch2 Ds
34 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
Data Science Practical
No ratings yet
Data Science Practical
28 pages
Dfs Manual
No ratings yet
Dfs Manual
43 pages
Labmanualfds
No ratings yet
Labmanualfds
49 pages
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
100% (1)
Introduction To Numpy: Aniruddh Kadam Reg No-12109237 Lovely Professional University
84 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
FDS Slot 3
No ratings yet
FDS Slot 3
15 pages
CS3361 Set2
No ratings yet
CS3361 Set2
13 pages
FDS Slot 1
No ratings yet
FDS Slot 1
19 pages
Numpy
No ratings yet
Numpy
22 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
Ds Lab
No ratings yet
Ds Lab
15 pages
CS3361 Lab Exp
No ratings yet
CS3361 Lab Exp
9 pages
Unit 1
No ratings yet
Unit 1
170 pages
ML IU48prac1,2
No ratings yet
ML IU48prac1,2
16 pages
Exp 9
No ratings yet
Exp 9
10 pages
Landpower 125-185 TDI
80% (10)
Landpower 125-185 TDI
204 pages
OSDBMS
No ratings yet
OSDBMS
59 pages
ML Assignment 2..
No ratings yet
ML Assignment 2..
6 pages
NUMPY
No ratings yet
NUMPY
16 pages
Python Unit-5
No ratings yet
Python Unit-5
14 pages
IP - NumPy
No ratings yet
IP - NumPy
5 pages
Numpy
No ratings yet
Numpy
5 pages
Programs
No ratings yet
Programs
8 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
Sheet 5 Pandas
No ratings yet
Sheet 5 Pandas
13 pages
Numpy Exercise
No ratings yet
Numpy Exercise
3 pages
PPS - Unit 5 (Imp Topics)
No ratings yet
PPS - Unit 5 (Imp Topics)
7 pages
MP2 Exercise 02 - NumPy Indexing and Selection
No ratings yet
MP2 Exercise 02 - NumPy Indexing and Selection
4 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
Batch 1 Set Question
No ratings yet
Batch 1 Set Question
3 pages
Staff Manula 01
No ratings yet
Staff Manula 01
7 pages
Written Arguments Consumer
No ratings yet
Written Arguments Consumer
3 pages
CS3361 Set2
No ratings yet
CS3361 Set2
6 pages
Univds
No ratings yet
Univds
8 pages
Exercise 4 - Python Pandas Exercise
No ratings yet
Exercise 4 - Python Pandas Exercise
3 pages
Section 7
No ratings yet
Section 7
33 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
12 Numpy
No ratings yet
12 Numpy
17 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Python Assignment
No ratings yet
Python Assignment
5 pages
Value Added Course: Programming in Python and Machine Learning UNIT-2
No ratings yet
Value Added Course: Programming in Python and Machine Learning UNIT-2
41 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
Pandas Questions Ip File
No ratings yet
Pandas Questions Ip File
13 pages
Cold Calling Scripts
100% (1)
Cold Calling Scripts
32 pages
Manual
No ratings yet
Manual
48 pages
Awwa C 510
No ratings yet
Awwa C 510
18 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Lesson Plan On Algebra
No ratings yet
Lesson Plan On Algebra
5 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
Savage Worlds - Savage Sleeves - Transhuman Roleplaying in Savage Worlds
No ratings yet
Savage Worlds - Savage Sleeves - Transhuman Roleplaying in Savage Worlds
8 pages
Sf6 Gas Density Monitor
No ratings yet
Sf6 Gas Density Monitor
2 pages
36 Lean Manufacturing Tools
No ratings yet
36 Lean Manufacturing Tools
21 pages
2023 Toyota Crown
No ratings yet
2023 Toyota Crown
9 pages
Service Manual SKS14SBA - VKS14SBA
No ratings yet
Service Manual SKS14SBA - VKS14SBA
21 pages
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
No ratings yet
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
45 pages
Powertronic Installation Manual-Kawasaki Er-6N (2012-2018)
No ratings yet
Powertronic Installation Manual-Kawasaki Er-6N (2012-2018)
33 pages
Bf2 Flanger Eng
No ratings yet
Bf2 Flanger Eng
5 pages
G Suite Interview Questions
No ratings yet
G Suite Interview Questions
7 pages
Internship Report Smriti
No ratings yet
Internship Report Smriti
20 pages
So3 b1 Unit Test U8a PDF
No ratings yet
So3 b1 Unit Test U8a PDF
5 pages
Dissertation Knowledge Management PDF
100% (2)
Dissertation Knowledge Management PDF
7 pages
Adaptive DFE Modeling Using IBIS v4. 2
No ratings yet
Adaptive DFE Modeling Using IBIS v4. 2
36 pages
Draft - R1-2312083 Summary of UE Features For NR NTN - v002 - DCM - HW&HiSi
No ratings yet
Draft - R1-2312083 Summary of UE Features For NR NTN - v002 - DCM - HW&HiSi
23 pages
Modul8 Manual
No ratings yet
Modul8 Manual
113 pages
JD - Android Developer - Fresher
No ratings yet
JD - Android Developer - Fresher
2 pages
Management Policy PDF
No ratings yet
Management Policy PDF
50 pages
Case Study Instructions
No ratings yet
Case Study Instructions
8 pages
Social Media Influences To Teenagers: June 2020
No ratings yet
Social Media Influences To Teenagers: June 2020
12 pages
Steps For Price Bid and EPublsih
No ratings yet
Steps For Price Bid and EPublsih
39 pages
VGS House Model - Estimate
No ratings yet
VGS House Model - Estimate
1 page
Itri 613 Database Systems Assignment 1 29435927
No ratings yet
Itri 613 Database Systems Assignment 1 29435927
9 pages
Yoga Pavan Resume
No ratings yet
Yoga Pavan Resume
2 pages
Tom's Introduction To The MBT Binaural Beats and How Best To Use Them
No ratings yet
Tom's Introduction To The MBT Binaural Beats and How Best To Use Them
3 pages
Manual de Usuario Suzuki Grand Vitara (2008) (337 Páginas)
No ratings yet
Manual de Usuario Suzuki Grand Vitara (2008) (337 Páginas)
2 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet