Fdspracticals - Ipynb - Colaboratory
Fdspracticals - Ipynb - Colaboratory
Write a NumPy program to create a null vector of size 10 and update sixth value to 11
c. Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10
d. Write a NumPy program to convert a list of numeric value into a one- dimensional NumPy array
#a
import numpy as np
print(vector)
[ 0. 0. 0. 0. 0. 11. 0. 0. 0. 0.]
#b
import numpy as np
Original array: [1 2 3 4 5]
Float array: [1. 2. 3. 4. 5.]
#c
import numpy as np
[[ 2 3 4]
[ 5 6 7]
[ 8 9 10]]
#d
import numpy as np
p py p
[1. 2.5 3. 4. 5. ]
d. Write a NumPy program to find the real and imaginary parts of an array of complex numbers
#a
import numpy as np
Original array: [1 2 3 4 5]
Float array: [1. 2. 3. 4. 5.]
#b
import numpy as np
#d
import numpy as np
# Extract the real and imaginary parts using real and imag attributes
real_part = complex_arr.real
imag_part = complex_arr.imag
3. Write a Pandas program to get the powers of an array values element- wise. Note: First array elements raised to
powers from second array
Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
import pandas as pd
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
4. Write a Pandas program to select the specified columns and rows from a given data frame.
labels:
Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data frame.
exam_ data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura'
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels=['a','b','c','d','e','f','g','h','i','j']
df = pd.DataFrame(exam_data, labels)
5. Write a Pandas program to count the number of rows and columns of a DataFrame.
exam_ data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Number of Rows: 10
Number of Columns: 4
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura'
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
df = pd.DataFrame(exam_data)
num_rows = len(df.index)
num_columns = len(df.columns)
Number of Rows: 10
Number of Columns: 4
6. Reading data from text files, Excel and the web and exploring various commands for doing descriptive analytics
on the Iris data set
import pandas as pd
import numpy as np
# Descriptive analytics
print("Data overview:\n", df.head())
print("\nSummary statistics:\n", df.describe())
print("\nData information:\n", df.info())
Summary statistics:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
Data information:
None
Correlation matrix:
Id SepalLengthCm SepalWidthCm PetalLengthCm \
Id 1.000000 0.716676 -0.397729 0.882747
SepalLengthCm 0.716676 1.000000 -0.109369 0.871754
SepalWidthCm -0.397729 -0.109369 1.000000 -0.420516
PetalLengthCm 0.882747 0.871754 -0.420516 1.000000
PetalWidthCm 0.899759 0.817954 -0.356544 0.962757
PetalWidthCm
Id 0.899759
SepalLengthCm 0.817954
SepalWidthCm -0.356544
PetalLengthCm 0.962757
PetalWidthCm 1.000000
<ipython-input-13-5cf404ad1e81>:31: FutureWarning: The default value of numeric_only in DataFrame.c
print("\nCorrelation matrix:\n", df.corr())
7. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:
• Frequency
• Mean,
• Median,
• Mode,
• Variance
• Standard Deviation
import pandas as pd
import matplotlib.pyplot as plt # For visualization (optional)
# Frequency distribution
print("Frequency distribution:\n", df[col].value_counts())
# Descriptive statistics
print("Descriptive statistics:\n", df[col].describe())
8. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:
# Make predictions
y_pred_linear = linear_model.predict(X_test)
# Make predictions
y_pred_logistic = logistic_model.predict(X_test)
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
9. Use the diabetes data set from Pima Indians Diabetes data set for performing the following:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
10. Apply and explore various plotting functions on UCI data set for performing the following:
a) Normal values
# Load the UCI dataset (replace with your dataset's path and filename)
df = pd.read_csv("glass.csv")
# a) Normal values
plt.figure(figsize=(8, 4))
plt.hist(df[column1]) # Histogram for column1
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Distribution of " + column1)
plt.show()
plt.subplot(122)
plt.contour(df[column1].values.reshape(10, 10), df[column2].values.reshape(10, 10)) # Contour plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Contour Plot")
plt.show()
# c) Three-dimensional plotting
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df["column3"]) # Replace "column3" with your third column
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel("column3")
plt.title("3D Scatter Plot")
plt.show()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, metho
3801 try:
-> 3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
4 frames
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'column1'
The above exception was the direct cause of the following exception:
KeyError: 'column1'
12.Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the following:(with
output)
a) Normal values
plt.subplot(132)
plt.hist(df[column2])
plt.xlabel(column2)
plt.ylabel("Frequency")
plt.title("Distribution of " + column2)
plt.subplot(133)
plt.hist(df[column3])
plt.xlabel(column3)
plt.ylabel("Frequency")
plt.title("Distribution of " + column3)
plt.show()
plt.subplot(122)
plt.contour(df[column1].values.reshape(48, 16), df[column2].values.reshape(48, 16)) # Contour plot
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Contour Plot")
plt.show()
# c) Three-dimensional plotting
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df[column3])
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel(column3)
plt.title("3D Scatter Plot")
plt.show()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-31-8726aef8b632> in <cell line: 43>()
41
42 plt.subplot(122)
---> 43 plt.contour(df[column1].values.reshape(48, 16), df[column2].values.reshape(48, 16)) #
44 plt.xlabel(column1)
45 plt.ylabel(column2)
6 frames
/usr/local/lib/python3.10/dist-packages/matplotlib/contour.py in _process_contour_level_args(se
1141 raise ValueError("Filled contours require at least 2 levels.")
1142 if len(self.levels) > 1 and np.min(np.diff(self.levels)) <= 0.0:
-> 1143 raise ValueError("Contour levels must be increasing")
1144
1145 def _process_levels(self):
13.Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the following:(with
output)
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# Correlation matrix
plt.subplot(121)
plt.matshow(df.corr())
plt.colorbar()
plt.title("Correlation Matrix")
# Scatter plot
plt.subplot(122)
plt.scatter(df[column1], df[column2])
plt.xlabel(column1)
plt.ylabel(column2)
plt.title("Scatter Plot")
plt.show()
# b) Histograms
plt.figure(figsize=(5, 5))
plt.subplot(121)
plt.hist(df[column1])
plt.xlabel(column1)
plt.ylabel("Frequency")
plt.title("Histogram of " + column1)
plt.subplot(122)
plt.hist(df[column2])
plt.xlabel(column2)
plt.ylabel("Frequency")
plt.title("Histogram of " + column2)
plt.show()
# c) Three-dimensional plotting
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df[column1], df[column2], df[column3])
ax.set_xlabel(column1)
ax.set_ylabel(column2)
ax.set_zlabel(column3)
plt.title("3D Scatter Plot")
plt.show()
<ipython-input-35-eb4625b4dce4>:23: MatplotlibDeprecationWarning: Auto-removal of overlapping a
plt.subplot(122)
14. Write a Pandas program to count number of columns of a DataFrame. Sample Output:
0147
1258
2 3 6 12
3491
4 7 5 11
Number of columns: 3
import pandas as pd
Original DataFrame
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11
Number of columns:
3
15. Write a Pandas program to group by the first column and get second column as lists in rows
Sample data:
Original DataFrame
col1 col2
0 C1 1
1 C1 2
2 C2 3
3 C2 3
4 C2 4
5 C3 6
6 C2 5
Expected output:
Group on the col1:
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
import pandas as pd
16. Write a Pandas program to check whether a given column is present in a DataFrame or not. Sample data:
Original DataFrame
0147
1258
2 3 6 12
3491
4 7 5 11
17.Create two arrays of six elements. Write a NumPy program to count the number of instances of a value occurring in
one array on the condition of another array.
Sample Output:
Original arrays:
Number of instances of a value occurring in one array on the condition of another array: 3
import numpy as np
Number of instances of a value occurring in one array on the condition of another array: 4
18. Create a 2- dimensional array of size 2 x 3, composed of 4- byte integer elements. Write a NumPy program to
find the number of occurrences of a sequence in the said array.
Sample Output:
[[1 2 3]
[2 1 2]]
Sequence: 2,3
import numpy as np
Sequence: [2, 3]
N b f f th id 2