Experiment-2-1-Ml Kritika
Experiment-2-1-Ml Kritika
1
Aim : Study of Different Python Libraries
Pandas Library:
Load a dataset (Iris dataset: https://www.kaggle.com/datasets/uciml/iris) using pandas
Display the first few rows to understand its structure.
Calculate basic statistics (mean, median, standard deviation, etc.) for a numerical column in
the dataset.
Perform data filtering to extract rows based on specific conditions (e.g., SepalLengthCm>5.0).
In [1]:
import numpy as np
import pandas as pd
In [3]:
iris_df = pd.read_csv('C:/Users/kriti/Downloads/Iris.csv')
print(iris_df.head())
In [4]:
new_col_name = ["ID","SepalLengthCm","SepalWidthCm" , "PetalLengthCm" , "PetalWi
iris_df.columns = new_col_name
iris_df.head()
In [5]:
x = iris_df[iris_df.columns[1:-1]]
x.head()
In [6]:
y = iris_df[iris_df.columns[-1]]
y.head()
Out[6]: 0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
Name: Species, dtype: object
In [7]:
sepal_length_stats = iris_df["SepalLengthCm"].describe()
print(sepal_length_stats)
count 150.000000
mean 5.843333
std 0.828066
min 4.300000
25% 5.100000
50% 5.800000
75% 6.400000
max 7.900000
Name: SepalLengthCm, dtype: float64
In [8]:
sepal_length_stats = iris_df["PetalWidthCm"].describe()
print(sepal_length_stats)
count 150.000000
mean 1.198667
std 0.763161
min 0.100000
25% 0.300000
50% 1.300000
75% 1.800000
max 2.500000
Name: PetalWidthCm, dtype: float64
In [9]:
iris_df.head(10)
In [10]:
iris_df.tail(10)
In [11]:
iris_df[15:50]
In [12]:
iris_df.groupby("Species").head(5)
In [14]:
iris_df.shape
Out[14]: (150, 6)
2. Matplotlib Library:
Create a line plot to visualize the trend of a numerical variable over time.
Generate a histogram to understand the distribution of a numerical variable in the dataset.
Create a bar chart to compare the performance of different categories.
Plot a scatter plot to explore the relationship between two numerical variables.
Customize your plots with labels, titles, colors, and styles.
In [15]:
import matplotlib.pyplot as plt
In [17]:
import plotly.express as px
species_count = iris_df["Species"].value_counts()
figure = px.pie(iris_df, values=species_count, names=species_count.index)
figure.show()
Iris-setosa
Iris-versicolor
Iris-virginica
33.3% 33.3%
33.3%
In [18]:
figure = px.histogram(iris_df , x = "SepalLengthCm")
figure.show()
30
25
20
count
15
10
0
4 5 6 7 8
SepalLengthCm
In [19]:
plt.scatter(iris_df["SepalLengthCm"] , iris_df["PetalLengthCm"])
plt.xlabel("Sepal Length")
3. Seaborn Library:
Create a box plot to visualize the distribution of a numerical variable across different
categories.
Generate a heatmap to explore the correlation between numerical variables.
Customize the appearance of seaborn plots using various parameters.
In [21]:
import seaborn as sns
sns.boxplot(x="Species",y="SepalLengthCm",data = iris_df)
plt.xlabel("Species")
plt.ylabel("Sepal Length (cm)")
plt.title("Distribution of Sepal Length across species")
plt.show()
In [22]:
# Heatmap to explore the correlation between numerical variables
Out[22]: <AxesSubplot:>
4. NumPy Library:
Create a NumPy array and perform basic operations like addition, subtraction, and
multiplication.
Use NumPy functions to calculate statistical measures like mean, median, and standard
deviation.
Reshape and slice NumPy arrays to extract specific data elements.
Perform element-wise operations and broadcasting with NumPy arrays.
Apply mathematical functions (e.g., exponential, logarithm) to NumPy arrays.
In [23]:
x = np.array([25 , 7 ,8 , 9 , 10 , 12])
y = np.array([10 , 20 , 58 , 100 , 204 , 7])
z = x + y
w = x - y
j = x * y
print("Addition : ", z)
print("Substraction : ", w)
print("Multiplication : ", j)
In [24]:
#statistics in numpy
print("Mean : ",np.mean(x))
print("Std Deviation : ",np.std(x))
print("Variance : ",np.var(x))
Mean : 11.833333333333334
Std Deviation : 6.094168432927407
Variance : 37.138888888888886
In [25]:
x = np.arange(1,11)
x1 = np.reshape(x , (2,5))
x1
In [26]:
# numpy slicing
x1[0:1 , 2:5]
In [27]:
# scalar broadcasting
x2 = x1 + 5
print(x2)
[[ 6 7 8 9 10]
[11 12 13 14 15]]
In [28]:
# logarithmic function
y = np.log(x)
plt.subplot(1,2,1)
plt.plot(x,y)
plt.title("Logarithmic Function")
# exponential function
plt.subplot(1,2,2)
f = np.exp(x)
plt.plot(x,f)
plt.title("Exponential Function")
5. SciPy Library:
Use SciPy to perform numerical integration for a given mathematical function.
In [29]:
from scipy.integrate import quad
# y = np.sin(x)
def integrand(m):
return np.sin(m)
print(fun_intr)
print(error)
# plt.plot(x , fun_intr)
2.0
2.220446049250313e-14
Name-Kritika Das
Regd no-2101020068
Rollno-CSE21068