PML Ex3
PML Ex3
DATE:
Aim:
Description:
MATPLOTLIB:
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in
Python. Matplotlib makes easy things easy and hard things possible. Create publication-quality plots.
Make interactive figures that can zoom, pan, and update.
Pyplot:
Most of the Matplotlib utilities lies under the pyplot submodule, and are usually imported under
the plt alias:
import matplotlib.pyplot as plt
Plot():
scatter():
The scatter() function plots one dot for each observation. It needs two arrays of the same length, one for
the values of the x-axis, and one for values on the y-axis
bar():
plt.bar(x, y)
hist():
The hist() function will use an array of numbers to create a histogram, the array is sent into the function
as an argument.
Linear Regression:
Linear regression uses the relationship between the data-points to draw a straight line through all them.
Polynomial Regression:
If your data points clearly will not fit a linear regression (a straight line through all data points), it might
be ideal for polynomial regression.
Polynomial regression, like linear regression, uses the relationship between the variables x and y to find
the best way to draw a line through the data points.
IMPLEMENTATION:
1. Plot the Age across Weight using matplotlib. Consider Age and Weight are 1D
array of 10 members. Plot them in X and Y –axis using plot() function.
2. Plot a graph between sales of Car by Maruti in each year 2015-2022. Fix the size
of graph, use specific color of line for visualizing.
import numpy as np
from google.colab import files
sp=files.upload()
Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving student-mat.csv to student-mat.csv
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("student-mat.csv")
plt.scatter(data['age'], data['traveltime'])
plt.title("Scatter Plot")
plt.xlabel('age')
plt.ylabel('traveltime')
plt.show()
4. Read a real-time data in CSV form[Iris, Toy, Car etc.] and analyze features
(i) Finding median, outliers using box plot – single feature[continuous].
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
arr = np.random.randint(1, 20, size=30)
arr1 = np.append(arr, [27, 30])
print('Thus the array becomes{}'.format(arr1))
q1 = np.quantile(arr1, 0.25)
q3 = np.quantile(arr1, 0.75)
med = np.median(arr1)
iqr = q3-q1
upper_bound = q3+(1.5*iqr)
lower_bound = q1-(1.5*iqr)
print(iqr, upper_bound, lower_bound)
plt.boxplot(arr1)
fig = plt.figure(figsize =(10, 7))
plt.show()
q1 = np.quantile(arr1, 0.25)
q3 = np.quantile(arr1, 0.75)
med = np.median(arr1)
iqr = q3-q1
upper_bound = q3+(1.5*iqr)
lower_bound = q1-(1.5*iqr)
print(iqr, upper_bound, lower_bound)
outliers = arr1[(arr1 <= lower_bound) | (arr1 >= upper_bound)]
print('The following are the outliers in the boxplot:{}'.format(outliers))
arr2 = arr1[(arr1 >= lower_bound) & (arr1 <= upper_bound)]
plt.figure(figsize=(12, 7))
plt.boxplot(arr2)
plt.show()
import numpy as np
from google.colab import files
sp=files.upload()
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
print(sns.get_dataset_names())
[]
tips_df=sns.load_dataset('tips')
print(tips_df)
sns.lineplot(x="sex", y="total_bill", data=tips_df)
plt.title('Title using Matplotlib Function')
plt.show()
BOX PLOT:
sns.boxplot(x='day',y='total_bill',data=tips_df,hue='sex',palette='afmhot')
plt.legend(loc=0)
(ii) Finding distribution using bar plot and histogram – Two features [categorical or
grouped].
BARPLOT:
sns.barplot(x='day',y='tip', data=tips_df,
hue='sex')
plt.show()
HISTOGRAM:
sns.histplot(x='total_bill', data=tips_df,kde=True, hue='sex')
plt.show()
(iii) Finding distribution across feature using scatter plot and Bubble chart – 3 or
more features [continuous/ categorical]
SCATTERPLOT:
sns.scatterplot(x='day', y='tip', data=tips_df)
plt.show()
sns.scatterplot(x='day', y='tip', data=tips_df,
hue='sex')
plt.show()
BUBBLE CHART:
import plotly.graph_objects as go
fig = go.Figure(data=[go.Scatter(
x=[1, 2, 3, 4], y=[10, 11, 12, 13],
mode='markers',
marker=dict(
color=['rgb(93, 164, 214)', 'rgb(255, 144, 14)',
'rgb(44, 160, 101)', 'rgb(255, 65, 54)'],
opacity=[1, 0.8, 0.6, 0.4],
size=[40, 60, 80, 100],
)
)])
fig.show()
5. Plot any two features from the dataset in scatter plot and find linear regression
between the features and plot the linear fit model
import numpy as np
from google.colab import files
sp=files.upload()
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
score_df = pd.read_csv('student_scores.csv')
score_df.head()
score_df.describe()
X = score_df.iloc[:, :-1].values
y = score_df.iloc[:, 1].values
print(y)
[30 90 80 45 67]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
plt.scatter(X_train, y_train,color='g')
plt.plot(X_test, y_pred,color='k')
plt.show()
6. Plot any two features from the dataset in scatter plot and find polynomial
regression between the features and plot the polynomial model
import numpy as np
from google.colab import files
sp=files.upload()
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
def viz_linear():
plt.scatter(X, y, color='red')
plt.plot(X, lin_reg.predict(X), color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_linear()
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree=4)
X_poly = poly_reg.fit_transform(X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, y)
def viz_polymonial():
plt.scatter(X, y, color='red')
plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
return
viz_polymonial()
lin_reg.predict([[5.5]])
pol_reg.predict(poly_reg.fit_transform([[5.5]]))
array([132148.43750002])
RESULT:
Thus the Matplotlib using Python programming has been understood and executed successfully.