0% found this document useful (0 votes)
35 views7 pages

DV Lab Fat

The document is a lab report submitted by S.S. Bhaskar Reddy to Professor Ramani detailing data visualization exercises completed using the built-in airquality dataset in Python. The report includes 5 questions answered with code snippets and explanations. Question 1 involves creating scatter plots, histograms, bar plots and boxplots to analyze relationships between variables in the airquality dataset. Question 2 involves creating a student performance dataset and using it to show linear regression between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views7 pages

DV Lab Fat

The document is a lab report submitted by S.S. Bhaskar Reddy to Professor Ramani detailing data visualization exercises completed using the built-in airquality dataset in Python. The report includes 5 questions answered with code snippets and explanations. Question 1 involves creating scatter plots, histograms, bar plots and boxplots to analyze relationships between variables in the airquality dataset. Question 2 involves creating a student performance dataset and using it to show linear regression between variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lab FAT

Data Visualization - CSE3020


Slot: L27+L28

Name:S.S.bhaskar reddy
REG NO:18BCE0808

Submitted to-Prof. Ramani S

School of Computer Science & Engineering


Question 1

1. Using the built-in data set airquality, create a scatter plot comparing the
Temp and Ozone variables. Does there appear to be a relationship?
2. Create a histogram of the Temp variable. Can you adjust the binning so that
there are (approximately) 25 bins? Does this look to be approximately normally
distributed?
3. Plot the frequency of observations in each Month. Are the months equally
represented?
4. Plot a graph between the Ozone and Wind values
5. Create a boxplot to view the distribution of Ozone for each month. Do the
distributions differ across the months?

Code:-
1. import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\admin\\Downloads\\airquality.csv")
x=df["Temp"]
y=df["Ozone"]
plt.scatter(x,y)
plt.show()

Explanation:
Yes, there indeed appears to be a direct relationship between the Temp & Ozone
variables as Ozone levels rise up when the temperature rises.
2.x=df["Temp"]
plt.hist(x, bins=25)

Explanation:
A histogram is created for the variable Temp and the binn value is set to 25 (changed from the
default value of 30).
Yes, the distribution as seen from the histogram looks to be a normal distribution.

3.g={}
for i in df["Month"]:
if i in g:
g[i]+=1
else:
g[i]=1
g
x=[]
y=[]
for i in g:
x.append(i)
y.append(g[i])
x
y
plt.bar(x,y)
Explanation:
Yes, the months are represented equally.

4.x=df["Ozone"]
y=df["Wind"]
plt.scatter(x,y)
plt.show()

Explanation:
The plot is created showing the relation of Ozone and Wind.
5.import seaborn as sns
ax = sns.boxplot(x=df["Month"], y=df["Ozone"], data=df)

Explanation:
Yes, the distributions differ across the months.

Question 2

Create your own (Student Record) with 5 attributes and 20 observations. Show the key and value
attributes in the dataset using different colors by displaying in a table form.

code:
df3=pd.read_csv("C:\\Users\\admin\\Desktop\\archive\\StudentsPerformance.csv")
from sklearn.model_selection import train_test_split as tr
X,X_test,y,y_test=tr(df3["math score"].values.reshape(-1,1),df3["writing score"],test_size=0.3)
from sklearn.linear_model import LinearRegression
model2=LinearRegression()
model2.fit(X,y)
sns.regplot(x=X, y=y, data=df3);
from sklearn.model_selection import train_test_split as tr
X,X_test,y,y_test=tr(df3["reading score"].values.reshape(-1,1),df3["writing score"],test_size=0.3)
from sklearn.linear_model import LinearRegression
model2=LinearRegression()
model2.fit(X,y)
sns.regplot(x=X, y=y, data=df3)

Explanation: shown LinearRegression from data

cm=sns.light_palette("red",as_cmap=True)
df3.style.background_gradient(cmap='Blues')
Explanation:shown data set used

You might also like