0% found this document useful (0 votes)
81 views28 pages

DEV Lab Record - Merged

The program loads iris and wine review datasets. It creates various plots including: - Color scatter plot of iris dataset attributes - Line charts of iris attributes over samples - Histogram and bar chart of wine review scores - Multiple histograms of iris attributes - Vertical and horizontal bar charts of wine review scores It also shows the highest cost counties for wine reviews and a correlation matrix heatmap for iris attributes.

Uploaded by

HEMANTH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views28 pages

DEV Lab Record - Merged

The program loads iris and wine review datasets. It creates various plots including: - Color scatter plot of iris dataset attributes - Line charts of iris attributes over samples - Histogram and bar chart of wine review scores - Multiple histograms of iris attributes - Vertical and horizontal bar charts of wine review scores It also shows the highest cost counties for wine reviews and a correlation matrix heatmap for iris attributes.

Uploaded by

HEMANTH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

AD3301 DATA EXPLORATION AND

VISUALISATION

Name of the Student : ……………………………….

Register Number : ……………………………….

Year / Semester / Section :………………………………

Branch : ……………………………….

Department of Artificial Intelligence and Data Science


Register No:

BONAFIDE CERTIFICATE

Certified that this is the bonafide record of work done by


Mr./Ms.……………………………………………………… of …… semester
B.Tech…………………………………..….in the ………………………………………
Theory cum Laboratory during the academic year ……............................

Staff in-Charge Head of the Department

Submitted for the University Practical Examination held on …………………………

Internal Examiner External Examiner

Date: …………… Date: ……………...


LIST OF PROGRAMS

S.No Title of the Experiments Date Page No. Signature


.
1. Write a python program for basic calculator 2

2. Write a python program for Data Visualizations 4

Write a python program for performing Time Series


3. 10
Visualizations.

Write a python program for performing Exploratory


4. 15
Data Analysis using Wine Data sets

Write a python program to visualize the lollipop chart


5. 19
using pandas. using company sales data set

Write a python program to create a data frame and


perform the transformations like deduplication
6. 21
identification, handling missing values and replace
values.

Write a python program to Explore and visualize of your


7. own dataset/ data frame and perform numerical 23
summaries and spread level.

Write a python program using Numpy package functions


8. 25
and 2D array to perform simple matrix operation.

1
1. BASIC CALCULATOR

1. Problem Statement: Install the python software and write a python program
to create a simple calculator which can perform basic arithmetic operations
like addition, subtraction, multiplication or division depending upon the user
input.

2. Expected Learning Outcomes: To train the students to understand the


basics structure of python program and basic mathematical operations.

3. Problem Analysis: Simple calculation lime addition, subtraction,


multiplication or division can be performed.

4. A. Input: Get the option for performing operation, get the numbers for doing
basic operations. For example it takes the input as follows

Please select operation -


1. Add
2. Subtract
3. Multiply
4. Divide
Select operations form 1, 2, 3, 4: 1
Enter first number: 20
Enter second number: 13

B. Output: It will print the basic operations and output. It do perform


the operations as follows for the above operation selection.
20 + 13 = 33
5. Algorithm:

Step1: Get the Option for Operation Selection.


Step 2: Get the First Number.
Step 3: Get the Second Number.
Step 4: Perform the selected Operation.

6. Coding:
# Python program for simple calculator
# Function to add two numbers
def add(num1, num2):
return num1 + num2
# Function to subtract two numbers
def subtract(num1, num2):
return num1 - num2
# Function to multiply two numbers
def multiply(num1, num2):
return num1 * num2

2
# Function to divide two numbers
def divide(num1, num2):
return num1 / num2
print("Please select operation -\n" \
"1. Add\n" \
"2. Subtract\n" \
"3. Multiply\n" \
"4. Divide\n")
# Take input from the user
select = int(input("Select operations form 1, 2, 3, 4 :"))
number_1 = int(input("Enter first number: "))
number_2 = int(input("Enter second number: "))

if select == 1:
print(number_1, "+", number_2, "=",
add(number_1, number_2))
elif select == 2:
print(number_1, "-", number_2, "=",
subtract(number_1, number_2))
elif select == 3:
print(number_1, "*", number_2, "=",
multiply(number_1, number_2))
elif select == 4:
print(number_1, "/", number_2, "=",
divide(number_1, number_2))
else:
print("Invalid input")

7. Test Case:

Please select operation -


1. Add
2. Subtract
3. Multiply
4. Divide
a. Actual Result:
Select operations form 1, 2, 3, 4 : 3
Enter first number: 20
Enter second number: 13

20 * 13 = 260

3
2. DATA VISUALIZATIONS

1. Problem Statement: Write a python program for Data Visualizations.

2. Expected Learning Outcomes: To train the students to understand the


different data visualization methods.

3. Problem Analysis: Take two different datasets and plotting different charts
and graphs.

A. Input: IRIS Dataset and Wine Review Data set taken as the Input.
B. Output: Different graph plotting.

4. Algorithm:

Step1: Load the IRIS Dataset and Wine Review Dataset


Step 2: Create the Color Scatter Plot of IRIS Dataset.
Step 3: Create the Line chart for each attribute of IRIS Dataset.
Step 4: Create the Histogram for Wine Review Scores.
Step 5: Create the Bar Chart for Wine Review Scores.
Step 6: Create the multiple histograms for attributes of IRIS Dataset.
Step 7: Create the vertical bar chart for Wine Review Scores using plot.bar().
Step 8: Create the horizontal bar chart for Wine Review Scores using
plot.bar().
Step 9: Create the bar chart for Wine Review with highest cost five different
Counties.

5. Coding:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
iris = pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width',
'petal_length', 'petal_width', 'class'])
print(iris.head())
wine_reviews = pd.read_csv('winemag-data-130k-v2.csv', index_col=0)
wine_reviews.head()

# Create Color Scatter Plotting


colors = {'Iris-setosa':'r', 'Iris-versicolor':'g', 'Iris-virginica':'b'}
# create a figure and axis
fig, ax = plt.subplots()
# plot each data-point
for i in range(len(iris['sepal_length'])):
ax.scatter(iris['sepal_length'][i],
iris['sepal_width'][i],color=colors[iris['class'][i]])
# set a title and labels
ax.set_title('Iris Dataset')
ax.set_xlabel('sepal_length')
ax.set_ylabel('sepal_width')
plt.show()

4
# Create Line Chart Plotting
columns = iris.columns.drop(['class'])
# create x data
x_data = range(0, iris.shape[0])
# create figure and axis
fig, ax = plt.subplots()
# plot each column
for column in columns:
ax.plot(x_data, iris[column], label=column)
# set title and legend
ax.set_title('Iris Dataset')
ax.legend()
plt.show()

# create figure and axis


fig, ax = plt.subplots()
# plot histogram
ax.hist(wine_reviews['points'])
# set title and labels
ax.set_title('Wine Review Scores')
ax.set_xlabel('Points')
ax.set_ylabel('Frequency')
plt.show()

# create a figure and axis


fig, ax = plt.subplots()
# count the occurrence of each class
data = wine_reviews['points'].value_counts()
# get x and y data
points = data.index
frequency = data.values
# create bar chart
ax.bar(points, frequency)
# set title and labels
ax.set_title('Wine Review Scores')
ax.set_xlabel('Points')
ax.set_ylabel('Frequency')
plt.show()

iris.plot.hist(subplots=True, layout=(2,2), figsize=(10, 10), bins=20)


plt.show()

wine_reviews['points'].value_counts().sort_index().plot.bar()
plt.show()

wine_reviews['points'].value_counts().sort_index().plot.barh()
plt.show()

wine_reviews.groupby("country").price.mean().sort_values(ascending=False)[:
5].plot.bar()

5
plt.show()

# Correlation Matrix
corr = iris.corr()
fig, ax = plt.subplots()
# create heatmap
im = ax.imshow(corr.values)

# set labels
ax.set_xticks(np.arange(len(corr.columns)))
ax.set_yticks(np.arange(len(corr.columns)))
ax.set_xticklabels(corr.columns)
ax.set_yticklabels(corr.columns)

# Rotate the tick labels and set their alignment.


plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")

# Loop over data dimensions and create text annotations.


for i in range(len(corr.columns)):
for j in range(len(corr.columns)):
text = ax.text(j, i, np.around(corr.iloc[i, j], decimals=2),
ha="center", va="center", color="black")
plt.show()

6. Test case :

Scatter Plot of IRIS Dataset

Actual
a.
Result:

Line chart for each attributes of IRIS Dataset

6
Histogram for Wine Review Scores.

7
Bar Chart for Wine Review Scores

Multiple histogram for attributes of IRIS Dataset

Vertical bar chart for Wine Review Scores

Horizontal bar chart for Wine Review Scores

8
Bar chart for Wine Review with highest cost five different
Counties.

Correlation Matrix

9
3. TIME SERIES VISUALIZATIONS

1. Problem Statement: Write a python program for performing Time Series


Visualizations.

2. Expected Learning Outcomes: To train the students to understand the different


time series visualization methods.

3. Problem Analysis: Take time series stock datasets and plotting different charts
and graphs.

A time-series analysis consists of methods for analyzing time series data in order
to extract meaningful insights and other useful characteristics of data.

4. Algorithm:

Step1: Load the stock price Dataset into the Data frame.
Step 2: Removing the unwanted columns from Data frame.
Step 3: Create a line plot for time series data using ‘Volume’.
Step 4: Create line sub-plots for showing seasonality.
Step 5: Create the Bar Chart for showing the month wise average ‘Volume’.
Step 6: Create the low difference in values of a specified interval.
Step 7: Create the low difference in values of a specified interval.
Step 8: Create plot the changes that occurred in data over time.

5. Coding:
# Time Series Visualisation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display
# reading the dataset using read_csv
df = pd.read_csv("stock_data.csv",parse_dates=True,index_col="Date")
# displaying the first five rows of dataset
display(df.head())

# deleting column
df.drop(columns='Unnamed: 0')
# set the title
plt.title('Volume Plot')
df['Volume'].plot()
plt.show()

df.plot(subplots=True, figsize=(10, 12))


plt.show()

# Resampling the time series data based on monthly 'M' frequency


df_month = df.resample("M").mean()

# using subplot
fig, ax = plt.subplots(figsize=(10, 6))

10
# plotting bar graph
ax.bar(df_month['2016':].index,
df_month.loc['2016':, "Volume"],
width=25, align='center')
plt.show()

df.Low.diff(2).plot(figsize=(10, 6))
plt.show()

df.High.diff(2).plot(figsize=(10, 6))
plt.show()

df['Change'] = df.Close.div(df.Close.shift())
df['Change'].plot(figsize=(10, 8), fontsize=16)
plt.show()

7. Test case:

A line plot for time series data using ‘Volume’.

Actual
a
Result
.
:

11
Line sub-plots for showing seasonality

Bar Chart for showing the month wise average ‘Volume’.

12
Low difference in values of a specified interval

High difference in values of a specified interval

13
Plot the changes that occurred in data over time.

14
4. WINE QUALITY ANALYSIS

1. Problem Statement: Write a python program for performing Exploratory Data


Analysis using Wine Data sets.

2. Expected Learning Outcomes: To train the students to understand the different


Exploratory Data Analysis using visualization methods.

3. Problem Analysis: Take wine datasets and plotting different charts and graphs.

An Exploratory Data Analysis consists of methods for analyzing data in order to


extract meaningful insights and other useful characteristics of data.

4. Algorithm:

Step1: Load the dataset from Online


Step 2: Understand the check quality using bar chart.
Step 3: Find the correlated columns.
Step 4: Generate the heat map to understand the correlation.
Step 5: Find Alcohol Distribution Graph Level through a distribution.
Step 6: Perform a boxplot quality of wine with respect to alcohol
concentration
Step 7: Perform a joint plot illustrating the correlation between alcohol
concentration and the pH values

5. Coding:
# Red Wine Quality Analysis
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.multivariate.factor import Factor
from scipy.stats import skew
df_red = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-
databases/wine-quality/winequality-red.csv", delimiter=";")
print(df_red.columns)
print(df_red.describe())
print(df_red.shape)
print(df_red.info())
sns.set(rc={'figure.figsize': (14, 8)})
sns.countplot(df_red['quality'])
plt.show()
sns.pairplot(df_red)
plt.show()
sns.heatmap(df_red.corr(), annot=True, fmt='.2f', linewidths=2)
plt.show()
sns.distplot(df_red['alcohol'])
plt.show()

15
print(skew(df_red['alcohol']))
sns.boxplot(x='quality', y='alcohol', data = df_red, showfliers=False)
plt.show()
sns.jointplot(x='alcohol',y='pH',data=df_red, kind='reg')
plt.show()

6. Test Case:

Perform a quality wise analysis using bar chart.

Actual
a.
Result:

Correlation amounts the columns

Heatmap display for the correlation of the columns

16
Alcohol distribution level

17
Boxplot quality of wine with respect to alcohol
concentration.

A joint plot illustrating the correlation between


alcohol concentration and the pH values.

18
5. COMPNAY SALES ANALYSIS

1. Problem Statement: Write a python program to visualize the lollipop chart


using pandas. using company sales data set.

2. Expected Learning Outcomes: To train the students to understand the


visualize the data sets using lollipop chart.

3. Problem Analysis: Take company sales data set and plotting using lollipop
chart. It is a method for analyzing data in order to extract meaningful insights
and other useful characteristics of data.

4. Algorithm:

Step1: Load the dataset from Online


Step 2: Creating an empty chart.
Step 3: Plot using plt.stem.
Step 4: Format the chart.

5. Coding:

# Red Customer Data Analysis


# importing modules
from pandas import *
import matplotlib.pyplot as plt

# reading CSV file


d = read_csv("company_sales_data.csv")

# creating an empty chart


fig, axes = plt.subplots()

# plotting using plt.stem


axes.stem(d['month_number'], d['total_profit'],
use_line_collection=True, basefmt=' ')

# starting value of y-axis


axes.set_ylim(0)

# details and formatting of chart


plt.title('Total Profit')
plt.xlabel('Month')
plt.ylabel('Profit')
plt.xticks(d['month_number'])
plt.show()

19
6. Test Case:

Perform company sales profit analysis for


different months.

Actual
a.
Result:

20
6. DATA TRANSFORAMTION

1. Problem Statement: Write a python program to create a data frame and


perform the transformations like deduplication identification, handling
missing values and replace values.

2. Expected Learning Outcomes: To train the students to understand the Data


Transfermations.

3. Problem Analysis: Create a data frame and perform the data transformation
operations like deduplication and handling missing values.

4. Algorithm:

Step1: Create a Data Frame with values


Step 2: Creating an empty chart.
Step 3: Identify the duplicate data sets
Step 4: Perform De-duplication
Step 5: Assign the NaN values
Step 6: Replace the NaN values with mean values.

5. Coding:
# Data frame creation and Transformation
import pandas as pd
import numpy as np
data = pd.DataFrame({"a": ["one","two"]*3,
"b": [1,1,2,3,2,3]})
print(data)
print(data.duplicated())
print(data.drop_duplicates())
data["c"]=np.nan
data.replace([np.nan],data.b.mean(),inplace=True)
print(data)

6. Test Case:
Data frame creation and transformation
a b
0 one 1
Actual 1 two 1
a. 2 one 2
Result:
3 two 3
4 one 2
5 two 3

21
0 False
1 False
2 False
3 False
4 True
5 True
dtype: bool

a b
0 one 1
1 two 1
2 one 2
3 two 3

a b c
0 one 1 2.0
1 two 1 2.0
2 one 2 2.0
3 two 3 2.0
4 one 2 2.0
5 two 3 2.0

22
7. NUMERICAL SUMMARIES AND PERCENTAGE TABLE CREATION

1. Problem Statement: Write a python program to Explore and visualize of your


own dataset/ data frame and perform numerical summaries and spread level.
A) Floating values into two columns from single variable.
B) Perform Descriptive Analysis
C) Perform Percentage Table both row and column.
2. Expected Learning Outcomes: To train the students to understand the
numerical summaries and spread level
3. Problem Analysis: Create a data frame and perform the data presentation and
numerical summaries.

7. Algorithm:

Step1: Create a Data Frame with values


Step 2: Perform Line plotting
Step 3: Identify the duplicate data sets
Step 4: Perform De-duplication
Step 5: Assign the NaN values
Step 6: Replace the NaN values with mean values.

8. Coding:
# Numerical Summaries and Spread Level
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
H = [155.4,160.2,162.5,150.5,152.3]
W = [53.2 ,62.5 ,70.3 ,53.2 ,57.4]
H.sort(reverse=True)
W.sort(reverse=True)
data = pd.DataFrame({"Height": H,
"Weight": W})
print("Creating dataframe:")
print(data)
# create figure and axis
fig, ax = plt.subplots()
# plot histogram
ax.plot(data['Height'], data['Weight'])
# set title and labels
ax.set_title('Height Weight Summary')

23
ax.set_xlabel('Height')
ax.set_ylabel('Weight')
plt.show()

data[['Height','Weight']] = data[['Height','Weight']].apply(lambda x:
x/x.sum(), axis=1)
print("Percentages:")
print(data)

9. Test Case:
Data frame creation and present the
Numerical summaries.

Creating data frame:


Height Weight
0 162.5 70.3
1 160.2 62.5
2 155.4 57.4
3 152.3 53.2
4 150.5 53.2

Actual
a.
Result:

Percentages Table:

Height Weight
0 0.698024 0.301976
1 0.719353 0.280647
2 0.730263 0.269737
3 0.741119 0.258881
4 0.738832 0.261168

24
8. CREATE, INDEXING AND SLICING OF 1D, 2D AND 3D ARRAYS USING
NUMPY

1. Problem Statement: Write a python program using Numpy package


functions and 2D array to perform simple matrix operation.
2. Expected Learning Outcomes: To train the students to understand the
fundamental of NumPy array.
3. Problem Analysis: Create arrays and perform various operations.

10. Algorithm:

Step1: Create a NumPy array


Step 2: Print array rank
Step 3: Create a two-dimensional Array
Step 4: Crate an array using tuple
Step 5: Perform Indexing
Step 6: Print array values.

11. Coding:
# Numpy 1D and 2D Array manipulation
# Python program for
# Creation of Arrays
import numpy as np

# Creating a rank 1 Array


arr = np.array([1, 2, 3])
print("Array with Rank 1: \n",arr)

# Creating a rank 2 Array


arr = np.array([[1, 2, 3],
[4, 5, 6]])
print("Array with Rank 2: \n", arr)

# Creating an array from tuple


arr = np.array((1, 3, 2))
print("\nArray created using "
"passed tuple:\n", arr)
# Two Dimensional Array
arr = np.array([[-1, 2, 0, 4],
[4, -0.5, 6, 0],
[2.6, 0, 7, 8],
[3, -7, 4, 2.0]])
print("Initial Array: ")

25
print(arr)

# Printing a range of Array


# with the use of slicing method
sliced_arr = arr[:2, ::2]
print ("Array with first 2 rows and"
" alternate columns(0 and 2):\n", sliced_arr)

# Printing elements at
# specific Indices
Index_arr = arr[[1, 1, 0, 3],
[3, 2, 1, 0]]
print ("\nElements at indices (1, 3), "
"(1, 2), (0, 1), (3, 0):\n", Index_arr)

12. Test Case:


Create and perform basic operations in Array

Array with Rank 1:


[1 2 3]
Array with Rank 2:
[[1 2 3]
[4 5 6]]

Array created using passed tuple:


[1 3 2]
Initial Array:
Actual
a. [[-1. 2. 0. 4. ]
Result:
[ 4. -0.5 6. 0. ]
[ 2.6 0. 7. 8. ]
[ 3. -7. 4. 2. ]]

Array with first 2 rows and alternate


columns(0 and 2):
[[-1. 0.]
[ 4. 6.]]

Elements at indices (1, 3), (1, 2), (0, 1), (3, 0):
[0. 6. 2. 3.]

26

You might also like