CS-605_DataAnalyticsLab_manav
CS-605_DataAnalyticsLab_manav
For
III. Our graduates will exhibit team-work and leadership qualities to meet stakeholder
business objectives in their careers.
IV. Our graduates will evolve in ethical and professional practices and enhance
socioeconomic contributions to the society.
Program Outcomes (POs):
PSO1
PSO2
PSO3
PO10
PO11
PO12
Attainment
PO2
PO3
PO4
PO5
PO6
PO7
PO8
PO9
CO1 Understand and apply the basic PO1
of data analytics concepts of 1 1 1 1 2
statistics and probability.
Apply the data processing
CO2 techniques on Data Frame 1 1 1 1 1 1 1 1 2
using Python Libraries.
CO3 Implement and evaluate the
data analytics techniques using
1 1 1 1 1 2 2
MATLAB, R and
Python
tools.
Able to evaluate or assess
CO4 models with the large volume
1 1 1 1 1 2 1
of data with the help of modern
tools
Define and explain to python
for data cleaning and
1 1 2 1 1 2 2 2
CO5 visualization as a data
analytics tool.
List of Program
S. No. List Course Page
Outcomes No.
Introduction to Python 1-2
Introduction to Analytics
1. Write a Python Program to Get Total Price of all FuelType from CO1, CO5 3-4
Toyota.csv file and show it using a line plot with the following Style
properties. Generated line plot must include following Style properties:
• Line Style dotted and Line-color should be red
• Show legend at the lower right location.
• X label name = Fuel Type
• Y label name = Price
• Add a circle marker.
• Line marker color as red
• Line width should be 3
2 Write a Python Program to Read 'Petrol' and 'CNG' FuelType sales CO1 5
data from Toyota.csv file and show it using the bar chart
3 CO1, CO2 6
Write a Python program to create and display a DataFrame from a
specified dictionary data which has the index labels.
exam_data = {'name': ['Dinesh', 'Suresh', 'Rahul', 'Ravi', 'Manoj',
'Hari', 'Yatharth', 'Saurabh', 'Kapil', 'Salini'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
4 Write a Python program to display a summary of the basic CO1, CO2 7
information like index, columns, non-null values of each column,
memory usage etc. about a specified DataFrame which is Toyota.csv.
5 WAP in Python to insert a column named “AGE_IN_MONTH” and CO3 8
then fill the values into inserted column on the basis of “AGE”
column using user defined function where function return a value.
AGE_IN_MONTH=AGE*12
6 Write a program that accepts a sequence of words as input and write CO2 9
the words in a comma-separated sequence file after sorting them
alphabetically.
7 WAP in Python to copy line-by-line contents of a file into another CO2 10
file.
8 WAP to enter characters one by one and then stored lower characters CO3 11
in the LOWER file and upper characters stored in the UPPER file and
other characters stored in the OTHER file.
9 Explain and understand the steps to solve the following system of CO2, CO3 12-13
linear equations using MATLAB:
2x-3y+4z=5
y+4z+x=10
-2z+3x+4y=0
10 The dataset Toyota.csv may be found on the web page. This dataset CO3, CO4 14
contains Car information: Price, Age, KM, FuelType, KM, CC and
Doors. Save this file and use read. Table to import it into R. What
are the means and standard deviations of the data variables
(excluding
Age)? Apply Data Analytic tool.
INTRODUCTION TO PYTHON
In Python programming section, the applications of Python are taken into account. Applications
of Python which are taken into details according to the syllabus prescribed by RGPV Bhopal for
this lab are:
LAB REQUIREMENTS
This Interpreter has no special hardware requirements as such. Any System with a minimum 256 MB
RAM and any normal processor can use for this lab.
Data Analytics refers to the techniques used to analyze data to enhance productivity and business
gain. Data is extracted from various sources and is cleaned and categorized to analyze various
behavioral patterns. The techniques and the tools used vary according to the organization or individual.
Data Analytics has a key role in improving your business as it is used to gather hidden insights, generate
reports, perform market analysis, and improve business requirements.
• Gather Hidden Insights – Hidden insights from data are gathered and then analyzed with respect to
business requirements.
• Generate Reports – Reports are generated from the data and are passed on to the respective teams and
individuals to deal with further actions for a high rise in business.
• Perform Market Analysis – Market Analysis can be performed to understand the strengths and
weaknesses of competitors.
1
• Improve Business Requirement – Analysis of Data allows improving Business to customer
requirements and experience.
• R programming – This tool is the leading analytics tool used for statistics and data modeling. R
compiles and runs on various platforms such as UNIX, Windows, and Mac OS. It also provides tools
to automatically install all packages as per user-requirement.
• Python – Python is an open-source, object-oriented programming language that is easy to read, write,
and maintain. It provides various machine learning and visualization libraries such as Scikit-
learn, TensorFlow, Matplotlib, Pandas, Keras, etc. It also can be assembled on any platform like SQL
server, a MongoDB database or JSON
• Tableau Public – This is a free software that connects to any data source such as Excel, corporate Data
Warehouse, etc. It then creates visualizations, maps, dashboards etc with real- time updates on the web.
• QlikView – This tool offers in-memory data processing with the results delivered to the end-users
quickly. It also offers data association and data visualization with data being compressed to almost 10%
of its original size.
• SAS – A programming language and environment for data manipulation and analytics, this tool is
easily accessible and can analyze data from different sources.
• Microsoft Excel – This tool is one of the most widely used tools for data analytics. Mostly used for
clients’ internal data, this tool analyzes the tasks that summarize the data with a preview of pivot tables.
• RapidMiner – A powerful, integrated platform that can integrate with any data source types such as
Access, Excel, Microsoft SQL, Tera data, Oracle, Sybase etc. This tool is mostly used for predictive
analytics, such as data mining, text analytics, machine learning.
2
Program 1
Write a Python Program to Get Total Price of all FuelType from Toyota.csv
file and show it using a line plot with the following Style properties. Generated
line plot must include following Style properties
• Line Style dotted and Line-color should be red
• Show legend at the lower right location.
• X label name = Fuel Type
• Y label name = Price
Procedure:
price1=price2=price3=price4=price5=0
for x in df.index:
if df.loc[x,'FuelType']=='CNG':
price1=price1+(df.loc[x,'Price'])
fuel1='CNG'
elif (df.loc[x,'FuelType']=='Petrol') | (df.loc[x,'FuelType']=='petrol'):
price2=price2+(df.loc[x,'Price'])
fuel2='Petrol'
elif (df.loc[x,'FuelType']=='Diesel') | (df.loc[x,'FuelType']=='diesel'):
price3=price3+(df.loc[x,'Price'])
fuel3='Diesel'
elif df.loc[x,'FuelType']=='methane':
price4=price4+(df.loc[x,'Price'])
fuel4='methane'
elif df.loc[x,'FuelType']=='CompressedNaturalGas':
price5=price5+(df.loc[x,'Price'])
fuel5='CompressedNaturalGas'
pricelist=[price1,price2,price3,price4,price5]
#print(l6)
fuelList = [fuel1,fuel2,fuel3,fuel4,fuel5]
plt.plot(fuelList, pricelist, label = 'Price-wise Fuel data',linestyle='dotted',color='red')
plt.xlabel('Fuel Type')
plt.ylabel('Total Price in Rs')
plt.xticks(fuelList)
plt.title('Total Price with Fuel Type')
plt.yticks([82189, 13571052, 1769635, 37600, 38150])
plt.legend()
plt.show()
3
Result:
4
Program-2
Write a Python Program to Read 'Petrol' and 'CNG' FuelType sales data from
Toyota.csv file and show it using the bar chart.
Procedure:
toyota_copy= toyota.copy()
toyota_copy['KM/100'] = toyota_copy['KM'].apply(lambda x: x/100)
toyota_copy['Price/150'] = toyota_copy['Price'].apply(lambda x: x/100)
toyota_copy.drop(['KM','Price'],axis=1,inplace=True)
data = toyota_copy.melt(id_vars='FuelType')
sns.barplot(x='FuelType', y='value', hue='variable', data=data[data['FuelType']!='Diesel'])
plt.gcf().set_size_inches(15, 10)
Output:
5
Program 3
exam_data = {'name': ['Dinesh', 'Suresh', 'Rahul', 'Ravi', 'Manoj', 'Hari', 'Yatharth', 'Saurabh', 'Kapil',
'Salini'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Procedure:
import numpy as np
exam_data = {'name': ['Dinesh', 'Suresh', 'Rahul', 'Ravi', 'Manoj', 'Hari', 'Yatharth', 'Saurabh', 'Kapil',
'Salini'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
data=pd.DataFrame(exam_data)
print(data)
Output:
6
Program-4
Write a Python program to display a summary of the basic information like index,
columns, non-null values of each column, memory usage etc. about a specified
DataFrame which is Toyota.csv.
Procedure:
import pandas as pd
data=pd.read_csv('Toyota1.csv',na_values=['####','????'])
data.info()
Output:
7
Program 5
WAP in Python to insert a column named “AGE_IN_MONTH” and then fill the values into
inserted column on the basis of “AGE” ” column using user defined function where function
return a value.
AGE_IN_MONTH=AGE*12
Procedure:
import pandas as pd
data=pd.read_csv('Toyota1.csv',na_values=['####','????'])
data.insert(2,'AGE_IN_MONTH',0)
def age_update(val):
age_in_month=val*12
return age_in_month
data['AGE_IN_MONTH']=age_update(data['Age'])
print(data[['Age','AGE_IN_MONTH']])
Output:
8
Program 6
Write a program that accepts a sequence of words as input and write the words
in a comma-separated sequence file after sorting them alphabetically.
Procedure:
wrd = input("Input words: ")
wrd_list = wrd.split(",")
wrd_list.sort()
print((', ').join(wrd_list))
Output:
Input words: ITM Gwalior
ITM Gwalior
9
Program-7
Procedure:
# open both files
with open('f1.txt','r') as f1file, open('f2.txt','a') as f2file:
# read content from first file
for line in f1file:
# append content to second file
F2file.write(line)
10
Program 8
WAP to enter characters one by one and then stored lower characters in the LOWER file and
upper characters stored in the UPPER file and other characters stored in the OTHER file.
Procedure:
f1 = open(‘LOWER.txt’, ‘w’)
f2 = open(‘UPPER.txt’, ‘w’)
f3 = open(‘OTHERS.txt’, ‘w’)
c = True
while c:
else:
f3.write(c)
11
Program 9
Explain and understand the steps to solve the following system of linear equations using
MATLAB:
3x-3y+2z=6
2x+2y+3z=12
-z+2x+2y=5
Procedure:
% declaring the matrices based on the equations
A = [3 3 2; 2 2 3; -1 2 2]
b = [6; 12; 5]
% creating augmented matrix
Ab = [A b]
% checking the ranks
if rank(A) == rank(Ab)
display("Unique solution exists")
else
display("Unique solution does not exist")
end
%Now we can find the solution to this system of equations by using 3 methods:
%Conventional way : inv(A) * b
%Using mid-divide routine : A \ b
%Using linsolve routine : linsolve(A, b)
Output:
x_inv =
2.0000e+00
8.8818e-16
-3.0000e+00
12
x_bslash =
2.0000e+00
9.6892e-16
-3.0000e+00
x_linsolve =
2.0000e+00
9.6892e-16
-3.0000e+00
13
Program 10
The dataset Toyota.csv may be found on the web page. This dataset contains Car information:
Price, Age, KM, FuelType, KM, CC and Doors. Save this file and use read.table to import it
into R. What are the means and standard deviations of the data variables (excluding Age)?
Apply Data Analytic tool.
Procedure:
#convert .csv file into dataframe
Data = read.table("data.csv", header= T, sep=";")
#To calculate the mean, standard deviation (Exclude Age)
Print(‘Price Mean:’,mean(data$Price))
Print(‘Price Standard Deviation:’,sd(data$Price))
Print(‘CC Mean:’,mean(data$CC))
Print(‘CC Standard Deviation:’,sd(data$CC))
Output:
Price Mean:10729.272094641614
Price Standard Deviation:3624.7192359440314
CC Mean:1566.988911988912
CC Standard Deviation:187.74178090826746
14