DADV - Lab - Subject - 303105315
DADV - Lab - Subject - 303105315
Laboratory Manual-DADV
CERTIFICATE
TABLE OF CONTENT
Practical-1
Aim: Perform Exploratory Data Analysis on the given dataset using Python.
Procedure:
Program:
df = pd.read_csv('titanic.csv')
df.info()
df.describe()
#Find the duplicates
df.duplicated().sum()
#unique values
df['Pclass'].unique()
df['Survived'].unique()
df['Sex'].unique()
array([3, 1, 2], dtype=int64)
array([0, 1], dtype=int64)
array(['male', 'female'], dtype=object)
#Filter data
df[df['Pclass']==1].head()
#Boxplot
df[['Fare']].boxplot()
Aim: Calculate mean, median and mode of the first 50 records in the given dataset using python
import pandas as pd
Marks.xlsx') f =
pd.DataFrame(d)
print(f)
meanm = d["Maths"].mean()
medianm = d["Maths"].median()
modem = d["Maths"].mode()
meanp = d["Physics"].mean()
medianp =
d["Physics"].median() modep =
d["Physics"].mode() cor =
d.corr()
print("Maths:")
print("Mean: ",meanm)
print("Median: ",medianm)
print("Mode: ",modem)
print("Physics:")
print("Mean: ",meanp)
print("Median: ",medianp)
print("Mode: ",modep)
print("Correlation: ")
print(cor)
'Physics') plt.show()
OUTPUT:
import pandas as pd
import numpy as np
CODES\PYTHON\JEET.xlsx') df.head()
x=df.drop(['V'],axis=1).value
s y=df['V'].values
print(x)
print(y)
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
ml=LinearRegression()
ml.fit(x_train,y_train)
y_pred=ml.predict(x_test)
print(y_pred)
r2_score(y_test,y_pred)
plt.figure(figsize=(15,10))
plt.scatter(y_test,y_pred)
plt.xlabel('Original')
plt.ylabel('Predicted')
plt.title('Original vs Predicted')
pred_y_df=pd.DataFrame({'Original Value':y_test,'Predicted
Value':y_pred,'Difference':y_test-y_pred})
pred_y_df[0:20]
OUTPUT
Practical-4
CODE:
from sklearn.datasets import load_digits
import pandas as pd
digits = load_digits()
dir(digits)
digits.data[4]
plt.gray()
plt.matshow(digits.images[1])
digits.target[0:5]
x_train,x_test,y_train,y_test = train_test_split(digits.data,digits.target,test_size=0.2)
len(x_test)
logistic = LogisticRegression()
logistic.fit(x_train,y_train)
logistic.score(x_test,y_test)
plt.matshow(digits.images[84])
digits.target[84]
logistic.predict([digits.data[84]])
OUTPUT
Aim: Use a dataset & apply K means clustering to get insights from data.
CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
Aim: Study and installation of the tools like PowerBI tool for data Visualization.
Select Download from the Download Center to download the Power BI Desktop
executable from the Download Center page. Then, specify a 32-bit or 64-bit installation
file to download.
Note: Windows Users can ignore the instructions beyond this point. The instructions ahead
are specifically for MAC users.
Since Power BI is not directly available for mac users, we will use an AWS container to run
an instance of windows ten where you can download power BI and install it for use.
2. Select Personal when asked how you plan to use this account
3. After completing all the steps, you will be asked to enter billing information, which is a
mandatory step.
4. Login into your AWS console, search for “workspaces,” and click on the Workspaces icon as shown below.
When you click launch, you will be prompted to view the workspace console; clicking on this button will
take you to the console where it says that it will take 20 min to set the workspace; typically, it takes 15-30
mining, my experience.
14. Now in this waiting time, download the appropriate Workspace RD client from this link -
https://clients.amazonworkspaces.com/?refid=ps_a134p000006vmxaaai&trkcampaign=acq
_paid_ search_brand
15. Now Install this and wait for your Workspace to finish setting up
16. Start the Amazon Workspace client on your Mac, and you should see something like the below
screenshot
14. Remember that you did not create a password in step 11. It would help if you were getting
an email to create a password. Use that password and the username used in step 11 here.
15. After successful login, you should see something like this.
20. Select Download from the Download Center to download the Power BI Desktop executable from
the Download Center page. Then, specify a 32-bit or 64-bit installation file to download.
21. Follow all the steps followed for Power BI installation on Windows.
Practical-8
Aim: Load a dataset from different sources in PowerBI and apply transformations to it.
First, you need to open the Power BI report that you want to update in Power BI Desktop.
Navigate to the “Data” tab, click the Transform data drop-down, and then Transform data to get to
Power Query.
Step 3: Click on New Source and connect to the new data source. Confirm the column names in Table 1
(new data source) are the same as that of the Sales table (current data source).
Select Table 1, navigate to the “View” tab, and click on Advanced Editor.
Locate the M code that specifies the new data source and copy from beginning up to [data].
Step 6: Select Sales table and click Advance Editor. Replace the M code up to [data] with the code you
copied from Table1 and click done.
After making the necessary changes, the user should review the M code and ensure it accurately reflects
the new data source.
Practical-9
Aim: Study and Plot various graphs for Data Visualization on PowerBI.
Scatter charts
Bubble charts
You can use a bubble chart in many of the same scenarios as a scatter chart. Here are some of the other
ways you can use bubble charts:
Use cases for the dot plot chart are similar to the scenarios described for scatter and bubble charts. The
primary advantage of dot plot charts is the ability to include categorical data along the horizontal axis.
Expand Sales and select the Sales Per Sq Ft and Total Sales Variance
% checkboxes.
You can now convert the clustered column chart visual into a scatter chart.
Notice the changes to the Visualizations pane. The District field is now listed
under Values. The chart axes are also different. Make sure that Power BI plots
the Sales Per Sq Ft field along the X Axis and the Total Sales Variance
% field along the Y Axis.
Power BI creates data points where the data values intersect along the x
and y axes. The data point colors represent different districts.
Practical-10
1. HEATHROW
Heathrow airport is an international airport in London. It is the second busiest
international airport in the world after Dubai international airport. And, also the
seventh-largest in terms of total passenger traffic.
THE CHALLENGE
Being the world’s seventh busiest airport in overall passenger traffic, one can only
imagine the level of efficiency and efforts expected from the airport’s ground
management to keep the airport functioning properly. Managing over 2,00,000
passengers every day can be quite a challenging task for airport authorities and
ground staff. Every department needs to be in absolute coordination and sync to be
able to manage the passenger traffic and give them a smooth experience at the
airport. At such busy airports, every day brings new challenges and uncertainties
with it. Unexpected disruptions in the smooth workflow of operations at the airport
disturb the entire functioning. Issues can arise due to stormy weather, delayed
flights, canceled flights, shifts in jet streams, etc. disturbing the airport’s smooth
functioning. Such problems send the passengers as well as airport employees into
turmoil.
The airport needed a central digitalized management system as a solution to this
problem. Such a system would use the large amounts of data being produced by
operational systems at the airport and transform it into useful visual insights. The
interpretations produced by the BI tool can be used by airport staff for better
functioning and passenger management.
THE CHANGE
Heathrow group went with Microsoft Power BI as their business intelligence
software and Microsoft Azure for cloud services. The airport has deployed Microsoft
Azure technology to collect data from back-end operational systems at the airport.
These systems are check-in counters, baggage tracking systems, flight schedules,
and weather tracking systems, cargo tracking and many more.
The operational data from these systems are forwarded to business intelligence
platforms like Power BI. In Power BI, users shape this data into useful
information that the airport staff can use.
Services such as Azure Stream Analytics, Azure Data Lake Analytics, and Azure
SQL Database are used to extract, clean and prepare operational data in real-time.
This data is about flight movements, security queues, passenger transfers, and
immigration queues. Ultimately, Power BI uses data from these Azure services for
analysis and interpretation.
Operational data from different data sources come into Power BI. Then Power BI
tools are used to transform that data into meaningful insights with the help of visual
reports, graphics, and dashboards. About 75,000 airport employees have information
on their fingertips by the virtue of Power BI.
Let us understand this with the help of a real-world example. If there is a change in
the jet stream, it may delay about 20 flights in a day. This will result in about 6,000
passengers waiting at the airport at a given point of time. It will increase passenger
traffic and density at the airport. Power BI works like the centralized information
system. The airport uses it to inform about the sudden passenger influx. This
information goes out to different sections such as food outlets, immigration,
customs, gate attenders, baggage handlers at the airport. This will give them time
to prepare themselves to attend the passengers.
With the presence of smart BI solutions like Power BI, airport staff is notified in
advance about the probable delays and the sudden rush of passengers at the airport.
This help management groups and other employees to take suitable actions in advance
like increasing the food stock, adding extra passenger buses, increasing the
ground staff, directing the passengers to the waiting area, etc. to avoid any last-
minute hustle.
Thus, with the help of a powerful BI tool like Power BI, Heathrow has been benefited
in more than one way. They are extremely happy and satisfied with the capabilities of
Power BI helping them give a hassle-free airport experience to their passengers.
Heathrow also is extending Power BI applications by trying to anticipate passenger
flow at the airport to avoid any unexpected disruptions for the passengers.
Step2: After that download the dataset for performing data analysis. Here I have
downloaded Sales of chocolates to importing to different countries and containing
info about the dealer
Step 3: Then open Power BI. The home page will look like below page