0% found this document useful (0 votes)
11 views38 pages

Da Lab Record

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views38 pages

Da Lab Record

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

B.V.

Raju Institute of Technology


(UGC-Autonomous)
Vishnupur, Narsapur, Medak Dist – 502313

Department of Computer Science Engineering

DATA ANALYTICS LAB RECORD

B.TECH
IV YEAR/I SEM
Reg NO:___________________
NAME:____________________

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK


DATA ANALYTICS

B V RAJU INSTITUTE OF TECHNOLOGY


( U G C - A ut o n o m o us )
V i s hn up ur , N ar s a pur , M e d a k D i st – 50 2 3 1 3

UNIVERSAL LEARNING

CERTIFICATE

This is to certify that this is the bonafide work of

_____________________________ Reg. No. ___________________

of B.Tech _____ year ____ semester, Branch _____________ in

the _____________________________________ laboratory during

the academic year ____________ _.

Staff In-Charge Head of Department

Submitted for the University examination


held on .

Internal Examiner External Examiner

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK.


DATA ANALYTICS

IInnddeexx

E
Exx..N
Noo.. D
Daattee N
Naam
mee oofftthhee EExxppeerriim
meen
ntt P
Pgg..N
Noo..

11 Overview of data types and objects, reading and writing data

22 Control structures, functions, scoping rules, dates and times etc.

33 Data Structures (vectors, arrays, matrices, data frames and lists)

Working with CSV files, XML files, Web Data, JSON files,
44
Databases, Excel files

Univariate analysis: Frequency, Mean, Median, Mode, Variance,


55
Standard Deviation, Skewness and Kurtosis.

66 Bivariate analysis: Linear and logistic regression modeling

77 Multiple Regression analysis

88 Data Visualization: Apply and explore various plotting functions

99 Visualizing Geographic Data with Basemap

1100 Clustering in Data Analytics

Experiment on Tableau (Sample – superstore.csv)


1111

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK.


DATA ANALYTICS LAB MANUAL

Ex No:1 Overview of data types and objects, reading and writing data

Data types are the classification or categorization of data items. It represents the kind of value
that tells what operations can be performed on a particular data. Since everything is an object in
Python programming, data types are actually classes and variables are instances (object) of these
classes. The following are the standard or built
built-in data types in Python:

 Numeric
 Sequence Type
 Boolean
 Set
 Dictionary
 Binary Types( memoryview, bytearray, bytes)

Example:

# DataType Output: str


x = "Hello World"

# DataType Output: int


x = 50

# DataType Output: float


x = 60.5

# DataType Output: complex


x = 3j

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAP


NARSAPUR
UR
DATA ANALYTICS LAB MANUAL

# DataType Output: list


x = ["geeks", "for", "geeks"]

# DataType Output: tuple


x = ("geeks", "for", "geeks")

# DataType Output: range


x = range(10)

# DataType Output: dict


x = {"name": "Suraj", "age": 24}

# DataType Output: set


x = {"geeks", "for", "geeks"}

# DataType Output: frozenset


x = frozenset({"geeks", "for", "geeks"})

# DataType Output: bool


x = True

# DataType Output: bytes


x = b"Geeks"

# DataType Output: bytearray


x = bytearray(4)

# DataType Output: memoryview


x = memoryview(bytes(6))

# DataType Output: NoneType


x = None

Numeric Data type:

The numeric data type in Python represents the data that has a numeric value. A numeric value
can be an integer, a floating number, or even a complex number. These values are defined
as Python int, Python float, and Python complex classes in Python.
 Integers – This value is represented by int class. It contains positive or negative whole
numbers (without fractions or decimals). In Python, there is no limit to how long an
integer value can be.
 Float – This value is represented by the float class. It is a real number with a floating-
point representation. It is specified by a decimal point. Optionally, the character e or E
followed by a positive or negative integer may be appended to specify scientific notation.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

 Complex Numbers – Complex number is represented by a complex class. It is specified


as (real part) + (imaginary part)j. For example – 2+3j

Example:
# Python program to
# demonstrate numeric value

a=5
print("Type of a: ", type(a))

b = 5.0
print("\nType of b: ", type(b))

c = 2 + 4j
print("\nType of c: ", type(c))

Sequence Data Type in Python

The sequence Data Type in Python is the ordered collection of similar or different data types.
Sequences allow storing of multiple values in an organized and efficient fashion. There are
several sequence types in Python –
 Python String
 Python List
 Python Tuple

Example:

# Python Program for


# Creation of String

# Creating a String
# with single Quotes
String1 = 'Welcome to the Geeks World'
print("String with the use of Single Quotes: ")
print(String1)

# Creating a String
# with double Quotes
String1 = "I'm a Geek"
print("\nString with the use of Double Quotes: ")
print(String1)
print(type(String1))

# Creating a String
# with triple Quotes
String1 = '''I'm a Geek and I live in a world of "Geeks"'''

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

print("\nString with the use of Triple Quotes: ")


print(String1)
print(type(String1))

# Creating String with triple


# Quotes allows multiple lines
String1 = '''Geeks
For
Life'''
print("\nCreating a multiline String: ")
print(String1)

Accessing elements of String

In Python, individual characters of a String can be accessed by using the method of


Indexing. Negative Indexing allows negative address references to access characters from the
back of the String, e.g. -1 refers to the last character, -2 refers to the second last character

Example:

# Python Program to Access


# characters of String

String1 = "GeeksForGeeks"
print("Initial String: ")
print(String1)

# Printing First character


print("\nFirst character of String is: ")
print(String1[0])

# Printing Last character


print("\nLast character of String is: ")
print(String1[-1])

Practice Question

1. Develop a python code to access characters of string.


2. Develop a python code to create a string.
3. Develop a python code to demonstrate numeric values.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Ex No:2 Control structures, functions, scoping rules, dates and times

Type() Function

To define the values of various data types and check their data types we use the type() function.

if , if..else, Nested if, if-elif statements

The following are the conditional statements provided by Python.


 if
 if..else
 Nested if
 if-elif statements.

Example 1
# if statement example
if 10 > 5:
print("10 greater than 5")

print("Program ended")

if..else Statement
In conditional if Statement the additional block of code is merged as else statement which
is performed when if condition is false.

Example 2
# if..else statement example
x=3
if x == 4:
print("Yes")
else:
print("No")

Example 3
You can also chain if..else statement with more than one condition.
# if..else chain statement
letter = "A"
if letter == "B":
print("letter is B")
else:
if letter == "C":
print("letter is C")
else:

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

if letter == "A":
print("letter is A")
else:
print("letter isn't A, B and C")

Nested if Statement
if statement can also be checked inside other if statement. This conditional statement is called a
nested if statement. This means that inner if condition will be checked only if outer if condition
is true and by this, we can see multiple conditions to be satisfied.

Example 4
# Nested if statement example
num = 10

if num > 5:
print("Bigger than 5")

if num <= 15:


print("Between 5 and 15")

if-elif Statement
The if-elif statement is shortcut of if..else chain. While using if-elif statement at the end else
block is added which is performed if none of the above if-elif statement is true.

Example 5

# if-elif statement example

letter = "A"

if letter == "B":
print("letter is B")

elif letter == "C":


print("letter is C")

elif letter == "A":


print("letter is A")

else:
print("letter isn't A, B or C")

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Creating a function in Python


We can create a Python function using the def keyword.

Example 6

# A simple Python function

def fun():
print("Welcome to GFG")

Calling a Python Function


After creating a function we can call it by using the name of the function followed by parenthesis
containing parameters of that particular function.

Example 7

# A simple Python function


def fun():
print("Welcome to GFG")

# Driver code to call a function


fun()

Defining and calling a function with parameters

def function_name(parameter: data_type) -> return_type:

"""Docstring"""

# body of the function

return expression

Example 8

def add(num1: int, num2: int) -> int:


"""Add two numbers"""
num3 = num1 + num2

return num3

# Driver code
num1, num2 = 5, 15
ans = add(num1, num2)
print(f"The addition of {num1} and {num2} results {ans}.")

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Python Function Arguments


Arguments are the values passed inside the parenthesis of the function. A function can have any
number of arguments separated by a comma.

Example 9

# A simple Python function to check


# whether x is even or odd
def evenOdd(x):
if (x % 2 == 0):
print("even")
else:
print("odd")

# Driver code to call the function


evenOdd(2)
evenOdd(3)

Practice Questions

1. Develop python code for scoping rules, dates and times.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Ex No:3 Data Structures (vectors, arrays, matrices, data frames and lists)

NumPy is an array processing package in Python and provides a high-performance


multidimensional array object and tools for working with these arrays. It is the fundamental
package for scientific computing with Python.

Arrays in NumPy

NumPy Array is a table of elements (usually numbers), all of the same type, indexed by a tuple
of positive integers. In Numpy, the number of dimensions of the array is called the rank of the
array. A tuple of integers giving the size of the array along each dimension is known as the shape
of the array.

Creating NumPy Array

NumPy arrays can be created in multiple ways, with various ranks. It can also be created with the
use of different data types like lists, tuples, etc. The type of the resultant array is deduced from
the type of the elements in the sequences. NumPy offers several functions to create arrays with
initial placeholder content. These minimize the necessity of growing arrays, an expensive
operation.

Example 1

import numpy as np

b = np.empty(2, dtype = int)


print("Matrix b : \n", b)

a = np.empty([2, 2], dtype = int)


print("\nMatrix a : \n", a)

c = np.empty([3, 3])
print("\nMatrix c : \n", c)

Example 2
import numpy as np
b = np.zeros(2, dtype = int)
print("Matrix b : \n", b)
a = np.zeros([2, 2], dtype = int)
print("\nMatrix a : \n", a)

c = np.zeros([3, 3])
print("\nMatrix c : \n", c)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Operations on Numpy Arrays

Arithmetic Operations

 Addition:

import numpy as np

# Defining both the matrices


a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# Performing addition using arithmetic operator


add_ans = a+b
print(add_ans)

# Performing addition using numpy function


add_ans = np.add(a, b)
print(add_ans)

# The same functions and operations can be used for


# multiple matrices
c = np.array([1, 2, 3, 4])
add_ans = a+b+c
print(add_ans)

add_ans = np.add(a, b, c)
print(add_ans)

 Subtraction:
import numpy as np

# Defining both the matrices


a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# Performing subtraction using arithmetic operator


sub_ans = a-b
print(sub_ans)

# Performing subtraction using numpy function


sub_ans = np.subtract(a, b)
print(sub_ans)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Multiplication

import numpy as np

# Defining both the matrices


a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# Performing multiplication using arithmetic


# operator
mul_ans = a*b
print(mul_ans)

# Performing multiplication using numpy function


mul_ans = np.multiply(a, b)
print(mul_ans)

NumPy Array Indexing

Indexing can be done in NumPy by using an array as an index. In the case of the slice, a view or
shallow copy of the array is returned but in the index array, a copy of the original array is
returned. Numpy arrays can be indexed with other arrays or any other sequence with the
exception of tuples. The last element is indexed by -1 second last by -2 and so on

Example:
# Python program to demonstrate
# the use of index arrays.
import numpy as np

# Create a sequence of integers from


# 10 to 1 with a step of -2
a = np.arange(10, 1, -2)
print("\n A sequential array with a negative step: \n",a)

# Indexes are specified inside the np.array method.


newarr = a[np.array([3, 1, 2 ])]
print("\n Elements at these indices are:\n",newarr)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Example

import numpy as np
macros = np.array([
[0.8, 2.9, 3.9],
[52.4, 23.6, 36.5],
[55.2, 31.7, 23.9],
[14.4, 11, 4.9]
])

# Create a new array filled with zeros,


# of the same shape as macros.
result = np.zeros_like(macros)

cal_per_macro = np.array([3, 3, 8])

# Now multiply each row of macros by


# cal_per_macro. In Numpy, `*` is
# element-wise multiplication between two arrays.
for i in range(macros.shape[0]):
result[i, :] = macros[i, :] * cal_per_macro

result.

Practice Questions

1. Develop a python code for creating data frames and lists

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Working with CSV files, XML files, Web Data, JSON files,
Ex No:4
Databases, Excel files

Creating Dataframe from CSV

We can create a dataframe from the CSV files using the read_csv() function.

Example

import pandas as pd

# Reading the CSV file


df = pd.read_csv("Iris.csv")

# Printing top 5 rows


df.head()

Filtering DataFrame

Pandas dataframe.filter() function is used to Subset rows or columns of dataframe according to


labels in the specified index. Note that this routine does not filter a dataframe on its contents. The
filter is applied to the labels of the index.

Example

import pandas as pd

# Reading the CSV file


df = pd.read_csv("Iris.csv")

# applying filter function


df.filter(["Species", "SepalLengthCm", "SepalLengthCm"]).head()

Sorting DataFrame
In order to sort the data frame in pandas, the function sort_values() is used. Pandas sort_values()
can sort the data frame in Ascending or Descending order.

Example

import pandas as pd

# Reading the CSV file


df = pd.read_csv("Iris.csv")
# applying filter function
df.sort_values(by=['SepalLengthCm'])

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Pandas GroupBy
Groupby is a pretty simple concept. We can create a grouping of categories and apply a function
to the categories. In real data science projects, you’ll be dealing with large amounts of data and
trying things over and over, so for efficiency, we use the Groupby concept. Groupby mainly
refers to a process involving one or more of the following steps they are:
 Splitting: It is a process in which we split data into group by applying some conditions
on datasets.
 Applying: It is a process in which we apply a function to each group independently.
 Combining: It is a process in which we combine different datasets after applying
groupby and results into a data structure.

Example
# importing pandas module
import pandas as pd

# Define a dictionary containing employee data


data1 = {'Name': ['Jai', 'Anuj', 'Jai', 'Princi',
'Gaurav', 'Anuj', 'Princi', 'Abhi'],
'Age': [27, 24, 22, 32,
33, 36, 27, 32],
'Address': ['Nagpur', 'Kanpur', 'Allahabad',
'Kannuaj',
'Jaunpur', 'Kanpur',
'Allahabad', 'Aligarh'],
'Qualification': ['Msc', 'MA', 'MCA', 'Phd',
'B.Tech', 'B.com',
'Msc', 'MA']}

# Convert the dictionary into DataFrame


df = pd.DataFrame(data1)

print("Original Dataframe")
display(df)

# applying groupby() function to


# group the data on Name value.
gk = df.groupby('Name')

# Let's print the first entries


# in all the groups formed.
print("After Creating Groups")
gk.first()

Practice Question
1. Develop a Python code to create and analyze the data by importing XML files, Web
Data, JSON files, Databases, Excel files

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Univariate analysis: Frequency, Mean, Median, Mode, Variance,


Ex No:5
Standard Deviation, Skewness and Kurtosis.
Univariate analysis is the most basic form of the data analysis technique. When we want to
understand the data contained by only one variable and don’t want to deal with the causes or
effect relationships then a Univariate analysis technique is used.

Example

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import math

Here, we will be using the Credit Card Approvals available on Kaggle.

card_approval_df=pd.read_csv(<PATH TO CSV FILE>)


print(card_approval_df.head())

Now lets get a summary of data using info method of the dataframe.

print(card_approval_df.info())

Now let’s mention which columns hold categorical data and which columns hold continuous data

Columns holding categorical data : Gender, Married, BankCustomer, Industry, Ethinicity,


PriorDefault, Employed, DrivingLicense, Citizen, Approved

Columns holding continuous data: Age, debt, YearsEmployed, CreditScore, Income

Note: I have dropped the ZipCode column because that column won’t help in analysis.

Univariate Analysis of continuous Variables

The describe function to get the descriptive statistics of continuous variables.


card_approval_data[[‘Age’,’Debt’,’YearsEmployed’,’CreditScore’,’Income’]].describe()

Practice Question

1. Implement Skewness and Kurtosis of univariate analysis by python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Ex No:6 Bivariate analysis: Linear and logistic regression modeling

Bivariate descriptive statistics involves simultaneously analyzing (comparing) two variables to


determine if there is a relationship between the variables. Generally by convention, the
independent variable is represented by the columns and the dependent variable is represented by
the rows.

Linear regression is the most used statistical modeling technique in Machine Learning today. It
forms a vital part of Machine Learning, which involves understanding linear relationships and
behavior between two variables, one being the dependent variable while the other one being the
independent variable.

Linear regression is a type of supervised learning algorithm, commonly used for predictive
analysis. As the name suggests, linear regression performs regression tasks.

linear regression is a predictive modeling technique. It is used whenever there is a linear relation
between the dependent and the independent variables.

Y = b0 + b 1 * x

Example

Step 1: Load the Boston dataset

Step 2: Have a glance at the shape

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Step 3: Have a glance at the dependent and independent variables

Step 4: Visualize the change in the variables

Step 5: Divide the data into independent and dependent variables

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Step 6: Split the data into train and test sets

Step 7: Shape of the train and test sets

Step 8: Train the algorithm

Step 9: Retrieve the intercept

Step 10: Retrieve the slope

Step 11: Predicted value

Step 12: Actual value

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Step 13: Evaluate the algorithm

Practice Question

1. Perform logistic regression analysis in bivariate model using python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Ex No:7 Multiple Regression analysis

Multiple regression is a statistical technique that can be used to analyze the relationship between
a single dependent variable and several independent variables. The objective of multiple
regression analysis is to use the independent variables whose values are known to predict the
value of the single dependent value.

There are several types of multiple regression analyses (e.g. standard, hierarchical, setwise,
stepwise) only two of which will be presented here (standard and stepwise).

Example:

consider ‘medv’ as the dependent variable and the rest of the attributes as independent variable.

Step 1: Load the Boston dataset

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Step 2: Set up the dependent and the independent variables

Step 3: Have a glance at the independent variable

Step 4: Have a glance at the dependent variable

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Step 5: Divide the data into train and test sets:

Step 6: Have a glance at the shape of the train and test sets:

Step 7: Train the algorithm:

Step 8: Having a look at the coefficients that the model has chosen:

Step 9: Concatenating the DataFrames to compare:

Step 10: Comparing the predicted value to the actual value:

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Step 11: Evaluate the algorithm

Practice Questions

1. Implement python code for Multiple Regression Analysis using various real time
Datasets

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Ex No:8 Data Visualization: Apply and explore various plotting functions

Data Visualisation is a graphical representation of information and data. By using different visual
elements such as charts, graphs, and maps data visualization tools provide us with an accessible
way to find and understand hidden trends and patterns in data.

Univariate Analysis

Univariate Analysis is a type of data visualization where we visualize only a single variable at a
time. Univariate Analysis helps us to analyze the distribution of the variable present in the data
so that we can perform further analysis.

Example:

import pandas as pd

import seaborn as sns

data = pd.read_csv('Employee_dataset.csv')

print(data.head())

Histogram

Here we’ll be performing univariate analysis on Numerical variables using


the histogram function.

Example:

sns.histplot(data['age'])

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Bar Chart

Univariate analysis of categorical data. We’ll be using the count plot function from
the seaborn library.

Example

sns.countplot(data['gender_full'])

Pie Chart

A piechart helps us to visualize the percentage of the data belonging to each category.

Example:

x = data['STATUS_YEAR'].value_counts()
plt.pie(x.values,
labels=x.index,
autopct='%1.1f%%')
plt.show()

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Bivariate analysis

Bivariate analysis is the simultaneous analysis of two variables. It explores the concept of the
relationship between two variable whether there exists an association and the strength of this
association or whether there are differences between two variables and the significance of these
differences.

The main three types we will see here are:

1. Categorical v/s Numerical

2. Numerical V/s Numerical

3. Categorical V/s Categorical data

Example 1

import matplotlib.pyplot as plt

plt.figure(figsize=(15, 5))

sns.barplot(x=data['department_name'], y=data['length_of_service'])

plt.xticks(rotation='90')

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Example 2

sns.scatterplot(x=data['length_of_service'], y=data['age'])

Example 3

sns.countplot(data['STATUS_YEAR'], hue=data['STATUS'])

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Multivariate Analysis

It is an extension of bivariate analysis which means it involves multiple variables at the same
time to find correlation between them. Multivariate Analysis is a set of statistical model that
examine patterns in multidimensional data by considering at once, several data variable.

PCA

Example:

from sklearn import datasets, decomposition

iris = datasets.load_iris()

X = iris.data

y = iris.target

pca = decomposition.PCA(n_components=2)

X = pca.fit_transform(X)

sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y)

HeatMap

Here we are using a heat map to check the correlation between all the columns in the dataset. It is
a data visualisation technique that shows the magnitude of the phenomenon as colour in two
dimensions. The values of correlation can vary from -1 to 1 where -1 means strong negative and
+1 means strong positive correlation.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Example

sns.heatmap(data.corr(), annot=True)

Practice Questions

1. Perform bivariate analysis is to understand the relationship between two variables using
Scatterplots, Correlation Coefficients, Simple Linear Regression

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Ex No:9 Visualizing Geographic Data with Basemap

Basemap works alongside Matplotlib to allow you to plot via latitude and longitude
coordinates.

Once you have basemap installed, you can use the following code to quickly show a simple
map. This will just render and display a map, but soon we'll be plotting, zooming, and more
fun things!

Example 1

from mpl_toolkits.basemap import Basemap

import matplotlib.pyplot as plt

m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\

llcrnrlon=-180,urcrnrlon=180,resolution='c')

m.drawcoastlines()

m.fillcontinents()

m.drawmapboundary()

plt.title("Quick Basemap Example!")

plt.show()

Practice Questions

1. Add some more resolutions in basemap and construct the Visualizing Geographic Data
with Basemap using python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

Ex No:10 Clustering in Data Analytics

Clustering is the process of separating different parts of data based on common characteristics.
Disparate industries including retail, finance and healthcare use clustering techniques for various
analytical tasks. In retail, clustering can help identify distinct consumer populations, which can
then allow a company to create targeted advertising based on consumer demographics that may
be too complicated to inspect manually. In finance, clustering can detect different forms
of illegal market activity like orderbook spoofing in which traders deceitfully place large orders
to pressure other traders into buying or selling an asset. In healthcare, clustering methods have
been used to figure out patient cost patterns, early onset neurological disorders and cancer gene
expression.

DATA CLUSTERING TECHNIQUES IN PYTHON

 K-means clustering

 Gaussian mixture models

 Spectral clustering

Example

Step 1: Read Data

import pandas as pd
df = pd.read_csv("Mall_Customers.csv")
print(df.head())

K-Means Clustering in Python

K-means clustering in Python is a type of unsupervised machine learning, which means that the
algorithm only trains on inputs and no outputs. It works by finding the distinct groups of data
(i.e., clusters) that are closest together. Specifically, it partitions the data into clusters in which
each point falls into a cluster whose mean is closest to that data point.

Example

from sklearn.clusters import KMeans


X = df[['Age', 'Spending Score (1-100)']].copy()
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=0)
kmeans.fit(X)
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=0)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

kmeans.fit(X)
wcss.append(kmeans.intertia_)

Finally, we can plot the WCSS versus the number of clusters. First, let’s import Matplotlib and
Seaborn, which will allow us to create and format data visualizations:

import matplotlib.pyplot as plt


import seaborn as sns
sns.set()
plt.plot(range(1, 11), wcss)
plt.title('Selecting the Numbeer of Clusters using the Elbow Method')

plt.xlabel('Clusters')
plt.ylabel('WCSS')
plt.show()

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


DATA ANALYTICS LAB MANUAL

We can see that K-means found four clusters, which break down thusly:

1. Young customers with a moderate spending score.

2. Young customers with a high spending score.

3. Middle-aged customers with a low spending score.

4. Senior customers with a moderate spending score.

This type of information can be very useful to retail companies looking to target specific
consumer demographics. For example, if most people with high spending scores are younger,
the company can target those populations with advertisements and promotions.

Practice Questions

1. Perform Gaussian mixture models and Spectral clustering using python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR


11. Experiment on tableau

a. The first step is to connect to the data you want to explore. This example
shows how to connect to Sample - Superstore data in Tableau Desktop.\
b. Open Tableau. On the start page, under Connect, click Microsoft Excel.
In the Open dialog box, navigate to the Sample - Superstore CSV file on your
computer. Select Sample - Superstore, and then click Open.
c. After you connect to the CSV file, the Data Source page shows the sheets or
tables in your data. Drag the "Orders" table to the canvas to start
exploring that data.

d. Click the sheet tab to go to the new worksheet and begin your analysis.

You might also like