0% found this document useful (0 votes)

11 views38 pages

Da Lab Record

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views38 pages

Da Lab Record

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

B.V.

Raju Institute of Technology

(UGC-Autonomous)
Vishnupur, Narsapur, Medak Dist – 502313

Department of Computer Science Engineering

DATA ANALYTICS LAB RECORD

B.TECH
IV YEAR/I SEM
Reg NO:___________________
NAME:____________________

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK

DATA ANALYTICS

B V RAJU INSTITUTE OF TECHNOLOGY

( U G C - A ut o n o m o us )
V i s hn up ur , N ar s a pur , M e d a k D i st – 50 2 3 1 3

UNIVERSAL LEARNING

CERTIFICATE

This is to certify that this is the bonafide work of

___________ Reg. No. _

of B.Tech _ year semester, Branch _________ in

the _____________________________________ laboratory during

the academic year ____________ _.

Staff In-Charge Head of Department

Submitted for the University examination

held on .

Internal Examiner External Examiner

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK.

DATA ANALYTICS

IInnddeexx

E
Exx..N
Noo.. D
Daattee N
Naam
mee oofftthhee EExxppeerriim
meen
ntt P
Pgg..N
Noo..

11 Overview of data types and objects, reading and writing data

22 Control structures, functions, scoping rules, dates and times etc.

33 Data Structures (vectors, arrays, matrices, data frames and lists)

Working with CSV files, XML files, Web Data, JSON files,
44
Databases, Excel files

Univariate analysis: Frequency, Mean, Median, Mode, Variance,

55
Standard Deviation, Skewness and Kurtosis.

66 Bivariate analysis: Linear and logistic regression modeling

77 Multiple Regression analysis

88 Data Visualization: Apply and explore various plotting functions

99 Visualizing Geographic Data with Basemap

1100 Clustering in Data Analytics

Experiment on Tableau (Sample – superstore.csv)

1111

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK.

DATA ANALYTICS LAB MANUAL

Ex No:1 Overview of data types and objects, reading and writing data

Data types are the classification or categorization of data items. It represents the kind of value
that tells what operations can be performed on a particular data. Since everything is an object in
Python programming, data types are actually classes and variables are instances (object) of these
classes. The following are the standard or built
built-in data types in Python:

 Numeric
 Sequence Type
 Boolean
 Set
 Dictionary
 Binary Types( memoryview, bytearray, bytes)

Example:

# DataType Output: str

x = "Hello World"

# DataType Output: int

x = 50

# DataType Output: float

x = 60.5

# DataType Output: complex

x = 3j

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAP

NARSAPUR
UR
DATA ANALYTICS LAB MANUAL

# DataType Output: list

x = ["geeks", "for", "geeks"]

# DataType Output: tuple

x = ("geeks", "for", "geeks")

# DataType Output: range

x = range(10)

# DataType Output: dict

x = {"name": "Suraj", "age": 24}

# DataType Output: set

x = {"geeks", "for", "geeks"}

# DataType Output: frozenset

x = frozenset({"geeks", "for", "geeks"})

# DataType Output: bool

x = True

# DataType Output: bytes

x = b"Geeks"

# DataType Output: bytearray

x = bytearray(4)

# DataType Output: memoryview

x = memoryview(bytes(6))

# DataType Output: NoneType

x = None

Numeric Data type:

The numeric data type in Python represents the data that has a numeric value. A numeric value
can be an integer, a floating number, or even a complex number. These values are defined
as Python int, Python float, and Python complex classes in Python.
 Integers – This value is represented by int class. It contains positive or negative whole
numbers (without fractions or decimals). In Python, there is no limit to how long an
integer value can be.
 Float – This value is represented by the float class. It is a real number with a floating-
point representation. It is specified by a decimal point. Optionally, the character e or E
followed by a positive or negative integer may be appended to specify scientific notation.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

 Complex Numbers – Complex number is represented by a complex class. It is specified

as (real part) + (imaginary part)j. For example – 2+3j

Example:
# Python program to
# demonstrate numeric value

a=5
print("Type of a: ", type(a))

b = 5.0
print("\nType of b: ", type(b))

c = 2 + 4j
print("\nType of c: ", type(c))

Sequence Data Type in Python

The sequence Data Type in Python is the ordered collection of similar or different data types.
Sequences allow storing of multiple values in an organized and efficient fashion. There are
several sequence types in Python –
 Python String
 Python List
 Python Tuple

Example:

# Python Program for

# Creation of String

# Creating a String
# with single Quotes
String1 = 'Welcome to the Geeks World'
print("String with the use of Single Quotes: ")
print(String1)

# Creating a String
# with double Quotes
String1 = "I'm a Geek"
print("\nString with the use of Double Quotes: ")
print(String1)
print(type(String1))

# Creating a String
# with triple Quotes
String1 = '''I'm a Geek and I live in a world of "Geeks"'''

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

print("\nString with the use of Triple Quotes: ")

print(String1)
print(type(String1))

# Creating String with triple

# Quotes allows multiple lines
String1 = '''Geeks
For
Life'''
print("\nCreating a multiline String: ")
print(String1)

Accessing elements of String

In Python, individual characters of a String can be accessed by using the method of

Indexing. Negative Indexing allows negative address references to access characters from the
back of the String, e.g. -1 refers to the last character, -2 refers to the second last character

Example:

# Python Program to Access

# characters of String

String1 = "GeeksForGeeks"
print("Initial String: ")
print(String1)

# Printing First character

print("\nFirst character of String is: ")
print(String1[0])

# Printing Last character

print("\nLast character of String is: ")
print(String1[-1])

Practice Question

1. Develop a python code to access characters of string.

2. Develop a python code to create a string.
3. Develop a python code to demonstrate numeric values.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Ex No:2 Control structures, functions, scoping rules, dates and times

Type() Function

To define the values of various data types and check their data types we use the type() function.

if , if..else, Nested if, if-elif statements

The following are the conditional statements provided by Python.

 if
 if..else
 Nested if
 if-elif statements.

Example 1
# if statement example
if 10 > 5:
print("10 greater than 5")

print("Program ended")

if..else Statement
In conditional if Statement the additional block of code is merged as else statement which
is performed when if condition is false.

Example 2
# if..else statement example
x=3
if x == 4:
print("Yes")
else:
print("No")

Example 3
You can also chain if..else statement with more than one condition.
# if..else chain statement
letter = "A"
if letter == "B":
print("letter is B")
else:
if letter == "C":
print("letter is C")
else:

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

if letter == "A":
print("letter is A")
else:
print("letter isn't A, B and C")

Nested if Statement
if statement can also be checked inside other if statement. This conditional statement is called a
nested if statement. This means that inner if condition will be checked only if outer if condition
is true and by this, we can see multiple conditions to be satisfied.

Example 4
# Nested if statement example
num = 10

if num > 5:
print("Bigger than 5")

if num <= 15:

print("Between 5 and 15")

if-elif Statement
The if-elif statement is shortcut of if..else chain. While using if-elif statement at the end else
block is added which is performed if none of the above if-elif statement is true.

Example 5

# if-elif statement example

letter = "A"

if letter == "B":
print("letter is B")

elif letter == "C":

print("letter is C")

elif letter == "A":

print("letter is A")

else:
print("letter isn't A, B or C")

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Creating a function in Python

We can create a Python function using the def keyword.

Example 6

# A simple Python function

def fun():
print("Welcome to GFG")

Calling a Python Function

After creating a function we can call it by using the name of the function followed by parenthesis
containing parameters of that particular function.

Example 7

# A simple Python function

def fun():
print("Welcome to GFG")

# Driver code to call a function

fun()

Defining and calling a function with parameters

def function_name(parameter: data_type) -> return_type:

"""Docstring"""

# body of the function

return expression

Example 8

def add(num1: int, num2: int) -> int:

"""Add two numbers"""
num3 = num1 + num2

return num3

# Driver code
num1, num2 = 5, 15
ans = add(num1, num2)
print(f"The addition of {num1} and {num2} results {ans}.")

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Python Function Arguments

Arguments are the values passed inside the parenthesis of the function. A function can have any
number of arguments separated by a comma.

Example 9

# A simple Python function to check

# whether x is even or odd
def evenOdd(x):
if (x % 2 == 0):
print("even")
else:
print("odd")

# Driver code to call the function

evenOdd(2)
evenOdd(3)

Practice Questions

1. Develop python code for scoping rules, dates and times.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Ex No:3 Data Structures (vectors, arrays, matrices, data frames and lists)

NumPy is an array processing package in Python and provides a high-performance

multidimensional array object and tools for working with these arrays. It is the fundamental
package for scientific computing with Python.

Arrays in NumPy

NumPy Array is a table of elements (usually numbers), all of the same type, indexed by a tuple
of positive integers. In Numpy, the number of dimensions of the array is called the rank of the
array. A tuple of integers giving the size of the array along each dimension is known as the shape
of the array.

Creating NumPy Array

NumPy arrays can be created in multiple ways, with various ranks. It can also be created with the
use of different data types like lists, tuples, etc. The type of the resultant array is deduced from
the type of the elements in the sequences. NumPy offers several functions to create arrays with
initial placeholder content. These minimize the necessity of growing arrays, an expensive
operation.

Example 1

import numpy as np

b = np.empty(2, dtype = int)

print("Matrix b : \n", b)

a = np.empty([2, 2], dtype = int)

print("\nMatrix a : \n", a)

c = np.empty([3, 3])
print("\nMatrix c : \n", c)

Example 2
import numpy as np
b = np.zeros(2, dtype = int)
print("Matrix b : \n", b)
a = np.zeros([2, 2], dtype = int)
print("\nMatrix a : \n", a)

c = np.zeros([3, 3])
print("\nMatrix c : \n", c)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Operations on Numpy Arrays

Arithmetic Operations

 Addition:

import numpy as np

# Defining both the matrices

a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# Performing addition using arithmetic operator

add_ans = a+b
print(add_ans)

# Performing addition using numpy function

add_ans = np.add(a, b)
print(add_ans)

# The same functions and operations can be used for

# multiple matrices
c = np.array([1, 2, 3, 4])
add_ans = a+b+c
print(add_ans)

add_ans = np.add(a, b, c)
print(add_ans)

 Subtraction:
import numpy as np

# Defining both the matrices

a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# Performing subtraction using arithmetic operator

sub_ans = a-b
print(sub_ans)

# Performing subtraction using numpy function

sub_ans = np.subtract(a, b)
print(sub_ans)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Multiplication

import numpy as np

# Defining both the matrices

a = np.array([5, 72, 13, 100])
b = np.array([2, 5, 10, 30])

# Performing multiplication using arithmetic

# operator
mul_ans = a*b
print(mul_ans)

# Performing multiplication using numpy function

mul_ans = np.multiply(a, b)
print(mul_ans)

NumPy Array Indexing

Indexing can be done in NumPy by using an array as an index. In the case of the slice, a view or
shallow copy of the array is returned but in the index array, a copy of the original array is
returned. Numpy arrays can be indexed with other arrays or any other sequence with the
exception of tuples. The last element is indexed by -1 second last by -2 and so on

Example:
# Python program to demonstrate
# the use of index arrays.
import numpy as np

# Create a sequence of integers from

# 10 to 1 with a step of -2
a = np.arange(10, 1, -2)
print("\n A sequential array with a negative step: \n",a)

# Indexes are specified inside the np.array method.

newarr = a[np.array([3, 1, 2 ])]
print("\n Elements at these indices are:\n",newarr)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Example

import numpy as np
macros = np.array([
[0.8, 2.9, 3.9],
[52.4, 23.6, 36.5],
[55.2, 31.7, 23.9],
[14.4, 11, 4.9]
])

# Create a new array filled with zeros,

# of the same shape as macros.
result = np.zeros_like(macros)

cal_per_macro = np.array([3, 3, 8])

# Now multiply each row of macros by

# cal_per_macro. In Numpy, `*` is
# element-wise multiplication between two arrays.
for i in range(macros.shape[0]):
result[i, :] = macros[i, :] * cal_per_macro

result.

Practice Questions

1. Develop a python code for creating data frames and lists

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Working with CSV files, XML files, Web Data, JSON files,
Ex No:4
Databases, Excel files

Creating Dataframe from CSV

We can create a dataframe from the CSV files using the read_csv() function.

Example

import pandas as pd

# Reading the CSV file

df = pd.read_csv("Iris.csv")

# Printing top 5 rows

df.head()

Filtering DataFrame

Pandas dataframe.filter() function is used to Subset rows or columns of dataframe according to

labels in the specified index. Note that this routine does not filter a dataframe on its contents. The
filter is applied to the labels of the index.

Example

import pandas as pd

# Reading the CSV file

df = pd.read_csv("Iris.csv")

# applying filter function

df.filter(["Species", "SepalLengthCm", "SepalLengthCm"]).head()

Sorting DataFrame
In order to sort the data frame in pandas, the function sort_values() is used. Pandas sort_values()
can sort the data frame in Ascending or Descending order.

Example

import pandas as pd

# Reading the CSV file

df = pd.read_csv("Iris.csv")
# applying filter function
df.sort_values(by=['SepalLengthCm'])

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Pandas GroupBy
Groupby is a pretty simple concept. We can create a grouping of categories and apply a function
to the categories. In real data science projects, you’ll be dealing with large amounts of data and
trying things over and over, so for efficiency, we use the Groupby concept. Groupby mainly
refers to a process involving one or more of the following steps they are:
 Splitting: It is a process in which we split data into group by applying some conditions
on datasets.
 Applying: It is a process in which we apply a function to each group independently.
 Combining: It is a process in which we combine different datasets after applying
groupby and results into a data structure.

Example
# importing pandas module
import pandas as pd

# Define a dictionary containing employee data

data1 = {'Name': ['Jai', 'Anuj', 'Jai', 'Princi',
'Gaurav', 'Anuj', 'Princi', 'Abhi'],
'Age': [27, 24, 22, 32,
33, 36, 27, 32],
'Address': ['Nagpur', 'Kanpur', 'Allahabad',
'Kannuaj',
'Jaunpur', 'Kanpur',
'Allahabad', 'Aligarh'],
'Qualification': ['Msc', 'MA', 'MCA', 'Phd',
'B.Tech', 'B.com',
'Msc', 'MA']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data1)

print("Original Dataframe")
display(df)

# applying groupby() function to

# group the data on Name value.
gk = df.groupby('Name')

# Let's print the first entries

# in all the groups formed.
print("After Creating Groups")
gk.first()

Practice Question
1. Develop a Python code to create and analyze the data by importing XML files, Web
Data, JSON files, Databases, Excel files

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Univariate analysis: Frequency, Mean, Median, Mode, Variance,

Ex No:5
Standard Deviation, Skewness and Kurtosis.
Univariate analysis is the most basic form of the data analysis technique. When we want to
understand the data contained by only one variable and don’t want to deal with the causes or
effect relationships then a Univariate analysis technique is used.

Example

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import math

Here, we will be using the Credit Card Approvals available on Kaggle.

card_approval_df=pd.read_csv(<PATH TO CSV FILE>)

print(card_approval_df.head())

Now lets get a summary of data using info method of the dataframe.

print(card_approval_df.info())

Now let’s mention which columns hold categorical data and which columns hold continuous data

Columns holding categorical data : Gender, Married, BankCustomer, Industry, Ethinicity,

PriorDefault, Employed, DrivingLicense, Citizen, Approved

Columns holding continuous data: Age, debt, YearsEmployed, CreditScore, Income

Note: I have dropped the ZipCode column because that column won’t help in analysis.

Univariate Analysis of continuous Variables

The describe function to get the descriptive statistics of continuous variables.

card_approval_data[[‘Age’,’Debt’,’YearsEmployed’,’CreditScore’,’Income’]].describe()

Practice Question

1. Implement Skewness and Kurtosis of univariate analysis by python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Ex No:6 Bivariate analysis: Linear and logistic regression modeling

Bivariate descriptive statistics involves simultaneously analyzing (comparing) two variables to

determine if there is a relationship between the variables. Generally by convention, the
independent variable is represented by the columns and the dependent variable is represented by
the rows.

Linear regression is the most used statistical modeling technique in Machine Learning today. It
forms a vital part of Machine Learning, which involves understanding linear relationships and
behavior between two variables, one being the dependent variable while the other one being the
independent variable.

Linear regression is a type of supervised learning algorithm, commonly used for predictive
analysis. As the name suggests, linear regression performs regression tasks.

linear regression is a predictive modeling technique. It is used whenever there is a linear relation
between the dependent and the independent variables.

Y = b0 + b 1 * x

Example

Step 1: Load the Boston dataset

Step 2: Have a glance at the shape

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Step 3: Have a glance at the dependent and independent variables

Step 4: Visualize the change in the variables

Step 5: Divide the data into independent and dependent variables

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Step 6: Split the data into train and test sets

Step 7: Shape of the train and test sets

Step 8: Train the algorithm

Step 9: Retrieve the intercept

Step 10: Retrieve the slope

Step 11: Predicted value

Step 12: Actual value

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Step 13: Evaluate the algorithm

Practice Question

1. Perform logistic regression analysis in bivariate model using python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Ex No:7 Multiple Regression analysis

Multiple regression is a statistical technique that can be used to analyze the relationship between
a single dependent variable and several independent variables. The objective of multiple
regression analysis is to use the independent variables whose values are known to predict the
value of the single dependent value.

There are several types of multiple regression analyses (e.g. standard, hierarchical, setwise,
stepwise) only two of which will be presented here (standard and stepwise).

Example:

consider ‘medv’ as the dependent variable and the rest of the attributes as independent variable.

Step 1: Load the Boston dataset

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Step 2: Set up the dependent and the independent variables

Step 3: Have a glance at the independent variable

Step 4: Have a glance at the dependent variable

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Step 5: Divide the data into train and test sets:

Step 6: Have a glance at the shape of the train and test sets:

Step 7: Train the algorithm:

Step 8: Having a look at the coefficients that the model has chosen:

Step 9: Concatenating the DataFrames to compare:

Step 10: Comparing the predicted value to the actual value:

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Step 11: Evaluate the algorithm

Practice Questions

1. Implement python code for Multiple Regression Analysis using various real time
Datasets

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Ex No:8 Data Visualization: Apply and explore various plotting functions

Data Visualisation is a graphical representation of information and data. By using different visual
elements such as charts, graphs, and maps data visualization tools provide us with an accessible
way to find and understand hidden trends and patterns in data.

Univariate Analysis

Univariate Analysis is a type of data visualization where we visualize only a single variable at a
time. Univariate Analysis helps us to analyze the distribution of the variable present in the data
so that we can perform further analysis.

Example:

import pandas as pd

import seaborn as sns

data = pd.read_csv('Employee_dataset.csv')

print(data.head())

Histogram

Here we’ll be performing univariate analysis on Numerical variables using

the histogram function.

Example:

sns.histplot(data['age'])

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Bar Chart

Univariate analysis of categorical data. We’ll be using the count plot function from
the seaborn library.

Example

sns.countplot(data['gender_full'])

Pie Chart

A piechart helps us to visualize the percentage of the data belonging to each category.

Example:

x = data['STATUS_YEAR'].value_counts()
plt.pie(x.values,
labels=x.index,
autopct='%1.1f%%')
plt.show()

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Bivariate analysis

Bivariate analysis is the simultaneous analysis of two variables. It explores the concept of the
relationship between two variable whether there exists an association and the strength of this
association or whether there are differences between two variables and the significance of these
differences.

The main three types we will see here are:

1. Categorical v/s Numerical

2. Numerical V/s Numerical

3. Categorical V/s Categorical data

Example 1

import matplotlib.pyplot as plt

plt.figure(figsize=(15, 5))

sns.barplot(x=data['department_name'], y=data['length_of_service'])

plt.xticks(rotation='90')

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Example 2

sns.scatterplot(x=data['length_of_service'], y=data['age'])

Example 3

sns.countplot(data['STATUS_YEAR'], hue=data['STATUS'])

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Multivariate Analysis

It is an extension of bivariate analysis which means it involves multiple variables at the same
time to find correlation between them. Multivariate Analysis is a set of statistical model that
examine patterns in multidimensional data by considering at once, several data variable.

PCA

Example:

from sklearn import datasets, decomposition

iris = datasets.load_iris()

X = iris.data

y = iris.target

pca = decomposition.PCA(n_components=2)

X = pca.fit_transform(X)

sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y)

HeatMap

Here we are using a heat map to check the correlation between all the columns in the dataset. It is
a data visualisation technique that shows the magnitude of the phenomenon as colour in two
dimensions. The values of correlation can vary from -1 to 1 where -1 means strong negative and
+1 means strong positive correlation.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Example

sns.heatmap(data.corr(), annot=True)

Practice Questions

1. Perform bivariate analysis is to understand the relationship between two variables using
Scatterplots, Correlation Coefficients, Simple Linear Regression

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Ex No:9 Visualizing Geographic Data with Basemap

Basemap works alongside Matplotlib to allow you to plot via latitude and longitude
coordinates.

Once you have basemap installed, you can use the following code to quickly show a simple
map. This will just render and display a map, but soon we'll be plotting, zooming, and more
fun things!

Example 1

from mpl_toolkits.basemap import Basemap

import matplotlib.pyplot as plt

m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\

llcrnrlon=-180,urcrnrlon=180,resolution='c')

m.drawcoastlines()

m.fillcontinents()

m.drawmapboundary()

plt.title("Quick Basemap Example!")

plt.show()

Practice Questions

1. Add some more resolutions in basemap and construct the Visualizing Geographic Data
with Basemap using python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

Ex No:10 Clustering in Data Analytics

Clustering is the process of separating different parts of data based on common characteristics.
Disparate industries including retail, finance and healthcare use clustering techniques for various
analytical tasks. In retail, clustering can help identify distinct consumer populations, which can
then allow a company to create targeted advertising based on consumer demographics that may
be too complicated to inspect manually. In finance, clustering can detect different forms
of illegal market activity like orderbook spoofing in which traders deceitfully place large orders
to pressure other traders into buying or selling an asset. In healthcare, clustering methods have
been used to figure out patient cost patterns, early onset neurological disorders and cancer gene
expression.

DATA CLUSTERING TECHNIQUES IN PYTHON

 K-means clustering

 Gaussian mixture models

 Spectral clustering

Example

Step 1: Read Data

import pandas as pd
df = pd.read_csv("Mall_Customers.csv")
print(df.head())

K-Means Clustering in Python

K-means clustering in Python is a type of unsupervised machine learning, which means that the
algorithm only trains on inputs and no outputs. It works by finding the distinct groups of data
(i.e., clusters) that are closest together. Specifically, it partitions the data into clusters in which
each point falls into a cluster whose mean is closest to that data point.

Example

from sklearn.clusters import KMeans

X = df[['Age', 'Spending Score (1-100)']].copy()
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=0)
kmeans.fit(X)
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=0)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

kmeans.fit(X)
wcss.append(kmeans.intertia_)

Finally, we can plot the WCSS versus the number of clusters. First, let’s import Matplotlib and
Seaborn, which will allow us to create and format data visualizations:

import matplotlib.pyplot as plt

import seaborn as sns
sns.set()
plt.plot(range(1, 11), wcss)
plt.title('Selecting the Numbeer of Clusters using the Elbow Method')

plt.xlabel('Clusters')
plt.ylabel('WCSS')
plt.show()

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

DATA ANALYTICS LAB MANUAL

We can see that K-means found four clusters, which break down thusly:

1. Young customers with a moderate spending score.

2. Young customers with a high spending score.

3. Middle-aged customers with a low spending score.

4. Senior customers with a moderate spending score.

This type of information can be very useful to retail companies looking to target specific
consumer demographics. For example, if most people with high spending scores are younger,
the company can target those populations with advertisements and promotions.

Practice Questions

1. Perform Gaussian mixture models and Spectral clustering using python code.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

11. Experiment on tableau

a. The first step is to connect to the data you want to explore. This example
shows how to connect to Sample - Superstore data in Tableau Desktop.\
b. Open Tableau. On the start page, under Connect, click Microsoft Excel.
In the Open dialog box, navigate to the Sample - Superstore CSV file on your
computer. Select Sample - Superstore, and then click Open.
c. After you connect to the CSV file, the Data Source page shows the sheets or
tables in your data. Drag the "Orders" table to the canvas to start
exploring that data.

d. Click the sheet tab to go to the new worksheet and begin your analysis.

StotraNidhi Telugu 15-Books Combo
No ratings yet
StotraNidhi Telugu 15-Books Combo
1 page
Kenya Medical Training College Proposal
33% (3)
Kenya Medical Training College Proposal
13 pages
Analysing Descriptive, Prescriptive, Predictive & Diagnostic Framework at Workplace
No ratings yet
Analysing Descriptive, Prescriptive, Predictive & Diagnostic Framework at Workplace
11 pages
Data Type
100% (1)
Data Type
18 pages
Data Types in Python 6 Standard Data Types in P
No ratings yet
Data Types in Python 6 Standard Data Types in P
1 page
Unit II Python
No ratings yet
Unit II Python
15 pages
L4 - Data Handling
No ratings yet
L4 - Data Handling
75 pages
XI Computer Science Gist-01
No ratings yet
XI Computer Science Gist-01
23 pages
Computer Notes
No ratings yet
Computer Notes
25 pages
Python Lab Manual Aietm
No ratings yet
Python Lab Manual Aietm
50 pages
SDFSD 2
No ratings yet
SDFSD 2
23 pages
Python Programming
No ratings yet
Python Programming
54 pages
Year 7 Programming - Python 1
No ratings yet
Year 7 Programming - Python 1
10 pages
Chapter-3 Data Handling
No ratings yet
Chapter-3 Data Handling
33 pages
Python - Data Types
No ratings yet
Python - Data Types
15 pages
Datatype
No ratings yet
Datatype
24 pages
Python M2 2
No ratings yet
Python M2 2
33 pages
Class 11 - Presentation # 2 - Data Types
No ratings yet
Class 11 - Presentation # 2 - Data Types
27 pages
Chapter 7 Data Handling
No ratings yet
Chapter 7 Data Handling
12 pages
Data Types and Operators
No ratings yet
Data Types and Operators
28 pages
Introductiontobasicsofpythonslideshare 230507134929 1e1c6402
No ratings yet
Introductiontobasicsofpythonslideshare 230507134929 1e1c6402
14 pages
Unit - 2 - Data Types, IO, Types of Errors and Control - Structures
No ratings yet
Unit - 2 - Data Types, IO, Types of Errors and Control - Structures
18 pages
Lecture 3 Operators Expression and Data Types
No ratings yet
Lecture 3 Operators Expression and Data Types
44 pages
Python Programming
No ratings yet
Python Programming
13 pages
Class Xi - Full Notes
No ratings yet
Class Xi - Full Notes
51 pages
PYTHON Revision PPT Part 1
No ratings yet
PYTHON Revision PPT Part 1
72 pages
Lec 4. Operators Expression and Data Types
No ratings yet
Lec 4. Operators Expression and Data Types
36 pages
PYTHON Course: The Ultimate
No ratings yet
PYTHON Course: The Ultimate
32 pages
Python Revision Tour 1 Copy - Class12
No ratings yet
Python Revision Tour 1 Copy - Class12
14 pages
Chapter 3 - Data Types
No ratings yet
Chapter 3 - Data Types
2 pages
SBL 2 A16
No ratings yet
SBL 2 A16
7 pages
Python IDP P
No ratings yet
Python IDP P
15 pages
Introduction To Python
No ratings yet
Introduction To Python
11 pages
1 - Introduction To Python
No ratings yet
1 - Introduction To Python
45 pages
Python For Non-Programmers Final
No ratings yet
Python For Non-Programmers Final
218 pages
Python Book Pages
No ratings yet
Python Book Pages
135 pages
Python PDF
No ratings yet
Python PDF
7 pages
Data 1458080698843
No ratings yet
Data 1458080698843
28 pages
Module 2 Lecture 3 Data Types
No ratings yet
Module 2 Lecture 3 Data Types
49 pages
Basics of Python (SEC-I) B.SC (CS) - II Yr - III SEM
No ratings yet
Basics of Python (SEC-I) B.SC (CS) - II Yr - III SEM
128 pages
Unit 3 Built-In Functions, Data Types, and Operators
No ratings yet
Unit 3 Built-In Functions, Data Types, and Operators
20 pages
Ch-8 Data Handling
No ratings yet
Ch-8 Data Handling
49 pages
95% SIMILAR To DH1 But Go Through For Important Info
No ratings yet
95% SIMILAR To DH1 But Go Through For Important Info
107 pages
Python Assignment 1
No ratings yet
Python Assignment 1
12 pages
Python Introduction
No ratings yet
Python Introduction
122 pages
Python 1
No ratings yet
Python 1
72 pages
Unit - I
No ratings yet
Unit - I
19 pages
Model Answer Format PWP CT1
No ratings yet
Model Answer Format PWP CT1
8 pages
1.1 (Co1, Co2)
No ratings yet
1.1 (Co1, Co2)
25 pages
CSE Skill Lab 2 Unit 1
No ratings yet
CSE Skill Lab 2 Unit 1
54 pages
Python Data Type by Maheshm
No ratings yet
Python Data Type by Maheshm
11 pages
SDFSD 1
No ratings yet
SDFSD 1
25 pages
Dsa (Week 1) - Python
No ratings yet
Dsa (Week 1) - Python
57 pages
Unit 3
No ratings yet
Unit 3
63 pages
ب ٥
No ratings yet
ب ٥
27 pages
Data Type
No ratings yet
Data Type
26 pages
Revision Basics - Python - Notes - 1
No ratings yet
Revision Basics - Python - Notes - 1
18 pages
Python - Learn Data Analytics Together's Group
No ratings yet
Python - Learn Data Analytics Together's Group
71 pages
Lab Manual - ML - RIT
No ratings yet
Lab Manual - ML - RIT
54 pages
Datatypes in Python
No ratings yet
Datatypes in Python
6 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
CUBO - Work Schedule
No ratings yet
CUBO - Work Schedule
1 page
Welding
No ratings yet
Welding
3 pages
Human Translation Vs Machine Translation
No ratings yet
Human Translation Vs Machine Translation
26 pages
Creating A New Label: Style Bar Set The Label Setup Options
No ratings yet
Creating A New Label: Style Bar Set The Label Setup Options
3 pages
Artificial Intelligence in Public Policy
No ratings yet
Artificial Intelligence in Public Policy
8 pages
Auction of Dead Stock - Auction Notice of CT
No ratings yet
Auction of Dead Stock - Auction Notice of CT
1 page
Timber Stacker One Page 7
No ratings yet
Timber Stacker One Page 7
1 page
CSF213 OOP Handout 2023 24 Sem I
No ratings yet
CSF213 OOP Handout 2023 24 Sem I
3 pages
Huawei SUN2000 30KTL-A - 33KTL - 40KTL User Manual (Issue04 - 2016!06!20)
No ratings yet
Huawei SUN2000 30KTL-A - 33KTL - 40KTL User Manual (Issue04 - 2016!06!20)
108 pages
Geu Admit Card Back
No ratings yet
Geu Admit Card Back
1 page
Temporary Revision N 07702-TR-02-20181009
No ratings yet
Temporary Revision N 07702-TR-02-20181009
32 pages
Endress-Hauser Proline T-Mass A 150 6AAB EN
No ratings yet
Endress-Hauser Proline T-Mass A 150 6AAB EN
4 pages
Kareem Shagar Formation An Oil Field Located in Ras Gharib Development
No ratings yet
Kareem Shagar Formation An Oil Field Located in Ras Gharib Development
53 pages
G-Low Dvor
No ratings yet
G-Low Dvor
39 pages
GetTempFileName Function (Winbase.h) - Win32 Apps - Microsoft Learn
No ratings yet
GetTempFileName Function (Winbase.h) - Win32 Apps - Microsoft Learn
4 pages
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
No ratings yet
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
4 pages
Phone
0% (1)
Phone
4 pages
Autoduel Quarterly 3 1
No ratings yet
Autoduel Quarterly 3 1
52 pages
Sae Arp741c 2016
No ratings yet
Sae Arp741c 2016
22 pages
Chs 07 08answers PDF
No ratings yet
Chs 07 08answers PDF
18 pages
SAP Afaria System Requirements
No ratings yet
SAP Afaria System Requirements
38 pages
Content - DELMIA - Ergonomics at Work Essentials
No ratings yet
Content - DELMIA - Ergonomics at Work Essentials
28 pages
Nikuni KTMcatalogue A4
No ratings yet
Nikuni KTMcatalogue A4
4 pages
Social Media Influences To Teenagers: June 2020
No ratings yet
Social Media Influences To Teenagers: June 2020
12 pages
Woodmizer LT15 Parts
No ratings yet
Woodmizer LT15 Parts
39 pages
Suraj Data
No ratings yet
Suraj Data
100 pages
2-Alarm Check Valve Viking Manual........
No ratings yet
2-Alarm Check Valve Viking Manual........
23 pages

Da Lab Record

Uploaded by

Da Lab Record

Uploaded by

B.V.

Raju Institute of Technology

Department of Computer Science Engineering

DATA ANALYTICS LAB RECORD

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK

B V RAJU INSTITUTE OF TECHNOLOGY

This is to certify that this is the bonafide work of

_____________________________ Reg. No. ___________________

of B.Tech _____ year ____ semester, Branch _____________ in

the _____________________________________ laboratory during

the academic year ____________ _.

Staff In-Charge Head of Department

Submitted for the University examination

Internal Examiner External Examiner

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK.

11 Overview of data types and objects, reading and writing data

22 Control structures, functions, scoping rules, dates and times etc.

33 Data Structures (vectors, arrays, matrices, data frames and lists)

Univariate analysis: Frequency, Mean, Median, Mode, Variance,

66 Bivariate analysis: Linear and logistic regression modeling

77 Multiple Regression analysis

88 Data Visualization: Apply and explore various plotting functions

99 Visualizing Geographic Data with Basemap

1100 Clustering in Data Analytics

Experiment on Tableau (Sample – superstore.csv)

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR, MEDAK.

# DataType Output: str

# DataType Output: int

# DataType Output: float

# DataType Output: complex

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAP

# DataType Output: list

# DataType Output: tuple

# DataType Output: range

# DataType Output: dict

# DataType Output: set

# DataType Output: frozenset

# DataType Output: bool

# DataType Output: bytes

# DataType Output: bytearray

# DataType Output: memoryview

# DataType Output: NoneType

Numeric Data type:

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

 Complex Numbers – Complex number is represented by a complex class. It is specified

Sequence Data Type in Python

# Python Program for

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

print("\nString with the use of Triple Quotes: ")

# Creating String with triple

Accessing elements of String

In Python, individual characters of a String can be accessed by using the method of

# Python Program to Access

# Printing First character

# Printing Last character

1. Develop a python code to access characters of string.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

Ex No:2 Control structures, functions, scoping rules, dates and times

if , if..else, Nested if, if-elif statements

The following are the conditional statements provided by Python.

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

if num <= 15:

# if-elif statement example

elif letter == "C":

elif letter == "A":

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

Creating a function in Python

# A simple Python function

Calling a Python Function

# A simple Python function

# Driver code to call a function

Defining and calling a function with parameters

def function_name(parameter: data_type) -> return_type:

# body of the function

def add(num1: int, num2: int) -> int:

B V RAJU INSTITUTE OF TECHNOLOGY, NARSAPUR

___________ Reg. No. _

of B.Tech _ year semester, Branch _________ in