Da Lab Record
Da Lab Record
B.TECH
IV YEAR/I SEM
Reg NO:___________________
NAME:____________________
UNIVERSAL LEARNING
CERTIFICATE
IInnddeexx
E
Exx..N
Noo.. D
Daattee N
Naam
mee oofftthhee EExxppeerriim
meen
ntt P
Pgg..N
Noo..
Working with CSV files, XML files, Web Data, JSON files,
44
Databases, Excel files
Ex No:1 Overview of data types and objects, reading and writing data
Data types are the classification or categorization of data items. It represents the kind of value
that tells what operations can be performed on a particular data. Since everything is an object in
Python programming, data types are actually classes and variables are instances (object) of these
classes. The following are the standard or built
built-in data types in Python:
Numeric
Sequence Type
Boolean
Set
Dictionary
Binary Types( memoryview, bytearray, bytes)
Example:
The numeric data type in Python represents the data that has a numeric value. A numeric value
can be an integer, a floating number, or even a complex number. These values are defined
as Python int, Python float, and Python complex classes in Python.
Integers – This value is represented by int class. It contains positive or negative whole
numbers (without fractions or decimals). In Python, there is no limit to how long an
integer value can be.
Float – This value is represented by the float class. It is a real number with a floating-
point representation. It is specified by a decimal point. Optionally, the character e or E
followed by a positive or negative integer may be appended to specify scientific notation.
Example:
# Python program to
# demonstrate numeric value
a=5
print("Type of a: ", type(a))
b = 5.0
print("\nType of b: ", type(b))
c = 2 + 4j
print("\nType of c: ", type(c))
The sequence Data Type in Python is the ordered collection of similar or different data types.
Sequences allow storing of multiple values in an organized and efficient fashion. There are
several sequence types in Python –
Python String
Python List
Python Tuple
Example:
# Creating a String
# with single Quotes
String1 = 'Welcome to the Geeks World'
print("String with the use of Single Quotes: ")
print(String1)
# Creating a String
# with double Quotes
String1 = "I'm a Geek"
print("\nString with the use of Double Quotes: ")
print(String1)
print(type(String1))
# Creating a String
# with triple Quotes
String1 = '''I'm a Geek and I live in a world of "Geeks"'''
Example:
String1 = "GeeksForGeeks"
print("Initial String: ")
print(String1)
Practice Question
Type() Function
To define the values of various data types and check their data types we use the type() function.
Example 1
# if statement example
if 10 > 5:
print("10 greater than 5")
print("Program ended")
if..else Statement
In conditional if Statement the additional block of code is merged as else statement which
is performed when if condition is false.
Example 2
# if..else statement example
x=3
if x == 4:
print("Yes")
else:
print("No")
Example 3
You can also chain if..else statement with more than one condition.
# if..else chain statement
letter = "A"
if letter == "B":
print("letter is B")
else:
if letter == "C":
print("letter is C")
else:
if letter == "A":
print("letter is A")
else:
print("letter isn't A, B and C")
Nested if Statement
if statement can also be checked inside other if statement. This conditional statement is called a
nested if statement. This means that inner if condition will be checked only if outer if condition
is true and by this, we can see multiple conditions to be satisfied.
Example 4
# Nested if statement example
num = 10
if num > 5:
print("Bigger than 5")
if-elif Statement
The if-elif statement is shortcut of if..else chain. While using if-elif statement at the end else
block is added which is performed if none of the above if-elif statement is true.
Example 5
letter = "A"
if letter == "B":
print("letter is B")
else:
print("letter isn't A, B or C")
Example 6
def fun():
print("Welcome to GFG")
Example 7
"""Docstring"""
return expression
Example 8
return num3
# Driver code
num1, num2 = 5, 15
ans = add(num1, num2)
print(f"The addition of {num1} and {num2} results {ans}.")
Example 9
Practice Questions
Ex No:3 Data Structures (vectors, arrays, matrices, data frames and lists)
Arrays in NumPy
NumPy Array is a table of elements (usually numbers), all of the same type, indexed by a tuple
of positive integers. In Numpy, the number of dimensions of the array is called the rank of the
array. A tuple of integers giving the size of the array along each dimension is known as the shape
of the array.
NumPy arrays can be created in multiple ways, with various ranks. It can also be created with the
use of different data types like lists, tuples, etc. The type of the resultant array is deduced from
the type of the elements in the sequences. NumPy offers several functions to create arrays with
initial placeholder content. These minimize the necessity of growing arrays, an expensive
operation.
Example 1
import numpy as np
c = np.empty([3, 3])
print("\nMatrix c : \n", c)
Example 2
import numpy as np
b = np.zeros(2, dtype = int)
print("Matrix b : \n", b)
a = np.zeros([2, 2], dtype = int)
print("\nMatrix a : \n", a)
c = np.zeros([3, 3])
print("\nMatrix c : \n", c)
Arithmetic Operations
Addition:
import numpy as np
add_ans = np.add(a, b, c)
print(add_ans)
Subtraction:
import numpy as np
Multiplication
import numpy as np
Indexing can be done in NumPy by using an array as an index. In the case of the slice, a view or
shallow copy of the array is returned but in the index array, a copy of the original array is
returned. Numpy arrays can be indexed with other arrays or any other sequence with the
exception of tuples. The last element is indexed by -1 second last by -2 and so on
Example:
# Python program to demonstrate
# the use of index arrays.
import numpy as np
Example
import numpy as np
macros = np.array([
[0.8, 2.9, 3.9],
[52.4, 23.6, 36.5],
[55.2, 31.7, 23.9],
[14.4, 11, 4.9]
])
result.
Practice Questions
Working with CSV files, XML files, Web Data, JSON files,
Ex No:4
Databases, Excel files
We can create a dataframe from the CSV files using the read_csv() function.
Example
import pandas as pd
Filtering DataFrame
Example
import pandas as pd
Sorting DataFrame
In order to sort the data frame in pandas, the function sort_values() is used. Pandas sort_values()
can sort the data frame in Ascending or Descending order.
Example
import pandas as pd
Pandas GroupBy
Groupby is a pretty simple concept. We can create a grouping of categories and apply a function
to the categories. In real data science projects, you’ll be dealing with large amounts of data and
trying things over and over, so for efficiency, we use the Groupby concept. Groupby mainly
refers to a process involving one or more of the following steps they are:
Splitting: It is a process in which we split data into group by applying some conditions
on datasets.
Applying: It is a process in which we apply a function to each group independently.
Combining: It is a process in which we combine different datasets after applying
groupby and results into a data structure.
Example
# importing pandas module
import pandas as pd
print("Original Dataframe")
display(df)
Practice Question
1. Develop a Python code to create and analyze the data by importing XML files, Web
Data, JSON files, Databases, Excel files
Example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import math
Now lets get a summary of data using info method of the dataframe.
print(card_approval_df.info())
Now let’s mention which columns hold categorical data and which columns hold continuous data
Note: I have dropped the ZipCode column because that column won’t help in analysis.
Practice Question
Linear regression is the most used statistical modeling technique in Machine Learning today. It
forms a vital part of Machine Learning, which involves understanding linear relationships and
behavior between two variables, one being the dependent variable while the other one being the
independent variable.
Linear regression is a type of supervised learning algorithm, commonly used for predictive
analysis. As the name suggests, linear regression performs regression tasks.
linear regression is a predictive modeling technique. It is used whenever there is a linear relation
between the dependent and the independent variables.
Y = b0 + b 1 * x
Example
Practice Question
Multiple regression is a statistical technique that can be used to analyze the relationship between
a single dependent variable and several independent variables. The objective of multiple
regression analysis is to use the independent variables whose values are known to predict the
value of the single dependent value.
There are several types of multiple regression analyses (e.g. standard, hierarchical, setwise,
stepwise) only two of which will be presented here (standard and stepwise).
Example:
consider ‘medv’ as the dependent variable and the rest of the attributes as independent variable.
Step 6: Have a glance at the shape of the train and test sets:
Step 8: Having a look at the coefficients that the model has chosen:
Practice Questions
1. Implement python code for Multiple Regression Analysis using various real time
Datasets
Data Visualisation is a graphical representation of information and data. By using different visual
elements such as charts, graphs, and maps data visualization tools provide us with an accessible
way to find and understand hidden trends and patterns in data.
Univariate Analysis
Univariate Analysis is a type of data visualization where we visualize only a single variable at a
time. Univariate Analysis helps us to analyze the distribution of the variable present in the data
so that we can perform further analysis.
Example:
import pandas as pd
data = pd.read_csv('Employee_dataset.csv')
print(data.head())
Histogram
Example:
sns.histplot(data['age'])
Bar Chart
Univariate analysis of categorical data. We’ll be using the count plot function from
the seaborn library.
Example
sns.countplot(data['gender_full'])
Pie Chart
A piechart helps us to visualize the percentage of the data belonging to each category.
Example:
x = data['STATUS_YEAR'].value_counts()
plt.pie(x.values,
labels=x.index,
autopct='%1.1f%%')
plt.show()
Bivariate analysis
Bivariate analysis is the simultaneous analysis of two variables. It explores the concept of the
relationship between two variable whether there exists an association and the strength of this
association or whether there are differences between two variables and the significance of these
differences.
Example 1
plt.figure(figsize=(15, 5))
sns.barplot(x=data['department_name'], y=data['length_of_service'])
plt.xticks(rotation='90')
Example 2
sns.scatterplot(x=data['length_of_service'], y=data['age'])
Example 3
sns.countplot(data['STATUS_YEAR'], hue=data['STATUS'])
Multivariate Analysis
It is an extension of bivariate analysis which means it involves multiple variables at the same
time to find correlation between them. Multivariate Analysis is a set of statistical model that
examine patterns in multidimensional data by considering at once, several data variable.
PCA
Example:
iris = datasets.load_iris()
X = iris.data
y = iris.target
pca = decomposition.PCA(n_components=2)
X = pca.fit_transform(X)
HeatMap
Here we are using a heat map to check the correlation between all the columns in the dataset. It is
a data visualisation technique that shows the magnitude of the phenomenon as colour in two
dimensions. The values of correlation can vary from -1 to 1 where -1 means strong negative and
+1 means strong positive correlation.
Example
sns.heatmap(data.corr(), annot=True)
Practice Questions
1. Perform bivariate analysis is to understand the relationship between two variables using
Scatterplots, Correlation Coefficients, Simple Linear Regression
Basemap works alongside Matplotlib to allow you to plot via latitude and longitude
coordinates.
Once you have basemap installed, you can use the following code to quickly show a simple
map. This will just render and display a map, but soon we'll be plotting, zooming, and more
fun things!
Example 1
m = Basemap(projection='mill',llcrnrlat=-90,urcrnrlat=90,\
llcrnrlon=-180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
m.fillcontinents()
m.drawmapboundary()
plt.show()
Practice Questions
1. Add some more resolutions in basemap and construct the Visualizing Geographic Data
with Basemap using python code.
Clustering is the process of separating different parts of data based on common characteristics.
Disparate industries including retail, finance and healthcare use clustering techniques for various
analytical tasks. In retail, clustering can help identify distinct consumer populations, which can
then allow a company to create targeted advertising based on consumer demographics that may
be too complicated to inspect manually. In finance, clustering can detect different forms
of illegal market activity like orderbook spoofing in which traders deceitfully place large orders
to pressure other traders into buying or selling an asset. In healthcare, clustering methods have
been used to figure out patient cost patterns, early onset neurological disorders and cancer gene
expression.
K-means clustering
Spectral clustering
Example
import pandas as pd
df = pd.read_csv("Mall_Customers.csv")
print(df.head())
K-means clustering in Python is a type of unsupervised machine learning, which means that the
algorithm only trains on inputs and no outputs. It works by finding the distinct groups of data
(i.e., clusters) that are closest together. Specifically, it partitions the data into clusters in which
each point falls into a cluster whose mean is closest to that data point.
Example
kmeans.fit(X)
wcss.append(kmeans.intertia_)
Finally, we can plot the WCSS versus the number of clusters. First, let’s import Matplotlib and
Seaborn, which will allow us to create and format data visualizations:
plt.xlabel('Clusters')
plt.ylabel('WCSS')
plt.show()
We can see that K-means found four clusters, which break down thusly:
This type of information can be very useful to retail companies looking to target specific
consumer demographics. For example, if most people with high spending scores are younger,
the company can target those populations with advertisements and promotions.
Practice Questions
1. Perform Gaussian mixture models and Spectral clustering using python code.
a. The first step is to connect to the data you want to explore. This example
shows how to connect to Sample - Superstore data in Tableau Desktop.\
b. Open Tableau. On the start page, under Connect, click Microsoft Excel.
In the Open dialog box, navigate to the Sample - Superstore CSV file on your
computer. Select Sample - Superstore, and then click Open.
c. After you connect to the CSV file, the Data Source page shows the sheets or
tables in your data. Drag the "Orders" table to the canvas to start
exploring that data.
d. Click the sheet tab to go to the new worksheet and begin your analysis.