Python Record
Python Record
LAB MANUAL
PYTHON FOR DATA SCIENCE
(22ANAL47)
Submitted by
BONAFIDE CERTIFICATE
This is to certify that the bona fide lab record of practical work done in Business
Analytics Lab of P.B. Siddhartha College of Arts and Science, during the
Academic Year 2023-2024.
Signature of HOD
D. Vasu
Department of Business Analytics
DECLARATION
Signature of Student
CONTENTS
Subtraction (-)
The subtraction operator (-) subtracts the second operand from the first.Syntax:
result = operand1 - operand2
Multiplication (*)
The multiplication operator (*) multiplies two operands.Syntax:
result = operand1 * operand2
Division (/)
The division operator (/) divides the first operand by the second. Syntax:
result = operand1 / operand2
Floor Division (//)
The integer division operator (//) divides the first operand by the second and
returns the integer part of the result.
Syntax:
result = operand1 // operand2
Modulus (%)
The modulus operator (%) returns the remainder when the first operand is
divided by the second.
Syntax:
result = operand1 % operand2
Exponentiation (**)
The exponentiation operator () raises the first operand to the power of the
second.
Syntax:
result = operand1 ** operand2
Assignment operators: - Assignment operators are used to assign values to
variables. Python provides various assignment operators to perform assignments
efficiently and concisely. These operators not only assign values but also
perform operations simultaneously, making code more readable and compact.
Assignment (=)
The assignment operator (=) assigns the value on the right side to the variable
on the left side.
Syntax:
variable = value
Increment (+=)
The increment assignment operator (+=) adds the value on the right side to the
variable on the left side and assigns the result to the variable.
Syntax:
variable += value
Decrement (-=)
The decrement assignment operator (-=) subtracts the value on the right side
from the variable on the left side and assigns the result to the variable.
Syntax:
variable -= value
Equal to (==)
The equal to operator (==) compares two operands and returns True if they are
equal, otherwise False.
Syntax:
result = operand1 == operand2
Logical OR (or)
The logical OR operator (or) returns True if at least one of the operands is True,
otherwise it returns False.
Syntax:
result = operand1 or operand2
Syntax:
result = variable1 is variable2
Syntax:
result = variable1 is not variable2
Bitwise OR (|):
Syntax: result = operand1 | operand2
Performs a bitwise OR operation between corresponding bits of two operands.
Sets each bit to 1 if at least one of the bits is 1.
Introduction:
Lists are an essential data structure in Python, serving as dynamic containers to
store collections of elements. Whether you're managing a sequence of numbers,
strings, or even complex objects, lists offer unparalleled flexibility and
efficiency. In Python, lists are incredibly versatile, allowing for easy
modification, iteration, and manipulation.
i. Length of list:
The length of a list refers to the number of elements it contains. In Python, you
can determine the length of a list using the built-in function len().
For example:
my_list = [1, 2, 3, 4, 5]
length = len(my_list)
print("Length of the list:", length)
not in: Negated membership operator, used to check if an element is not present
in a list.
These operators and functions provide powerful tools for working with lists in
Python.
LAB – 3
Programme for Sets & Dictionaries
1. Sets:
Sets in Python are unordered collections of unique elements. They are defined
by enclosing elements within curly braces {} or by using the set() constructor.
Sets are useful for tasks where you need to eliminate duplicates from a
collection or perform operations like intersection, union, and difference
efficiently.
Here are some of the common set operators and methods in Python:
i. Creating Sets:
Sets can be created using curly braces {} or the set() constructor.
set1 = {1, 2, 3, 4, 5}
set2 = set([4, 5, 6, 7, 8])
2. Dictionary:
A dictionary in Python is an unordered collection of key-value pairs. Each key
is unique and associated with a value, similar to a real-world dictionary where
words (keys) have corresponding definitions (values). Dictionaries are mutable,
meaning their contents can be modified after creation.
i. Creation:
You can create a dictionary by enclosing comma-separated key-value pairs
within curly braces {}.
For example:
my_dict = {'apple': 3, 'banana': 5, 'orange': 2}
ii. Accessing values:
You can access the value associated with a key using square brackets [] and the
key itself.
For example:
print(my_dict['apple'])
vi. Length:
You can determine the number of key-value pairs in a dictionary using the len()
function:
print(len(my_dict))
LAB – 4
Function in writing all Arithmetic operators
In Python, functions serve as reusable blocks of code that perform specific
tasks. They encapsulate a set of instructions, allowing you to execute them
multiple times without having to rewrite the code. Functions enhance code
readability, maintainability, and reusability, making them a fundamental
concept in Python programming.
The code defines a simple Python program that performs arithmetic operations
based on user input. Let's break down how it works:
Function Definitions:
Four functions are defined: add(), subtract(), multiply(), and divide(). Each
function takes two parameters (P and Q) representing the numbers on which the
respective operation will be performed.
Each function returns the result of the corresponding arithmetic operation:
addition, subtraction, multiplication, or division.
User Interface:
The program prompts the user to select an operation by displaying options for
addition, subtraction, multiplication, and division.
The user is asked to input their choice (a, b, c, or d) corresponding to the desired
operation.
The user is prompted to enter two numbers (num_1 and num_2) on which the
selected operation will be performed.
Conditional Statements:
The program uses conditional statements (if, elif, else) to determine which
operation to perform based on the user's choice.
Depending on the user's choice, the program calls the corresponding function
(add(), subtract(), multiply(), or divide()) with the provided numbers as
arguments.
The result of the operation is printed to the console.
Input Handling:
The program ensures that the user's input for the choice and numbers are
converted to the appropriate data types (‘str’ to ‘int’) before performing any
operations.
Output:
After performing the selected
arithmetic operation, the program
prints the expression along with the
result to the console.
LAB – 5
Python Programming: Tuples, Assignment Operators, and
Comparison Operators
1. Introduction
This document provides an overview of tuples, assignment operators, and
comparison operators in Python, along with examples to demonstrate their
usage.
2. Tuples
Definition
A tuple is an immutable sequence type in Python. It can hold a collection of
items of any data type.
Example
# Creating a tuple
tuple1 = (1, 2, 3, 4, 5)
tuple2 = ("apple", "banana", "cherry")
print("Tuple1:", tuple1)
print("Tuple2:", tuple2)
3. Assignment Operators
Definition
Assignment operators are used to assign values to variables. Python provides
several assignment operators to perform arithmetic and other operations in a
concise manner.
Example
a=5
b = 10
# Using assignment operators
a += b # equivalent to a = a + b
print("a += b:", a)
a -= b # equivalent to a = a - b
print("a -= b:", a)
a *= b # equivalent to a = a + b
print("a *= b:", a)
a /= b # equivalent to a = a / b
print("a /= b:", a)
a %= b # equivalent to a = a % b
print("a %= b:", a)
a **= 2 # equivalent to a = a ** 2
print("a **= 2:", a)
a //= b # equivalent to a = a // b
print("a //= b:", a)
Explanation
+=: Adds right operand to the left operand and assigns the result to the left
operand.
-=: Subtracts right operand from the left operand and assigns the result to the
left operand.
*=: Multiplies left operand with the right operand and assigns the result to the
left operand.
/=: Divides left operand by the right operand and assigns the result to the left
operand.
%=: Takes the modulus using left and right operands and assigns the result to
the left operand.
**=: Performs exponential (power) calculation on operators and assigns the
result to the left operand.
//=: Performs floor division on operators and assigns the result to the left
operand.
4. Comparison Operators
Definition
Comparison operators are used to compare two values. They return a Boolean
value (True or False) based on the comparison.
Example
x = 10
y = 20
# Using comparison operators
print("x == y:", x == y) # Equal to
print("x != y:", x != y) # Not equal to
print("x > y:", x > y) # Greater than
print("x < y:", x < y) # Less than
print("x >= y:", x >= y) # Greater than or equal to
print("x <= y:", x <= y) # Less than or equal to
# Tuple comparison
tuple3 = (1, 2, 3)
tuple4 = (1, 2, 4)
print("tuple3 == tuple4:", tuple3 == tuple4)
print("tuple3 != tuple4:", tuple3 != tuple4)
print("tuple3 < tuple4:", tuple3 < tuple4)
print("tuple3 > tuple4:", tuple3 > tuple4)
Explanation
==: Checks if two operands are equal.
!=: Checks if two operands are not equal.
>: Checks if the left operand is greater than the right operand.
<: Checks if the left operand is less than the right operand.
>=: Checks if the left operand is greater than or equal to the right operand.
<=: Checks if the left operand is less than or equal to the right operand.
Tuple comparison is done element-wise until a difference is found.
5. Conclusion
This document has provided a brief overview of tuples, assignment operators,
and comparison operators in Python, including examples to demonstrate their
functionality. Tuples are useful for storing immutable sequences of items,
assignment operators simplify arithmetic operations, and comparison
operators help in making decisions based on comparisons.
LAB – 6
Handling DataFrames, Missing Data, Grouping, and File
Operations
Introduction to DataFrames
DataFrames are a fundamental data structure in Python's pandas library, used
for data manipulation and analysis. They are a two-dimensional labeled data
structure with columns of potentially different types. DataFrames are similar
to SQL tables or spreadsheets, but with more powerful and flexible data
manipulation capabilities.
Step 1: Load Data
To begin, we need to load the data into a DataFrame. This can be done by
reading data from various sources, such as CSV files, Excel files, HTML
tables, or SQL databases. Here's an example of how to read a CSV file into a
DataFrame:
import pandas as pd
df = pd.read_csv(r"C:\Users\hp\OneDrive\Documents\students.csv")
Step 2 : Manipulate Dataframe
Filtering Rows:
• Filter rows where the CGPA is greater than 8.0:
high_cgpa_students = df[df['CGPA'] > 8.0]
print(high_cgpa_students)
• Filter rows where the age is 22 and the weight is less than 60:
young_light_students = df[(df['Age'] == 22) & (df['Weight'] < 60)]
print(young_light_students)
Selecting Columns:
• Select the 'Student', 'CGPA', and 'Height' columns:
selected_columns = df[['Student', 'CGPA', 'Height']]
print(selected_columns)
Select columns by label:
selected_by_label = df.loc[:10, ['Student', 'CGPA', 'Height']]
print(selected_by_label)
Handling Duplicates:
• Remove duplicate rows:
df_unique = df.drop_duplicates()
print(df_unique)
• Remove duplicates based on the 'Student' and 'Roll No' columns:
df_unique_students = df.drop_duplicates(subset=['Student', 'Roll No'])
print(df_unique_students)
Step 3 : Handle Missing Data:
• Group by the 'Age' and 'Height' columns and calculate the sum of the
'Weight' column:
grouped_age_height = df.groupby(['Age', 'Height'])['Weight'].sum()
print(grouped_age_height)
Combined Code:
import matplotlib.pyplot as plt
# Data for scatter and line plot
x = [1, 2, 3, 4, 5]
y1 = [2, 3, 5, 7, 11] # Data for scatter plot
y2 = [1, 4, 6, 8, 10] # Data for line plot
# Scatter plot
plt.scatter(x, y1, color='blue', label='Scatter Data')
# Line plot
plt.plot(x, y2, color='green', label='Line Data')
plt.title('Scatter and Line Chart')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()
2. Bubble Chart
• Explanation: A bubble chart is a type of scatter plot where a third
dimension is represented by the size of the bubbles. It's useful for
comparing three variables.
• Code Explanation:
✓ s: Defines the size of the bubbles.
✓ plt.scatter(x, y, s=s, alpha=0.5, c='red', label='Bubble Data'): Plots x vs.
y with varying bubble sizes.
Code:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
# Bubble plot
plt.title('Bubble Chart')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
# Show plot
plt.show()
3. Histogram
• Explanation: A histogram displays the distribution of a dataset by grouping
data into bins and counting the number of observations in each bin. It's
useful for understanding the underlying frequency distribution of a dataset.
• Code Explanation:
✓ plt.hist(data, bins=bins, color='blue', edgecolor='black'): Plots a
histogram of data with specified bins.
Code:
import matplotlib.pyplot as plt
# Data for histogram
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 7, 8, 9, 10]
bins = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Histogram
plt.hist(data, bins=bins, color='blue', edgecolor='black')
# Adding titles and labels
plt.title('Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
# Show plot
plt.show()
4. Trend Line
• Explanation: A trend line is used to highlight the general direction in
which a dataset is moving. It's often used in time series data to show
trends over time.
• Code Explanation:
✓ plt.plot(x, y, 'o'): Plots the original data points.
✓ plt.plot(x, m*x + b, '-'): Plots the trend line using the linear equation y =
mx + b.
Code:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([1, 3, 2, 5, 7, 8])
m, b = np.polyfit(x, y, 1) # Slope (m) and intercept (b)# Plotting
data points
plt.plot(x, y, 'o')
# Plotting trend line plt.plot(x, m*x
+ b, '-')
# Adding titles and labels
plt.title('Trend Line') plt.xlabel('X-
axis') plt.ylabel('Y-axis')
# Show plotplt.show()
These examples demonstrate how to create different types of charts using Matplotlib. Each
type of chart provides a unique way to visualize and interpret data, making it easier to
understand patterns, relationships, and distributions within a dataset.
LAB – 8
PROGRAM USING CATEGORICAL DATA, SPLITTING
DATA TESTING SET NORMALIZE DATA
1. Importing the necessary libraries:
2. Load Data: Here, we'll create a sample Data Frame for demonstration.
Step 2: Create or Load a Sample Dataset: Create a sample dataset for demonstration purposes. This dataset
includes information about customer demographics and spending behavior.
Step 3: Descriptive Statistics: Compute and display summary statistics for numerical
variables in the dataset using describe() and check data distribution using value_counts().
Step 4: Data Visualization: Visualize the dataset using histograms, boxplots, and scatter
plots to understand the distribution and relationships between variables.
Histograms: A histogram is a graphical representation that organizes a group of data points into specified
ranges or bins. It shows the frequency distribution of a continuous variable
Components
• Bins: Intervals that divide the entire range of data into smaller segments. The width of each bin
represents a range of values, and the height represents the frequency or count of values within that
range.
• Frequency: The number of data points that fall within each bin.
Purpose
• To visualize the distribution of a continuous variable.
• To identify patterns such as skewness, modality, and the presence of outliers.
• To understand the spread and central tendency of the data.
Boxplot: A box plot (or box-and-whisker plot) is a standardized way of displaying the distribution of data based on a
five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
Components
• Box: Represents the interquartile range (IQR), which is the distance between the first quartile (Q1) and
the third quartile (Q3).
• Median: The line inside the box, indicating the median or 50th percentile.
• Whiskers: Lines extending from the box to the minimum and maximum values within 1.5 times the
IQR from Q1 and Q3.
• Outliers: Points outside the whiskers, representing unusually high or low values.
Purpose
• To provide a visual summary of data dispersion and skewness.
• To identify outliers.
• To compare distributions across different groups or categories.
Scatter plot: A scatter plot is a type of plot that shows the relationship between two continuous variables by
displaying data points on a two-dimensional plane.
Components
• Data Points: Each point represents an observation in the dataset, with its position determined by its
values on the x and y axes.
• Axes: The horizontal (x) axis and vertical (y) axis represent the two variables being compared.
Purpose
• To visualize the relationship between two continuous variables.
• To identify trends
Step 5: Hypothesis Testing: Perform a T-test to compare means of two variables (Age and
Spending Score).
Step 6: Correlation Analysis: Compute the correlation matrix and visualize it using a
heatmap to understand the relationships between variables.
Step 7: Regression Analysis: Perform linear regression analysis to predict Spending Score
based on Annual Income. Visualize the regression line.
LAB – 10
APPLICATION OF EXPLORATORY DATA ANALYSIS
Exploratory Data Analysis (EDA) is a crucial step in the data analysis process. It involves summarizing the
main characteristics of the data, often using visual methods. Here, we’ll go through a step-by-step guide to
performing EDA in Python. We'll cover:
Step 1: Import Required Libraries: Start by importing the necessary libraries for data manipulation and
visualization.
Step 2: Load the Dataset: Load your dataset. For demonstration, we'll create a sample
dataset.
Step 3: Data Overview: Examine the first few rows, column data types, and summary
statistics.
Step 4: Handle Missing Data: Identify and handle missing values in the dataset.
Step 5: Data Visualization: Use various plots to visualize the distribution of variables and
relationships between them.
Histogram: Histograms are powerful tools for visualizing the distribution and variability of data. By
understanding the components and interpretation of histograms, you can effectively analyze and communicate
data trends and patterns.
Box plot: Boxplots are valuable tools for visualizing data distribution, identifying outliers, and comparing
datasets. Their simplicity and effectiveness make them a staple in data analysis and statistics.
Pair plot:
Heatmap of correlation matrix: Heatmaps are a powerful tool for visualizing data distributions, correlations,
and patterns. They offer a compact and intuitive way to present complex data and are widely used in various fields for
data analysis and visualization.
Step 6: Correlation Analysis: Analyze correlations between numerical variables to
understand their relationships.
Step 7: Insights and Conclusion: Summarize key findings and insights from the analysis.