0% found this document useful (0 votes)
33 views63 pages

DS

The document outlines steps to install and configure R, including optional installation of RStudio, and how to run R code with necessary packages. It also provides examples of simple R programs demonstrating operations with numbers, vectors, and objects, as well as a calculator application developed in Python both with and without objects. Additionally, it includes a Python script for basic descriptive statistics using the mtcars and cars datasets.

Uploaded by

KEERTHANA K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views63 pages

DS

The document outlines steps to install and configure R, including optional installation of RStudio, and how to run R code with necessary packages. It also provides examples of simple R programs demonstrating operations with numbers, vectors, and objects, as well as a calculator application developed in Python both with and without objects. Additionally, it includes a Python script for basic descriptive statistics using the mtcars and cars datasets.

Uploaded by

KEERTHANA K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Exp No: 1 INSTALL, CONFIGURE AND RUN R WITH NECESSARY PACKAGES.

Date:

AIM
To install, configure, and run R with necessary packages, you can follow these steps.

Steps
Step 1: Install R
● Go to the official R website at https://www.r-project.org/
● Click on "CRAN" under the "Download" section.
● Choose a mirror site close to your location.
● Download and install R according to your operating system (Windows, macOS, or Linux).

Step 2: Install RStudio (optional but recommended)


● RStudio is an integrated development environment (IDE) for R that provides a user-friendly
interface. It's not required, but highly recommended.
● Go to the RStudio website at https://www.rstudio.com/products/rstudio/download/#download
● Download and install RStudio Desktop according to your operating system.

Step 3: Launch R or RStudio


● If you installed RStudio, open it.
● Otherwise, open the R console directly.

Step 4: Install necessary packages


● To install packages, you can use the install.packages() function.
● For example, to install the "dplyr" package, type the following command and press Enter:

● Repeat this step for all the packages you need. Make sure to include all the necessary packages for
your specific analysis or task.

Step 5: Load packages


● Once the packages are installed, you need to load them into the R session using the library()
function.
● For example, to load the "dplyr" package, type the following command and press Enter:

● Repeat this step for all the packages you installed.


Step 6: Start coding
● You can now start writing R code using the installed and loaded packages. ● In RStudio, you can
create a
new script file by clicking on "File" -> "New File" -> "R Script". ● Alternatively, you can directly type
your code in the R console.

Step 7: Run R code


● To run your R code, you can either execute the code line by line or run the entire script. ● In RStudio,
you can execute a single line of code by placing the cursor on that line and pressing "Ctrl + Enter".
● To run the entire script, click on the "Source" button or press "Ctrl + Shift + S". That's it! You have
now installed, configured, and can run R with necessary packages.

Result:
Thus the R package have successfully installed.
Exp No:2 IMPLEMENT SIMPLE R PROGRAMS USING NUMBERS, VECTORS AND
Date: OBJECTS

a) SIMPLE R PROGRAMS USING NUMBERS


Aim:
To write a R program to create a sum of numbers
Algorithm:
1. Assign the value 5 to variable a.
2. Assign the value 3 to variable b.
3. Add a and b together and store the result in variable sum.
4. Print the value of sum.

Program
a <- 5
b <- 3
sum <- a + b
print(sum)

Output:
[1] 8

b) COMPUTING THE MEAN OF A VECTOR:


Aim:
To write a R program to compute the mean of a vector
Algorithm:
1. Create a vector vec with values 2, 4, 6, 8, and 10.
2. Calculate the mean of the elements in vec using the mean() function and store the result in
mean_val.
3. Print the value of mean_val.

Program
vec <- c(2, 4, 6, 8, 10)
mean_val <- mean(vec)
print(mean_val)

Output
[1] 6

c)R PROGRAM THAT DEMONSTRATES THE USE OF OBJECTS.


Aim:
To write a program creates an object representing a student and displays their
information
Algorithm:
1. Create an empty object using the `list()` function.
2. Assign values to different attributes of the object.
3. Access the attributes and print the student's information.

Program:
student <- list()
student$name <- "John Doe"
student$age <- 20
student$major <- "Computer Science"
# Access attributes and print student's information
print(paste("Name:", student$name))
print(paste("Age:", student$age))
print(paste("Major:", student$major))

Output:
[1] "Name: John Doe"
[1] "Age: 20"
[1] "Major: Computer Science"

d) R program to create a sequence of numbers

AIM
To write a R program to create a sequence of numbers from 20 to 50. Find the mean of the numbers
from 20 to 60 and find the sum of the numbers from 51 to 91.
Algorithm:
1. Generate a sequence of numbers from 20 to 50
2.Calculate the mean of numbers from 20 to 60
3.Calculate the sum of numbers from 51 to 91
4.Print the results
Program:
sequence <- seq(20, 50)
mean_20_to_60 <- mean(seq(20, 60))
sum_51_to_91 <- sum(seq(51, 91))
print("Sequence of numbers from 20 to 50:")
print(sequence)
print(paste("Mean of numbers from 20 to 60:", mean_20_to_60))
print(paste("Sum of numbers from 51 to 91:", sum_51_to_91))

Output:
[1] "Sequence from 20 to 50:"
[1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[1] "Mean of numbers from 20 to 60:"
[1] 40
[1] "Sum of numbers from 51 to 91:"
[1] 2860

e) R program to find sum, mean and product of the vector

Aim:
To find the sum, mean and product of the vector in R.
Algorithm:
1. Generate a vector of numbers
2.Calculate the sum of the vector
3.Calculate the mean of the vector
4.Calculate the product of the vector
5.Print the results
Program:
# Create a vector of numbers
vector <- c(2, 4, 6, 8, 10)
# Calculate the sum of the vector
sum_vector <- sum(vector)
# Calculate the mean of the vector
mean_vector <- mean(vector)
# Calculate the product of the vector
product_vector <- prod(vector)
# Print the results
print(paste("Sum of the vector:", sum_vector))
print(paste("Mean of the vector:", mean_vector))
print(paste("Product of the vector:", product_vector))

OUTPUT:
[1] "Sum of the vector: 30"
[1] "Mean of the vector: 6"
[1] "Product of the vector: 3840"

f) R program to check if the number is prime or not


AIM
To write a R program to find the prime of the number.
Algorithm:
1: Take num as input.
2: Initialize a variable temp to 0.
3: Iterate a “for” loop from 2 to num/2.
4: If num is divisible by loop iterator, then increment temp.
5: If the temp is equal to 0,
Return “Num IS PRIME”.Else,
Return “Num IS NOT PRIME”.
Program:
num = as.integer(readline(prompt="Enter a number: "))
flag = 0
# prime numbers are greater than 1
if(num > 1) {
# check for factors
flag = 1
for(i in 2:(num-1)) {
if ((num %% i) == 0) {
flag = 0
break
}
}
}
if(num == 2) flag = 1
if(flag == 1) {
print(paste(num,"is a prime number"))
} else {
print(paste(num,"is not a prime number"))
}
Output:
Enter a number: 25
[1] "25 is not a prime number"

Result:
Thus the above simple r program using numbers,vectors and objects is executed successfully and
output is verified.
Exp No: 3a CALCULATOR APPLICATION USING WITHOUT PYTHON OBJECTS
Date:

AIM
To write a python script to develop calculator application without python objects on console
Algorithm:
1. Prompt the user to enter the first number.
2. Prompt the user to enter the second number.
3. Prompt the user to select an operation (addition, subtraction, multiplication, or division). 4. Perform
the selected operation on the numbers.
5. Display the result.
Program:
num1 = float(input("Enter the first number: "))
num2 = float(input("Enter the second number: "))
print("Select operation:")
print("1. Addition")
print("2. Subtraction")
print("3. Multiplication")
print("4. Division")
choice = int(input("Enter your choice (1-4): "))
if choice == 1:
result = num1 + num2
elif choice == 2:
result = num1 - num2
elif choice == 3:
result = num1 * num2
elif choice == 4:
result = num1 / num2
else:
print("Invalid choice!")
exit(1)
# Display the result
print("Result:", result)
OUTPUT:
Enter the first number: 5
Enter the second number: 3
Select operation:
1. Addition
2. Subtraction
3. Multiplication
4. Division
Enter your choice (1-4): 3
Result: 15.0
Result:
Thus the above program for calculator application without python objects is executed successfully and
output is verified.
Exp No:3b CALCULATOR APPLICATION USING WITH PYTHON OBJECTS
Date:

AIM
To write a python script to develop calculator application without python objects on console.
Algorithm:
1. Create a Calculator class with methods for addition, subtraction, multiplication, and division. 2.
Initialize the Calculator object.
3. Prompt the user to enter the first number.
4. Prompt the user to enter the second number.
5. Prompt the user to select an operation (addition, subtraction, multiplication, or division). 6. Perform
the selected operation using the appropriate method of the Calculator object. 7. Display the result.
Program:
class Calculator:
def addition(self, num1, num2):
return num1 + num2
def subtraction(self, num1, num2):
return num1 - num2
def multiplication(self, num1, num2):
return num1 * num2
def division(self, num1, num2):
return num1 / num2
# Initialize the Calculator object
calculator = Calculator()
# Prompt the user to enter the first number
num1 = float(input("Enter the first number: "))
# Prompt the user to enter the second number
num2 = float(input("Enter the second number: "))
# Prompt the user to select an operation
print("Select operation:")
print("1. Addition")
print("2. Subtraction")
print("3. Multiplication")
print("4. Division")
choice = int(input("Enter your choice (1-4): "))
# Perform the selected operation using the appropriate method of the Calculator object
if choice == 1:
result = calculator.addition(num1, num2)
elif choice == 2:
result = calculator.subtraction(num1, num2)
elif choice == 3:
result = calculator.multiplication(num1, num2)
elif choice == 4:
result = calculator.division(num1, num2)
else:
print("Invalid choice!")
OUTPUT:
Enter the first number: 5
Enter the second number: 3
Select operation:
1. Addition
2. Subtraction
3. Multiplication
4. Division
Enter your choice (1-4): 3
Result: 15.0

Result:
Thus the above program for calculator application with python objects is executed successfully and
output is verified.
Exp No: 3c CREATE PYTHON OBJECTS FOR CALCULATOR APPLICATION AND SAVE
Date: IN A SPECIFIED LOCATION IN DISK

Aim:
To create a python object for calculator application and save in a specified location in disk
Algorithm:
1. We start by defining a Calculator class with the basic arithmetic operations: addition, subtraction,
multiplication, and division. The result is stored in the result attribute.
2. We create an instance of the Calculator class called calculator.
3. We perform some calculations by calling the appropriate methods on the calculator object.
4. Next, we specify the location where we want to save the calculator object. In this example, we use
the filename calculator_object.pkl, but you can change it to any desired location.
5. We use the pickle module to save the calculator object to the specified location on the disk. 6.
Finally, we print a message indicating the successful saving of the calculator object.

Program:
import pickle
class Calculator:
def __init__(self):
self.result = 0
def add(self, x):
self.result += x
def subtract(self, x):
self.result -= x
def multiply(self, x):
self.result *= x
def divide(self, x):
if x != 0:
self.result /= x
else:
print("Error: Division by zero")
calculator_instance = Calculator()
calculator_instance.add(5)
calculator_instance.multiply(3)
calculator_instance.subtract(8)
calculator_instance.divide(2)
print("Final Result:", calculator_instance.result)
save_location = "calculator_object.pkl"
with open(save_location, 'wb') as file:
pickle.dump(calculator_instance, file)
print(f"Calculator object saved to {save_location}")

Output:
Final Result: 5.0
Calculator object saved to calculator_object.pkl

Result:
Thus the above program to create a python object for calculator application and save in a specified
location in disk is executed successfully and output is verified.
Exp No: 4a WRITE A PYTHON SCRIPT TO FIND BASIC DESCRIPTIVE STATISTICS
Date: USING SUMMARY, STR, QUARTILE FUNCTION ON MTCARS & CARS
DATASETS

Aim:
To write a python script to find the basic descriptive statistics using summary ,STR quantile
on mtcars and cars dataset

Algorithm:
1. Import the pandas library to work with datasets.
2. Load the mtcars dataset using the read_csv() function and assign it to the Variable mtcars.
3. Use the describe() function on the mtcars dataset to compute summary statistics and
assign the result to mtcars_summary.
4. Print the summary statistics for the mtcars dataset using print(mtcars_summary).
5. Use the info() function on the mtcars dataset to display information about the data
types of the columns.
6. Print the data type information for the mtcars dataset using print(mtcars.info()).
7. Use the quantile() function on the mtcars dataset to calculate quartiles at 25%, 50%, and 75%.
8. Assign the calculated quartiles to mtcars_quartiles.
9. Print the quartiles for the mtcars dataset using print(mtcars_quartiles).
10. Repeat steps 2-9 for the cars dataset, replacing mtcars with cars in the variable names and
filenames.
program:
import pandas as pd
mtcars=pd.read_csv('https://gist.githubusercontent.com/ZeccaLehn/4e06d2575eb9589dbe8c3
65d61cb056c/raw/1202b0e7b8c26105fadf1eb5bf92ed2564cf1e30/mtcars.csv')
print("Summary Statistics for mtcars dataset:")
print(mtcars.describe())
print("\nStructure Information for mtcars dataset:")
print(mtcars.info())
print("\nQuartiles for mtcars dataset:")
print(mtcars.quantile([0.25, 0.5, 0.75]))
cars=pd.read_csv('https://gist.githubusercontent.com/ZeccaLehn/4e06d2575eb9589dbe8c365
d61cb056c/raw/1202b0e7b8c26105fadf1eb5bf92ed2564cf1e30/cars.csv')
print("\nSummary Statistics for cars dataset:")
print(cars.describe())
print("\nStructure Information for cars dataset:")
print(cars.info())
print("\nQuartiles for cars dataset:")
print(cars.quantile([0.25, 0.5, 0.75]))
Output:
Summary Statistics for mtcars dataset:
mpg cyl disp hp drat wt qsec vs am gear carb
count 32.00000 32.000000 32.00000 32.00000 32.000000 32.000000 32.000000 32.000000
32.000000 32.000000 32.00000
mean 20.090625 6.187500 230.72188 127.93750 3.596563 3.217250 17.848750 0.437500
0.406250 3.687500 2.81250
std 6.026948 1.785922 123.93869 84.83616 0.534679 0.978457 1.786943 0.504016
0.498991 0.737804 1.61520
min 10.400000 4.000000 71.10000 52.00000 2.760000 1.513000 14.500000 0.000000
0.000000 3.000000 1.00000
25% 15.425000 4.000000 120.82500 96.50000 3.080000 2.581250 16.892500 0.000000
0.000000 3.000000 2.00000
50% 19.200000 6.000000 196.30000 123.00000 3.695000 3.325000 17.710000 0.000000
0.000000 4.000000 2.00000
75% 22.800000 8.000000 326.00000 180.00000 3.920000 3.610000 18.900000 1.000000
1.000000 4.000000 4.00000
max 33.900000 8.000000 472.00000 335.00000 4.930000 5.424000 22.900000 1.000000
1.000000 5.000000 8.00000

Structure Information for mtcars dataset:


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 11 columns):
...

Quartiles for mtcars dataset:


mpg cyl disp hp drat wt qsec vs am gear carb
0.25 15.4 4.0 120.825 96.5 3.080 2.58125 16.892 0.0 0.0 3.0 2.0
0.50 19.2 6.0 196.300 123.0 3.695 3.32500 17.710 0.0 0.0 4.0 2.0
0.75 22.8 8.0 326.000 180.0 3.920 3.61000 18.900 1.0 1.0 4.0 4.0

...

Summary Statistics for cars dataset:


MPG Cylinders Displacement HP Weight Acceleration Model Year Origin
count 406.0 406.0 406.000000 406.000000 406.000000 406.000000 406.000000 406.000000
mean 23.0 5.0 194.779557 105.082524 2979.413793 15.519704 75.921182 1.568966
std 8.0 1.0 104.922458 38.480533 847.004328 2.803359 3.748737 0.797479
min 10.0 3.0 68.000000 46.000000 1613.000000 8.000000 70.000000 1.000000
25% 17.0 4.0 105.000000 76.000000 2226.500000 13.700000 73.000000 1.000000
50% 22.0 4.0 151.000000 95.000000 2822.500000 15.500000 76.000000 1.000000
75% 29.0 6.0 302.000000 130.000000 3618.250000 17.175000 79.000000 2.000000
max 46.0 8.0 455.000000 230.000000 5140.000000 24.800000 82.000000 3.000000

Structure Information for cars dataset:


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 406 entries, 0 to 405
Data columns (total 8 columns):
...

Quartiles for cars dataset:


MPG Cylinders Displacement HP Weight Acceleration Model Year Origin
0.25 17.0 4.0 105.0 76.0 2226.5 13.7 73.0 1.0
0.50 22.0 4.0 151.0 95.0 2822.5 15.5 76.0 1.0
0.75 29.0 6.0 302.0 130.0 3618.25 17.175 79.0 2.0

Result:
Thus the above program for basic descriptive statistics using summary ,STR quantile on mtcars and
cars dataset is executed successfully and output is verified.
Exp No:4b WRITE A PYTHON SCRIPT TO FIND SUBSET OF DATASET BY USING
Date: SUBSET();AGGREGATE () FUNCTIONS ON IRIS DATASET.

Aim:
To Write a python script to find the subset and aggregate function on iris dataset.
Algorithm:
1. Import the pandas library to work with the dataset.
2. Load the Iris dataset using the read_csv() function and assign it to the variable iris.
3. Use the subset() function to filter the dataset based on a condition. In this example,
we subset the dataset where the species is equal to 'setosa' and assign the result to the
variable subset.
4. Print the subset of the Iris dataset using print(subset).
5. Use the aggregate() function on the subset to compute aggregate statistics. In this
example, we calculate the minimum, maximum, and mean values for the 'sepal_length'
column, as well as the mean value for the 'sepal_width' column.
6. Assign the aggregated result to the variable aggregate_result.
7. Print the aggregated result using print(aggregate_result).
program:

import pandas as pd
url = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
iris = pd.read_csv(url)
print("First few rows of iris dataset:")
print(iris.head())
print()
subset = iris[['sepal_length', 'petal_length', 'species']]
print("Subset of selected columns (sepal_length, petal_length, species):")
print(subset.head())
print()
aggregate_functions = iris.agg({'sepal_width': 'mean', 'petal_width': 'median', 'sepal_length':
'max'})
print("Aggregate functions on sepal_width (mean), petal_width (median), sepal_length (max):")
print(aggregate_functions)

OUTPUT:
First few rows of iris dataset:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Subset of selected columns (sepal_length, petal_length, species):


sepal_length petal_length species
0 5.1 1.4 setosa
1 4.9 1.4 setosa
2 4.7 1.3 setosa
3 4.6 1.5 setosa
4 5.0 1.4 setosa

Aggregate functions on sepal_width (mean), petal_width (median), sepal_length (max):


sepal_width 3.057333
petal_width 1.300000
sepal_length 7.900000
dtype: float64

Result:
Thus the above program to find the subset and aggregate function on iris dataset is executed
successfully and output is verified.
Exp No:4c WRITE A PYTHON SCRIPT TO FIND SUBSET OF DATASET BY USING
Date: SUBSET(); AGGREGATE () FUNCTIONS ON MTCARS DATASET.

Aim:
To Write a python script to find the subset and aggregate function on iris dataset.
Algorithm:
1. Import the pandas library to work with the dataset.
2. Load the Iris dataset using the read_csv() function and assign it to the variable
mtcars.
3. Use the subset() function to filter the dataset based on a condition. In this example,
we subset the dataset where the species is equal to 'setosa' and assign the result to the
variable subset.
4. Print the subset of the Iris dataset using print(subset).
5. Use the aggregate() function on the subset to compute aggregate statistics. In this
example, we calculate the minimum, maximum, and mean values for the 'sepal_length'
column, as well as the mean value for the 'sepal_width' column.
6. Assign the aggregated result to the variable aggregate_result.
7. Print the aggregated result using print(aggregate_result).
program:

import pandas as pd
url = 'https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/mtcars.csv'
mtcars = pd.read_csv(url)print("First few rows of mtcars dataset:")
print(mtcars.head())
print()
subset = mtcars[['mpg', 'cyl', 'wt']]
print("Subset of selected columns (mpg, cyl, wt):")
print(subset.head())
print()
aggregate_functions = mtcars.agg({'mpg': 'mean', 'hp': 'median', 'qsec': 'max'})
print("Aggregate functions on mpg (mean), hp (median), qsec (max):")
print(aggregate_functions)
import pandas as pd
url = 'https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/mtcars.csv'
mtcars = pd.read_csv(url)
print("First few rows of mtcars dataset:")
print(mtcars.head())
print()
subset = mtcars[['mpg', 'cyl', 'wt']]
print("Subset of selected columns (mpg, cyl, wt):")
print(subset.head())
print()
aggregate_functions = mtcars.agg({'mpg': 'mean', 'hp': 'median', 'qsec': 'max'})
print("Aggregate functions on mpg (mean), hp (median), qsec (max):")
print(aggregate_functions)

OUTPUT:
First few rows of mtcars dataset:
model mpg cyl disp hp drat wt qsec vs am gear carb
0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

Subset of selected columns (mpg, cyl, wt):


mpg cyl wt
0 21.0 6 2.620
1 21.0 6 2.875
2 22.8 4 2.320
3 21.4 6 3.215
4 18.7 8 3.440

Aggregate functions on mpg (mean), hp (median), qsec (max):


mpg 20.090625
hp 123.000000
qsec 22.900000
dtype: float64

First few rows of mtcars dataset:


model mpg cyl disp hp drat wt qsec vs am gear carb
0 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

Subset of selected columns (mpg, cyl, wt):


mpg cyl wt
0 21.0 6 2.620
1 21.0 6 2.875
2 22.8 4 2.320
3 21.4 6 3.215
4 18.7 8 3.440

Aggregate functions on mpg (mean), hp (median), qsec (max):


mpg 20.090625
hp 123.000000
qsec 22.900000
dtype: float64

Result:
Thus the above program to find the subset and aggregate function on mtcars dataset is executed
successfully and output is verified.
Exp No:5a WRITE A PYTHON SCRIPT TO READ DIFFERENT TYPES OF DATA SETS
(.TXT,
Date: CSV) FROM WEB AND DISK AND WRITE TO A FILE IN A SPECIFIC
LOCATION.

Aim:
To read different types of data from text files, Excel and the web using the pandas
package.

ALGORITHM:
STEP 1: Start the program
STEP 2: To read data from csv file using pandas package.
STEP 3: To read data from excel file using pandas package.
STEP 4: To read data from an html file using the pandas package.
STEP 5: Display the output.
STEP 6: Stop the program.

PROGRAM:
DATA INPUT AND OUTPUT
This notebook is the reference code for getting input and output, pandas can read a
variety of file types using its pd.read_ methods. Let’s take a look at the most common
data types: import numpy as np
import pandas as pd
CSV
CSV INPUT:
df = pd.read_csv('example')
df

a b c d

0 0 1 2 3

1 4 5 6 7

2 8 9 10 11

3 12 13 14 15
EXCEL
Pandas can read and write excel files, keep in mind, this only imports data. Not
formulas or images, having images or macros may cause this read_excel method to
crash. EXCEL INPUT :
pd.read_excel('Excel_Sample.xlsx',sheet name='Sheet1')

a b c d

0 0 1 2 3

1 4 5 6 7

2 8 9 10 11

3 12 13 14 15

EXCEL OUTPUT :
df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')
HTML
You may need to install htmllib5, lxml, and BeautifulSoup4. In your terminal/command
prompt
run: pip install lxml
pip install html5lib==1.1
pip install BeautifulSoup4
Then restart Jupyter Notebook. (or use conda install). Pandas can read table tabs off of
html.

For example:
HTML INPUT:
Pandas read_html function will read tables off of a webpage and return a list of
DataFrame
objects:
url =
https://www.fdic.gov/resources/resolutions/bank-failures/
failed-bank-list
df = pd.read_html(url)
df[0]
match = "Metcalf Bank"
df_list = pd.read_html(url, match=match)
df_list[0]

HTML OUTPUT:

RESULT:
Thus commands for read data from csv file, excel file and html are successfully
executed.
Exp No:5b WRITE A PYTHON SCRIPT TO READ EXCEL DATA SHEETS IN PYTHON.
Date:

AIM:
To write a python script to read Excel data sheets.
ALGORITHM:
STEP 1: Start the program
STEP 2: Install openpyxl library using pip from command line.
STEP 3: Import openpyxl library.
STEP 4: Read data from an existing spreadsheet.
STEP 5: Also users can perform calculations on existing data.
STEP 6: Install xlwings library using pip from command line.
STEP 7: Import xlwings library.
STEP 8: Read data from an existing spreadsheet.
STEP 9: Write data to the existing spreadsheet..
STEP 10:Display the output.
STEP 11: Stop the program.
PROGRAM:
DATA INPUT AND OUTPUT
Install an openpyxl module using pip from the command line.
pip install openpyxl
Read data from an existing spreadsheet.
import openpyxl
# Define variable to load the dataframe
dataframe = openpyxl.load_workbook("Book2.xlsx")
# Define variable to read sheet
dataframe1 = dataframe.active
# Iterate the loop to read the cell values
for row in range(0, dataframe1.max_row):
for col in dataframe1.iter_cols(1, dataframe1.max_column): print(col[row].value)

OUTPUT:

28
Install xlwings library using pip from command line.
pip install xlwings
To start xlwings open an Excel file, viewing the sheet available and then
selecting a sheet. import xlwings as xw
# Specifying a sheet
ws = xw.Book("Book2.xlsx").sheets['Sheet1']
# Selecting data from
# a single cell
v1 = ws.range("A1:A7").value
# v2 = ws.range("F5").value
print("Result:", v1, v2)

OUTPUT:
Result: ['Name Age Stream Percentage',
'0 Ankit 18 Math 95',
'1 Rahul 19 Science 90',
'2 Shaurya 20 Commerce 85',
'3 Aishwarya 18 Math 80',
'4 Priyanka 19 Science 75',
None]
RESULT:
Thus python script was written to read data from and write data to excel file using
Python libraries are successfully executed and verified.

Exp No: 5c WRITE A PYTHON SCRIPT TO READ XML DATASET IN PYTHON.


Date:

AIM:
To write a python script to read XML data sets using Python.
ALGORITHM:
STEP 1: Start the program
STEP 2: Install BeautifulSoup library using pip from command line.
STEP 3: Also install third party Python parser lxml using pip
command..
STEP 4: Read data from an XML file.
STEP 5: Find tags and extract from tags.
STEP 6: Import Element Tree class found inside XML library.
STEP 7: Read data from an XML file.
STEP 9: Write data to the XML file.
STEP 10:Display the output.
STEP 11: Stop the program.

PROGRAM:
Reading Data from an XML file Using BeautifulSoup
from bs4 import BeautifulSoup
# Reading the data inside the xml
# file to a variable under the name
# data
with open('dict.xml', 'r') as f:
data = f.read()
# Passing the stored data inside
# the beautifulsoup parser, storing
# the returned object
Bs_data = BeautifulSoup(data, "xml")
# Finding all instances of tag
# `unique`
b_unique = Bs_data.find_all('unique')
print(b_unique)
# Using find() to extract attributes
# of the first instance of the tag
b_name = Bs_data.find('child', {'name':'Frank'}) print(b_name)
# Extracting the data stored in a
# specific attribute of the
# `child` tag
value = b_name.get('test')
print(value)

OUTPUT:
Writing Data to an XML file Using BeautifulSoup
from bs4 import BeautifulSoup
# Reading data from the xml file
with open('dict.xml', 'r') as f:
data = f.read()
# Passing the data of the xml
# file to the xml parser of
# beautifulsoup
bs_data = BeautifulSoup(data, 'xml')
# A loop for replacing the value
# of attribute `test` to WHAT !!
# The tag is found by the clause
# `bs_data.find_all('child', {'name':'Frank'})` for tag in
bs_data.find_all('child', {'name':'Frank'}): tag['test'] =
"WHAT !!"
# Output the contents of the
31
# modified xml file
print(bs_data.prettify())
OUTPUT:
Reading Data from an XML file Using Elementtree
# importing element tree
# under the alias of ET
import xml.etree.ElementTree as ET
# Passing the path of the
# xml document to enable the
# parsing process
tree = ET.parse('dict.xml')
# getting the parent tag of
# the xml document
root = tree.getroot()
# printing the root (parent) tag
# of the xml document, along with
# its memory location
print(root)
# printing the attributes of the
# first tag from the parent
print(root[0].attrib)
# printing the text contained within
# first subtag of the 5th tag from
# the parent
print(root[5][0].text)
OUTPUT:

Writing Data to an XML file Using Elementtree


import xml.etree.ElementTree as ET
# This is the parent (root) tag
# onto which other tags would be
# created
data = ET.Element('chess')
# Adding a subtag named `Opening`
# inside our root tag
element1 = ET.SubElement(data, 'Opening')
# Adding subtags under the `Opening`
# subtag
s_elem1 = ET.SubElement(element1, 'E4')
s_elem2 = ET.SubElement(element1, 'D4')
33
# Adding attributes to the tags under
# `items`
s_elem1.set('type', 'Accepted')
s_elem2.set('type', 'Declined')
# Adding text between the `E4` and `D5`
# subtag
s_elem1.text = "King's Gambit Accepted"
s_elem2.text = "Queen's Gambit Declined"
# Converting the xml data to byte object,
# for allowing flushing data to file
# stream
b_xml = ET.tostring(data)
# Opening a file under the name `items2.xml`,
# with operation mode `wb` (write + binary)
with open("GFG.xml", "wb") as f:
f.write(b_xml)
OUTPUT:
RESULT:
Thus a python script was written to read data from and write data to an XML file using
Python libraries are successfully executed and verified.

Exp No: 6a FIND THE DATA DISTRIBUTIONS USING A BOX AND SCATTER PLOT.
Date:

AIM:
To find the data distributions using box and scatter plots.
ALGORITHM:
STEP 1: Start the Program
STEP 2: Import any dataset.
STEP 3: Pick any two columns.
STEP 4: Plot the boxplot using the boxplot function.
STEP 5: Draw the boxplot with a notch..
STEP 6: Display the result.
STEP 7: Plot the scatter plot using plot function
STEP 8: Print the result
STEP 9: Stop the process
The basic syntax to create a boxplot in R is −
boxplot(x, data, notch, varwidth, names, main)
where ,
● x is a vector or a formula.
● data is the data frame.
● notch is a logical value. Set as TRUE to draw a notch.
● varwidth is a logical value. Set as true to draw width of the box proportionate to
the
sample size.
● names are the group labels which will be printed under each boxplot.
● main is used to give a title to the graph.
The basic syntax for creating scatterplot in R is −
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
where,
● x is the data set whose values are the horizontal coordinates.
● y is the data set whose values are the vertical coordinates.
● main is the tile of the graph.
● xlab is the label in the horizontal axis.
● ylab is the label in the vertical axis.
● xlim is the limit of the values of x used for plotting.
● ylim is the limits of the values of y used for plotting.
● axes indicate whether both axes should be drawn on the plot.

PROGRAM:
BOX PLOT
For the dataset “mtcars” available in R environment,
input <- mtcars[,c('mpg','cyl')]
print(head(input))

OUTPUT:
Creating the Boxplot
# Give the chart file a name.
png(file = "boxplot.png")
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")
# Save the file.
dev.off()

OUTPUT:

36
Boxplot with Notch
# Give the chart file a name.
png(file = "boxplot_with_notch.png")
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
main = "Mileage Data",
notch = TRUE,
varwidth = TRUE,
col = c("green","yellow","purple"),
names = c("High","Medium","Low")
)
# Save the file.
dev.off()

OUTPUT:

Creating the Scatterplot


37
# Get the input values.
input <- mtcars[,c('wt','mpg')]
# Give the chart file a name.
png(file = "scatterplot.png")
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15
and 30. plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Mileage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Mileage"
)
# Save the file.
dev.off()
OUTPUT:
Scatterplot Matrices
The basic syntax for creating scatterplot matrices in R is −
pairs(formula, data)
where,
● formula represents the series of variables used in pairs.
● data represents the data set from which the variables will be taken.

Program
# Give the chart file a name.
png(file = "scatterplot_matrices.png")
# Plot the matrices between 4 variables giving 12 plots.
# One variable with 3 others and a total of 4 variables.
pairs(~wt+mpg+disp+cyl,data = mtcars,
main = "Scatterplot Matrix")
# Save the file.
dev.off()

OUTPUT:
RESULT:
Thus the program to find the data distributions using box and scatter plots were implemented and
executed
successfully.
Exp No: 6b PLOT BARCHART,PIE CHART AND HISTOGRAM USING IRIS DATA
Date:

Aim:
To plot barchart, pie chart and histogram in R using iris data.
Algorithm:
1.Load the Iris Dataset
2.Create a Bar Plot for Mean Sepal Length by Species
3.Create a Pie Chart for the Proportion of Iris Species
4.Create a Histogram for the Distribution of Sepal Width
Program:
data(iris)
barplot(tapply(iris$Sepal.Length, iris$Species, mean),
main = "Mean Sepal Length by Species",
ylab = "Sepal Length",
xlab = "Species",
col = "skyblue",
ylim = c(0, 8)
)
slices <- table(iris$Species)
lbls <- paste(names(slices), "\n", slices)
pie(slices, labels = lbls, col = rainbow(length(slices)),
main = "Proportion of Iris Species"
)
hist(iris$Sepal.Width, col = "lightgreen", border = "black",
main = "Distribution of Sepal Width",
xlab = "Sepal Width",
ylab = "Frequency"
)
OUTPUT:
RESULT:
Thus the program to plot barchart, pie chart and histogram using iris data were implemented and
Executed Successfully
Exp No: 6c PLOT BARCHART,PIE CHART AND HISTOGRAM USING MTCARS DATA
Date:

Aim:
To plot barchart, pie chart and histogram in R using mtcars data.
Algorithm:
1.Load the Mtcars Dataset
2.Create a Bar Plot for Mean Sepal Length by Species
3.Create a Pie Chart for the Proportion of Iris Species
4.Create a Histogram for the Distribution of Sepal Width
Program:
# Bar chart - Count of Cars by Number of Gears
bar_data <- table(mtcars$gear)

barplot(bar_data, main = "Number of Gears Distribution", xlab = "Number of


Gears", ylab = "Count", col = "skyblue")

# Pie chart - Count of Cars by Cylinder


pie_data <- table(mtcars$cyl)

pie(pie_data, main = "Cylinder Distribution", col = rainbow(length(pie_data)),


labels = c("4", "6", "8"))

# Histogram - Distribution of Car Weights


hist(mtcars$wt, main = "Car Weight Distribution", xlab = "Weight", col = "orange")

OUTPUT:
RESULT:
Thus the program to plot barchart, pie chart and histogram using mtcars data were implemented and
executed Successfully

Exp No: 6d FIND THE OUTLIERS USING PLOT.


Date:
Exp No. 7 A FIND THE CORRELATION MATRIX
Date:

AIM:
To write a program to find the correlation matrix of iris dataset.

ALGORITHM:
1. Load the Iris dataset: Use the data() function to load the Iris dataset.
2. Inspect the dataset: Optionally, use the head() function to inspect the first few rows of the dataset.
3.Select numeric columns: Extract the numeric columns from the dataset (columns 1 to 4).
4. Calculate the correlation matrix: Use the cor() function to compute the correlation matrix for the
selected numeric columns.
5. Print the correlation matrix: Display the calculated correlation matrix using the print() function.

PROGRAM:
data(iris)
head(iris)
plot(iris)
cor_matrix <- cor(iris[1:4])
print(cor_matrix)
cor(iris[1:4],method=”kendall”)
cor(iris[1:4],method=”spearman”)

OUTPUT:

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411

## Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259

## Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654

## Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## Sepal.Length 1.00000000 -0.07699679 0.7185159 0.6553086


## Sepal.Width -0.07699679 1.00000000 -0.1859944 -0.1571257

## Petal.Length 0.71851593 -0.18599442 1.0000000 0.8068907

## Petal.Width 0.65530856 -0.15712566 0.8068907 1.0000000

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## Sepal.Length 1.0000000 -0.1667777 0.8818981 0.8342888

## Sepal.Width -0.1667777 1.0000000 -0.3096351 -0.2890317

## Petal.Length 0.8818981 -0.3096351 1.0000000 0.9376668

## Petal.Width 0.8342888 -0.2890317 0.9376668 1.0000000


RESULT:

Thus the program to find the correlation matrix of the iris dataset is executed successfully and the
output is verified.

Exp No. 7 B PLOT THE CORRELATION PLOT ON DATASET AND VISUALIZE GIVING
Date: AN OVERVIEW OF RELATIONSHIPS AMONG DATA ON IRIS DATA

AIM:

To write a program to plot the correlation plot on the dataset and visualize giving an overview of
relationships among data on iris data.

ALGORITHM:

1. Load the Iris dataset:


- Use the `data()` function to load the Iris dataset.
2. Calculate the correlation matrix:
- Use the `cor()` function to compute the correlation matrix for the selected numeric columns
(columns 1 to 4).
3. Load necessary libraries:
- Load the required libraries for visualization, such as `corrplot` and `psych`.
4. Visualize the correlation matrix using corrplot:
- Display the correlation matrix using different visualization methods, such as standard corrplot, pie
charts, color representation, and numerical values.
5. Visualize scatterplots:
- Use the `pairs()` function for a scatterplot matrix with default colors.
- Use `pairs.panels()` from the `psych` package for scatterplot matrices with additional customization,
such as custom histogram background colors.

PROGRAM:

cr <- cor(iris[1:4])
library(corrplot)
corrplot(cr)
corrplot(cr,method="pie")
corrplot(cr,method="color")
corrplot(cr,method="number")
pairs(iris[1:4])
library(psych)
pairs.panels(iris[1:4], hist.col="purple")
pairs.panels(iris[1:4], hist.col="#00CED1" )

OUTPUT:
RESULT:
Thus the correlation is plotted on the dataset and visualized giving an overview of relationships among
data on iris data.
Exp No. 7 C ANALYSIS OF COVARIANCE: VARIANCE (ANOVA), IF DATA HAVE
Date: CATEGORICAL VARIABLES ON IRIS DATA

AIM:
To write a program to find the analysis of variance, if data have categorical variables on iris data.

ALGORITHM:
1. Load the necessary library:
Use the library(stats) function to load the required library for statistical analysis.
2. Load the Iris dataset:
Use the data(iris) function to load the Iris dataset.
3. Specify the ANOVA model:
Use the aov() function to specify the one-way ANOVA model, with the dependent variable
(Sepal.Length) regressed on the categorical variable (Species).
4. Fit the ANOVA model:
Apply the specified ANOVA model to the dataset using the aov() function.
5. Print the ANOVA results:
Use the summary() function to print the ANOVA results, including the F-statistic, p-value, and
other relevant information.

PROGRAM:

library(stats)
data(iris)
anova_result <- aov(Sepal.Length ~ Species, data = iris)
print(summary(anova_result))

OUTPUT:

Df Sum Sq Mean Sq F value Pr(>F)


Species 2 63.21 31.605 119.26 < 2e-16 ***
Residuals 147 38.96 0.265

RESULTS:

Thus the program to find the analysis of variance, if data have categorical variables on iris data is
executed successfully and the output is verified.
Exp No: 8 Import data from web storage. Name the dataset and now do Logistic Regression
to find out relation between variables that are affecting the admission of a student
in an institute based on his or her GRE score, GPA obtained and rank of the
student. Also check if the model fits it or not.
Date:

AIM:
To perform Logistic regression find out relation between variables that are affecting the
admission of a student in an institute based on his or her GRE score, GPA obtained and rank of the
student and also to check whether the model fits it or not.

ALGORITHM:
STEP 1. Start the program
STEP 2: Load the data set into the working environment under the name df.
STEP 3: Ensure whether the data meets the key assumptions “homogeneity of variance” by
running Levene’s test.
STEP 4: Install car package to execute Levene’s test.
STEP 5: Run the ANOVA command.
STEP 6: Report the finding by using the describeBy command from psych package.
STEP 7: Plot any type of graph for pictorial representation of the finding
STEP 8: Stop the program.

PROGRAM

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
url = "/content/Admission_Predict.csv"
df = pd.read_csv(url)
print(df.head())
df.columns = df.columns.str.strip()
threshold = 0.7
df['Admission'] = (df['Chance of Admit'] > threshold).astype(int)
X = df[['GRE Score', 'TOEFL Score', 'University Rating', 'SOP', 'LOR', 'CGPA',
'Research']]
y = df['Admission']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print('Confusion Matrix:')
print(conf_matrix)
print('Classification Report:')
print(classification_rep)

OUTPUT:

Serial No. GRE Score TOEFL Score University Rating SOP LOR CGPA \
0 1 337 118 4 4.5 4.5 9.65
1 2 324 107 4 4.0 4.5 8.87
2 3 316 104 3 3.0 3.5 8.00
3 4 322 110 3 3.5 2.5 8.67
4 5 314 103 2 2.0 3.0 8.21

Research Chance of Admit


0 1 0.92
1 1 0.76
2 1 0.72
3 1 0.80
4 0 0.65
Accuracy: 0.825
Confusion Matrix:
[[25 8]
[ 6 41]]
Classification Report:
precision recall f1-score support

0 0.81 0.76 0.78 33


1 0.84 0.87 0.85 47

accuracy 0.82 80
macro avg 0.82 0.81 0.82 80
weighted avg 0.82 0.82 0.82 80

RESULT:
Thus the program to perform Logistic regression finds out relation between variables that are
affecting the admission of a student in an institute based on his or her GRE score, GPA obtained and
rank of the student and also to check whether the model fits it or not is executed successfully and the
output is verified.

Exp No. 9 APPLY MULTIPLE REGRESSIONS, IF DATA HAVE A CONTINUOUS


INDEPENDENT VARIABLE. APPLY ON ABOVE DATASET.
Date:

AIM:
To apply multiple regressions on a dataset if data have a continuous independent variable.

ALGORITHM:
STEP1: Load or prepare the dataset for multiple regression.
STEP 2: Split the dataset into training and testing sets (e.g., using train_test_split function).
STEP 3: Choose a multiple regression algorithm (e.g., Multiple Linear Regression, Ridge
Regression).
STEP 4: Train the regression model on the training set with multiple independent variables.
STEP 5: Make predictions on the testing set.
STEP 6: Evaluate the performance of the multiple regression model using appropriate metrics (e.g.,
mean squared error, R-squared score).
Dependent variable (Y): A continuous variable that you want to predict or explain.
Independent variables (X1, X2, X3, ...): One or more continuous variables that you believe
may influence the dependent variable.
STEP 7: Once you have the dataset, you can follow these steps to apply multiple regression
Prepare the data, Explore the data, Build the regression model, Assess the model, Make predictions.

PROGRAM:

import numpy as np
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)
columns = ['Independent_Var1', 'Independent_Var2', 'Independent_Var3']
df = pd.DataFrame(data=X, columns=columns)
df['Dependent_Var'] = y
print(df.head())
df.to_csv('multiple_regression_dataset.csv', index=False)
data = pd.read_csv('multiple_regression_dataset.csv')
y = data['Dependent_Var']
X = data[['Independent_Var1', 'Independent_Var2', 'Independent_Var3']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
print('Coefficients:', model.coef_)
print('Intercept:', model.intercept_)

OUTPUT:

Independent_Var1 Independent_Var2 Independent_Var3 Dependent_Var


0 -0.792521 0.504987 -0.114736 12.781276
1 0.280992 -0.208122 -0.622700 -21.876410
2 0.791032 1.402794 -0.909387 91.098757
3 0.625667 -1.070892 -0.857158 -82.265575
4 -0.342715 -0.161286 -0.802277 -30.416557
Mean Absolute Error: 8.960994532577917
Mean Squared Error: 123.84680824798075
Root Mean Squared Error: 11.128648087165878
Coefficients: [28.49475136 74.39534965 18.78132401]
Intercept: 1.3037594527224998
RESULT:
Thus the program to apply multiple regressions on a dataset is performed successfully and the
output is verified.
Exp No: 10 APPLY REGRESSION MODEL TECHNIQUES TO PREDICT THE DATA
Date: ON THE ABOVE DATASET.

AIM:
To write a program to apply regression model techniques to predict the data on the above dataset.

ALGORITHM:
1. Load or prepare the dataset for regression.
2. Split the dataset into training and testing sets (e.g., using train_test_split function).
3. Choose a regression algorithm (e.g., Linear Regression, Random Forest Regression).
4. Train the regression model on the training set.
5. Make predictions on the testing set.
6. Evaluate the performance of the regression model using appropriate metrics (e.g., mean
squared error, R-squared score).

PROGRAM:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_iris
iris = load_iris()
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
data['target'] = iris.target
X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
regression_model = LogisticRegression()
regression_model.fit(X_train, y_train)
y_pred = regression_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = mse ** 0.5
print("Root Mean Squared Error:", rmse)

OUTPUT:
The code generates a logistic regression model on the given dataset, makes predictions on the
testing set, and evaluates the performance of the model using root mean squared error (RMSE)
Root Mean Squared Error: 0.4472135954999579

RESULT:
Thus the program of regression model depend on the dataset, problem domain, and your
requirements.
Exp No: 11 a) INSTALL RELEVANT PACKAGES FOR CLASSIFICATION
Date:

AIM:
To install relevant packages for classification

ALGORITHM:
1. Identify the relevant packages required for classification tasks.
2. Determine the preferred method of package installation.
3. Install the packages using the chosen method.

PROGRAM:
1. scikit-learn: Scikit-learn is a popular machine learning library that provides various
classification algorithms and tools. You can install it using pip, the package installer for
Python.
2. pip install scikit-learn
3. numpy: NumPy is a fundamental package for scientific computing in Python. It provides
support for efficient numerical operations and multidimensional arrays, which are commonly
used in machine learning.
4. pip install numpy
5. pandas: Pandas is a powerful data manipulation and analysis library. It provides data
structures like DataFrame, which is useful for preprocessing and organizing data for
classification tasks.
6. pip install pandas
7. matplotlib: Matplotlib is a plotting library that allows you to create various types of
visualizations. It can be useful for analyzing and visualizing the results of your classification
models.
8. pip install matplotlib
9. seaborn: Seaborn is a statistical data visualization library that is built on top of Matplotlib. It
provides additional functionality and aesthetically pleasing visualizations.
10. pip install seaborn
11. tensorflow or pytorch: If you're interested in deep learning-based classification models, you
can install either TensorFlow or PyTorch, which are popular deep learning frameworks.
For TensorFlow:
12. pip install tensorflow
For PyTorch:
13. pip install torch torchvision
OUTPUT:

The output will display the installation progress and confirmation of successful installations.
For example:
yaml
Copy code
Collecting package_name Downloading package_name-1.0.0.tar.gz (1.0 kB) Building
wheels for collected packages: package_name Building wheel for package_name
(setup.py) ... done Created wheel for package_name: filename.whl Stored in directory:
/path/to/cache Successfully built package_name Installing collected packages:
package_name Successfully installed package_name-1.0.0
RESULT:
Thus the commonly used packages for classification in Python. Depending on your specific
requirements, you may need additional packages or versions specific to your project.

Exp No: 11 b) CHOOSE A CLASSIFIER FOR A CLASSIFICATION PROBLEM


Date:

AIM:
To choose a classifier algorithm for classification problem which is the Random Forest
algorithm.

ALGORITHM:
1. Load or generate the dataset for classification.
2. Split the dataset into training and testing sets (e.g., using train_test_split function).
3. Select a classifier based on the problem requirements and characteristics of the dataset.
4. Train the classifier on the training set.
5. Make predictions on the testing set.
6. Evaluate the performance of the classifier using appropriate metrics (e.g., accuracy,
precision, recall, F1 score) and cross-validation.

PROGRAM:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

OUTPUT:

Accuracy: 1.0

RESULT :
Thus the accuracy of the classifier is calculated using the accuracy_score function and printed
successfully.

Exp No: 11 C) EVALUATE THE PERFORMANCE OF THE CLASSIFIER


Date:

AIM:
To use the evaluation metrics to evaluate the performance of the classifier.

ALGORITHM:
1. Obtain the true labels and predicted labels from the classifier.
2. Calculate and store the evaluation metrics: accuracy, precision, recall, F1 score, and
confusion matrix.

PROGRAM:

from sklearn.ensemble import RandomForestClassifier


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')
confusion = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:")
print(confusion)

OUTPUT:
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1 Score: 1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
RESULT:

Thus the performance of the classifier is evaluated successfully using the evaluation metrics.
Exp No: 12.a CLUSTERING ALGORITHM FOR UNSUPERVISED CLASSIFICATION.
Date:

AIM:
To perform unsupervised classification using clustering algorithm.

ALGORITHM:
1. Choose the number of clusters K.
2. Initialize K cluster centroids randomly.
3. Repeat until convergence:
4. Assign each data point to the nearest centroid.
5. Recalculate the centroid of each cluster based on the assigned data points.

PROGRAM:

from numpy import unique


from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import KMeans
from matplotlib import pyplot
X, _ = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0,
n_clusters_per_class=1, random_state=4)
model = KMeans(n_clusters=2)
model.fit(X)
yhat = model.predict(X)
clusters = unique(yhat)
for cluster in clusters:
row_ix = where(yhat == cluster)
pyplot.scatter(X[row_ix, 0], X[row_ix, 1])
pyplot.show()

OUTPUT:
RESULT:
Thus the code generates a scatter plot where each data point is colored according to its
assigned cluster label. Additionally, the centroids of the clusters are marked with red triangles.
Exp No: 12.B PLOT THE CLUSTER DATA USING PYTHON VISUALIZATIONS
Date:

AIM:
To write a program to plot the cluster data using python visualization using iris dataset

ALGORITHM:

1. Obtain the cluster labels and data points.


2. Create a scatter plot where each data point is assigned a different color based on its
cluster label.
3. Customize the plot by adding labels, titles, and other visual elements.
4. Display the plot.

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
iris = load_iris()
data = iris.data
feature_names = iris.feature_names
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(data_standardized)
df = pd.DataFrame(data_standardized, columns=feature_names)
df['Cluster'] = clusters
sns.pairplot(df, hue='Cluster', palette='viridis')
plt.suptitle('K-Means Clustering on Iris Dataset', y=1.02)
plt.show()

OUTPUT:
RESULT:
The code generates a scatter plot where each data point is colored according to its
assigned cluster label. The color mapping is displayed on a colorbar to represent the cluster
labels.

You might also like