0% found this document useful (0 votes)
16 views33 pages

Aditya SSDMA-1

The document outlines exercises for implementing basic matrix operations, eigenvalues, and solving equations using various methods in Scilab, including Gauss elimination, Gauss-Jordan, and Gauss-Seidel methods. It also covers properties of matrices such as associative, commutative, and distributive properties, along with code examples for practical implementation. Additionally, it discusses the criteria for reduced row echelon form (RREF) of a matrix.

Uploaded by

adityadattatre43
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views33 pages

Aditya SSDMA-1

The document outlines exercises for implementing basic matrix operations, eigenvalues, and solving equations using various methods in Scilab, including Gauss elimination, Gauss-Jordan, and Gauss-Seidel methods. It also covers properties of matrices such as associative, commutative, and distributive properties, along with code examples for practical implementation. Additionally, it discusses the criteria for reduced row echelon form (RREF) of a matrix.

Uploaded by

adityadattatre43
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

DR.

AKHILESH DAS GUPTA INSTITUTE


OF PROFESSIONAL STUDIES

STATISTICS, STATISTICAL MODELLING &


DATA ANALYTICS LAB (DA-304P)

DEPARTMENT OF COMPUTER SCIENCE &


ENGINEERING

Submitted To: Submitted By:


Mr. Lokesh Meena Aditya Sharma
Associate Professor 07015602722
Department of CSE CSE T-2
Program-1

Aim: Exercises to implement the basic matrix operations in Scilab.

Theory:

Addition of Matrices
If A[aij]mxn and B[bij]mxn are two matrices of the same order, then their sum A + B isa matrix, and
each element of that matrix is the sum of the corresponding elements,
i.e. A + B = [aij + bij]mxn.
Consider the two matrices, A and B, of order 2 x 2. Then, the sum is given by:

Subtraction of Matrices
If A and B are two matrices of the same order, then we define

Consider the two matrices, A and B, of order 2 x 2. Then, the difference is given by:

Scalar Multiplication of Matrices


If A = [aij]m×n is a matrix and k any number, then the matrix which is obtained by multiplying the elements
of A by k is called the scalar multiplication of A by k, and it is denoted by k A, thus if A
= [aij]m×n,
Then

Multiplication of Matrices
If A and B be any two matrices, then their product AB will be defined only when the number of columns in A
is equal to the number of rows in B.
If

will be a matrix of order m×p where


Code:

1. Defining Matrices:

// 2x2 matrix
A = [1, 2; 3, 4];

// 3x1 column vector B


= [5; 6; 7];

// 2x2 matrix
C = [8, 9; 10, 11];

2. Accessing Elements:

// Second row, first column of A A(2,


1) // Output: 3

// First element of b
B(1) // Output: 5

Output:

3. Arithmetic Operations:

// Addition
D=A+C
disp(D)
// Subtraction E
=A-C
disp(E)
// Element-wise multiplication (use . for matrix multiplication)
F = A .* C
disp(F)
// Matrix multiplication
G=A*C
disp(G)
// Scalar multiplication H
=2*A
disp(H)

Output:

4. Other Useful Commands:

// Transpose

A_transposed = A'
disp(A’)
// Inverse (if exists)
A_inverse = inv(A)
disp(inv(A))
// Determinant
det(A)
disp(det(A))
// Size size(A)
disp(size(A))
Output:
Program-2

Aim: Exercises to find the Eigenvalues and eigenvectors in Scilab.

Theory:

Eigen Values and Eigen Vector:

Consider a square matrix n × n. If X is the non-trivial column vector solution of the matrix
equation AX = λX, where λ is a scalar, then X is the eigenvector of matrix A, and the
corresponding value of λ is the eigenvalue of matrix A.

Suppose the matrix equation is written as A X – λ X = 0. Let I be the n × n identity matrix.

If I X is substituted by X in the equation above, we obtain A X

– λ I X = 0.

The equation is rewritten as (A – λ I) X = 0.

The equation above consists of non-trivial solutions if and only if the determinant value of the
matrix is 0. The characteristic equation of A is Det (A – λ I) = 0. ‘A’ being an n × n matrix, if (A
– λ I) is expanded, (A – λ I) will be the characteristic polynomialof A because its degree is n.

Code:

// Define your matrix A


= [1 2; 3 4];

// Find eigenvalues and eigenvectors [eigenvalues,


eigenvectors] = spec(A);

// Print eigenvalues
disp('Eigenvalues:');
disp(eigenvalues);

// Print eigenvectors
disp('Eigenvectors:');
disp(eigenvectors);
Output:
Program-3

Aim: Exercises to solve equations by Gauss elimination, Gauss Jordan Method and Gauss
Siedel in Scilab.

Theory:

Gauss Elimination Method

The Gaussian elimination method is known as the row reduction algorithm for solving linear
equations systems. It consists of a sequence of operations performed on the corresponding
matrix of coefficients. We can also use this method to estimate either of the following:

● The rank of the given matrix


● The determinant of a square matrix
● The inverse of an invertible matrix

To perform row reduction on a matrix, we have to complete a sequence of elementary row


operations to transform the matrix till we get 0s (i.e., zeros) on the lower left-hand corner of the
matrix as much as possible. That means the obtained matrix should be an upper triangular
matrix. There are three types of elementary row operations; they are:
● Swapping two rows and this can be expressed using the notation ↔, for
example, R2 ↔ R3
● Multiplying a row by a nonzero number, for example, R1 → kR2 where k is some
nonzero number
● Adding a multiple of one row to another row, for example, R2 → R2 + 3R1

Gauss Jordan Elimination Method

The Gauss-Jordan method is a method for solving systems of linear equations. It is similar to
the Gaussian elimination process, but the entries above and below each pivot are zeroed out.
The result of the Gauss-Jordan method is in reduced row echelon form.

The Gauss-Jordan method uses three elementary row operations on a matrix:


● Swap the positions of two of the rows
● Multiply one of the rows by a nonzero scalar
● Add or subtract the scalar multiple of one row to another row

The Gauss-Jordan method can also be used to find the inverse of any invertible matrix.
The steps for the Gauss-Jordan method are:
● Write the augmented matrix
● Interchange rows if necessary to obtain a non-zero number in the first row, first
column
● Use a row operation to get a 1 as the entry in the first row and first column
● Use row operations to make all other entries as zeros in column one

Gauss-Seidel method

The Gauss-Seidel method is an iterative method for solving a system of linear equations. It's
named after the German mathematicians Carl Friedrich Gauss and Philipp Ludwig von Seidel.
The method is also known as the Liebmann method or the method of successive displacement.

The Gauss-Seidel method works by:


● Decomposing the matrix A into a lower triangular component and a strictly upper
triangular component U
● Solving the left hand side of the equation
● Using previous values of x

The Gauss-Seidel method is an improvement on the Jacobi method. In the Jacobi method, the
value of the variables is not modified until the next iteration. In the Gauss-Seidel method, the
value of the variables is modified as soon as a new value is evaluated.

The Gauss-Seidel method has several advantages, including:


● Simple calculations
● Less storage needed in computer memory
● Applicable for smaller systems
● Advantageous for large systems of equations because it is less prone to round-off
errors

Code:

1. Gauss Elimination:

// Function to perform forward elimination (Gauss Elimination) function [A,


b] = forward_elimination(A, b)
n = size(A, 1); for
i = 1:n-1
max_pivot_row = i; // Initialize max_pivot_row for
j = i+1:n
if abs(A(j, i)) > abs(A(max_pivot_row, i)) // Find row with largest pivot in column
max_pivot_row = j;
end
end
// Swap rows if necessary for pivoting if
max_pivot_row ~= i
temp = A(i, :);
A(i, :) = A(max_pivot_row, :);
A(max_pivot_row, :) = temp;
temp = b(i);
b(i) = b(max_pivot_row);
b(max_pivot_row) = temp;
end

pivot = A(i, i);


if abs(pivot) < eps * 100 // Check for pivot element close to zero
error('Pivot element close to zero. Consider reordering equations or using a different
method.');
end
factor = A(j, i) / pivot;
A(j, :) = A(j, :) - factor * A(i, :);
b(j) = b(j) - factor * b(i);
end end

// Function to perform back substitution


function x = back_substitution(U, b)
n = size(U, 1);
x = zeros(n, 1);
for i = n:-1:1 sum
= 0;
for j = i+1:n
sum = sum + U(i, j) * x(j);
end
x(i) = (b(i) - sum) / U(i, i);
end
end

// Function to solve using Gauss Elimination function


x = gauss_elimination(A, b)
[U, b] = forward_elimination(A, b); x
= back_substitution(U, b);
end

// Example usage
A = [1 2 3; 4 5 6; 7 8 9];
b = [1; 2; 3];

x = gauss_elimination(A, b);

disp('Solution using Gauss Elimination:'); disp(x);

Output:

2. Gauss Jordan Method:

// Function to perform forward elimination (Gauss Elimination)


function [A, b] = forward_elimination(A, b)
n = size(A, 1);
for i = 1:n-1
max_pivot_row = i; // Initialize max_pivot_row
for j = i+1:n
if abs(A(j, i)) > abs(A(max_pivot_row, i)) // Find row with largest pivot in column
max_pivot_row = j; end
end
// Swap rows if necessary for pivoting if
max_pivot_row ~= i
temp = A(i, :);
A(i, :) = A(max_pivot_row, :);
A(max_pivot_row, :) = temp;
temp = b(i);
b(i) = b(max_pivot_row);
b(max_pivot_row) = temp;
end

pivot = A(i, i);


if abs(pivot) < eps * 100 // Check for pivot element close to zero error('Pivot
element close to zero. Consider reordering equations or using
a different method.'); end
for j = i+1:n
factor = A(j, i) / pivot;
A(j, :) = A(j, :) - factor * A(i, :);
b(j) = b(j) - factor * b(i);
end
end end

// Function to perform back substitution


function x = back_substitution(U, b)
n = size(U, 1);
x = zeros(n, 1);
for i = n:-1:1 sum
= 0;
for j = i+1:n
sum = sum + U(i, j) * x(j);
end
x(i) = (b(i) - sum) / U(i, i);
end
end

// Function to solve using Gauss Jordan elimination function x


= gauss_jordan(A, b)
[U, b] = forward_elimination(A, b); n
= size(U, 1);
// Perform back substitution for each row, transforming U into identity matrix for i
= n:-1:1
for j = i-1:-1:1
factor = U(j, i) / U(i, i);
U(j, :) = U(j, :) - factor * U(i, :);
end
end
// After transformation, diagonal elements of U are 1, so directly copy b to x x = b;
end

// Example usage
A = [1 2 3; 4 5 6; 7 8 9];
b = [1; 2; 3];

x = gauss_jordan(A, b);

disp('Solution using Gauss Jordan:'); disp(x);

Output:

3. Gauss Siedel Method:

function x = gauss_seidel(A, b, x0, tol, max_iter)


// A: coefficient matrix
//b: right-hand side vector
//x0: initial guess vector
//tol: tolerance for convergence
//max_iter: maximum number of iterations

n = length(b);
x = x0;
iter = 0;
while iter < max_iter
x_old = x;
for i = 1:n
sum1 = 0;
sum2 = 0; for
j = 1:i-1
sum1 = sum1 + A(i,j) * x(j); end
for j = i+1:n
sum2 = sum2 + A(i,j) * x_old(j);
end
x(i) = (b(i) - sum1 - sum2) / A(i,i);
end
// Check for convergence if
norm(x - x_old) < tol
break;
end
iter = iter + 1;
end
if iter >= max_iter
disp('Maximum iterations reached without convergence'); else
disp(['Converged in ', string(iter), ' iterations']);
end
end

A = [4, -1, 0; -1, 4, -1; 0, -1, 3];


b = [5; -7; 6];
x0 = [0; 0; 0];
tol = 1e-6;
max_iter = 1000;

x = gauss_seidel(A, b, x0, tol, max_iter);


disp('Solution:');
disp(x);

Output:
Program-4

Aim: Exercises to implement the associative, commutative and distributive


properties in a matrix in Scilab.

Theory:

● Commutative property of addition: A + B = B + A. This means that you can add


two matrices in any order and get the same result.

● Associative property of addition: (A + B) + C = A + (B + C). This means that you


can change the grouping in matrix addition and get the same result.

● Distributive property of multiplication: A(BC) = (AB)C.

● Commutative property of multiplication: p x q = q x p.

● Associative property of multiplication: p x (q x r) = (p x q) x r.

● Distributive property of division and subtraction: p x (a ± b) = p x a ± p x b.

● Inverse property of division and multiplication: p x (1/p) = 1, provided p ≠ 0.

Code:

// Define matrices
A = [1 2; 3 4];
B = [5 6; 7 8];
C = [9 10; 11 12];

// Associativity of matrix addition (A + (B + C) = (A + B) + C) sum1 =


A + (B + C);
sum2 = (A + B) + C;

if (sum1 == sum2)
disp('Associativity of matrix addition holds.'); else
disp('Associativity of matrix addition may not hold.'); end

// Commutativity of matrix addition (A + B = B + A) sum3 =


A + B;
sum4 = B + A;

if (sum3 == sum4)
disp('Commutativity of matrix addition holds.'); else
disp('Commutativity of matrix addition may not hold.'); end

// Distributivity (A * (B + C) = A * B + A * C)
product1 = A * (B + C);
product2 = A * B + A * C;

if (product1 == product2)
disp('Distributivity of matrix holds true.');
else
disp('Distributivity may not hold.');
end

Output:
Program-5

Aim: Exercises to find the reduced row echelon form of a matrix in Scilab.

Theory:

A matrix is in reduced row echelon form (RREF) if it meets the following criteria:
● The matrix is a zero matrix, or
● All of its pivots are 1, and all entries above its pivots are 0
● In each row, the left-most nonzero entry is 1, and the column that contains this 1 has
all other entries equal to 0
● The 1 is called a leading 1

The reduced row echelon form of a matrix is unique and does not depend on the sequence of
elementary row operations used to obtain it.

The reduced row echelon form of a matrix is used to solve the system of linear equations.

Code:
// Function to perform Gaussian elimination
function [U, b] = gaussian_elimination(A, b) n =
size(A, 1);
for i = 1:n-1
for j = i+1:n
pivot = A(i, i);
if abs(pivot) < eps // Check for pivot element close to zero (avoid division by
zero)
error('Pivot element close to zero. Consider reordering equations or using a
different method.');
end
factor = A(j, i) / pivot;
A(j, :) = A(j, :) - factor * A(i, :);
b(j) = b(j) - factor * b(i);
end
end
U = A; // U now represents the upper triangular matrix end

// Example usage
A = [1 2 3; 4 5 6; 7 8 9];
b = [1; 2; 3];
[U, b] = gaussian_elimination(A, b);

disp('Reduced row echelon form:'); disp(U);

Output:
Program-6

Aim: Exercises to plot the functions and to find its first and second derivatives in Scilab.

Theory:

Function
A function is a relationship between a set of inputs and their outputs. A function can be
represented as an equation, a set of ordered pairs, as a table, or as a graph in the coordinate
plane.

Derivative
In mathematics, a derivative is the rate of change of a function with respect to an independent
variable. It is used to measure the sensitivity of one variable with respect to another.

First-order derivatives
These derivatives tell about the direction of the function and can be interpreted as an
instantaneous rate of change. For example, the first derivative of a distance versus time graph
gives you velocity.

Second-order derivatives
These derivatives are used to get an idea of the shape of the graph for the given function. For
example, the second derivative gives you the acceleration.

Graphically
The first derivative represents the slope of the function at a point, and the second derivative
describes how the slope changes over the independent variable in the graph.

Image processing
When you apply sobel convolution matrix to a given image, you get the first derivative of the
input image. When you apply the laplacian matrix to the initial image, you get the second
derivative.

Code:
// Replace 'f(x)' with your actual function definition function
y = f(x)
y = x^2 + sin(x); // Example function end
// Define x-axis range
x = linspace(1, 250, 15);

// Calculate function values at each x point y =


f(x);

// Plot the function


plot(x, y);
xlabel('x');
ylabel('f(x)');
title('Plot of f(x)');
// Define the first derivative function
function dy = df(x)
dy = 2*x + cos(x); // Example derivative of f(x) end

// Calculate first derivative values at each x point dy =


df(x);

// Plot the first derivative (optional)


// plot(x, dy); // Un-comment to plot the first derivative

// Print the first derivative at a specific point (optional)


x_point = 2; // Example point
first_derivative = df(x_point);
disp('First derivative at x=2:')
disp(first_derivative);
// Define the second derivative function
function d2y = d2f(x)
d2y = 2 - sin(x); // Example second derivative of f(x) end

// Calculate second derivative values at each x point d2y =


d2f(x);

// Plot the second derivative (optional)


// plot(x, d2y); // Un-comment to plot the second derivative

// Print the second derivative at a specific point (optional)


second_derivative = d2f(x_point);
disp('Second derivative at x=2:')
disp(second_derivative);
Output:
Program – 7

Aim: Exercises to present the data as a frequency table in SPSS.

Theory:
Frequency tables in SPSS provide a structured way to summarize categorical data, aiding in
data interpretation and analysis. The algorithm to create a frequency table involves several
steps within the SPSS software:

Al:
Step 1: Open Dataset
Open the dataset containing the variable of interest.
Step 2: Access Frequency Analysis
Click on the "Analyze" tab in SPSS. Step 3:
Navigate to Descriptive Statistics
From the Analyze menu, select "Descriptive Statistics." Step 4:
Select Frequencies
In the submenu, choose "Frequencies." Step
5: Choose Variable
In the Frequencies dialog box, locate the variable for which you want to create a
frequency table.
Step 6: Add Variable
Drag the variable from the left panel (Variables) to the right panel (Variable(s)) in
the Frequencies dialog box.
Step 7: Optional: Additional Statistics
If desired, click on the "Statistics" button to include additional descriptive statistics
such as mean, median, etc.
Step 8: Generate Frequency Table
Click on the "OK" button to generate the frequency table.

Output interpretation:
Validity Information:
The generated output provides information about the number of valid and missing
values for the selected variable.
Frequency Table:
The frequency table displays each unique value of the variable, along with its
frequency count and percentage.
The "Frequency" column indicates how many times each unique value occurs in the
dataset.
The "Percent" column shows the percentage of each value relative to the total
number of valid responses.
The sum of the percentages equals 100%.

Data used:
The frequency table output provides a clear summary of the distribution of values within
the variable, facilitating easy interpretation and analysis.
a) The team name Mavs occurs 4 times, which represents 4/11 = 36.4% of all values
in the Team column.
b) The team name Rockets occurs 3 times, which represents 3/11 = 27.3% of all values
in the Team column.
c) The team name Spurs occurs 2 times, which represents 3/11 = 18.2% of all values
in the Team column.

d) The team name Warriors occurs 2 times, which represents 3/11 = 18.2% of all values
in the Team column.
e) Note that the values in the Percent column add up to 100%.
Program – 8
Aim: Exercises to find the outliers in a dataset in SPSS.

Theory:
Identifying outliers in a dataset is crucial for data analysis in SPSS. The algorithm
involves the following steps:

Algorithm:
Step 1: Open Dataset
Open the dataset containing the variable of interest, such as annual income
for individuals.
Step 2: Access Descriptive Statistics
Click on the "Analyze" tab in SPSS.
Step 3: Navigate to Descriptive Statistics
From the Analyze menu, select "Descriptive Statistics." Step
4: Select Explore
In the submenu, choose "Explore." Step
5: Choose Variable
Drag the variable (e.g., income) into the box labeled "Dependent List." Step 6:
Configure Statistics
Click on the "Statistics" button and ensure that the box next to
"Percentiles" is checked.
Click "Continue." Step
7: Generate Box Plot
Click "OK" to generate the box plot.
Step 8: Interpret Box Plot
Examine the box plot to identify any circles or asterisks on either end of the box
plot.
Circles indicate potential outliers, while asterisks indicate extreme outliers.
Step 9: Calculate Interquartile Range (IQR)
Locate the interquartile range (IQR) from the output, typically labeled as "Tukey's
Hinges."
Step 10: Define Outlier Ranges
Calculate outlier ranges using the formula:
Upper bound: 3rd quartile + 1.5 * IQR Lower
bound: 1st quartile - 1.5 * IQR
Step 11: Determine Outliers
Identify any values outside the defined outlier ranges.
Any data points beyond these ranges are considered outliers.

Output interpretation:
1. Box Plot: Examining the box plot visually identifies outliers.
2. Tukey's Hinges: Locate the interquartile range (IQR) from the output.
3. Outlier Ranges: Calculate upper and lower bounds based on the IQR.
4. Identification: Values outside the defined outlier ranges are considered outliers or
extreme outliers.

Handling outliers:
1. Verify Data Entry: Ensure outliers are not the result of data entry errors.
2. Remove Outliers: Consider removing outliers if they significantly impact
analysis.

3. Assign New Values: Replace outlier values with appropriate replacements (e.g., mean,
median) if they are data entry errors.

Data used:
The output includes box plots indicating potential outliers and extreme outliers.
Interquartile range and outlier ranges are calculated to identify and handle outliers
appropriately.

INCOME 18 24 36 34 38 45 48 54 60 73 79 85 94 98 108
Program-9
Aim: Exercises to find the most risky project out of two mutually exclusive projects in SPSS.

Theory:

Steps:-

1. Open the data in SPSS

● Launch SPSS and open the data file containing the "Project," "Cost Estimate," and
"Time Estimate" variables.

2. Analyse Cost Estimates

● Go to Analyze > Descriptive Statistics > Frequencies.


● Select the "Cost Estimate" variable.
● Click "Statistics" and check "Mean," "Median," and "Standard Deviation."
● Click "OK" to run the analysis.

Examine the output:

● Compare the mean, median, and standard deviation of cost estimates for Project
A and Project B.
● A higher standard deviation indicates greater cost variability and potential risk.

3. Analyse Time Estimates

● Repeat the same steps as in Step 2, but select the "Time Estimate" variable instead.

Examine the output:

● Compare the mean, median, and standard deviation of time estimates for both projects.
● A higher standard deviation suggests greater uncertainty in project completion time,
potentially increasing risk.

4. Calculate Cost-Risk

● Create a new variable named "Cost Risk" by multiplying the "Cost Estimate" by a
risk factor.
● Choose a risk factor based on your project context (e.g., 1.1 for moderate risk, 1.2 for
high risk).
● For example, if using a risk factor of 1.1:
Create a new computed variable named "Cost Risk" with the followingformula: Cost Risk =
Cost Estimate * 1.1

5. Calculate Schedule Risk

● Create a new variable named "Schedule Risk" by multiplying the "TimeEstimate" by the
chosen risk factor.
● For example, using the same risk factor of 1.1:
Create a new computed variable named "Schedule Risk" with the followingformula: Schedule
Risk = Time Estimate * 1.1

6. Compare Project Risks

● Calculate the total risk score for each project by summing the "Cost Risk" and
"Schedule Risk" values.
● Compare the total risk scores of Project A and Project B.
● The project with the higher total risk score is considered riskier.

Data:

Schedule_ Total_Risk_
Project Cost_Estimate Time_Estimate Cost_Risk Risk Score
Project A 100000 12 110000 13.2 110013.2

Project A 120000 14 132000 15.4 132015.4


Project A 110000 13 121000 14.3 121014.3
Project B 80000 10 88000 11 88011
Project B 90000 11 99000 12.1 99012.1
Project B 100000 12 110000 13.2 110013.2
Output:

Conclusion:
Project A is risky because of the higher total risk score. The standard deviation is closer to the cost
estimate and time estimate for Project A indicating a higher risk assessment.
Program-10

Aim: Write a program to draw a scatter diagram, residual plots, outliers leverage and influential data
points in R.

Theory:
Data Import: Import the dataset containing variables of interest into R. Ensure that the dataset is
properly formatted and contains the necessary variables for analysis.
Scatter Diagram: Create a scatter plot to visualize the relationship between two variables of interest. Use
the plot() function in R to generate the scatter plot.
# Example scatter plot between variables X and Y:
plot(X, Y, main = "Scatter Plot of X vs. Y", xlab = "X", ylab = "Y")
Linear Regression Model: Fit a linear regression model to the data using the lm() function in R. This will
be used to generate the predicted values and residuals.
# Fit linear regression model:
model <- lm(Y ~ X, data = dataset)
Residual Plots: Create residual plots to assess the goodness-of-fit of the linear regression model and
identify patterns in the residuals. Use the plot() function with the argument which = c(1, 2, 3) to generate
multiple plots.
# Residual plots:
par(mfrow = c(2, 2)) plot(model, which = c(1, 2, 3))
Identifying Outliers: Identify outliers by examining the residual plot for extreme values that do not
follow the general pattern. Outliers are points with large residuals compared to the majority of the data
points.
Leverage Points: Calculate leverage statistics to identify points with high leverage. Leverage points have
extreme predictor values that can significantly influence the regression model.
# Leverage statistics leverage <- hatvalues(model)
Influential Data Points: Identify influential data points using measures such as Cook's distance or
DFFITS. Influential points have a large impact on the regression coefficients or predictions. # Cook's
distance
cooks_distance <- cooks.distance(model)
Visualizing Outliers, Leverage, and Influential Points: Overlay the identified outliers, leverage points,
and influential points on the scatter plot for visualization.

R Script:
# Load Required Libraries
library(ggplot2)
# Generate Sample Data
set.seed(123)
x <- rnorm(100)
y <- 2*x + rnorm(100)
data <- data.frame(x, y) #
Scatter Diagram
ggplot(data, aes(x = x, y = y)) +
geom_point() +
labs(title = "Scatter Diagram", x = "X", y = "Y") #
Fit linear model
model <- lm(y ~ x, data = data) #
Residual Plot
plot(model, which = 1, main = "Residual Plot")# Identify Outliers
residuals_sd <- sd(model$residuals)
outliers <- which(abs(model$residuals) > 2 * residuals_sd)

# Leverage Points
leverage <- hatvalues(model)
leverage_points <- which(leverage > (2 * (ncol(data) + 1) / nrow(data)))

# Influence Index
cooksd <- cooks.distance(model)
influential_points <- which(cooksd > 4 / nrow(data)) # Adjust threshold as needed
Output:
Conclusion: Scatter diagram, residual plots, outliers leverage and influential data points in R has
been plotted and verified.

You might also like