0% found this document useful (0 votes)
11 views2 pages

New Paper

Uploaded by

221980068
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views2 pages

New Paper

Uploaded by

221980068
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

# Load necessary libraries

library(ggplot2)
library(dplyr)

# Load the iris dataset


data(iris)

# Calculate z-scores for Sepal.Length and Sepal.Width


iris_z <- iris %>%
mutate(
z_Sepal.Length = (Sepal.Length - mean(Sepal.Length)) / sd(Sepal.Length),
z_Sepal.Width = (Sepal.Width - mean(Sepal.Width)) / sd(Sepal.Width)
)

# Define a threshold for outliers


threshold <- 2 # Common choice, can adjust based on needs

# Identify outliers
iris_z <- iris_z %>%
mutate(
is_outlier = abs(z_Sepal.Length) > threshold | abs(z_Sepal.Width) > threshold
)

# Create a combined plot


ggplot(iris_z, aes(x = Sepal.Length, y = Sepal.Width, color = is_outlier)) +
geom_point(size = 3) +
scale_color_manual(values = c("black", "red"), labels = c("Not Outlier",
"Outlier")) +
labs(
title = "Sepal Length vs Sepal Width with Outliers Highlighted",
x = "Sepal Length",
y = "Sepal Width",
color = "Outlier Status"
) +
theme_minimal()

Explanation of the Code:


1. Load Libraries: ggplot2 is for plotting, and dplyr is for data manipulation.
2. Calculate Z-Scores: Compute z-scores for Sepal Length and Sepal Width using
the formula (x−mean)/sd(x - \text{mean}) / \text{sd}(x−mean)/sd.
3. Define Outliers: Use a z-score threshold (e.g., 2) to determine outliers. If
the z-score exceeds the threshold in either feature, it's considered an outlier.
4. Plot Data: Use ggplot2 to create a scatter plot where points are colored
based on whether they are outliers.
The plot will show Sepal Length vs. Sepal Width, with outliers highlighted in red
and non-outliers in black. You can adjust the threshold if needed to be more or
less strict about what constitutes an outlier.

You might also like