ggplot(iris_z, aes(x = Sepal.Length, y = Sepal.Width, color = is_outlier)) + geom_point(size = 3) + scale_color_manual(values = c("black", "red"), labels = c("Not Outlier", "Outlier")) + labs( title = "Sepal Length vs Sepal Width with Outliers Highlighted", x = "Sepal Length", y = "Sepal Width", color = "Outlier Status" ) + theme_minimal()
Explanation of the Code:
1. Load Libraries: ggplot2 is for plotting, and dplyr is for data manipulation. 2. Calculate Z-Scores: Compute z-scores for Sepal Length and Sepal Width using the formula (x−mean)/sd(x - \text{mean}) / \text{sd}(x−mean)/sd. 3. Define Outliers: Use a z-score threshold (e.g., 2) to determine outliers. If the z-score exceeds the threshold in either feature, it's considered an outlier. 4. Plot Data: Use ggplot2 to create a scatter plot where points are colored based on whether they are outliers. The plot will show Sepal Length vs. Sepal Width, with outliers highlighted in red and non-outliers in black. You can adjust the threshold if needed to be more or less strict about what constitutes an outlier.