Lab 8
Lab 8
1. The first 6 rows from diamonds data set and its structure are given below. Using this
data set do the following tasks with the ggplot2 package:
diamond
Output
To specifically analyze diamonds with prices over 6000, a subset function was
applied to filter the dataset based on the price condition. The resulting subset of data
was then used to create a histogram, focusing on the distribution of carat values for
higher-priced diamonds. This histogram provides insights into the carat weight
distribution within the selected subset of diamonds that have prices exceeding 6000.
d. Study the relationship between the diamond’s weight (carat) and its price (price).
Code
ggplot(diamonds,aes(x = carat,y = price))+geom_point()
Output
Using a bar plot, we visualized the distribution of diamond cuts based on their color.
Each cut category was represented by a separate bar, and within each bar, segments
were used to indicate different colors. This plot provided a clear visualization of the
combination of cut and color within the dataset, allowing for a better understanding of
how these attributes are distributed among the diamonds.
f. Study the relationship between the quality of the cut (cut) and the price (price).
Code
ggplot(diamonds,aes(x = cut,y=price))+geom_boxplot()
Output
To examine the distribution of prices across various diamond cuts, a box plot was
created. This plot displayed essential statistical measures such as the minimum,
maximum, median, and quartile values for each cut category. By visualizing these
measures, the box plot offered valuable insights into the variation of prices among
different diamond cuts.
2. Create a new vector with the following data: 1,2,3,4,NA,6,7,8,NA,NA. NA means ‘Not
Available’ / Missing Values. Use min, max, and mean functions to get the minimum,
maximum, and average, respectively for this vector. Try using the argument
na.rm=TRUE with these three functions and re-print the results.
Code
vec<- c(1,2,3,4,NA,6,7,8,NA,NA)
#without NA remove
cat("Minm",min(vec))
cat("Maxm",max(vec))
cat("mean",mean(vec))
#with NA removed
cat("Minm",min(vec,na.rm = TRUE))
cat("Maxm",max(vec,na.rm = TRUE))
cat("mean",mean(vec,na.rm = TRUE))
Output
Given a vector named vec, which includes numeric values as well as missing values represented
as NA, calculations were performed on this vector as follows:
a. Initially, without removing the NA values, the minimum, maximum, and mean values of the
vector were computed using the min, max, and mean functions, respectively. These calculations
considered all values in the vector, including the NA values.
b. Next, by excluding the NA values using the na.rm = TRUE argument, the same calculations
were repeated. This allowed for the determination of the minimum, maximum, and mean values
considering only the available numeric values in the vector, excluding the NA values.
Conclusion
In summary, this analysis encompassed the visualization of the diamonds dataset using different
plots, enabling us to gain valuable insights into the distribution of cuts, carat weights, prices, and
color combinations. Furthermore, calculations were conducted on a vector containing NA values,
with and without their removal, to compare statistical summaries. These exploratory tasks
facilitated a comprehensive examination and analysis of the dataset, revealing patterns,
relationships, and summary statistics pertaining to diamonds.