0% found this document useful (0 votes)
11 views7 pages

Exp-6 SDMA

The document outlines basic statistical concepts and visualization techniques using RStudio, including mean, median, range, variance, box plots, scatter plots, and histograms. It provides syntax and parameters for creating these visualizations in R, along with example code for analyzing the 'mtcars' dataset. Additionally, it includes a viva-voce section addressing common questions about histograms, scatter plots, and bar charts.

Uploaded by

shivroy282
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Exp-6 SDMA

The document outlines basic statistical concepts and visualization techniques using RStudio, including mean, median, range, variance, box plots, scatter plots, and histograms. It provides syntax and parameters for creating these visualizations in R, along with example code for analyzing the 'mtcars' dataset. Additionally, it includes a viva-voce section addressing common questions about histograms, scatter plots, and bar charts.

Uploaded by

shivroy282
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Experiment-6

Alm: Tofind basic statistics and visualizationof a given data set in k


Software Used: RStudio
Theory:
Mean

The arithmetic mean of a variable, often referred to as the average, is calculated by


summing up allthe values and then dividing the total by the count of values.
Population Mean (u):=
Sample Mean (2): X= )* n

Median

The median of a variable is determined by identifying the middle value within a dataset
when the data are arranged in ascending order. It effectively divides the data into two
equal halves, with 50% of the data points falling below the median and the remaining
50% above it.

Range

The range of avariable is determined by subtracting the smallest value from the largest
value within a quantitative dataset, making it the most basic measure that relies solely on
these two extreme values.

Variance

Variance involves the computation of the squared differences between each value and the
arithmetic mean. This approach accommodates both positive and negative deviations.
The sample variance (s) serves as an unbiased estimator of the population variance (o).
with (n-1) degrees of freedom.

Box Plot

Abox graph is achart that is used to display information in the form of distribution by
drawing boxplots for each of them. This distribution of data is based on five sets
(minimum, first quartile, median, third quartile, and maximum).
Boxplots in RProgramming Language
Boxplots arecreated in Rbyusing the boxplot() function.
Syntax: boxplot(x, data, notch, varwidth, names, main)
Parameters:
X: Thisparameter sets as a vector or a formula.
data: This parameter sets the data frame.
notch: This parameter is the label for horizontal axis. width of the
Varwidth: This parameter is alogical value. Set as true to draw
box proportionate to the sample size.
main: This parameter is the title of the chart.
will be showed under each
names: This parameter are the aroup labels that
boxplot.

Scatter Plot

on the
Ascatter plot is a set of dotted points representing individual data pieces plotted
horizontal and vertical axis. In a graph in which the values of two variables are
correlation
along the X-axis and Y-axis, the pattern of the resulting points reveals a
between them.

R- Scatter plots

We can create a scatter plot in RProgramming Language using the plot) function.
Syntax: plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Parameters:
X: This parameter sets the horizontalcoordinates.
y: This parameter sets the verticalcoordinates.
xlab: This parameter is the label for horizontal axis.
ylab: This parameter is the label for verticalaxis.
main: This parameter main is the title of the chart.
xlim: This parameter is used for plotting values of x.
ylim: This parameter is used for plotting values ofy.
axes: This parameter indicates whether both axes should be drawn on the
plot.

Histogram
Ahistogram contains arectangular area to display the statistical information which is
proportional to the frequency of a variable and its width in successive numerical
intervals. Agraphical representation that manages a group of data points into different
specified ranges. It has a special feature that shows no gaps between the bars and is
similar to a vertical bar graph.

R- Histograms
We can create histograms in R Programming Language using the hist(0)
function.
Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
Parameters:
v: This parameter contains numerical values used in histogram.
main: This parameter main is the title of the chart.
col: This parameter isused toset color of the bars.
xlab: This parameter is the label for horizontal axis.
border: This parameter is used to set border color of each bar.
xlim: This parameter is used for plotting values of x-axis.
ylim: This parameter is used for plotting values ofy-axis.
breaks: This parameter is used as width of each bar.

Code

# Load the dataset

data <- mtcars[, c('mpg', 'cyl')]

#Display the first few rowsof the dataset


print("First Few Rows of the Dataset:")

head(data)

# Summary of the dataset

print("Summary Statistics of the Dataset:")

summary(data)

# Structure of the dataset

print("Structure of the Dataset:")


str(data)
# Choose a numeric column

column_data =data$mpg
# Basic statistics

mean_value <- mean(column_data, na.rm =TRUE)

median_value <- median(column_data, na.rm =TRUE)

variance <- var(column_data, na.rm =TRUE)


std_dev <- sd(column_data, na.rm =TRUE)
min_value <- min(column_data, na.rm =TRUE)

max_value<-max(column_data, na.rm =TRUE)


quantiles <- quantile(column_data, na.rm = TRUE)

# Print statistics

cat("Mean:",mean_value, "\n")
cat("Median:", median_value, "\n")
cat("Variance:", variance, "\n")
cat("Standard Deviation:", std_dev, "\n")

cat("Minimum:", min_value, "\n")

cat("Maximum:", max_value, "\n")


cat("Quantiles:\n")
# Histogram

hist(column_data,
breaks = 10,

col = "lightblue",

main = "Histogram",
xlab = column_name)

# Boxplot
boxplot(column_data,
main = "Boxplot",
col ="orange",
horizontal =TRUE)

# Scatterplot (if the dataset has two numeric columns)


plot(dataSmpg, datascyl,
main ="Scatterplot",
xlab ="mpg",

ylab ="cyl",
col = "blue",

pch = 19)

print(quantiles)

Output

3 Histogram

10 15 20 30

mpg
Boxplot

T T

5 20 25 20

Scatterplot

y 6

10 15 20 25 30

mpg

Viva- Voce

Q1. What isa histogram, and how is it different froma bar chart?
Ahistogram is agraphical representation of the
distribution of a continuous variable. It
groups the data into bins (intervals) and shows the frequency of data
points in each bin.
Abar chart,on the other hand, represents
categorical data and displays frequencies or
values for distinct categories.
Key Difference: Histograms use bins for continuous data, while
bar charts use distinct
categories with gaps between bars.
Q2. What can you infer from the pattern of points in a
scatter plot?
Positive Correlation Points slope upward, indicating that as one variable
increases, the other also increases.

Negative Correlation: Points slope downward, indicating that as one variable


increases, the other decreases.

" No Correlation: Points are scattered randomly, showing no relationship.


" Clustersor Outliers: Specificgroupings or isolated points may indicate data
subgroups or anomalies.
Q3. What is a bar chart, and what type of data does it represent?
Abar chart represents categorical data, where each bar corresponds to a category, and
the bar's height represents the frequency or value for that category.

You might also like