0% found this document useful (0 votes)

8 views37 pages

SSMDA

The document outlines basic statistical concepts and visualizations using R, including mean, median, variance, box plots, scatter plots, and histograms. It also covers classical probability, its properties, advantages, limitations, and real-world applications, along with R code examples for implementing these concepts. Additionally, it includes a viva-voce section with questions related to probability theory and its applications.

Uploaded by

ashish.raj.mlbx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views37 pages

SSMDA

Uploaded by

ashish.raj.mlbx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Experiment-6

Alm: Tofind basic statistics and visualizationof a given data set in k

Software Used: RStudio
Theory:
Mean

The arithmetic mean of a variable, often referred to as the average, is calculated by

summing up allthe values and then dividing the total by the count of values.
Population Mean (u):=
Sample Mean (2): X= )* n

Median

The median of a variable is determined by identifying the middle value within a dataset
when the data are arranged in ascending order. It effectively divides the data into two
equal halves, with 50% of the data points falling below the median and the remaining
50% above it.

Range

The range of avariable is determined by subtracting the smallest value from the largest
value within a quantitative dataset, making it the most basic measure that relies solely on
these two extreme values.

Variance

Variance involves the computation of the squared differences between each value and the
arithmetic mean. This approach accommodates both positive and negative deviations.
The sample variance (s) serves as an unbiased estimator of the population variance (o).
with (n-1) degrees of freedom.

Box Plot

Abox graph is achart that is used to display information in the form of distribution by
drawing boxplots for each of them. This distribution of data is based on five sets
(minimum, first quartile, median, third quartile, and maximum).
Boxplots in RProgramming Language
Boxplots arecreated in Rbyusing the boxplot() function.
Syntax: boxplot(x, data, notch, varwidth, names, main)
Parameters:
X: Thisparameter sets as a vector or a formula.
data: This parameter sets the data frame.
notch: This parameter is the label for horizontal axis. width of the
Varwidth: This parameter is alogical value. Set as true to draw
box proportionate to the sample size.
main: This parameter is the title of the chart.
will be showed under each
names: This parameter are the aroup labels that
boxplot.

Scatter Plot

on the
Ascatter plot is a set of dotted points representing individual data pieces plotted
horizontal and vertical axis. In a graph in which the values of two variables are
correlation
along the X-axis and Y-axis, the pattern of the resulting points reveals a
between them.

R- Scatter plots

We can create a scatter plot in RProgramming Language using the plot) function.
Syntax: plot(x, y, main, xlab, ylab, xlim, ylim, axes)
Parameters:
X: This parameter sets the horizontalcoordinates.
y: This parameter sets the verticalcoordinates.
xlab: This parameter is the label for horizontal axis.
ylab: This parameter is the label for verticalaxis.
main: This parameter main is the title of the chart.
xlim: This parameter is used for plotting values of x.
ylim: This parameter is used for plotting values ofy.
axes: This parameter indicates whether both axes should be drawn on the
plot.

Histogram
Ahistogram contains arectangular area to display the statistical information which is
proportional to the frequency of a variable and its width in successive numerical
intervals. Agraphical representation that manages a group of data points into different
specified ranges. It has a special feature that shows no gaps between the bars and is
similar to a vertical bar graph.

R- Histograms
We can create histograms in R Programming Language using the hist(0)
function.
Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
Parameters:
v: This parameter contains numerical values used in histogram.
main: This parameter main is the title of the chart.
col: This parameter isused toset color of the bars.
xlab: This parameter is the label for horizontal axis.
border: This parameter is used to set border color of each bar.
xlim: This parameter is used for plotting values of x-axis.
ylim: This parameter is used for plotting values ofy-axis.
breaks: This parameter is used as width of each bar.

Code

# Load the dataset

data <- mtcars[, c('mpg', 'cyl')]

#Display the first few rowsof the dataset

print("First Few Rows of the Dataset:")

head(data)

# Summary of the dataset

print("Summary Statistics of the Dataset:")

summary(data)

# Structure of the dataset

print("Structure of the Dataset:")

str(data)
# Choose a numeric column

column_data =data$mpg
# Basic statistics

mean_value <- mean(column_data, na.rm =TRUE)

median_value <- median(column_data, na.rm =TRUE)

variance <- var(column_data, na.rm =TRUE)

std_dev <- sd(column_data, na.rm =TRUE)
min_value <- min(column_data, na.rm =TRUE)

max_value<-max(column_data, na.rm =TRUE)

quantiles <- quantile(column_data, na.rm = TRUE)

# Print statistics

cat("Mean:",mean_value, "\n")
cat("Median:", median_value, "\n")
cat("Variance:", variance, "\n")
cat("Standard Deviation:", std_dev, "\n")

cat("Minimum:", min_value, "\n")

cat("Maximum:", max_value, "\n")

cat("Quantiles:\n")
# Histogram

hist(column_data,
breaks = 10,

col = "lightblue",

main = "Histogram",
xlab = column_name)

# Boxplot
boxplot(column_data,
main = "Boxplot",
col ="orange",
horizontal =TRUE)

# Scatterplot (if the dataset has two numeric columns)

plot(dataSmpg, datascyl,
main ="Scatterplot",
xlab ="mpg",

ylab ="cyl",
col = "blue",

pch = 19)

print(quantiles)

Output

3 Histogram

10 15 20 30

mpg
Boxplot

T T

5 20 25 20

Scatterplot

y 6

10 15 20 25 30

mpg

Viva- Voce

Q1. What isa histogram, and how is it different froma bar chart?
Ahistogram is agraphical representation of the
distribution of a continuous variable. It
groups the data into bins (intervals) and shows the frequency of data
points in each bin.
Abar chart,on the other hand, represents
categorical data and displays frequencies or
values for distinct categories.
Key Difference: Histograms use bins for continuous data, while
bar charts use distinct
categories with gaps between bars.
Q2. What can you infer from the pattern of points in a
scatter plot?
Positive Correlation Points slope upward, indicating that as one variable
increases, the other also increases.

Negative Correlation: Points slope downward, indicating that as one variable

increases, the other decreases.

" No Correlation: Points are scattered randomly, showing no relationship.

" Clustersor Outliers: Specificgroupings or isolated points may indicate data
subgroups or anomalies.
Q3. What is a bar chart, and what type of data does it represent?
Abar chart represents categorical data, where each bar corresponds to a category, and
the bar's height represents the frequency or value for that category.
Experiment-7
Aim: To implement concepts of probability and distributions in R.

Software Used: R

Theory:

Classical Probability
Classical probability, often referred to as "a priori" probability, is a branch of
theory that deals with situations where all possible probability
outcomes are equally likely. It
provides a foundational understanding of how probability works and forms the
for more advanced probability concepts. basis
Mathematical Foundations
Sample Space: The sample space represents the set of all
outcomes in a given experiment. It serves as the foundation for possible
calculating
probabilities. For instance, when rolling a fair six-sided die, the sample
is {1, 2, 3, 4, 5, 6}. space
Events: An event is a subset of the sample space,
representing a specific
outcome or set of outcomes. Events can range from simple, such as rolling an
even number,to complex, like drawing a red card from a deck.
Probability Distribution: A probability distribution assigns probabilities
to each event in the sample space. For classical
probability, all outcomes are
equally likely, so each event has the same probability.
Calculating Classical Probability
Classical probability is based on the principle of equally likely outcomes. Consider an
experiment with a finite sample space S, consisting of n equally likely outcomes. Let A
be an event of interest within S.
The classical probability of event A, denoted as P(A), is calculated as:
Number of favourable outcomes for event A
P(A) =
Total number of equally likely outcomes in S

Mathematically, this can be expressed as:

n(A)
P(A)
n(S)

Where:
P(A) is the probability of event A.
" n(A) is the number of favourable outcomes for event A.
" n(S) is the total number of equally likely outcomes in the sample space S.
This
formula allows us to calculate the probability of an event by counting the
favourable outcomes annd dividing by the total number of equally likely outcomes.
In R, you can use this formula to calculate classical probabilities for various events,
bing it afundamental concept in probability theory for data analysis and statistics.
Properties of Classical Probability
Complementary Probability - The probability of an event not occurring is
known as the complementary probability. It can be calculated as : 1- P(E)
Mutually Exclusive Events - Events are mutually exclusive if they cannot
occur simultaneously. For example, rolling a die and getting both a 2 and a 4
in a single roll is impossible.
Independent Events - Events are considered independent if the outcome
of one event does not affect the outcome of another. For instance, tossing a
coin does not influence the roll of a die.

Advantages and Limitations of Classical Probability

Advantages:
" Simpleness: Classical probability offers an easy-to-understand framework
for modelling and analysing random events, making it approachable for
novices and the basis for more complex probability ideas.
Theoretical Foundation: It provides the foundation for more intricate
probability theories, allowing for a thorough comprehension of probability
concepts.
Classical probability is unbiased and simple to use in circumstances with
well-defined sample spaces because it makes the assumption that each result
is equally likely.
Limitations:
Application: When dealing with continuous or complicated data or when
events arenot allequally likely, classical probability may not correctly reflect
real-world scenarios.
Limited Complexity: It may not be able to handle complex probabilistic
issues, necessitating the use of more sophisticated models like Bayesian
probability for in-depth investigations.
Discreteness: Due to the inherent discreteness of classical probability,
continuous probability distributions may not match it in some real-world
situations.

Real-world Applications
Weather Forecasting: Classical probability is used in weather forecasting
to estimate the likelihood of various weather conditions based on historical
data.
Quality Control: In manufacturing, classical probability is applied to assess
the probability of defects in a production process, aiding in quality control
Code 1:

six-sided die
# Rolling afair
die <- 1:6
probabilities <- rep(1/6, 6) # Each face has equal probability

4
# Probability of rolling a
prob_4 <- probabilities [die == 4]
print(paste("Probability of rolling a 4:", prob_4))

# Simulating 10 rolls of the die

rolls <- sample(die, size = 10, replace =TRUE, prob =probabilities)
print("Simulated rolls:")
print(rolls)

# Uniform distribution between 0 and 1

x<- seq(0, 1, by = 0.01)
# PDF
pdf <- dunif(x, min = 0, max = 1)
# CDF
cdf <- punif(x, min = 0, max = 1)

# Random numbers
random_values <- runif( 10, min = 0, max = 1)
#Plotting PDF and CDF
plot(x, pdf, type = "I", col = "blue", main ="Uniform Distribution", ylab = "Density")
lines(x, cdf, col ="red")
legend("bottomright",legend =c("PDF", "CDF"), col =c("blue", "red"), Ity =1)
# Normal distribution with mean=0, sd=1
x<- seq(-4,4, by = 0.01)
# PDF
pdf <-dnorm(x, mean = 0, sd = 1)
# CDF
cdf <- pnorm(x, mean = 0, sd = 1)
# Random numbers
random_values <-rnorm(1000, mean = 0, sd= 1)
# Plotting PDF and CDF
plot(x, pdf, type = "",col = "blue", main = "Normal Distribution",ylab = "Density")
lines(x, cdf, col ="red")
legend("bottomright", legend =c("PDF", "CDF"), col =c("blue", "red"), Ity =1)
Values
# Histogram of Random
hist(random_values, probability = TRUE, col =
Random Values")
"lightblue", main = "Histogram of
lines(density(random _values), col = "red")

Probability Distribution
Bmakes it easy to draw probability distributions and demonstrate
statistical concepts.
Someof the more common probability distributions available in R are given below.

distribution R name Distribution R name

Beta beta Lognormal Inorm

Binomial binom Negative Binomial nbinom

Cauchy cauchy Normal norm

Chisquare chisq Poisson pois

Exponential exp Student t

F f Uniform unif

Gamma gamma Tukey tukey

Geometric geom Weibull weib

Hypergeometric hyper Wilcoxon wilcox

Logistic logis

The functions available for each distribution follow this format:

name Description

d name( ) density or probability function

P name( ) cumulative density function

q name(O quantile function

randon deviates
Rname (0
For example. pnorm(0) =0.5 (the area under the standard normal curve to the left of
zero). qnorm(0.9)= 1.28 (1.28 is the 90th percentile of the standard normal
stribution ). rnorm(100) generates 100 random deviates from a standard normal
distribution.
Each function has parameters specific to that distribution. For example, rnorm(100.
m=50, sd=10) generates 100 random deviates from a normal distribution with mean 50
and standard deviation 10.

Output:
[1] "Simulated rolls:"
>print(rolls)
[1]31 5 2256433

Uniform Distribution
.4

1.2

Density
1.0

0.8

0.6
PDF
CDF

0.0 0.2 0.4 06 0.8 1.0

Normal Distribution
4

Density
0.2

0.1

0.0
PDF
CDF

-2 2

0.4 Histogram of Random Values

0.3

Density
0.2

-3 -2 -1 2 3

random_ values
Viva-Voce:

Q1 What is classicalprobability?
Classical probability is a branch of probability theory that deals with
lkely outcomes. It forms the basis of events having
probability theory and is widely
statistics and data science. used in

Q.2. How can Iuse Rfor probability calculations?

8Isapowerful programming language for statistical analysis and data
Vou can use R packages like 'prob' and manipulation.
calculations.
'gtools' to perform various probability
0.3: What are some real-world applications of
Probability plays a crucial role in data science
probability in data science?
applications like risk
predictive modelling,quality control, and decision-making under assessment,
uncertainty.
Q.4:Can you recommend any additional resources for
Certainly! There are numerous online courses, books, learning probability in R?
and tutorials available for
learning probability in R. Somepopular resources
Statistics in R," the book "Introduction to Probability"include Coursera's "Probability and
by Joseph K. Blitzstein and Jessica
Hwang, and online R documentation.
Q. 5: What are the mainchallenges when
working with classical probability in R?

Challenges in classical probability include making

limited realism, dealing with data quality issues,simplifying assumptions, handling
and addressing computationally
intensive calculations. In such cases, alternative
or advanced machine learning techniques may beapproaches like Bayesian probability
considered.
Experiment-8
Aim:To implement linear regression using R.

Software Used: R

Theory:
Regression analysis is a very widely used statistical tool to establish a relationship model
between two variables.One of these variables is called predictor variable whose value is
gathered through experiments. The other variable is called response variable
value is derived from the predictor variable. whose

In Linear Regression these two variables are related through an equation, where
exponent (power) of both these variables is 1. Mathematically a linear
represents a straight line when plotted as a graph. A non-linear relationshiprelationship
where the
exponent of any variable is not equal to 1 creates a curve.
The general mathematical equation for a linear regression is -
y= ax + b

Following is the description of the parameters used -

y is the response variable.
x is the predictor variable.
aand b are constants which are called the coefficients.
Steps to Establish a Regression
Asimple example of regression is predicting weight of a person when his height is known.
To do this we need to havethe relationship between height and weight of a person.
The steps to create the relationship is -

Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
Create a relationship model using the Im() functions in R.
Find the coefficients from the model created and create the mathematical equation
using these
prediction.
Get a summary of the relationship model to know the average error in
Also called residuals.
Topredict the weight of new persons, use the predict() function
inR.
Im()Function

Thie function creates the relationship model between the predictor and the response
variable.

Syntax

The basic syntax for Im(0function in linear regression is -

Im(formula,data)

Following is the description of the parameters used -

formula is asymbol presenting the relation between x and y.
data is the vector on which the formula will be applied.
predict() Function
Syntax

The basic syntax for predict() in linear regression is -

predict(object, newdata)
Following is the description of the parameters used -
object is the formula which is already created using the
newdata is the vector containing the new value for Im() function.
predictor variable.

Code:
#Input Data
#Below is the sample data representing the
observations -
#Values of
height
151, 174, 138, 186, 128, 136, 179, 163, 152,
131
# Values of
weight.
63,81, 56,91, 47, 57, 76, 72, 62, 48
X<-c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y<- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
Coefficients
#Create Relationship Model &get the
# Apply the Im()
function.
relation <-Im(y~x)

print(relation)

print(summary(relation)
#predict

# Find weight of a person with height 170.

a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)

#Visualize the Regression Graphically

# Give the chart file a name.
png(file ="linearregression.png")

# Plot the chart.

plot(y,x,col = "blue",main = "Height &Weight Regression",

abline (Im(x~y)),cex = 1.3,pch = 16,xlab = "Weight in
Kg"ylab ="Height in cm")

# Save the file.

dev.off0)

Output:
Call:

Im(formula =y~ x)
Coefficients:
(Intercept)

38.4551 0.6746

>print(summary(relation))

Call:

Im(formula =y ~ x)

Residuals:

Min 10 Median 3Q Max

-6.3002 -1.6629 0.0412 1.8944 3.9775

Coefficients:
Estimate Std. Error t value Pr(>\t|)
(Intercept) -38.45509 8.04901 -4.778 0.00139 **
X
0.67461 0.05191 12.997 1.16e-06 ***

Signif. codes: 0 * 0.001 * 0.01 ** 0.05"0.1"1

Residual standard error: 3.253 on 8 degrees of freedom

Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491
F-statistic: 168.9 on 1and 8 DF, p-value: 1.164e-06

> #predict

># Find weight of aperson with height 170.

>a<- data.frame(x = 170)
predict(relation,a)
>result <-
>print(result)

76.22869

Helght &Welght Regresslon

1.
50 60 70 80

Weight un Ka

Viva-Voce:
Q.1. What are the assumptionsof a linear regression model?

The assumptions of a linear regression model are:

The relationship between the independent and dependent variables is linear.

The residuals, or errors, are normally distributed with a mean of zero and a
Constant variance.
The independent variables are not correlated with each other (i.e. they are not
collinear).
The residuals are independent of each other (i.e. they are not autocorrelated).
The model includes all the relevant independent variables needed to accurately
predict the dependent variable.
2. What is multicollinearityand howdoes it affect linearregression analysis?
Multicollinearity refers to asituation in which two or more independent variables in a
linear regression model are highly correlated with each other. This can create problems
In the regression analysis, as it can be difficult to determine the individualeffects ot
each independent variable on the dependent variable.

when two or more independent variables are highly correlated, it becomes ditticult to
Isolate the effect of each variable on the dependent variable. The regression model may
indcate that both variables are significant predictors of the dependent variable, but it
can be difiult to determine which variable is actually responsible for the observed
cftect

.3. What are the common techniques used to improve the accuracy of a linear
regression model?

Featureselection: selecting the most relevant features for the model to improve
its predictive power.
Feature scaling: scaling the features to a similar range to prevent bias towards
certain features.
Regularization: adding a penalty term to the model to prevent overfitting and
improve generalization.
Cross-validation: dividing the data into multiple partitions and using a different
partition for validation in each iteration to avoid overfitting.
Ensemble methods: combining multiple models to improve the overall accuracy
and reduce variance.

0.4.What is a residual in linear regression and how is it used in model

evaluation?
In linear regression, a residual is the difference between the
dependent variable (based on the model) and the actual observed predicted value of the
value. It is used to
evaluate the performance of the model by measuring how well the model fits the
If the residuals are small and evenly data.
distributed around the me an, it indicates that the
model is agood fit for the data. However, if the residuals are large
distributed, it indicates that the model may not be a good fit for the dataand not evenly
and may need
to be improved or refined.

Q.5. What is heteroscedasticity?

Heteroscedasticity is a statistical term that refers to the unequal variance of the error
terms (or residuals) in a regression model. In a regression model, the
residuals
represent the difference between the observed values and the predicted values of the
dependent variable. When heteroscedasticity occurs:
The variance of the error terms is not constant across the range of the
variables.
independent
Error terms tend to be larger for some values of the independent variables than
for others.
This can result in biased and inconsistent estimates of the regression coefficients
and standard errors, which can affect the accuracy of the statistical inferences and
predictions made from the model.

Heteroscedasticity can be caused by a number of factors, including:

Outliers
Omitted variables
Measurement errors
Nonlinear relationships between the variables

Experimental Design I Lecture Notes 1
100% (1)
Experimental Design I Lecture Notes 1
33 pages
Statistics and Probability: Senior High School
77% (13)
Statistics and Probability: Senior High School
44 pages
Basic Principles of Experimental Designs
100% (4)
Basic Principles of Experimental Designs
2 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
Measures of Central Tendency Dispersion and Correlation
100% (1)
Measures of Central Tendency Dispersion and Correlation
27 pages
Analytics Compendium (Incl Stats)
No ratings yet
Analytics Compendium (Incl Stats)
31 pages
Unit II TYCS DS
No ratings yet
Unit II TYCS DS
176 pages
Advanced Statistics
No ratings yet
Advanced Statistics
259 pages
Chap 012
75% (4)
Chap 012
91 pages
Pinheiro and Bates - 2000 - Mixed-Effects Models in S and S-PLUS PDF
100% (2)
Pinheiro and Bates - 2000 - Mixed-Effects Models in S and S-PLUS PDF
537 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
QT Theory (Full)
No ratings yet
QT Theory (Full)
81 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
Chapter 1
No ratings yet
Chapter 1
63 pages
Statistics Foundation Slider Team Group#1
No ratings yet
Statistics Foundation Slider Team Group#1
94 pages
H1.1 Definitions, Measures, Plots, CLT
No ratings yet
H1.1 Definitions, Measures, Plots, CLT
83 pages
Skewness 2025
No ratings yet
Skewness 2025
62 pages
05 Charts and Graphs in R
No ratings yet
05 Charts and Graphs in R
51 pages
614 Descriptive Statistcs
No ratings yet
614 Descriptive Statistcs
56 pages
Unit V Statistics R
No ratings yet
Unit V Statistics R
60 pages
DV - Unit 2
No ratings yet
DV - Unit 2
73 pages
Charts and Graphs in R
No ratings yet
Charts and Graphs in R
50 pages
3 Data Description
No ratings yet
3 Data Description
87 pages
Unit3 R
No ratings yet
Unit3 R
30 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
DA R Unit-4
No ratings yet
DA R Unit-4
32 pages
CHP 2
No ratings yet
CHP 2
52 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
02 R Stats Visualisation
No ratings yet
02 R Stats Visualisation
37 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
Unit III - R Programming
No ratings yet
Unit III - R Programming
21 pages
Word File For Prob and Stats
No ratings yet
Word File For Prob and Stats
25 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Visualization
No ratings yet
Visualization
27 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
R UNIT 3 STatistic N Probabilty
No ratings yet
R UNIT 3 STatistic N Probabilty
17 pages
Unit 4 Ba Shivdas
No ratings yet
Unit 4 Ba Shivdas
17 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
2 9 Cumulative Frequency Tables
No ratings yet
2 9 Cumulative Frequency Tables
6 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
Satistics
No ratings yet
Satistics
18 pages
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
No ratings yet
Describing Data: Probability and Statistics For Science and Engineering With Examples in R
24 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Experiment 3
No ratings yet
Experiment 3
43 pages
Data Analysis2
No ratings yet
Data Analysis2
16 pages
00 Probability 2
No ratings yet
00 Probability 2
19 pages
(Ebook PDF) Principles of Econometrics, 5th Editioninstant Download
100% (3)
(Ebook PDF) Principles of Econometrics, 5th Editioninstant Download
44 pages
Exp-6 SDMA
No ratings yet
Exp-6 SDMA
7 pages
Types of Statistics
No ratings yet
Types of Statistics
7 pages
Data Visualizations: Histograms
No ratings yet
Data Visualizations: Histograms
27 pages
Introduction To R Charts Graphs AN 15 09 2024
No ratings yet
Introduction To R Charts Graphs AN 15 09 2024
8 pages
Business Statistics 18 19 Nov 2017
No ratings yet
Business Statistics 18 19 Nov 2017
23 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
10 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
Unit 1: Exploratory Data Analysis
No ratings yet
Unit 1: Exploratory Data Analysis
28 pages
Plot Graph Selection Guide Combined-1-1
No ratings yet
Plot Graph Selection Guide Combined-1-1
6 pages
Beginning Behavioral Research A Conceptual Primer, 7th Edition Exclusive Download
100% (17)
Beginning Behavioral Research A Conceptual Primer, 7th Edition Exclusive Download
14 pages
Topic1 Summarizing and Visualizing Data PDF
No ratings yet
Topic1 Summarizing and Visualizing Data PDF
29 pages
Notes: Section 1: Exploratory Data Analysis
No ratings yet
Notes: Section 1: Exploratory Data Analysis
6 pages
Statistical Analysis: 1 Data Analysis: Mean, Variance, Boxplots
No ratings yet
Statistical Analysis: 1 Data Analysis: Mean, Variance, Boxplots
4 pages
Prelims Biostat
No ratings yet
Prelims Biostat
9 pages
Quantitative Data Analysis Assignment (Recovered)
100% (1)
Quantitative Data Analysis Assignment (Recovered)
26 pages
Stats Week 1 PDF
No ratings yet
Stats Week 1 PDF
6 pages
MAT5007 - Module 1 Problem Set
No ratings yet
MAT5007 - Module 1 Problem Set
3 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
DOANE - STAT - Chap 017
No ratings yet
DOANE - STAT - Chap 017
87 pages
Chap 14 - Statistical Process Control
No ratings yet
Chap 14 - Statistical Process Control
38 pages
BRM Unit 4 Full
No ratings yet
BRM Unit 4 Full
32 pages
ECO Paper
No ratings yet
ECO Paper
35 pages
07 - Principles of Visualizing Data
No ratings yet
07 - Principles of Visualizing Data
41 pages
Distribution
No ratings yet
Distribution
17 pages
Statistics - 3rd Grading
No ratings yet
Statistics - 3rd Grading
3 pages
Module 9: Statistical Inference of Two Samples: The Z-Test
No ratings yet
Module 9: Statistical Inference of Two Samples: The Z-Test
11 pages
Essay On Different Types of Sampling Used in Nepal
No ratings yet
Essay On Different Types of Sampling Used in Nepal
4 pages
Ribs
No ratings yet
Ribs
12 pages
BAS300 2018 Semester 2 Final Exam
No ratings yet
BAS300 2018 Semester 2 Final Exam
7 pages
MATH103 M2 Data Presentation
No ratings yet
MATH103 M2 Data Presentation
43 pages
Lecture 6 Instrumental Variables (IV) Estimation and Two Stage Least Squares (2SLS)
No ratings yet
Lecture 6 Instrumental Variables (IV) Estimation and Two Stage Least Squares (2SLS)
2 pages
Associates Degree Programme: Coursework Project: Part I
No ratings yet
Associates Degree Programme: Coursework Project: Part I
8 pages
Latin Squares Design Has Following Features
No ratings yet
Latin Squares Design Has Following Features
9 pages
4 Slovins Formula
No ratings yet
4 Slovins Formula
6 pages
Bayesian Net Example
No ratings yet
Bayesian Net Example
4 pages
Variance of Sample Variance
No ratings yet
Variance of Sample Variance
0 pages
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Image Histogram: Unveiling Visual Insights, Exploring the Depths of Image Histograms in Computer Vision
From Everand
Image Histogram: Unveiling Visual Insights, Exploring the Depths of Image Histograms in Computer Vision
Fouad Sabry
No ratings yet

SSMDA

Uploaded by

SSMDA

Uploaded by

Experiment-6

Alm: Tofind basic statistics and visualizationof a given data set in k

The arithmetic mean of a variable, often referred to as the average, is calculated by

# Load the dataset

data <- mtcars[, c('mpg', 'cyl')]

#Display the first few rowsof the dataset

# Summary of the dataset

print("Summary Statistics of the Dataset:")

# Structure of the dataset

print("Structure of the Dataset:")

mean_value <- mean(column_data, na.rm =TRUE)

median_value <- median(column_data, na.rm =TRUE)

variance <- var(column_data, na.rm =TRUE)

max_value<-max(column_data, na.rm =TRUE)

cat("Minimum:", min_value, "\n")

cat("Maximum:", max_value, "\n")

# Scatterplot (if the dataset has two numeric columns)

Negative Correlation: Points slope downward, indicating that as one variable

" No Correlation: Points are scattered randomly, showing no relationship.

Mathematically, this can be expressed as:

Advantages and Limitations of Classical Probability

# Simulating 10 rolls of the die

# Uniform distribution between 0 and 1

distribution R name Distribution R name

Beta beta Lognormal Inorm

Binomial binom Negative Binomial nbinom

Chisquare chisq Poisson pois

Exponential exp Student t

Gamma gamma Tukey tukey

Hypergeometric hyper Wilcoxon wilcox

The functions available for each distribution follow this format:

d name( ) density or probability function

P name( ) cumulative density function

q name(O quantile function

0.0 0.2 0.4 06 0.8 1.0

0.4 Histogram of Random Values

Q.2. How can Iuse Rfor probability calculations?

Challenges in classical probability include making

Following is the description of the parameters used -

The basic syntax for Im(0function in linear regression is -

Following is the description of the parameters used -

The basic syntax for predict() in linear regression is -

# Find weight of a person with height 170.

#Visualize the Regression Graphically

# Plot the chart.

plot(y,x,col = "blue",main = "Height &Weight Regression",

# Save the file.

Min 10 Median 3Q Max

-6.3002 -1.6629 0.0412 1.8944 3.9775

Signif. codes: 0 *** 0.001 *** 0.01 ** 0.05"0.1"1

Residual standard error: 3.253 on 8 degrees of freedom

># Find weight of aperson with height 170.

Helght &Welght Regresslon

The assumptions of a linear regression model are:

The relationship between the independent and dependent variables is linear.

0.4.What is a residual in linear regression and how is it used in model

Q.5. What is heteroscedasticity?

Heteroscedasticity can be caused by a number of factors, including:

You might also like

Signif. codes: 0 * 0.001 * 0.01 ** 0.05"0.1"1