Ds Practical
Ds Practical
Loading the Dataset: data(mtcars) loads the mtcars dataset into the R
environment.
Printing Quantiles: cat and print functions are used to display the quartile
values for the mpg column.
print(data)
# Sum of values by group
sum_by_group <- aggregate(value ~ group, data,sum)
print(sum_by_group)
Explanation:
● Define Variables: a <- 10 and b <- 5 create two variables a and b with values 10 and
5, respectively.
● Perform Arithmetic Operations: sum_result <- a + b, diff_result <- a - b,
prod_result <- a * b, div_result <- a / b perform addition, subtraction,
multiplication, and division using the variables a and b, storing the results in respective
variables (sum_result, diff_result, prod_result, div_result).
● Print Results: cat() functions are used to print each result to the console.
cat("Difference:", 10 - 5, "\n")
cat("Product:", 10 * 5, "\n")
cat("Division:", 10 / 5, "\n")
Explanation:
Comparison:
● Using R Objects: This approach is useful when you want to store intermediate results for
further calculations or when you want to reuse values multiple times.
● Without Using R Objects: This approach is simpler and more concise for one-off
calculations where the result does not need to be stored or reused.
● _________________________________________________________________
5 b) R AS CALCULATOR APPLICATION
# Addition
print(sum(10, 5))
# Subtraction
print(diff(10, 5))
# Multiplication
print(prod(10, 5))
# Division
print(10 / 5)
Explanation:
Explanation:
Load the Dataset: The iris dataset is loaded, and the first few rows are printed
for inspection.
Calculate the Correlation Matrix: The cor() function computes the correlation
matrix, excluding the Species column because it is a categorical variable.
The iris dataset is a classic dataset used for machine learning and statistical
analysis. It contains measurements of sepal length, sepal width, petal length, and
petal width for three species of iris flowers.
Here's an R script to calculate the correlation matrix for the iris dataset and
visualize it using the corrplot package:
print(correlation_matrix)
Sepal.Length and Petal.Length: There is a strong positive correlation (0.87),
meaning as sepal length increases, petal length also tends to increase.
corrplot(correlation_matrix)
Steps to Analyze the Correlation Plot
● Strong Positive Correlations: Petal length and petal width, sepal length and
petal length, sepal length and petal width.
● Moderate Negative Correlations: Sepal width with petal length and petal
width.
● Weak Correlations: Sepal length with sepal width.
__________________________________________________________________
7.Write an R script to find F Test, T Test, Z Test for the given dataset.
F test:
The F-test is a statistical test used to compare two variances to see if they are
significantly different. It is often used in the context of comparing the variances of
two populations or in the analysis of variance (ANOVA) to determine if there are
significant differences among group means.
print(f_test_result)
1. The var.test() function will return the F-statistic and the p-value. The
p-value helps us determine whether to reject the null hypothesis.
2. Interpreting the Results
If the p-value is less than the significance level (commonly 0.05), we reject
the null hypothesis and conclude that there is a significant difference
between the variances of the two groups.
Example Code
library(datasets)
data("iris")
head(iris)
print(f_test_result)
Output Interpretation
The output will look something like this:
In this case, if the p-value is less than 0.05, we reject the null hypothesis and
conclude that the variances of Sepal.Length for setosa and versicolor are
significantly different.
_________________________________________________________
T test
A T-test is a statistical test used to compare the means of two groups. It helps
determine if the means are significantly different from each other.
In this example, we'll focus on the two-sample T-test using the iris dataset in R
to compare the means of Sepal.Length between two species: setosa and
versicolor.
Load Sample Data
We'll use the built-in iris dataset for this example. The iris dataset contains
measurements of different characteristics of iris flowers from three species.
data("iris") or print(“iris”)
head(iris)
print(t_test_result)
Example Code
library(datasets)
data("iris")
head(iris)
print(t_test_result)
Output Interpretation:
● t: The calculated T-statistic.
● df: Degrees of freedom.
● p-value: The probability of obtaining a result at least as extreme as the one
observed, under the assumption that the null hypothesis is true.
● 95 percent confidence interval: The range within which the true difference
in means lies with 95% confidence.
● mean of x: Mean of Sepal.Length for setosa.
● mean of y: Mean of Sepal.Length for versicolor.
In this case, if the p-value is less than 0.05, we reject the null hypothesis and
conclude that the means of Sepal.Length for setosa and versicolor are significantly
different.
________________________________________________________________
Z-Test
Step-by-Step Guide
1. Install and Load Necessary Packages
For the Z-test, we'll use the BSDA package in R, which provides a function
for performing Z-tests.
install.packages("BSDA")
library(BSDA)
set.seed(123)
1. The z.test() function will return the Z-statistic, p-value, and confidence
interval for the difference in means.
2. Interpreting the Results
If the p-value is less than the significance level (commonly 0.05), we reject
the null hypothesis and conclude that there is a significant difference
between the means of the two groups.
Example Code
install.packages("BSDA")
library(BSDA)
set.seed(123)
print(z_test_result)
Two-sample z-Test
-4.6919583 0.6919583
sample estimates:
mean of x mean of y
50.29094 51.91209
In this case, if the p-value is greater than 0.05, we fail to reject the null hypothesis
and conclude that there is no significant difference between the means of
sample1 and sample2
__________________________________________________________
Web scraping in R can be done using packages such as rvest, httr, and xml2.
Here’s a step-by-step guide to scraping data from a website without using an API.
Example Scenario
Let's scrape data from a website like Wikipedia. For this example, we'll scrape the
table of "List of countries by GDP (nominal)" from Wikipedia.
Step-by-Step Guide
1. Install and Load Necessary Packages
install.packages("rvest")
install.packages("httr")
install.packages("xml2")
library(rvest)
library(httr)
library(xml2)
head(gdp_table)
Complete Code
Here’s the complete R script for scraping the GDP data from Wikipedia:
install.packages("rvest")
install.packages("httr")
install.packages("xml2")
library(rvest)
library(httr)
library(xml2)
url <-
"https://en.wikipedia.org/wiki/List_of_countries_by_GDP
_(nominal)"
head(gdp_table)
_____________________________________________
______________________________________________________
Completed