R Notes Previous Year Paper
R Notes Previous Year Paper
12. **Text and Data Mining:** R has packages that facilitate text and data
mining, making it a valuable tool for analyzing unstructured data and
extracting insights.
Disadvantage of R
11. **Limited Support for Real-Time Data:** R may not be the best choice
for real-time data processing and analysis due to its inherent limitations in
speed and efficiency.
Output
The smallest number among 10 , 5 , and 8 is: 5
1. **mean() Function:**
The `mean()` function in R is used to calculate the arithmetic mean or
average of a numeric vector. It takes one or more numeric values as input
and returns the mean value. Here's how you can use the `mean()` function:
# Example 1: Calculate the mean of a numeric vector
numbers <- c(5, 10, 15, 20, 25)
avg <- mean(numbers)
print(avg) # Output: 15
The `mean()` function can handle missing values using the `na.rm`
parameter, which, when set to `TRUE`, ignores NA values while calculating
the mean.
2. **plot() Function:**
The `plot()` function is highly flexible and can be used for more advanced
and customized visualizations by adjusting its parameters and using other
plotting functions and libraries in R.
In this code:
**Defining a List:**
You can define a list in R using the `list()` function. Here's the basic syntax:
Each element in the list can be of any data type, and they can be accessed
by their position within the list.
**Examples:**
1. **Basic List:**
# Create a list
my_list <- list(42, "Hello", TRUE)
# Print elements
print(first_element)
print(second_element)
print(third_element)
Output
[1] 42
[1] "Hello"
[1] TRUE
-----------------------------------------------------------------------
2. **List of Vectors:**
Lists can also contain vectors or other data structures. Here's an example
of a list containing numeric vectors:
print(first_vector)
print(second_vector)
print(third_vector)
Output
[1] 1 2 3
[1] 4 5 6
[1] 7 8 9
--------------------------------------------------------------------------
3. **Nested Lists:**
Lists can also contain other lists, allowing for nesting of data structures.
Here's an example of a nested list:
print(first_fruit)
print(second_pet)
Output
[1] "apple" "banana" "cherry"
[1] "cat"
----------------------------------------------------------------------
Lists are particularly useful when you need to store and organize
heterogeneous data, such as different types of variables or data structures,
in a single container. They provide a flexible way to structure and access
your data in R.
**Defining a Vector:**
You can define a vector in R using the `c()` function, which stands for
"combine" or "concatenate." Here's the basic syntax:
Each element within the vector should be of the same data type.
**Examples:**
1. **Numeric Vector:**
Output:
[1] 1 2 3 4 5
----------------------------------------------------------------------
2. **Character Vector:**
Output:
[1] "apple" "banana" "cherry"
------------------------------------------------------------------------
3. **Logical Vector:**
Output:
[1] TRUE FALSE TRUE TRUE FALSE
------------------------------------------------------------------------
A vector can also contain mixed data types, but all elements are implicitly
converted to a common data type. For example:
Output:
[1] "1" "apple" "TRUE"
-----------------------------------------------------------------------
In this case, all elements are converted to character type because
character data type can accommodate various data types.
Vectors are a fundamental building block in R and are commonly used for
storing and manipulating data. They are the basis for more complex data
structures like lists, data frames, and arrays.
Describe Variance
Ans: Variance is a statistical measure that quantifies the spread or
dispersion of a set of data points in a dataset. It provides insight into how
individual data points deviate from the mean (average) of the dataset. In
other words, it measures the average squared difference between each
data point and the mean of the dataset. The variance is an important
concept in statistics and data analysis, and it's often denoted by the symbol
σ² (sigma squared).
**Calculation of Variance:**
The variance of a dataset can be calculated using the following formula:
Where:
sigma^2 is the variance.
n is the number of data points in the dataset.
x_i represents each individual data point.
mu is the mean (average) of the dataset.
2. Calculate the squared differences from the mean for each integer:
(1 - 5.5)^2 = 20.25)
(2 - 5.5)^2 = 12.25)
(3 - 5.5)^2 = 6.25)
.
.
.
(10 - 5.5)^2 = 20.25)
4. Take the square root of the result from step 3 to obtain the standard
deviation:
Standard Deviation ( \sigma \)) = (\sqrt{3.325} \approx 1.823\)
So, the standard deviation of the first 10 integers (1, 2, 3, ..., 10) is
approximately 1.823 (rounded to three decimal places).
In this code:
This code will create a data frame named `employee_data` with 20 rows,
each containing an employee name and a department name. The
employee names and department names will be randomly assigned based
on the specified vectors. The actual data will vary due to the random
selection.
1. **Vectors:**
- A vector is a one-dimensional array that can hold elements of the same
data type, such as numeric, character, or logical.
- It is the simplest data structure in R and is created using the `c()`
function.
- Vectors are used for storing and manipulating single variables or small
datasets.
2. **Lists:**
- A list is a versatile data structure that can hold elements of different data
types, including other lists.
- Lists are created using the `list()` function and are often used to store
heterogeneous data and complex structures.
3. **Matrices:**
- A matrix is a two-dimensional data structure with rows and columns,
containing elements of the same data type.
- Matrices are created using the `matrix()` function and are commonly
used in linear algebra and statistical operations.
4. **Data Frames:**
- A data frame is a two-dimensional, tabular data structure similar to a
matrix but with the flexibility to store columns of different data types.
- Data frames are often used to store and analyze structured data, such
as datasets from spreadsheets or databases.
- They are created using functions like `data.frame()` or by reading data
from external files.
5. **Factors:**
- Factors are used to represent categorical data or nominal data with a
fixed set of categories.
- Factors are created using the `factor()` function and are particularly
useful in statistical modeling and analysis.
6. **Arrays:**
- An array is a multi-dimensional data structure that can have more than
two dimensions.
- Arrays are used for more complex data that requires multiple
dimensions.
- They are created using the `array()` function.
7. **Time Series:**
- Time series objects are used to represent time-based data, such as
stock prices, sensor readings, or economic indicators.
- Time series data structures are created using packages like "ts" or "xts"
for time series analysis.
These data structures can be combined and nested to handle diverse data
requirements. Understanding the characteristics and appropriate use cases
for each data structure is essential for effective data manipulation and
analysis in R.
How can new objects be created in R? Also discuss methods and
arguments.
Ans:
In R, you can create new objects using a variety of data types, such as
vectors, matrices, data frames, lists, and more. To create objects, you
typically assign values to variables using the assignment operator "<-" or
the "=" sign. Here's how you can create objects with some of the basic data
types and an explanation of methods and arguments:
1. Vectors:
Vectors are one-dimensional data structures that can hold elements of
the same data type (e.g., numeric, character, logical). You can create
vectors using the `c()` function.
Example:
```R
# Creating a numeric vector
numeric_vector <- c(1, 2, 3, 4)
2. Matrices:
Matrices are two-dimensional data structures. You can create matrices
using the `matrix()` function.
Example:
```R
# Creating a matrix
data_matrix <- matrix(1:9, nrow = 3, ncol = 3)
```
3. Data Frames:
Data frames are similar to matrices, but they can store different data
types in different columns. You can create data frames using the
`data.frame()` function.
Example:
```R
# Creating a data frame
df <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
Married = c(TRUE, FALSE, TRUE)
)
```
4. Lists:
Lists are versatile data structures that can store objects of different types.
You can create lists using the `list()` function.
Example:
# Creating a list
my_list <- list(
numbers = c(1, 2, 3),
names = c("John", "Mary"),
matrix = matrix(1:6, nrow = 2)
)
Example:
```R
# Create some sample data
x <- seq(0, 2 * pi, length.out = 100)
y1 <- sin(x)
y2 <- cos(x)
In this example, we first create a plot of the `sin(x)` curve and then add the
`cos(x)` curve to the same plot using the `lines()` function. The `type = "l"`
argument in the `plot()` function specifies that we want a line plot. You can
customize the appearance of each curve using arguments like `col` for
color and labels for axes and titles.
```R
# Create some sample data
x <- rnorm(100)
y <- rnorm(100)
In this example, we first set the layout to have one row and two columns
using `par(mfrow = c(1, 2))`. Then, we create two separate plots within this
layout using `plot()` and `hist()`. After plotting, we reset the layout to the
default (1 plot per window) using `par(mfrow = c(1, 1)`.
You can adjust the layout to accommodate more plots as needed. The
`mfrow` argument in `par()` controls the number of rows and columns in the
layout.
These are just some basic examples of how to plot multiple curves or
graphs in the same plot or window in R. Depending on your specific data
and visualization needs, you can customize the appearance, layout, and
other aspects of your plots.
2. **Mixed-Effects Models**:
Mixed-effects models, also known as hierarchical or multilevel models,
are used when data has a hierarchical structure or repeated
measurements. They incorporate both fixed effects (population-level
parameters) and random effects (individual/group-level variations). These
models are often used in longitudinal and clustered data analysis.
3. **Survival Analysis**:
Survival analysis is used to model time-to-event data, such as time until a
patient's recovery or failure. The Cox proportional hazards model and
Kaplan-Meier survival curves are common techniques in this field.
8. **Bayesian Models**:
Bayesian modeling involves using Bayes' theorem to update prior beliefs
with new data. Bayesian models can be applied to various statistical tasks,
including Bayesian regression, Bayesian networks, and Markov Chain
Monte Carlo (MCMC) methods.
**Classes**:
- A class is a blueprint or template for creating objects. It defines the
structure, attributes (data members), and behaviors (methods) that objects
of that class will have.
- Classes provide a way to encapsulate data and functionality into a single
unit, promoting modularity and code organization.
- Classes are typically defined with attributes (variables) and methods
(functions) that describe the object's properties and actions.
**Objects**:
- An object is an instance of a class. It's a concrete realization of the
blueprint defined by the class.
- Objects have specific values for the attributes defined in the class and can
perform actions through the methods defined in the class.
- Objects represent real-world entities or concepts and can interact with
one another.
# Method to greet
def greet(self):
print(f"Hello, my name is {self.name} and I'm {self.age} years old.")
In this example, we have a class called "Person" with attributes `name` and
`age`, and a method `greet`. We create two objects, `person1` and
`person2`, each with their own set of attribute values. We can access object
attributes using dot notation and call methods on objects.
Output:
```
Alice
Hello, my name is Bob and I'm 25 years old.
```
The `person1` and `person2` objects are instances of the "Person" class,
each with its own data (name and age) and the ability to execute the `greet`
method.
Classes and objects provide a powerful way to model and manage complex
systems by encapsulating data and behavior into reusable units. They are
a fundamental concept in object-oriented programming and are widely used
in languages like Python, Java, C++, and many others.
To create a factor in R, you can use the `factor()` function. Here's how you
can create a factor:
You can also specify the levels explicitly using the `levels` argument to
ensure that the factor retains a specific order or that all possible levels are
included, even if not present in the data. For example, `levels = c("Red",
"Green", "Blue")` specifies the order of levels as "Red," "Green," and
"Blue."
**Bar Chart**:
- A bar chart is used to display categorical data with discrete categories on
one axis and the corresponding values on the other axis.
- In a bar chart, the bars are separated, and there are gaps between them,
as the categories are distinct and not connected in a continuous range.
- Bar charts are suitable for visualizing and comparing categories or
groups. They are typically used for displaying frequencies, counts, or
proportions of categories. Bar charts are commonly used to represent data
such as survey results, product sales by category, or the number of
students in different grade levels.
**Histogram**:
- A histogram, on the other hand, is used to display the distribution of
continuous data. It divides the range of continuous data into intervals (bins)
and shows the frequency or density of data points within each interval.
- In a histogram, the bars are typically connected, as the data points fall
along a continuous scale.
- Histograms are useful for visualizing the shape, central tendency, and
spread of a continuous data distribution. They are commonly used in
statistics to assess the distribution of data, such as the distribution of ages
in a population, exam scores, or heights of individuals.
Here are some key differences between bar charts and histograms:
1. **Data Type**:
- Bar charts are used for categorical data with distinct categories.
- Histograms are used for continuous data with a range of values.
2. **Bar Separation**:
- Bar charts have distinct bars with gaps between them.
- Histogram bars are typically connected, forming a continuous
distribution.
3. **Purpose**:
- Bar charts are used for comparing and displaying discrete categories or
groups.
- Histograms are used to visualize the distribution and characteristics of
continuous data.
4. **Axis Scales**:
- In a bar chart, both axes can represent categorical data.
- In a histogram, one axis represents the range of values (continuous
scale), and the other axis represents frequencies or densities.
- Use a **bar chart** when you have categorical data and want to compare
categories or groups. For example, you might use a bar chart to compare
sales of different products.
- Use a **histogram** when you have continuous data and want to
understand the distribution, shape, or characteristics of the data. For
example, you might use a histogram to visualize the distribution of exam
scores in a class.
Ans: To calculate the mean deviation about the mean for the given data,
you'll need to follow these steps:
Let's calculate the mean deviation about the mean for the given data:
Weight (kg) Persons
50-55 12
55-60 18
60-65 15
65-70 14
70-75 8
μ = [(52.5 * 12) + (57.5 * 18) + (62.5 * 15) + (67.5 * 14) + (72.5 * 8)] / (12 +
18 + 15 + 14 + 8)
μ = (630 + 1035 + 937.5 + 945 + 580) / 67
μ = 4127.5 / 67
μ ≈ 61.50 kg
So, the mean deviation about the mean for the given data is approximately
5.48 kg.
6. **Data Visualization**:
7. **Resource Management**:
Dataframe in R.
Ans: In R, a DataFrame is a two-dimensional, tabular data structure that is
commonly used to store and manipulate data. It is one of the most
fundamental and widely used data structures for data analysis and
statistics in R. DataFrames are part of the base R package and are also
used extensively in packages like `dplyr`, `tidyr`, and `ggplot2` for data
manipulation and visualization. Here are some key points about
DataFrames in R:
```R
# Creating a DataFrame
data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Age = c(25, 30, 22),
Gender = c("Female", "Male", "Male")
)
# Display the DataFrame
print(data)
```
This code creates a DataFrame named `data` with columns for Name, Age,
and Gender. The `data.frame()` function is used to create the DataFrame,
and the `print()` function displays its contents.
Graphics in R.
Ans: In R, you can create a wide variety of graphical visualizations to
explore and communicate your data effectively. R provides a robust and
versatile set of graphics and plotting functions that allow you to create static
and interactive visualizations for a wide range of data types. Here are some
key concepts and packages for creating graphics in R:
1. **Base Graphics**:
- R's base graphics system provides a simple and built-in way to create
static plots and charts.
- Functions like `plot()`, `hist()`, `boxplot()`, and `barplot()` are commonly
used for basic data visualization.
- Base graphics allow customization, but they have limited interactivity.
Example:
```R
# Create a scatterplot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 1, 5, 3)
plot(x, y, type = "p", main = "Scatterplot", xlab = "X-axis", ylab = "Y-axis")
```
2. **ggplot2**:
- `ggplot2` is a popular package for creating high-quality, customizable,
and visually appealing graphics in R.
- It follows the Grammar of Graphics concept, which allows you to build
plots layer by layer using a declarative syntax.
- ggplot2 is known for its flexibility and wide range of options for
customization.
Example:
```R
# Create a scatterplot using ggplot2
library(ggplot2)
data <- data.frame(x = c(1, 2, 3, 4, 5), y = c(2, 4, 1, 5, 3))
ggplot(data, aes(x, y)) + geom_point() + labs(title = "Scatterplot", x =
"X-axis", y = "Y-axis")
```
3. **Interactive Graphics**:
- R offers interactive graphics packages like `plotly`, `shiny`, and `leaflet`
for creating dynamic and interactive visualizations.
- These packages allow users to interact with plots, zoom in, pan, and
explore data points.
Example:
```R
# Create an interactive scatterplot using plotly
library(plotly)
data <- data.frame(x = c(1, 2, 3, 4, 5), y = c(2, 4, 1, 5, 3))
plot_ly(data, x = ~x, y = ~y, type = "scatter", mode = "markers", text =
~paste("X:", x, "<br>Y:", y))
```
4. **Specialized Packages**:
- R has numerous specialized packages for creating specific types of
visualizations, such as `ggmap` for maps, `lattice` for trellis plots, and
`networkD3` for network graphs.
Example:
```R
# Create a trellis plot using lattice
library(lattice)
data <- data.frame(x = rnorm(100), group = rep(1:5, each = 20))
xyplot(x ~ 1 | group, data = data, type = "p", main = "Trellis Plot")
```
5. **3D Graphics**:
- For 3D plotting, the `rgl` package is commonly used. It allows you to
create interactive 3D plots and animations.
Example:
```R
# Create a 3D scatterplot using rgl
library(rgl)
data <- data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100))
plot3d(data$x, data$y, data$z, type = "s", col = "blue", size = 2)
```
6. **Exporting Plots**:
- You can export plots to various file formats (e.g., PDF, PNG, SVG) using
functions like `pdf()`, `png()`, or `ggsave()` (for ggplot2).