0% found this document useful (0 votes)
14 views37 pages

Da Unit-4

Uploaded by

Ajay Kumar Ediga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views37 pages

Da Unit-4

Uploaded by

Ajay Kumar Ediga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

UNIT-IV

Simulation - Motivating Examples, Simulation Modeling Method, case study. Introduction to optimization –
Introduction, Methods in Optimization- Linear Programming, Integer Programming—Enforcing Integrality
Restrictions on Decision Variables, Nonlinear Optimization Models. Forecasting Analytics - Methods and
Quantitative Approaches of Forecasting, Applied Forecasting Analytics Process, Applications, Evaluating Forecast
Accuracy. Survival Analysis – Introduction, Motivating Business Problems, Methods of Survival Analysis, case
study
Simulation
Simulation is a method used to examine the “what if” without having real data. We just make it up! We can
use pre-programmed functions in R to simulate data from different probability distributions or we can
design our own functions to simulate data from distributions not available in R.

When we do a simulation, we have to make many assumptions. One major assumption is the choice of the
distribution to use for a particular variable. Each particular distribution has parameters that are integral to
generating data from the distribution. We need to set a value for these parameters to simulate a value from a
distribution.

How we decide that choice of distribution and the manner in which the values of the parameters are to be
estimated are beyond the scope

Given a particular distribution and known parameters, we can generate values from that distribution.
However, in reality, we never know the true distribution, and we never know the exact parameter values
needed to generate values from that distribution. Practical statistical analysis helps us to identify a good
choice of the distribution and estimate reasonable parameters.

We can think of the simulated values as a sample of a larger population. Samples with a large number of
values will better reflect the properties of the distribution of the larger population.

➢ Normal Distribution
➢ Setting the Seed
➢ Indicator (Bernoulli) Variables
➢ Uniform (Continuous Version)
➢ Poisson Variables (Optional)
➢ Gamma Variables (Optional)
➢ Compound Distribution (Optional)

Normal Distribution

The normal distribution is used if the variable is continuous. We usually refer to the density of a normal
random variable as a bell-shaped curve. We require a value for the mean and another for the standard
deviation to simulate a value from a normal distribution.(The mean and standard deviation (or variance) are
the parameters of a normal distribution.)

We can easily simulate 1000 values from a normal distribution with a mean of 10 and a standard deviation
of 4 as follows:

x <- rnorm(n = 1000, mean = 10, sd = 4)

Notice that notation of n = (number of values to be simulated), mean = , and sd = . If we plot a histogram,
we can see a somewhat bell-shaped curve.

hist(x)
1
We can check what are the mean and standard deviation of the sample values. We could start with the
summary() function:

summary(x)

Note in the results that the mean is shown. While the mean not equal 10, it is close to 10. Samples of
variables will never exactly be equal to the parameters used (in this case the mean and standard deviation).
The larger the sample, the closer the simulated sample values will be to those set in the rnorm() function.

The standard deviation is not shown. We can use the sd() function or load the psych package.

sd(x)

library(psych)

describe(x)

Again we see that the standard deviation is close to the parameter of 4.

Simulation is a nice tool as you can re-do everything and get different samples. This way you can see how
quantities (like the mean, standard deviation, histogram) vary from one sample to another, even though they
were generated from the same underlying distribution.

You can change the sample size, or mean, or standard deviation. Plotting the values helps you see the data.
See what happens if we just change the sample size to 10, instead of 1000.

x <- rnorm(n = 10, mean = 10, sd = 4)

describe(x)

hist(x)

Setting the Seed

Technically, when we simulate from a particular distribution, we are using a pre-programmed function to
generate the numbers. Thus, we may call these pseudo-random numbers.

Because there is a recursive function behind the generation of numbers, we can control the sequence of the
values generated. It might be that you want to be able to reproduce the simulation results at another time.

To do this, we initialize the seed of the data-generating process:

set.seed(1)

Then generate 10 values from a normal distribution with mean of 10 and standard deviation of 4.

x <- rnorm(n = 10, mean = 10, sd = 4)

describe(x)

And then, if we do this again:

set.seed(1)

y <- rnorm(n = 10, mean = 10, sd = 4)

2
describe(y)

Then we get the same values again. You can absolutely verify this by printing the values of x and y:

Indicator (Bernoulli) Variables

A special case of a categorical variable is an indicator variable, sometimes referred to as a binary or dummy
variable. The underlying distribution of an indicator variable is called a Bernoulli distribution.

Suppose we are interested in evaluating the whether a flip of a coin would be a head or a tail. Here we
could define Head as the variable of interest.

We can simulate this random variable using a Binomial distribution. (Technically, the Bernoulli
distribution is a special case of a Binomial.) We need to set values for n = , size = , and prob = , where n is
the number of values you want to simulate, size in this case is 1 (as we want to simulate an indicator
variable), and prob, is the probability that you will flip a head (or tail, depending on your random variable).

Simulating indicator variables is completed using the rbinom function. Here, we simulate 5 values of heads
with a probability of 1/2 of getting a head on each flip (or a fair coin).

rbinom(n = 6, size=1, prob=0.5)

If you are simulating 6 coin flips using a fair coin, how many heads do you expect? What did you get?

Re-run the above code several times to see how the sequence of 6 coin flips varies.

Note: this is one way of simulating what is known as a Bernoulli random variable. You can also use a
function called rbernoulli that is part of the purrr package.

Uniform (Continuous Version)

Suppose instead of flipping a coin with 2 outcomes, you want to simulate test scores of students, where any
score between 0 and 100 is possible.

As this random variable

X is continuous, we define its probability with its density function:

3
In the fraction I include the two endpoints (0 and 100) to illustrate how you would adjust for other bounds.
These bounds are defined as the parameters of the uniform distribution.

As an aside, this definition makes sense, as if you are asked what is the chance that someone’s exam score
is between 0 and 25, you would say 1/4, as the length of the interval between 0 and 25 is one-quarter of the
length between 0 and 100.

To simulate one exam score we would use the following code:

runif(n = 1, min = 0, max = 100)

You could save the results in another variable for use.

dat <- runif(n = 100, min = 0, max = 100)

Then you can check on the mean (average) of the sample. What result to you expect to find?

mean(dat)

And provide a histogram of the data to see the frequency of occurence. (You can play the results to see
what happens is you lower the sample size or raise it to 5000.)

hist(dat)

Poisson Variables (Optional)

Another discrete distribution that you may learn is called the Poisson distribution that is used to predict
counts of some event that occur within a given time interval. For example, recording the number of car
accidents that you have in a year.

As you would guess, there is a R function for simulating this random variable. Here in addition to the
number of values to simulate, we just need the parameter for the mean (called lambda).

rpois(n = 6, lambda = 20)

Gamma Variables (Optional)

Another continuous distribution that you may learn is called the Gamma distribution. This distribution is
used for random variables that have some skewness and is not symmetrical, like the Normal Distribution.

The Gamma distribution requires a little more background to understand how to define the parameters.

There is a R function for simulating this random variable. Here in addition to the number of values to
simulate, we just need two parameters, one for the shape and one for either the rate or the scale. The rate is
the inverse of the scale. The general formula is: rgamma(n, shape, rate = 1, scale = 1/rate).

Given that α is the shape parameter and β is the rate or scale parameter, then if you are thinking of a
Gamma random variable, where the mean = α∗β , then you use the scale for simulating. Otherwise, you
use the rate.

4
We can verify these concepts with the following code.
set.seed(1)
rgamma(n = 5, shape = 3, scale = 2)
set.seed(1)
rgamma(n = 5, shape = 3, rate = 0.5)
And verifying the mean equal to 6 (shape*scale):
set.seed(1)
x <- rgamma(n = 1000, shape = 3, scale = 2)
mean(x)
Or verifying the mean equal to 6 (shape*1/rate):
set.seed(1)
x <- rgamma(n = 1000, shape = 3, rate = 0.5)
mean(x)
Compound Distribution (Optional)
An example of where simulation is useful is shown below. Suppose you have one variable,
X that is assumed to have a Poisson distribution with lambda = 3 and another random variable Y that is
assumed to have a Gamma distribution with shape = 3 and rate = 0.5.

A practical story where this scenario might be useful is that we want to explore the annual sum of costs of
individuals with car accidents. The number of annual car accidents is Poisson and the cost per car accident
is Gamma.

The code below contains a for-loop as discussed in Section 7.4 and data frames as discussed in Section 10.

The example assumes that there are 6 people to consider. The data frame is initialized in the first block of
code where one Poisson is simulated for the number of accidents, Gamma values are generated for the
number of accidents, and then summed over the total. The number of accidents and the sum is written to a
data frame. A second block of code contains a for loop that does the same thing, but adds the results to the
already created data frame.

The interim values x, y, z are printed so that you can verify the process and the final results.

x <- rpois(n = 1, lambda = 3)


y <- rgamma(n = x, shape = 3, rate = 0.5)
z <- sum(y)
print(x)
print(y)
print(z)
dat <- data.frame(x,z)
dat

for (count in 1:5){


x <- rpois(n = 1, lambda = 3)
y <- rgamma(n = x, shape = 3, rate = 0.5)
z <- sum(y)
print(x)
print(y)
print(z)
5
dat[nrow(dat) + 1,] <- list(x, z)
}
}

If the simulation is completed with more iterations, we can plot the new simulated distribution, along with
some summary statistics.

x <- rpois(n = 1, lambda = 3)


y <- rgamma(n = x, shape = 3, rate = 0.5)
z <- sum(y)
dat1 <- data.frame(x,z)

for (count in 1:999){


x <- rpois(n = 1, lambda = 3)
y <- rgamma(n = x, shape = 3, rate = 0.5)
z <- sum(y)
dat1[nrow(dat1) + 1,] <- list(x, z)
}

hist(dat1$z)
describe(dat1)

Simulation Modeling Method


Simulation modeling is a powerful technique used in various fields to mimic real-world systems or
processes and study their behavior under different conditions. In R, you can perform simulation modeling
using various packages and methodologies. Here, I'll introduce a basic simulation modeling method in R
using the simmer package, which is a versatile tool for discrete-event simulation.

Step 1: Install and Load the simmer Package

If you haven't already installed the simmer package, you can do so using the following command:
install.packages("simmer")
library(simmer)
Step 2: Define the Simulation Environment and Entities
In simulation modeling, you typically start by defining the entities that will interact within the simulated
system and the characteristics of the environment.
# Create a simulation environment
sim_env <- simmer()

# Define entities (e.g., customers)


sim_env %>% add_generator("customer", at(0), function() rnorm(1, mean = 10, sd = 2))

6
# Define resources (e.g., servers)
sim_env %>% add_resource("server", capacity = 2)
Step 3: Define Processes
Processes represent the activities that entities go through during the simulation. You can define the order
and logic of these processes.
sim_env %>% add_trajectory("customer_process") %>%
seize("server", 1) %>%
timeout(function() rnorm(1, mean = 5, sd = 1)) %>%
release("server", 1)
In this example, entities (customers) arrive at the simulation and go through a process of seizing a server,
waiting for a specified time, and then releasing the server.
Step 4: Run the Simulation
Now, you can run the simulation for a specified duration or until a particular condition is met.
sim_env %>% run(until = 100)
This command runs the simulation until a simulation time of 100 units.

Step 5: Collect and Analyze Data


During or after the simulation, you can collect data to analyze the results. For example, you might want to
calculate statistics related to system performance.
# Collect data from the simulation
data <- get_mon_arrivals(sim_env)

# Analyze the data (e.g., calculate mean waiting time)


mean_wait_time <- mean(data$waiting_time)
Step 6: Visualization and Reporting

Create plots and reports to visualize and communicate the simulation results effectively, using R's plotting
libraries such as ggplot2, plotly, or lattice.

Step 7: Iterate and Refine


Simulations often require multiple iterations to explore different scenarios or parameter values. Adjust the
model and rerun the simulation as needed to gain insights.

Simulation modeling in R can be as simple or as complex as your specific problem requires. You can build
more elaborate models with intricate logic and multiple entities interacting in different ways. The simmer
package, along with R's data manipulation and visualization capabilities, provides a flexible platform for
simulation modeling and analysis.
7
Example
install.packages("simmer")
library(simmer)
# Create a simulation environment
sim_env <- simmer()
# Define entities (e.g., customers)
sim_env %>% add_generator("customer", at(0), function() rnorm(1, mean = 10, sd = 2))
# Define resources (e.g., servers)
sim_env %>% add_resource("server", capacity = 2)
# Define processes (e.g., customer arrival and service)
sim_env %>% add_trajectory("customer_process") %>%
seize("server", 1) %>%
timeout(function() rnorm(1, mean = 5, sd = 1)) %>%
release("server", 1)
sim_env %>% run(until = 100)
# Collect data from the simulation
data <- get_mon_arrivals(sim_env)
# Analyze the data (e.g., calculate mean waiting time)
mean_wait_time <- mean(data$waiting_time)

Example 2
install.packages("simmer")
library(simmer)
# Create a simulation environment
sim_env <- simmer()
# Define entities (e.g., customers)
sim_env %>% add_generator("customer", at(0), function() rnorm(1, mean = 10, sd = 2))
# Define resources (e.g., servers)
sim_env %>% add_resource("server", capacity = 2)
sim_env %>% add_trajectory("customer_process") %>%
seize("server", 1) %>%
timeout(function() rnorm(1, mean = 5, sd = 1)) %>%
release("server", 1)
sim_env %>% run(until = 100)
# Collect data from the simulation
data <- get_mon_arrivals(sim_env)
# Analyze the data (e.g., calculate mean waiting time)
8
mean_wait_time <- mean(data$waiting_time)

Introduction to optimization

Optimization in R refers to the process of finding the best solution to a problem by systematically searching
for the optimal values of one or more variables within specified constraints. This can be applied to a wide
range of problems, from mathematical and statistical modeling to machine learning and data analysis. R
provides a variety of tools and packages to perform optimization tasks efficiently.
Types of optimization:
Unconstrained Optimization: This involves finding the minimum or maximum of a function without any
constraints on the variables.
In certain cases the variable can be freely selected within it’s full range. The optim() function in R
can be used for 1- dimensional or n-dimensional problems. The general format for the optim() function is -
optim(objective, constraints, bounds = NULL, types= NULL, maximum = FALSE)
We start off with an example, let’s define the objective function what we are looking to solve -
> f <- function(x) 4 * (x[1]-1)^2 + 7 * (x[2]-3)^2 + 30
>f
function(x) 4 * (x[1]-1)^2 + 7 * (x[2]-3)^2 + 30
Setting up the constraints next
> c <- c(1, 1)
>c
[1] 1 1
The optimization function is invoked
> r <- optim(c, f)
>r
$par
[1] 0.9999207 3.0001660
$value
[1] 30

$counts
function gradient
69 NA
$convergence
[1] 0
$message
NULL
9
Next we check if the optimization converged to a minimum or not. The easy way to do this is to check if
> r$convergence == 0
[1] TRUE
The optimization has converged to minimum. Finding out the input arguments for the optimization function
can be obtained by
> r$par
[1] 0.9999207 3.0001660
The value of objective function at the minimum is obtained by
> r$value
[1] 30

Constrained Optimization: Here, optimization is performed subject to certain constraints on the variables.
Constraints can be equality constraints (e.g., linear equations) or inequality constraints (e.g., bounds on
variables).
Optimization Packages in R:
R offers several packages for optimization, with some of the most commonly used ones being:
➢ optim(): A built-in optimization function for general-purpose optimization.
➢ nloptr: A package for nonlinear optimization with a wide range of algorithms and support for both
constrained and unconstrained problems.
➢ lpSolve: Useful for linear programming problems.
➢ quadprog: Designed for quadratic programming problems.
➢ DEoptim, GA, GenSA: Packages for global optimization using differential evolution, genetic
algorithms, and simulated annealing, respectively.
Using optim() for Unconstrained Optimization:
The optim() function is a versatile optimizer in R that can handle unconstrained optimization problems.
It takes the objective function to be minimized (or maximized) as an argument and allows you to specify
starting values for optimization.
Example :
# Define the objective function
objective_function <- function(x) {
return(x^2 + 2*x + 1)
}
# Perform optimization
result <- optim(par = 0, fn = objective_function, method = "BFGS")
print(result)
Using Optimization Packages for Constrained Optimization:
10
For constrained optimization problems, you may need to use specialized packages like nloptr.
You need to define your objective function and constraints, and then use the appropriate functions from the
package to perform optimization.
Example usage with nloptr for a simple constrained optimization problem
library(nloptr)
# Define the objective function
objective_function <- function(x) {
return(x[1]^2 + x[2]^2)
}
# Define inequality constraints
inequalities <- list("ineq" = function(x) {
return(x[1] + x[2] - 1)
})
# Perform constrained optimization
result <- nloptr(x0 = c(0, 0), eval_f = objective_function, eval_g_ineq = inequalities)
print(result)
Visualization: Visualization can be a valuable tool to understand the optimization process. You can create
plots to visualize the objective function, constraints, and the path taken by the optimizer.

Performance Considerations: Optimization can be computationally expensive, especially for complex


problems or large datasets. Consider the choice of optimization algorithm and fine-tuning its parameters for
better performance.

Optimization in R is a powerful tool for solving a wide range of problems, and the choice of optimization
method depends on the specific problem you are trying to address. Familiarity with R's optimization
packages and techniques is essential for data scientists, statisticians, and researchers working on
mathematical modeling and machine learning tasks.

Methods in Optimization

In R, there are various optimization methods and algorithms available for solving different types of
optimization problems. These methods can be broadly categorized into unconstrained optimization methods
and constrained optimization methods. Here's an overview of some commonly used optimization methods
in R:
Unconstrained Optimization Methods:
Gradient Descent: optim() with method = "BFGS" or method = "CG" (Conjugate Gradient) can be used
for gradient-based optimization.
11
These methods are suitable for smooth, differentiable functions.
Variants like "L-BFGS-B" include bounds on variables.
Nelder-Mead Simplex:
optim() with method = "Nelder-Mead" is a derivative-free optimization method.
It works well for functions with noisy or discontinuous derivatives.
However, it may converge slowly for high-dimensional problems.
Simulated Annealing:
Simulated annealing can be implemented using the GenSA package.
It is useful for global optimization and is often used when the objective function has multiple local minima.
Genetic Algorithms:
Genetic algorithms can be implemented using the GA package.
They are suitable for optimization problems with a large solution space and no clear gradient information.

Constrained Optimization Methods:


Sequential Quadratic Programming (SQP):
The quadprog package provides methods for solving quadratic programming problems.
SQP methods are suitable for convex constrained optimization problems.
Nonlinear Programming (NLP):
The nloptr package offers various optimization algorithms for constrained optimization.
It supports both local and global optimization with inequality and equality constraints.
Linear Programming (LP):
Linear programming problems can be solved using the lpSolve package.
LP is used when both the objective function and constraints are linear.
Mixed-Integer Linear Programming (MILP):
The lpSolve package also supports mixed-integer linear programming.
MILP is used when some variables are constrained to be integers.
Constraint Handling in Genetic Algorithms:

Genetic algorithms in the GA package can handle constraints using penalty functions or other techniques.
Bayesian Optimization:
The DiceOptim package provides Bayesian optimization methods, which are useful for optimizing
expensive and noisy functions.
Surrogate Optimization:
Surrogate optimization methods, like those implemented in the SANN package, use surrogate models to
approximate the objective function, making them suitable for expensive function evaluations.
Interior-Point Methods:

12
Interior-point methods can be used for large-scale linear and nonlinear programming problems. Packages
like ROI provide interfaces to interior-point solvers.
When choosing an optimization method in R, consider the nature of your objective function (e.g., smooth,
non-smooth, convex, non-convex), the presence of constraints, the dimensionality of the problem, and
computational resources available. It's often a good practice to experiment with different methods and
algorithms to find the one that works best for your specific optimization problem. Additionally, fine-tuning
parameters and incorporating problem-specific knowledge can often lead to better optimization results.
Linear Programming

Linear programming is a mathematical method that is used to determine the best possible outcome or
solution from a given set of parameters or list of requirements, which are represented in the form of linear
relationships. It is most often used in computer modeling or simulation in order to find the best solution in
allocating finite resources such as money, energy, manpower, machine resources, time, space and many
other variables. In most cases, the "best outcome" needed from linear programming is maximum profit or
lowest cost.
An example of a LP problem is -
Maximize or Minimize objective function: f(y1, y2) = g1.y1 + g2.y2
Subjected to inequality constraints:
g11.y1 + g12.y2 <= p1
g21.y1 + g22.y2 <= p2
g31.y1 + g32.y2 <= p3
y1 >= 0, y2 >=0
Step 1: Install and Load the lpSolve Package
If you haven't already installed the lpSolve package, you can do so using the following command:
install.packages("lpSolve")
Once installed, load the package:
library(lpSolve)
Step 2: Define the Objective Function and Constraints
A linear programming problem typically consists of an objective function to maximize or minimize
and a set of linear constraints. The objective function and constraints should be defined in a specific format
suitable for lpSolve.

13
In R, you can define this problem as follows:
# Objective coefficients
obj_coeffs <- c(5, 3)
# Coefficients matrix for constraints
constr_matrix <- matrix(c(2, 1, 1, 3), nrow = 2, byrow = TRUE)
# Right-hand side of constraints
rhs <- c(10, 12)
# Direction of inequalities (less than or equal)
direction <- c("<=", "<=")
# Variable lower bounds (x and y >= 0)
bounds <- list(lower = c(0, 0), upper = c(Inf, Inf))

Step 3: Solve the Linear Programming Problem


Now that you have defined the objective function and constraints, you can use the lp() function from the
lpSolve package to solve the linear programming problem:
# Solve the linear programming problem
lp_result <- lp("max", obj_coeffs, constr_matrix, direction, rhs, bounds = bounds)
# Print the results
print(lp_result)
The lp() function takes several arguments:

➢ "max" indicates that you want to maximize the objective function. You can use "min" for
minimization.
➢ obj_coeffs is the vector of coefficients in the objective function.
➢ constr_matrix is the coefficients matrix for the constraints.
➢ direction specifies the direction of inequalities.
➢ rhs is the right-hand side of the constraints.
➢ bounds sets the lower and upper bounds on variables.
14
Step 4: Interpret the Results
The lp() function returns a list with information about the optimization results. You can access the optimal
objective value and variable values using the $ operator:
# Optimal objective value
optimal_value <- lp_result$objval
cat("Optimal Value (Z):", optimal_value, "\n")
# Optimal variable values
optimal_variables <- lp_result$solution
cat("Optimal Variables (x, y):", optimal_variables, "\n")
This will give you the optimal value of the objective function and the values of the decision variables that
achieve this optimal value.
Example 1
A company wants to maximize the profit for two products A and B which are sold at $ 25 and $ 20
respectively. There are 1800 resource units available every day and product A requires 20 units while B
requires 12 units. Both of these products require a production time of 4 minutes and total available working
hours are 8 in a day. What should be the production quantity for each of the products to maximize profits?

A LP problem can either be a maximization problem or a minimization problem. The Above problem is a
maximization problem. Some of the steps that should be followed while defining a LP problem are -
Solution
➢ Identify the decision variables
➢ Write the objective function
➢ Mention the constraints
➢ Explicitly state the non-negativity restriction
As already defined, this is a maximization problem, first we define the objective function.
max(Sales) = max(25 y1 + 20 y2)
where,
y1 is the units of Product A produced
y2 is the units of Product B produced
y1 and y2 are called the decision variables
25 and 20 are the selling price of the products
We are trying to maximize the sales while finding out the optimum number of products to manufacture.
Now we set the constraints for this particular LP problem. We are dealing with both resource and time
constraints.
20y1 + 12 y2 <= 1800 (Resource Constraint)
4y1 + 4y2 <= 8*60 (Time constraint)
There are two ways to solve a LP problem
15
➢ Graphical Method
➢ Simplex Method
We will be solving this problem using the simplex method but in R. We shall also explain another example
with excel’s solver. There are a couple of packages in R to solve LP problems.
➢ lpsolve
➢ lpsolveAPI
Implementation in R using Lpsolve
Let’s use lpsolve for this problem. First we need to set the objective function, this has already been
defined.
> require(lpSolve)
Loading required package: lpSolve
> objective.in <- c(25, 20)
> objective.in
[1] 25 20
Creating a matrix for the constraints, we create a 2 by 2 matrix for this. And then setting constraints.

> const <- matrix(c(20, 12, 4, 4), nrow=2, byrow=TRUE)


> const
[,1] [,2]
[1,] 20 12
[2,] 4 4
> time_constraints <- (8*60)
> resource_constraints <- 1800
> time_constraints
[1] 480
> resource_constraints
[1] 1800
Now we are basically creating the equations that we have already defined by setting the rhs and the
direction of the constraints.
> rhs <- c(resource_constraints, time_constraints)
> rhs
[1] 1800 480
> direction <- c("<=", "<=")
> direction
[1] "<=" "<="
The final step is to find the optimal solution.
The syntax for the lpsolve package is -
16
lp(direction , objective, const.mat, const.dir, const.rhs)

> optimum <- lp(direction="max", objective.in, const, direction, rhs)


> optimum
Success: the objective function is 2625
> summary(optimum)
Length Class Mode
direction 1 -none- numeric
x.count 1 -none- numeric
objective 2 -none- numeric
const.count 1 -none- numeric
constraints 8 -none- numeric
int.count 1 -none- numeric
int.vec 1 -none- numeric
bin.count 1 -none- numeric
binary.vec 1 -none- numeric
num.bin.solns 1 -none- numeric
objval 1 -none- numeric
solution 2 -none- numeric
presolve 1 -none- numeric
compute.sens 1 -none- numeric
sens.coef.from 1 -none- numeric
sens.coef.to 1 -none- numeric
duals 1 -none- numeric
duals.from 1 -none- numeric
duals.to 1 -none- numeric
scale 1 -none- numeric
use.dense 1 -none- numeric
dense.col 1 -none- numeric
dense.val 1 -none- numeric
dense.const.nrow 1 -none- numeric
dense.ctr 1 -none- numeric
use.rw 1 -none- numeric
tmp 1 -none- character
status 1 -none- numeric

Now we get the optimum values for y1 and y2, i.e the number of product A and product B that should be
manufactured.

> optimum$solution
[1] 45 75

The maximum sales figure is -

> optimum$objval
[1] 2625

Integer Programming
Integer programming is an area of constrained optimization that involves optimizing a linear program (LP)
with the additional constraint that some or all variables are restricted to nonnegative integers. Integer
programming algorithms solve such problems by using LP relaxation, heuristic methods and k-opt
heuristics.
17
LP relaxation is a technique used in integer programming where one relaxes the integrality
condition on each variable so as to convert it into a standard LP problem which can be solved quickly.
Heuristic methods used for solving IPs include rounding off variables to the nearest integer, local search
techniques and branch & bound algorithms. The k-opt heuristic is another approach based on randomly
selecting points from the solution space while considering their objective value before making decisions
regarding their inclusion or exclusion within the final optimal solution.
The process of formulating these types of problems requires understanding how different constraints
interact together in order to achieve an overall goal; this could be compared to creating a bumper sticker out
of words as they must fit together perfectly in order to create something meaningful and useful. To do this
effectively, every decision made should contribute towards minimizing or maximizing an objective
function subject to various restrictions given by constraints in order for them to satisfy certain criteria.
Nonnegative integers are often chosen because they have no fractional values associated with them
meaning more accurate solutions can be obtained faster than if decimals were included instead.
Integer programming is a mathematical optimization technique used to solve problems where some or all of
the decision variables are required to be integers. You can perform integer programming in R using various
optimization libraries and packages. One popular package for this purpose is "lpSolve."
1.Install and load the lpSolve package:
install.packages("lpSolve")
library(lpSolve)
2.Define your objective function and constraints:
You need to specify the objective function you want to optimize and any constraints that must be satisfied.
For example, suppose you want to maximize the following linear objective function:
Maximize: 3x + 2y
Subject to the following constraints:
2x + y <= 8
x + 2y <= 6
x, y >= 0 (and integers)
You can represent this in R as follows:
# Coefficients of the objective function
obj_coef <- c(3, 2)
# Coefficients of the constraint matrix
A <- matrix(c(2, 1, 1, 2), nrow = 2, byrow = TRUE)
# Right-hand side of the constraints
rhs <- c(8, 6)
# Constraint types (<=)
con_types <- c("<=", "<=")
# Variable types (integer)
18
var_types <- c("I", "I")
# Create an LP problem
lp <- lp("max", obj_coef, A, con_types, rhs, all.bin = TRUE)
# Solve the LP problem
lp_solve_result <- solve.lp(lp)
# Print the results
print(lp_solve_result)
3.Interpret the results:
The solve lp function will return the optimal solution, including the values of decision variables
and the objective function value. If you have specified integer variables (using "I" in var_types), lpSolve
will find an integer solution if one exists.
EXAMPLE: Solving an Integer Programming Problem
Let's consider a simple integer programming problem: maximize 4x + 3y, subject to the constraints 2x + y
≤ 14, x + 2y ≤ 16, and x, y ≥ 0, where x and y are integers.
First, we need to formulate this problem in R using the lpSolve package:
Solution
library(lpSolve)
# Define the objective function
f.obj <- c(4, 3)
# Define the matrix of left-hand side coefficients
f.con <- matrix(c(2, 1, 1, 2), nrow=2, byrow=TRUE)
# Define the right-hand side coefficients
f.rhs <- c(14, 16)
# Define the direction of the constraints
f.dir <- c("<=", "<=")
# Solve the problem
solution <- lp("max", f.obj, f.con, f.dir, f.rhs, all.int=TRUE)
# Print the solution
print(solution$solution)
Enforcing Integrality Restrictions on Decision Variables
Enforce integrality restrictions on decision variables by specifying the variable types when defining your
optimization problem. You typically use the var. types argument in optimization packages to indicate which
decision variables should be integers. Here's how to enforce integrality restrictions using the lpSolve
package as an example:
1.Install and load the lpSolve package if you haven't already:
install.packages("lpSolve")
library(lpSolve)
19
2.Define your objective function, constraints, and variable types:
Let's use the same example as before, where we want to maximize the objective function 3x + 2y subject to
constraints, but this time we enforce that both x and y should be integers.
# Coefficients of the objective function
obj_coef <- c(3, 2)
# Coefficients of the constraint matrix
A <- matrix(c(2, 1, 1, 2), nrow = 2, byrow = TRUE)
# Right-hand side of the constraints
rhs <- c(8, 6)
# Constraint types (<=)
con_types <- c("<=", "<=")
# Variable types (both integer)
var_types <- c("I", "I")
# Create an LP problem with integer variable types
lp <- lp("max", obj_coef, A, con_types, rhs, types = var_types)
# Solve the LP problem
lp_solve_result <- solve(lp)
# Print the results
print(lp_solve_result)

By setting types = var_types in the lp function, you enforce that both x and y should take integer values.
The solver will find an optimal solution that satisfies this integrality requirement.
Make sure to adapt the code and specify the appropriate variable types for your specific integer
programming problem. You can use "I" for integer variables and "C" for continuous variables in the
var_types vector.
Enforce integrality restrictions on decision variables when solving optimization problems using various
optimization packages. I'll provide an example using the ROI (R Optimization Interface) package, which
provides a unified framework for solving optimization problems and allows you to specify integrality
restrictions on decision variables.
Here's how you can enforce integrality restrictions on decision variables using the ROI package:
1.Install and load the ROI package:
install.packages("ROI")
library(ROI)
2.Define your optimization problem:
Suppose you want to solve the following integer programming problem:
Maximize: 3x + 2y
Subject to:
20
2x + y <= 8
x + 2y <= 6
x, y >= 0 (integer)
You can define this problem in R as follows:
# Load the ROI plugin for lp_solve (or you can choose a different solver)
library(ROI.plugin.lp_solve)
# Create an empty optimization problem
op <- ROI_create_problem(type = "lp_solve")
# Define the decision variables and their integrality restrictions
x <- op$add_variable(name = "x", type = "integer", lb = 0)
y <- op$add_variable(name = "y", type = "integer", lb = 0)
# Define the objective function
op$set_objective(expression = 3 * x + 2 * y, sense = "max")
# Add constraints
op$add_constraint(lhs = 2 * x + y, sense = "<=", rhs = 8)
op$add_constraint(lhs = x + 2 * y, sense = "<=", rhs = 6)
# Solve the optimization problem
result <- ROI_solve(op)
# Extract and print the solution
solution <- ROI_solution(result)
print(solution)
In this code, we use the ROI.plugin.lp_solve package to interface with the lp_solve solver. You can
choose a different solver plugin based on your preference or the specific solver you have installed.

The type = "integer" argument when defining variables enforces integrality restrictions on the decision
variables x and y. When you solve the optimization problem, the ROI package will search for an integer
solution that satisfies the constraints.
Nonlinear Optimization Models
Nonlinear optimization is a powerful technique used to find the optimal solution to problems where
the objective function or constraints are nonlinear. R is a versatile programming language and environment
for statistical computing and data analysis, and it offers several packages and tools for solving nonlinear
optimization problems. Two popular packages for nonlinear optimization in R are optim and nloptr.
1.optim Package:
The optim function is a built-in optimization function in R that can be used to solve nonlinear optimization
problems. It can handle both unconstrained and constrained optimization problems.
# Example: Solve a simple unconstrained optimization problem
# Define the objective function
21
objective_function <- function(x)
{
return((x[1] - 2)^2 + (x[2] - 3)^2)
}
# Initial guess
initial_guess <- c(0, 0)
# Call the optim function to find the minimum
result <- optim(initial_guess, fn = objective_function, method = "BFGS")
# Print the result
print(result)
we defined an objective function and used the optim function with the "BFGS" method to find the
minimum of the function.
2.nloptr Package:
The nloptr package provides more advanced tools for nonlinear optimization, including a wide range of
optimization algorithms and the ability to handle both constrained and unconstrained problems.
# Install and load the nloptr package if not already installed
# install.packages("nloptr")
library(nloptr)
# Example: Solve a simple unconstrained optimization problem
# Define the objective function
objective_function <- function(x)
{
return((x[1] - 2)^2 + (x[2] - 3)^2)
}
# Initial guess
initial_guess <- c(0, 0)
# Define the optimization problem
problem <- nloptr(x0 = initial_guess,
eval_f = objective_function,
opts = list(algorithm = "NLOPT_LN_NELDERMEAD"))
# Solve the problem
result <- nloptr::nloptr(problem)
# Print the result
print(result)
we used the nloptr package to define an optimization problem and used the Nelder-Mead algorithm to find
the minimum.

22
These are just basic examples, and both optim and nloptr offer a wide range of options and algorithms for
solving more complex nonlinear optimization problems. Depending on your specific problem and
requirements, you may need to explore the documentation and choose the appropriate algorithm and
options to achieve the best results.
Install and Load the Package:
First, you need to install the nloptr package if you haven't already. You can do this using the
install.packages
install.packages("nloptr")
library(nloptr)
Define the Objective Function:
Create an R function that represents your objective function. This function should take a numeric vector as
input (the optimization parameters) and return a scalar value that you want to minimize or maximize.
objective_function <- function(x)
{
# Compute the value to be minimized
# Replace this with your actual objective function
return(x[1]^2 + x[2]^2)
}
Define Constraints (if any):
If your optimization problem has constraints, define them as functions that return either TRUE or FALSE
depending on whether the constraints are satisfied.
constraint_function <- function(x)
{
# Define your constraints here
return(x[1] + x[2] <= 1)
}
Set Optimization Parameters:
Create a list specifying the optimization parameters, including the objective function, constraints (if any),
and optimization algorithm options.
opt <- list(
"algorithm" = "NLOPT_LN_COBYLA", # Optimization algorithm
"xtol_rel" = 1e-6, # Relative tolerance for stopping
"maxeval" = 1000 # Maximum number of function evaluations
)
You can choose from various optimization algorithms provided by nloptr, such as COBYLA, SLSQP, etc.
Solve the Optimization Problem:
Use the nloptr function to solve the optimization problem:
23
result <- nloptr(
objective_function,
par = c(0, 0), # Initial guess for optimization parameters
lb = c(-Inf, -Inf), # Lower bounds for parameters (if any)
ub = c(Inf, Inf), # Upper bounds for parameters (if any)
eval_f = TRUE, # Evaluate the objective function
eval_g_ineq = TRUE, # Evaluate the inequality constraints (if any)
opts = opt # Optimization parameters
)
Retrieve Results:
You can access the optimized parameters and the optimal objective function value from the result object:
optimized_parameters <- result$solution
optimal_value <- result$objective
Forecasting Analytics
Forecasting analytics in R involves using statistical and machine learning techniques to make
predictions about future values based on historical data. R offers a wide range of packages and tools for
time series forecasting and predictive analytics.
Install and Load Relevant Packages:
Depending on your specific forecasting needs, you may need to install and load different R
packages. Common packages for time series forecasting and predictive analytics include forecast, prophet,
tsibble, tidyverse, and caret. Install and load the required packages

install.packages("forecast")
install.packages("prophet")
install.packages("tsibble")
install.packages("tidyverse")
install.packages("caret")

library(forecast)
library(prophet)
library(tsibble)
library(tidyverse)
library(caret)
Data Preparation:
Load and preprocess your historical data. Ensure that your data is in a suitable format for time series
analysis. Common steps include converting data to a time series object (e.g., using ts() or tsibble), handling
missing values, and ensuring a consistent time interval between observations.
24
Exploratory Data Analysis (EDA):
Conduct exploratory data analysis to understand the patterns and characteristics of your data. Visualize
your time series data using plots and summary statistics.
Time Series Decomposition:
Decompose your time series data into its constituent components, which typically include trend,
seasonality, and noise. You can use functions like decompose() or stl() for decomposition.
Model Selection:
Choose an appropriate forecasting model based on the characteristics of your data. Common models
include:
➢ Exponential Smoothing Models: Fit models like ETS (Error, Trend, Seasonality), which
can handle different combinations of these components.
➢ ARIMA Models: Use the auto.arima() function to automatically select the best ARIMA
model for your data.
➢ Prophet: For forecasting with seasonality and holidays, you can use the prophet package,
which simplifies the modeling process.
➢ Machine Learning Models: Explore machine learning algorithms like Random Forest,
Gradient Boosting, or neural networks if your data is complex or nonlinear.
Model Training:
Train your selected forecasting model on historical data. Use functions like forecast(), prophet(), or
train() (from the caret package) to fit the model.

Model Evaluation:
Evaluate the performance of your forecasting model using appropriate metrics such as Mean
Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), or others
depending on your specific problem.
Forecasting:
Use your trained model to make future forecasts. Depending on the chosen package and model, you may
use functions like forecast(), predict(), or the prophet package's functionality.
Visualize and Communicate Results:
Visualize your forecasts alongside historical data to communicate your findings effectively. You can use
autoplot() from the forecast package for quick visualizations.
Fine-Tuning and Monitoring:
Continuously monitor the performance of your forecasting model and consider fine-tuning or updating it as
new data becomes available.
Remember that the choice of the forecasting method and model selection should be guided by the
nature of your data and your specific forecasting goals. Different techniques may be more appropriate for
different types of time series data (e.g., univariate, multivariate, with or without seasonality).
25
Methods and Quantitative Approaches of Forecasting
The appropriate forecasting methods depend largely on what data are available.
If there are no data available, or if the data available are not relevant to the forecasts, then qualitative
forecasting methods must be used. These methods are not purely guesswork—there are well-developed
structured approaches to obtaining good forecasts without using historical data. These methods are
discussed.
Quantitative forecasting can be applied when two conditions are satisfied:
1. numerical information about the past is available;
2. it is reasonable to assume that some aspects of the past patterns will continue into the future.
There is a wide range of quantitative forecasting methods, often developed within specific disciplines for
specific purposes. Each method has its own properties, accuracies, and costs that must be considered when
choosing a specific method.
Most quantitative prediction problems use either time series data (collected at regular intervals over
time) or cross-sectional data (collected at a single point in time). In this book we are concerned with
forecasting future data, and we concentrate on the time series domain.
Time series forecasting
Daily IBM stock prices
Monthly rainfall
Quarterly sales results for Amazon
Annual Google profits
Anything that is observed sequentially over time is a time series. In this book, we will only consider
time series that are observed at regular intervals of time (e.g., hourly, daily, weekly, monthly, quarterly,
annually). Irregularly spaced time series can also occur, but are beyond the scope of this book.

When forecasting time series data, the aim is to estimate how the sequence of observations will
continue into the future. Figure 1.1 shows the quarterly Australian beer production from 1992 to the second
quarter of 2010.

The blue lines show forecasts for the next two years. Notice how the forecasts have captured the seasonal
pattern seen in the historical data and replicated it for the next two years. The dark shaded region shows

26
80% prediction intervals. That is, each future value is expected to lie in the dark shaded region with a
probability of 80%. The light shaded region shows 95% prediction intervals. These prediction intervals are
a useful way of displaying the uncertainty in forecasts. In this case the forecasts are expected to be accurate,
and hence the prediction intervals are quite narrow.
The simplest time series forecasting methods use only information on the variable to be forecast, and make
no attempt to discover the factors that affect its behaviour. Therefore they will extrapolate trend and
seasonal patterns, but they ignore all other information such as marketing initiatives, competitor activity,
changes in economic conditions, and so on.
Time series models used for forecasting include decomposition models, exponential smoothing models and
ARIMA models. These models are discussed in Chapters 6, 7 and 8, respectively.
Predictor variables and time series forecasting
Predictor variables are often useful in time series forecasting. For example, suppose we wish to forecast the
hourly electricity demand (ED) of a hot region during the summer period. A model with predictor variables
might be of the form
ED=f(current temperature, strength of economy, population,time of day, day of week, error).
The relationship is not exact — there will always be changes in electricity demand that cannot be accounted
for by the predictor variables. The “error” term on the right allows for random variation and the effects of
relevant variables that are not included in the model. We call this an explanatory model because it helps
explain what causes the variation in electricity demand.
Because the electricity demand data form a time series, we could also use a time series model for
forecasting. In this case, a suitable time series forecasting equation is of the form
EDt+1=f(EDt,EDt−1,EDt−2,EDt−3,…,error),
where t is the present hour, t+1+1 is the next hour, t−1−1 is the previous hour, t−2−2 is two hours ago, and

so on. Here, prediction of the future is based on past values of a variable, but not on external variables which may
affect the system. Again, the “error” term on the right allows for random variation and the effects of relevant
variables that are not included in the model.
There is also a third type of model which combines the features of the above two models. For example, it might
be given by

EDt+1=f(EDt, current temperature, time of day, day of week, error).


These types of mixed models have been given various names in different disciplines. They are known as
dynamic regression models, panel data models, longitudinal models, transfer function models , and linear system

models (assuming that f is linear).

An explanatory model is useful because it incorporates information about other variables, rather than only
historical values of the variable to be forecast. However, there are several reasons a foreca ster might select a time
series model rather than an explanatory or mixed model. First, the system may not be understood, and even if it
was understood it may be extremely difficult to measure the relationships that are assumed to govern its behavior.
Second, it is necessary to know or forecast the future values of the various predictors in order to be able to
27
forecast the variable of interest, and this may be too difficult. Third, the main concern may be only to predict
what will happen, not to know why it happens. Finally, the time series model may give more accurate forecasts
than an explanatory or mixed model.
The model to be used in forecasting depends on the resources and data available, the accuracy of the competing
models, and the way in which the forecasting model is to be used.
Forecasting is a crucial task in various fields, and R is a popular programming language for time series
forecasting due to its extensive libraries and packages. There are several methods and quantitative
approaches for forecasting in R.

Forecasting is a crucial task in various fields, and R is a popular programming language for time series
forecasting due to its extensive libraries and packages. There are several methods and quantitative
approaches for forecasting in R. Here are some common methods and packages you can use:
1.Exponential Smoothing:
The forecast package in R provides functions like ets() for Exponential Smoothing methods.
Example:
library(forecast)
fit <- ets(your_time_series_data)
forecasted_values <- forecast(fit, h = n) # Forecast the next 'n' periods
2.ARIMA (AutoRegressive Integrated Moving Average):
The forecast package also includes the auto.arima() function for automatic ARIMA model selection.
Example:
library(forecast)
fit <- auto.arima(your_time_series_data)
forecasted_values <- forecast(fit, h = n)
3.Prophet:
The prophet package is suitable for forecasting time series data with daily observations that may contain
missing data.
Example:
library(prophet)
df <- data.frame(ds = your_dates, y = your_values)
m <- prophet(df)
forecast <- predict(m, future)
4.VAR (Vector Autoregression):
For multivariate time series forecasting, you can use the vars or tsDyn packages.
Example:
library(vars)
your_data <- ts(your_matrix, start = start_date, frequency = frequency)
28
var_model <- VAR(your_data, p = lag_order)
forecasted_values <- predict(var_model, n.ahead = n)
5.Neural Networks and Deep Learning:
The forecast package, as well as other packages like keras, can be used to implement neural networks for
time series forecasting.
Example using keras:
library(keras)
model <- keras_model_sequential()
# Build and compile your neural network model
model %>% compile(...)
# Fit the model to your time series data
model %>% fit(...)
# Forecast using the trained model
forecasted_values <- model %>% predict(...)
6.Machine Learning Methods:
You can use machine learning algorithms like Random Forest, XGBoost, and Support Vector Machines for
time series forecasting using the randomForest, xgboost, and e1071 packages, respectively.
7.State Space Models:
You can use the KFAS package for state space modeling and forecasting.
Example:
library(KFAS)
model <- SSModel(your_time_series_data, ...)
forecasted_values <- KFS(model, ...)
8.Bayesian Time Series Forecasting:
The bsts (Bayesian Structural Time Series) package is useful for Bayesian time series modeling and
forecasting.
Example:
library(bsts)
model <- bsts(your_time_series_data, ...)
forecasted_values <- predict(model, horizon = n)
When performing time series forecasting in R, it's essential to preprocess your data, select an appropriate
method based on the characteristics of your data, and validate the model's performance using various
evaluation metrics. Additionally, make sure to update your R packages regularly, as new packages and
improvements are continually being developed.
Applied Forecasting Analytics Process

29
Applied forecasting analytics in R involves a systematic process that encompasses data preparation, model
selection, training, validation, and generating forecasts. Here's a step-by-step guide for conducting
forecasting analytics in R:

1.Data Preparation:
Import your time series data into R, which can be in various formats like CSV, Excel, or from a database.
Ensure the data is in a suitable format with a date-time index and numeric values.

2.Data Exploration and Visualization:


Explore your time series data to understand its patterns, trends, seasonality, and potential outliers.
Visualize the data using tools like ggplot2 or the base R plotting functions.

3.Data Preprocessing:
Handle missing values by imputation or interpolation.
Transform the data if needed (e.g., logarithmic transformation for stabilizing variance).
Decompose the time series into trend, seasonality, and residual components using methods like
decompose() or stl().

4.Model Selection:
Choose an appropriate forecasting method based on the characteristics of your data (e.g., Exponential
Smoothing, ARIMA, Prophet, etc.).
You can use automated methods like auto.arima() from the forecast package to help with model selection.

5.Model Training:
Fit the selected forecasting model to your data using the chosen R package.
Store the model parameters and any other necessary information.

6.Model Validation:
Split your data into training and testing sets to validate the model's accuracy.
Use appropriate metrics (e.g., Mean Absolute Error, Root Mean Squared Error) to assess the model's
performance.

7.Hyperparameter Tuning:
If your chosen model has hyperparameters (e.g., ARIMA order, Prophet parameters), perform
hyperparameter tuning to improve forecasting accuracy.
Techniques like grid search or optimization algorithms can be used for this purpose.

8.Cross-Validation (Optional):
Implement k-fold cross-validation to assess the model's generalization performance on multiple subsets of
your data.

9.Forecast Generation:
Apply your trained model to generate forecasts for future time periods.
Store the forecasted values in a suitable data structure.

10.Visualization and Reporting:


Visualize your forecasts alongside historical data.
30
Create reports or dashboards using packages like shiny or flexdashboard to share your findings with
stakeholders.

11.Evaluation and Refinement:


Continuously monitor and evaluate the forecasting model's performance, and make adjustments as needed.
Refine the model or consider retraining with updated data.

12.Deployment (if applicable):


If your forecasting model is part of an automated system, deploy it to make real-time predictions.

13.Documentation:
Maintain comprehensive documentation of your forecasting process, including the data sources,
preprocessing steps, model details, and evaluation results.

14.Communication:
Communicate your findings and forecasts to stakeholders in a clear and understandable manner.
R offers a rich ecosystem of packages for time series forecasting and analytics, so you can choose the
appropriate method based on the specific characteristics of your data and the requirements of your
forecasting task.

Applications, Evaluating Forecast Accuracy

Evaluating forecast accuracy is a critical step in assessing the quality of your forecasting models in R.
There are various methods and metrics available to measure forecast accuracy based on the nature of your
time series data and forecasting techniques. Here are some commonly used evaluation methods and R
packages for different forecasting applications:

1.Single-Series Forecast Accuracy:

For univariate time series, common metrics include Mean Absolute Error (MAE), Mean Squared Error
(MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). You can use
these metrics to evaluate forecasts generated by methods like ARIMA, Exponential Smoothing, and
Prophet.
library(forecast)
actual_values <- your_actual_time_series_data
forecasted_values <- your_forecasted_data

# Calculate MAE
mae <- mean(abs(actual_values - forecasted_values))

# Calculate RMSE
rmse <- sqrt(mean((actual_values - forecasted_values)^2))

# Calculate MAPE
mape <- mean(abs((actual_values - forecasted_values) / actual_values)) * 100
2.Multiple-Series Forecast Accuracy:
When dealing with multiple time series, you can calculate metrics like Weighted MAE or Weighted RMSE
to account for varying importance across series. The Mcomp package provides functions for evaluating
accuracy in this context.
31
library(Mcomp)
forecast_accuracy(your_forecasts, your_actuals)
3.Time Series Cross-Validation:
To evaluate forecast accuracy using cross-validation, you can split your time series data into training and
testing sets for multiple time periods, calculate metrics for each fold, and compute the average error.
library(forecast)
folds <- your_data_split_into_folds
accuracy_metrics <- numeric(length(folds))

for (i in 1:length(folds)) {
train_data <- your_training_data_in_fold_i
test_data <- your_testing_data_in_fold_i
forecast_model <- your_forecasting_method(train_data)
forecasted_values <- forecast(forecast_model, h = length(test_data))
accuracy_metrics[i] <- your_chosen_accuracy_metric(test_data, forecasted_values)
}
mean_accuracy <- mean(accuracy_metrics)
4.Probabilistic Forecasting:
If you are working with probabilistic forecasts, evaluation methods may include Quantile Loss or
Probability Integral Transform (PIT). You can use the forecast package and other specialized packages for
this purpose.
5.Classification-Based Forecast Accuracy:
In scenarios where forecasts are categorical (e.g., up, down, or stable), you can use classification metrics
like Accuracy, Precision, Recall, and F1-score.
library(caret)
confusion_matrix <- confusionMatrix(your_forecasts, your_actuals)
accuracy <- confusion_matrix$overall['Accuracy']
6.Evaluating Machine Learning Models:
When using machine learning models for time series forecasting, you can evaluate accuracy using
regression metrics (e.g., R-squared, Mean Absolute Error) or classification metrics if the forecast is
transformed into a classification problem.
7.Visualization:
Visualizing the actual vs. forecasted values through time series plots can provide valuable insights into
forecast accuracy.
library(ggplot2)
ggplot(data = your_data, aes(x = your_dates)) +
geom_line(aes(y = actual_values), color = "blue", linetype = "solid") +
32
geom_line(aes(y = forecasted_values), color = "red", linetype = "dashed") +
labs(title = "Actual vs. Forecasted", x = "Date", y = "Value")
Choosing the appropriate evaluation method depends on the specific characteristics of your data, the nature
of your forecasting problem, and the chosen forecasting method. It's common to use a combination of
multiple metrics to get a comprehensive view of forecast accuracy.
Survival Analysis – Introduction
Survival analysis, also known as time-to-event analysis, is a statistical technique used to analyze the time it
takes for an event of interest to occur. This event can be anything with a temporal component, such as the
time to failure of a machine, the time until a patient experiences a specific outcome, or the time until a
customer churns. In R, you can perform survival analysis using the survival package.
Here's a basic introduction to survival analysis in R:
1.Installing and Loading the survival Package:
If you haven't already, you can install the survival package from CRAN and load it into your R
environment:
install.packages("survival")
library(survival)
2.Survival Data Structure:
Survival analysis typically involves two primary variables:
Time to Event (time): The time it takes for an event to occur or a subject to experience the outcome of
interest.
Event Indicator (event): A binary variable that indicates whether the event occurred (event = 1) or not
(event = 0) at the given time.
You can create a Surv object, which is a special data structure used for survival analysis, with these
variables:
survival_data <- Surv(time, event)
3.Kaplan-Meier Estimator:
The Kaplan-Meier estimator is a non-parametric method for estimating the survival probability over time. It
can be used to create survival curves that show how the probability of survival changes over time.
# Creating a Kaplan-Meier survival curve
km_fit <- survfit(survival_data ~ 1) # ~ 1 indicates no covariates
plot(km_fit, main = "Kaplan-Meier Survival Curve")
4.Log-Rank Test:
The log-rank test is used to compare survival curves between two or more groups. It tests whether there is a
significant difference in survival probabilities between groups.
# Creating Kaplan-Meier survival curves for two or more groups
km_fit_group1 <- survfit(survival_data ~ group_variable)
km_fit_group2 <- survfit(survival_data ~ group_variable)
33
# Performing the log-rank test
log_rank_test <- survdiff(survival_data ~ group_variable)
5.Cox Proportional-Hazards Model:
The Cox proportional-hazards model is a popular method for estimating the effect of multiple covariates on
survival. It assumes that the hazard ratio is constant over time.
# Fitting a Cox proportional-hazards model
cox_model <- coxph(Surv(time, event) ~ covariate1 + covariate2, data = your_data)
summary(cox_model)
6.Survival Plots:
You can create survival plots for specific groups or covariate values using the survfit function and then
visualize the results using the plot function.
# Creating and plotting survival curves for specific groups
km_fit_group1 <- survfit(survival_data ~ group_variable, data = your_data)
plot(km_fit_group1, main = "Survival Curve for Group 1")
This is a basic introduction to survival analysis in R. Depending on your specific research question and
dataset, you may need to explore more advanced techniques and visualization methods, especially when
dealing with complex survival data. The survival package and other related packages in R offer a wide
range of tools for conducting in-depth survival analysis.
Motivating Business Problems
R is a versatile programming language for addressing various business problems. Below are some
motivating business problems that can be solved using R:
1.Demand Forecasting:
Use time series forecasting techniques to predict future demand for products or services.
Optimize inventory management and production scheduling based on forecasts.
Apply methods like ARIMA, Exponential Smoothing, or machine learning models for forecasting in R.
2.Customer Churn Prediction:
Analyze customer data to identify factors leading to churn.
Build predictive models to forecast which customers are likely to churn.
Develop retention strategies and targeted marketing campaigns to reduce churn.
3.Market Basket Analysis:
Analyze transaction data to identify associations between products frequently purchased together.
Use association rule mining techniques to improve cross-selling and product recommendations.
4.Credit Risk Assessment:
Evaluate the creditworthiness of loan applicants or customers.
Build credit scoring models using techniques like logistic regression, decision trees, or gradient boosting in
R.
34
5.Employee Attrition Prediction:
Analyze HR data to identify factors contributing to employee turnover.
Build predictive models to forecast employee attrition and develop retention strategies.
6.Sales and Revenue Forecasting:
Predict future sales and revenue based on historical data.
Optimize pricing, marketing, and sales strategies using forecasts.
Apply regression analysis, time series analysis, or machine learning models for forecasting in R.
7.Social Media Sentiment Analysis:
Analyze social media data to understand customer sentiments and opinions.
Use text mining and natural language processing (NLP) techniques to gain insights from unstructured text
data.
8.Fraud Detection:
Detect fraudulent activities in financial transactions, insurance claims, or e-commerce transactions.
Utilize anomaly detection, clustering, and classification models to identify unusual patterns and behaviors.
9.Recommendation Systems:
Build personalized recommendation systems for e-commerce, content, or streaming platforms.
Use collaborative filtering, content-based filtering, or hybrid models to provide tailored recommendations.
10.Supply Chain Optimization:
Optimize supply chain operations to reduce costs and improve efficiency.
Use optimization algorithms, simulation, and predictive analytics to enhance supply chain management.
11.Market Segmentation:
Segment customers or markets to tailor marketing strategies.
Apply clustering techniques to group similar customers based on behavior and demographics.
12.A/B Testing and Experimentation:
Conduct A/B tests to assess the impact of changes in products, websites, or marketing campaigns.
Use statistical analysis and hypothesis testing in R to draw conclusions from experiments.
13.Time-to-Event Analysis (Survival Analysis):
➢ Analyze time-to-event data for business problems like customer retention, product lifetimes, or
equipment maintenance.
➢ Use survival analysis techniques to model and predict event durations.
R's rich ecosystem of packages, statistical functions, and data visualization capabilities makes it a valuable
tool for addressing a wide range of business problems. Whether it's data analysis, predictive modeling, or
decision support, R can be applied effectively to drive data-driven solutions in various industries.
Methods of Survival Analysis
Survival analysis in R involves various methods and techniques for analyzing time-to-event data. Here are
some of the most common methods and R packages used for survival analysis:
1.Kaplan-Meier Estimator:
35
The Kaplan-Meier estimator is used to estimate the survival function and create survival curves that show
how the probability of survival changes over time.
R Package: survival
library(survival)
survival_object <- Surv(time, event)
km_fit <- survfit(survival_object ~ 1)
2.Log-Rank Test:
The log-rank test is used to compare survival curves between different groups to determine if there is a
statistically significant difference in survival times.
R Package: survival
logrank_test <- survdiff(survival_object ~ group_variable)
3.Cox Proportional-Hazards Model:
The Cox proportional-hazards model is a popular method for analyzing the effect of covariates
(independent variables) on survival times. It estimates hazard ratios.
R Package: survival
cox_model <- coxph(Surv(time, event) ~ covariate1 + covariate2, data = your_data)
4.Accelerated Failure Time (AFT) Model:
The AFT model is another parametric method for analyzing the impact of covariates on survival times. It
models the log of survival time as a linear function of covariates.
R Package: survival
aft_model <- survreg(Surv(time, event) ~ covariate1 + covariate2, data = your_data)
5.Nelson-Aalen Estimator:
The Nelson-Aalen estimator is used to estimate the cumulative hazard function, which is related to the
survival function.
R Package: survival
nelson_aalen <- survfit(Surv(time, event) ~ 1, type = "aalen")
6.Parametric Survival Models:
Parametric models, such as Weibull, Exponential, and Log-Normal, assume specific distribution functions
for the survival times. These models can be used when the data distribution is known or can be assumed.
R Package: survival, flexsurv
7.Time-Dependent Covariates:
You can analyze how the impact of covariates on survival times changes over time by including time-
dependent covariates in Cox models.
R Package: survival
8.Stratified Survival Analysis:
Stratified analysis is used to compare survival curves across different strata (subgroups) to account for the
effect of certain variables.
36
R Package: survival
cox_model_stratified <- coxph(Surv(time, event) ~ covariate1 + covariate2 + strata(stratification_variable),
data = your_data)
9.Competing Risks Analysis:
In cases where there are multiple competing events or outcomes, competing risks analysis can be performed
using methods like Fine-Gray models.
R Package: cmprsk
10.Survival Plots and Visualization:
Use various R packages (e.g., ggplot2, survminer) to create Kaplan-Meier survival plots, forest plots for
Cox models, and other visualizations for presenting results.
These methods provide a comprehensive toolkit for survival analysis in R. The choice of method depends
on the nature of your data, the research question, and the underlying assumptions about the data distribution
and the relationship between covariates and survival times.

*******************************UNIT IV END**************************

37

You might also like