How To Do A Logistic Regression in Excel
How To Do A Logistic Regression in Excel
Logistic regression is a statistical analysis technique for transforming a linear function’s output
into a probability value. Unlike linear regression, which predicts continuous outcomes, logistic
regression predicts the probability of an event occurring by using a logistic function to predict
the probability of a binary outcome. These types of predictions that categorize based on two
outcomes are called binary classification tasks.
For our example, you’ll perform a logistic regression in Excel to determine whether a college
basketball player is likely to get drafted into the NBA. Your dataset includes basic performance
metrics from the previous season:
Average points
Rebounds
Assists
Because logistic regression is a binary classification problem, the target prediction is a simple
binary classification value of the likelihood of being drafted:
0 = no
1 = yes
The first step is to create the tabular structure in Excel for holding your dataset and performing
calculations and transformations.
1. In a new Excel sheet, create four columns labeled ““Points,” “Rebounds,” “Assists,” and
“Drafted?”
Create a corresponding cell for each of your columnar variables—Points, Rebounds, Assists—to
hold your regression coefficients.
1. Skipping a row after the dataset, create three subsequent cells labeled B1, B2, and B3.
2. On the next row, create a cell for the logistic regression’s intercept.
3. Set all four of these values to 0.001 for now; we’ll optimize them in a later step.
Step 3: Create Columns for Coefficient Optimizations
Next we’ll create columns for optimizing the regression coefficients. We’ll need these to
calculate predictions in later steps, but for now we’ll focus on populating four new columns:
Logit: The logarithm of the odds of the probability p of a player getting drafted.
Log Likelihood: Goodness of fit, expressed as a negative number—the closer to zero, the
better.
1. Beginning in the first empty column to the right of the dataset, label the four subsequent
columns as follows: “logit,” “elogit,” “probability,” and “log likelihood.”
2. Calculate logit values by taking the logarithm of the odds of the probability (p) of a certain
event occurring:
In our example:
1. Use the formula =LN(G2) to easily derive the log likelihood values in Excel by placing this
formula into the first log likelihood cell and dragging the bottom right corner of the
highlighted cell to the last log likelihood cell to populate the column.
2. Sum up all the log likelihood values in order to derive the number to maximize to solve for
the regression coefficients. You do this easily by placing the formula =SUM(H2:H13) in the
cell below the last log likelihood cell.
Your spreadsheet should now look like this
Step 5: Solve For Regression Coefficients
The last step involves using Excel’s Solver add-in to automatically calculate the regression
coefficient estimates.
1. Install Excel’s Solver add-in by clicking first on the Home menu and then the Add-Ins menu.
3. Select the Data menu from the top-level navigation and click Solver on the right-hand side to
run the add-in.
Set Objective: select cell H14 with the sum of the log likelihoods
To: Max
By Changing Variable Cells: Select cells B15:B18 containing your regression coefficients
Draft? = 0
To get the probability of being drafted (Draft? = 1), simply reverse the regression coefficients
signs—for example, reverse the -4.643753 in the p(x=0) column for a positive 4.643753 value
in the p(x=1) column.
Now that you have your regression coefficient estimates, you can plug them into the probability
equation to find out whether a new player will get drafted. For this example, let’s say the new
player averages 15 points per game, 4 rebounds per game, and 6 assists per game. Again, the
formula for calculating the probability of being drafted is:
In this example, the formula would look like the following:
Evaluating this equation yields 0.66, or a 66 percent probability this new player will get drafted.
1. To calculate this in Excel, add the new player’s data to your Excel spreadsheet in a new row
to calculate their probability of getting drafted.
Evaluating this equation yields 0.66, or a 66 percent probability this new player will get drafted.
1. To calculate this in Excel, add the new player’s data to your Excel spreadsheet in a new row
to calculate their probability of getting drafted.
As you can see, the probability of the new player being drafted is also 66 percent, which lines up
with the previous manual calculation.
Logistic regression involves predicting the probability of a binary event occurring—for example,
success/failure, yes/no, churn/no churn). By definition, probability is a measure of the likelihood
of an event occurring, ranging from 0 (impossible) to 1 (certain).
Odds, on the other hand, express the likelihood of success compared to the likelihood of failure.
For example, if the probability of success is 0.8, the odds of success are 0.8 / (1 – 0.8) = 4. This
means there are four times as many favorable outcomes as unfavorable ones.
Log odds ratio is a calculation method for transforming these odds into a more workable range of
values. Specifically, the logistic regression model uses the sigmoid function—denoted as σ(z)—
to calculate the log odds ratio, or the logarithm of the odds of success. Mathematically, log odds
ratio is represented as: