0% found this document useful (0 votes)

24 views208 pages

Ilovepdf Merged

The document discusses linear and logistic regression, highlighting their differences in predicting continuous versus categorical outcomes. It details the steps for performing logistic regression, including calculating log-odds, odds ratios, and probabilities, as well as evaluating model performance through accuracy, precision, and sensitivity. Additionally, it explains binary and categorical data types, including their characteristics and examples.

Uploaded by

Vaastav Gera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views208 pages

Ilovepdf Merged

Uploaded by

Vaastav Gera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 208

Logistic Regression

Using Excel
Linear Regression and Multiple Regression

• Linear regression models are used to identify the relationship

between a continuous dependent variable and one or more
independent variables.
• When there is only one independent variable and one dependent
variable, it is known as simple linear regression, but as the number of
independent variables increases, it is referred to as multiple linear
regression.
• For each type of linear regression, it seeks to plot a line of best fit
through a set of data points, which is typically calculated using the
least squares method.
Logistic regression

• Similar to linear regression, logistic regression is also used to estimate

the relationship between a dependent variable and one or more
independent variables, but it is used to make a prediction about a
categorical variable versus a continuous one. A categorical variable
can be true or false, yes or no, 1 or 0, et cetera. The unit of measure
also differs from linear regression as it produces a probability,
How much data is required to run analysis

• While both models are used in regression analysis to make

predictions about future outcomes, linear regression is typically
easier to understand.
• Linear regression also does not require as large of a sample size as
logistic regression needs an adequate sample to represent values
across all the response categories. Without a larger, representative
sample, the model may not have sufficient statistical power to detect
a significant effect.
We are predicting employee performance score (y) based on hours of training (x).

Regression Equation: y=β0+β1x+ϵ

Let’s assume we have the following relationship between the two variables based on a dataset:
• Intercept (β0\beta) = 50
• Slope (β1) = 2
• Independent variable (hours of training, x) = 10
• The regression equation becomes:

Substituting the values:

• y=50+2(10)
• Y = 70

Interpretation: For an employee who received 10 hours of training, their predicted performance
score is 70.
• If y = mx + b, then m is the slope and b is the y-intercept (i.e., the value of y when
x = 0).

The simple linear regression model is essentially a linear equation of the form
y = c + b*x;
• where y is the dependent variable (outcome),
• x is the independent variable (predictor),
• b is the slope of the line; also known as regression coefficient
• c is the intercept

So, to calculate further in the given data set:

• Y = intercept + b1*b1 coefficient + b2*b2 coefficient + b3*b3 coefficient + b4*b4
coefficient + b5*b5 coefficient + b6*b6 coefficient + b7*b7 coefficient + b8*b8
coefficient + b9*b9 coefficient + b10*b10 coefficient
Logistic Regression

Step 1: Log (odds) or Logit = b0 + b1X

• The natural logarithm of the odds is known as log-odds or

logit.

• A link function is simply a function of the mean of the response

variable Y that we use as the response instead of Y itself.
• All that means is when Y is categorical, we use the logit of Y as
the response in our regression equation instead of just Y:
How to calculate Logit
• Odds express the number of favorable and unfavorable outcomes in a
situation

• ODDS = Probability of success

Probability of failures
Step 2: EXP(logits)
• Log odds can be difficult to make sense of within a logistic regression
data analysis. As a result, exponentiating the beta estimates is
common to transform the results into an odds ratio (OR), easing the
interpretation of results.
• The OR represents the odds that an outcome will occur given a
particular event, compared to the odds of the outcome occurring in
the absence of that event.
• odds ratio can be denoted as the following: the odds of a success
changes by exp(cB_1) times for every c-unit increase in x.
Step 3: Calculating probability, Probability = Odds
1 + odds
• Probabilities range from zero to one, i.e., p∈[0,1], whereas logits can
be any real number from minus infinity to infinity; L∈(−∞,∞)
• A probability of 0.5 corresponds to a logit of 0. Negative logit values
indicate probabilities smaller than 0.5, positive logits indicate
probabilities greater than 0.5.
Interpreting logistic regression

• If the OR is greater than 1, then the event is associated with a higher

odds of generating a specific outcome. Conversely, if the OR is less
than 1, then the event is associated with a lower odds of that
outcome occurring. Based on the equation from above, the
interpretation of an odds ratio can be denoted as the following: the
odds of a success changes by exp(cB_1) times for every c-unit
increase in x.
Probability vs Likelihood

• Probability deals with the possibility of a random experiment

occurring. The term "probability" refers to the possibility of
something happening.

• The term Likelihood refers to the process of determining the best

data distribution given a specific situation in the data.
• Step 4: Probabilities of correct match

• Step 5: Loglikelihood
A measure of how well a statistical model fits a given dataset.
Defined as the logarithm of the likelihood function, which is the
probability of observing the data given the model parameters

• Step 6: Sum of Likelihood

• When calculating the probability of a given outcome, you assume the
model's parameters are reliable.

• However, when you calculate the likelihood, you’re attempting to

determine whether the parameters in a model can be trusted based
on the sample data you have observed.
Example

• Suppose you have an unbiased coin. If you flip the coin, the
probability of getting head and a tail is equal, which is 0.5
• Now suppose the same coin is tossed 50 times, and it shows heads
only 14 times. You would assume that the likelihood of the unbiased
coin is very low. If the coin were fair, it would have shown heads and
tails the same number of times.
• When calculating the probability of coin getting heads, you assume
that P(head) = 0.5
• However, when calculating the likelihood, you are trying to find if the
model parameter (p = 0.5) is correctly specified or not.
• The fact that a coin only lands on heads 14 times out of 50 makes you
highly suspicious that the true probability of a coin landing on heads
on a given toss is p = 0.5.
• Step 7: Use solver to maximize value , it will generate coefficients now

• Step 8 : Prediction – If formula, that calculated P value is more than cut off value
(0.5), put 1 else 0

• Step 9: Define Name – Status column as “Actual” and Prediction column as

“Predicted”

• Step 10: Prepare Confusion Matrix: Using actual and prediction – use Countifs
function

• Step 11: Calculate accuracy, precision and sensitivity percentage

• Step 12: Transpose and calculate value of accuracy, precision and
sensitivity at 0.1, 0.2 ….0.9 probabilities- use what if analysis function.

• Step 13: Insert graph – plot the values - scalar plot

• Step 14: Calculate true positives and false positives value percentage
• Step 15: Transpose and calculate value of accuracy, precision and
sensitivity at 0.1, 0.2 ….0.9 probabilities- use what if analysis function.
• Step 16: Insert graph – plot the values – ROC curve
• Accuracy = How often the model predicted correctly. The ratio of the
true cases to all the cases.
• Precision = How often the model predicted the event to be positive
and it turned out to be true. It would be the ratio of True Positive to
cases that were predicted positive.

• Recall or Sensitivity is the Ratio of true positives to total (actual)

positives in the data
• Accuracy is how correctly you predict all events.

• Sensitivity is from all actual positive events how good are you in
predicting positive events.

• Precision is from all positively predicted events how much correctly

you predicted.

• Specificity is how good are you in predicting negative events.

How do we know which one to go for?

• More interested in False Positive cases, go for precision.

• If False Negative cases are what you are looking for then recall is a
good measure.

• Accuracy is better if you have a biased scenario. If it’s unbiased —

Precision and Recall.
• What-If Analysis is the process of changing the values in cells to see
how those changes will affect the outcome of formulas on the
worksheet.

• False positive is making mistake and true positive means making the
right choice, so TP should be high (maximize) and FP should be low
(minimize)
Receiver Operating Characteristic curve (ROC)

• ROC curves in logistic regression are used for determining the best
cutoff value for predicting whether a new observation is a "failure" (0)
or a "success" (1).

• An ROC curve shows the relationship between clinical sensitivity and

specificity for every possible cut-off.
Logistic Regression

Step 1: Log (odds) or Logit = b0 + b1X

• The natural logarithm of the odds is known as log-odds or

logit.

• A link function is simply a function of the mean of the response

variable Y that we use as the response instead of Y itself.
• All that means is when Y is categorical, we use the logit of Y as
the response in our regression equation instead of just Y:
Step 2: EXP(logits)
• Log odds can be difficult to make sense of within a logistic regression
data analysis. As a result, exponentiating the beta estimates is
common to transform the results into an odds ratio (OR), easing the
interpretation of results.
• The OR represents the odds that an outcome will occur given a
particular event, compared to the odds of the outcome occurring in
the absence of that event.
• odds ratio can be denoted as the following: the odds of a success
changes by exp(cB_1) times for every c-unit increase in x.
Step 3: Calculating probability, Probability = Odds
1 + odds
• Step 4: Probabilities of correct match

• Step 6: Sum of Likelihood

• Step 7: Use solver to maximize value , it will generate coefficients now

• Step 8 : Prediction – If formula, that calculated P value is more than cut off value
(0.5), put 1 else 0

• Step 9: Define Name – Status column as “Actual” and Prediction column as

“Predicted”

• Step 10: Prepare Confusion Matrix: Using actual and prediction – use Countifs
function

• Step 11: Calculate accuracy, precision and sensitivity percentage

• Step 12: Transpose and calculate value of accuracy, precision and
sensitivity at 0.1, 0.2 ….0.9 probabilities- use what if analysis function.

• Step 13: Insert graph – plot the values - scalar plot

Course Instructor:
Amita Shivhare
Binary Data

Binary data is a type of categorical data that can take only two
possible values or categories.

Characteristics:
•It has only two distinct categories or classes.
•Often coded as 0 or 1, True or False, Yes or No, On or Off,
Success or Failure.
•The categories are usually mutually exclusive and exhaustive (i.e.,
each observation falls into one and only one category).
Binary Data

Example:
•Survival chances (in the binary case: Yes = 1, No = 0).
•A light switch (On = 1, Off = 0).
•Pass/Fail on an exam.

Common Use: Binary data is often used in logistic regression,

binary classification problems, and yes/no type decisions.
Categorical Data
• Categorical data refers to data that can take on more than two distinct
categories. The categories are typically qualitative in nature and do not
have an inherent numerical order (though sometimes they can be ordinal).

Characteristics:
• Can have two or more categories (including binary data as a special case of
categorical data).
• Categories are usually labels and represent different groups or types.
• The categories do not imply any kind of ranking or order (in case of
nominal data).
Categorical Data
Types of Categorical Data:
1. Nominal Data: Categories without a meaningful order (e.g., Colors:
Red, Blue, Green).
2. Ordinal Data: Categories with a meaningful order, but the difference
between them is not measurable (e.g., Education level: High school,
Bachelor's, Master's).
Categorical Data
Example:
• Car brands (Toyota, Ford, BMW, etc.).
• Types of fruits (Apple, Orange, Banana).
• Blood type (A, B, AB, O).

• Common Use: Categorical data is used in classification models (e.g.,

decision trees, multinomial logistic regression) and grouping analysis.
Discrete Variable
• Data that can take on specific, countable values, often representing whole
numbers.

Characteristics:
• Countable: Represents distinct, separate values that can be counted, like
the number of employees or departments.
• No fractions: The variable does not take fractional values, only whole
numbers.
• Finite or Infinite: There may be a finite number of options
• Example: the number of employees in a team, or an infinite but countable
set (e.g., number of training sessions attended by an employee, Job
applications received, Number of interviews conducted, number of hires).
Continuous Variable
• Represents data that can take any value within a range, usually measuring
something with a high degree of precision.
Characteristics:
• Measured: Represents measurable quantities that can vary continuously,
such as salary or work hours.
• Fractions possible: Can take fractional or decimal values (e.g., an
employee’s working hours can be 8.5 hours).
• No gaps: There are no gaps between the values; for example, an
employee’s age could be 29.2 years.
Continuous Variable
Examples
• Salary: The annual or monthly salary of employees (e.g., $50,000).
• Years of experience: The years of experience an employee has (e.g., 7.5
years in the industry).
• Working hours: The number of hours worked per week by an employee
(e.g., 40.5 hours).
• Performance rating: Employee performance measured on a scale (e.g.,
performance rating of 4.3 on a scale of 1 to 5).Age of employees: Age of
employees within the organization (e.g., 32.5 years old)
Aspect Discrete Variable Continuous Variable
Type of Values Countable, distinct, and Can take any value within a range,
separate values. including decimals/fractions.
Examples - Number of employees. - Employee salary.
- Number of job applications - Working hours per week.
received. - Years of experience.
- Number of training sessions
completed.
Measurement Typically counted (e.g., number Typically measured (e.g., salary,
of hires or interviews). working hours, performance
rating).
Gaps Between Yes, there are gaps between No, the values form a continuum
Values values (e.g., 1, 2, 3 hires). (e.g., 40.5 hours of work).
Aspect Binary Data Categorical Data

Number of Always two categories (e.g., Two or more categories (e.g., Red,
Categories 0/1, Yes/No) Blue, Green)
Special case of categorical Includes binary and multi-class
Nature data categories

Example Pass/Fail, On/Off, True/False Car brands, Blood types, Color

Ordering No ordering, always two Can be nominal (no order) or ordinal
choices (ordered)

Applications Binary classification, Logistic Multinomial classification, Grouping

regression analysis
Different Regression Techniques

are used for

Predictive Modeling
Linear Regression

 Purpose:
Used when you want to predict a continuous dependent variable
based on one or more continuous or categorical independent
variables.
 Example in HRM:
Predicting an employee's salary based on their years of experience,
education level, and job role.
 Type of Prediction:
Predicts continuous outcomes (e.g., a numerical value like salary or
performance score).
Multiple Linear Regression

• Purpose:
Similar to simple linear regression but involves multiple independent
variables to predict a continuous outcome.

• Example in HRM:
Predicting an employee's performance rating based on age, education
level, years of experience, and training hours.

• Type of Prediction:
Predicts continuous outcomes from multiple predictors.
Logistic Regression

 Purpose:
Used when you want to predict a binary categorical outcome (two
categories, like 0/1, Yes/No) based on one or more independent variables.
 Example in HRM:
Predicting whether an employee will leave the organization (attrition)
based on factors like work hours, job satisfaction, and performance.
 Type of Prediction:
Predicts the probability of an event occurring, with the outcome being
either 0 or 1 (e.g., employee stays or leaves).
Multinomial Logistic Regression

 Purpose:
Used when the dependent variable has three or more categories that are
not ordered, predicting which category an observation will fall into.
 Example in HRM:
Predicting an Employee job role (e.g., Manager, Analyst, Associate),
Reason for leaving (e.g., better salary offer, relocation, career change).
 Type of Prediction:
Predicts the probability of multiple categorical outcomes.
Polynomial / Exponential Regression

 Purpose:
An extension of linear regression where the relationship between the
independent variable(s) and the dependent variable is non-linear. This is
used when the data shows a curved or non-linear pattern.
 Example in HRM:
Predicting an employee's performance rating based on their experience,
where the relationship may not be a straight line (e.g., performance
increases rapidly in early years, stabilizes, and then declines).
 Type of Prediction:
Predicts continuous outcomes where the relationship is not linear.
Ordinal Regression

• Purpose:
When the dependent variable is ordinal—meaning it has categories with a
natural ranking but unknown or unequal distances between categories. The
goal is to predict which category an observation (e.g., an employee) will fall
into based on several independent variables.
• Example:
Predicting Employee Satisfaction Levels (low, medium, high)
Employee Performance Rating (e.g., below average, average, above average)

• Type of Prediction
Predicts the probability of an employee being in a specific category (e.g.,
high satisfaction).
Ridge, Lasso, and Elastic Net Regression

 Purpose:
These are advanced techniques that modify the linear regression model by
applying penalties to prevent overfitting. They are useful when there are
many predictor variables or when the model has multicollinearity
(correlated predictors).
 Example in HRM:
Predicting employee engagement levels based on a large number of
factors like job role, location, compensation, age, work-life balance, etc.,
where some predictors may not be useful and should be minimized in the
model.
 Type of Prediction:
Predict continuous outcomes while controlling for model complexity.
Time Series Regression

•Purpose:
Used when the data is collected over time, and the goal is to make
predictions based on past trends.

•Example in HRM:
Predicting employee turnover rate or absenteeism in the future months
based on historical data.

•Type of Prediction:
Predicts continuous outcomes that vary over time, with dependencies
between time periods.
When to Use Regression for Predictive Modeling:

•When the outcome is continuous (e.g., salary, performance score):

Linear or polynomial regression models are ideal.

•When the outcome is categorical (e.g., employee attrition – stay/leave):

Logistic or multinomial logistic regression is used.

•When there are many predictors and potential for multicollinearity:

Ridge, Lasso, or Elastic Net regression may be used to fine-tune
predictions.

•When the data follows a time pattern (e.g., monthly absenteeism rates):
Time series regression is appropriate.
Other Predictive Techniques (Beyond Regression):

• While regression is widely used for predictive modeling, other

techniques such as decision trees, random forests, support vector
machines (SVM), and neural networks can also be used for more
complex or non-linear relationships.

• These models can be particularly useful when the relationships

between variables are intricate, or the dataset is large and
unstructured.
Summary of Statistical tests based on types of variables
People Analytics
Session: 6

Course Instructor:
Amita Shivhare
Case: HR Analytics at Barney

• What is happening in this company? What problem is it facing?

• As HR managers, what do we want to prove to the board?

“We want to prove that the company’s decline in performance is

due to demotivation and stress.”
Case Questions

How do we want to prove it?

What type of data do we have?

Type of Data

1. Objective Data 2. Subjective Data

• Objective performance measures
include information on salaries, • Are influenced by the observer’s
gender, tenure, level of education,
age, number of sick days, among
personal judgment of how the
many others. skill was performed. These
measures are often criticized
and scrutinized as they are open
• These measures are not subject to
personal opinion or interpretation to interpretation and opinion
of results
Questionnaire Design

What do you think about this questionnaire

in particular?
Analysis
• Besides the questionnaire answers, the HR manager also has the following
available:
 Gender: This is a dummy variable, with a value of 1 if the employee is
female and 0 if male.
 JobTenure 2017: The years the employee has been with the organization.
 PerfoRating2017: Performance rating (1–5) by the employee’s head of
department, given in January 2017.
 PerfoRating2018: Performance rating (1–5) by the employee’s head of
department, given in January 2018.
 SickDays2017: Total number of sick days taken by the employee in 2017.
Timeline of Data Gathering
Analysis
• What would the dependent variable be?
• What would we like to explain?

• “We want to prove that the company’s decline in performance is

due to demotivation and stress.”
(Dependent variable / Y) = PerfoRating2017 − PerfoRating2018
Correlation Matrix
Regression Analysis

Y (Dependent variable)

= Difference in performance rating (2018-2017)

X (independent variable) = All variables

R-squared (R²)

• Interpretation: R-squared explains the proportion of variance in the

dependent variable that is explained by the independent variables.

• Range: It ranges from 0 to 1.

• High R²: The model explains a large proportion of the variance in the
dependent variable.
• Low R²: The model explains a small proportion of the variance.

• Example: If R² = 0.75, then 75% of the variation in the dependent variable is

explained by the independent variables in the model.
Adjusted R-squared

• Interpretation: Adjusted R-squared adjusts for the number of

predictors in the model and is more accurate when multiple
independent variables are present.

• It can be lower than R² if unnecessary variables are added to the

model.
Coefficients (β or B)

• Interpretation: The coefficient values represent the change in the

dependent variable (outcome) for every one-unit change in the
independent variable, assuming all other variables are held constant.

Positive coefficient: The independent variable has a positive relationship

with the dependent variable (when X increases, Y increases).
Negative coefficient: The independent variable has a negative relationship
with the dependent variable (when X increases, Y decreases).
Example: If a coefficient is 2.5 for "hours studied" in predicting exam
scores, it means that for every additional hour of study, exam scores
increase by 2.5 points, all else being equal.
Standard Error (SE)

• Interpretation: The standard error measures the accuracy of the

coefficient estimate. A smaller SE suggests that the coefficient
estimate is more precise.

• Usage: Standard errors are used to calculate confidence intervals and

test hypotheses about the coefficients.
t-Statistic

• Interpretation: The t-statistic is calculated by dividing the coefficient

by its standard error. It is used to test whether a coefficient is
significantly different from zero.

• High t-statistic: Suggests that the independent variable has a

significant effect on the dependent variable.
• Rule of Thumb: A t-value greater than +2 or less than -2 is often
considered significant.
p-Value

• Interpretation: The p-value tells you the probability that the

coefficient is different from zero by chance.

• Low p-value (typically < 0.05): The coefficient is statistically

significant, meaning the independent variable likely has an effect on
the dependent variable.
• High p-value (> 0.05): The coefficient is not statistically significant,
and the independent variable likely doesn't affect the dependent
variable.
CYNET SYSTEMS: READY TO LEVERAGE MILEAGE FROM
HUMAN RESOURCE ANALYTICS?

• Help Sharma develop a report that can be presented to Cynet management

about the recruiters’ performance based on the data from 2019 to 2021.
What was the total headcount growth of Cynet recruiters over the three
years?
What is the recruiter headcount worked as part of Cynet Systems while for
Cynet Health.
What all sectors have the the lowest number of onboards during the three-
year time frame?
Which sector has the highest conversion ratio?
• Recruiter resource allocation?
• Offer-to-join ratio?
• trainee recruiters (freshers) hired by Cynet sector wise?
• sixty-three different titles ?
• employee turnover?
• What per cent of the employee population drives the overall
recruitment throughput of offers and starters?
• Work from home productivity comparison?
Recommendations and Next Steps

• The number of offers to joiners (onboards) declined by almost 34 per cent in 2021,

• building a stronger relationship with the candidates’ post-offer before they onboard at the
clients’ end

• Cynet looks at reducing the number of client industries serviced, particularly where
onboarding numbers have been less than 1 per cent

• Based on the analysis of submissions versus offers, health care has the highest conversion
ratio, of about 17 per cent; the hiring conversion for IT infrastructure and IT services is
around 11 per cent; software implementation conversion is only at 5 per cent—this
indicates a specialist requirement in these roles, which are typically hard to find given the
specific combination of skills.
• Reducing the number of role titles will help the organization maintain consistency
in job descriptions, expectations, and remuneration. This change will also ensure
there is fairness and equity at the organization. It will also help Cynet reduce
hiring, training, and compensation costs, including payroll administration.
• These clients are in the BFSI, education and learning, engineering, travel and
hospitality, utilities, wireless and IT infrastructure, retail, logistics, and IT services
sectors, specifically (industries with the lowest number of onboards during the
three-year time frame).
• Reassigning the workforce can help Cynet focus more on clients where it has had
better success with onboards, which can increase revenue for the organization
• Overall, having fewer role titles for employees doing the same or similar jobs can
help provide greater clarity, consistency, and cost savings in the organization.
Employee Turnover

• A quarter of recruiters were inactive for over a month, indicating a

misalignment between the role and actual work to be done.

• The employee turnover is at such a high rate as to require a separate

study.

• The average tenure of the recruiters working continuously in Cynet is

only about six months.

• Cynet must work on interventions to help employees stay longer in the

organization.
per cent of the employee population drives
 The data shows that roughly 5 per cent of the employee population drives the
overall recruitment throughput of offers and starters, considering submission
data.

 Cynet can track what happens with the remaining 95 per cent of the recruiter
population, who have continued to work but haven’t had much impact given their
low number of onboards.

 Cynet’s HR department needs to identify the reasons for the low performance
among this population and consider training interventions to help build skills and
expertise.

 By looking at the prior years of experience and expertise in the industry, Sharma
can seek to realign the workforce.

 Cynet can look at the overall recruitment strategy to help assess the effectiveness
of its recruitment channel, job description, client briefing and documentation, and
candidate screening and assessment to build a better candidate pool.
• In the case of Linear Regression, the outcome is continuous while in the
case of Logistic Regression outcome is discrete (not continuous)
• To perform Linear regression we require a linear relationship between the
dependent and independent variables. But to perform Logistic
regression we do not require a linear relationship between the dependent
and independent variables.
• Linear Regression is all about fitting a straight line in the data
while Logistic Regression is about fitting a curve to the data.
• Linear Regression is a regression algorithm for Machine Learning
while Logistic Regression is a classification Algorithm for machine learning.
• Linear regression assumes Gaussian (or normal) distribution of the
dependent variable. Logistic regression assumes the binomial distribution
of the dependent variable
People Analytics
Session: 4
Descriptive Analytics

Course Instructor:
Amita Shivhare
Case: Talent Acquisition Group at HCL Technologies:
Improving the Quality of Hire Through Focused Metrics
1. Did the Talent realignment really improve the
recruitment experience for all critical stakeholders of HCL?

• What is the objective of structural realignment?

• Who are the key stakeholders of TAG?
• Whether the structure helped TAG achieve its purpose and goals?
What is the structure earlier before Narayan joined HCL?
• What is the new structure? How it helps? What was the intention
behind?
• Satisfaction of stakeholders?
Case Questions

• Did the Talent realignment really improve the recruitment experience for
all critical stakeholders of HCL?

• In your opinion, which of the metrics of TAG are truly relevant from a
business partner point of view?

• How did the POFU gamification help TAG improve its fulfilment metrics?

• What, in your opinion, constitutes a true "Quality of Hire" and how should
organizations measure this? What was TAG doing to measure this?
Profit vs Profitability

Let’s take 2 companies, both with a Rs.10,000 profit.

• Company A earned Rs.50,000 in one month, and its expenses were

Rs.40,000.
• Company B earned Rs.20,000 with its expenses at Rs.10,000.

Although the two have equal profits, they don’t have the same profitability.
Following the equation above.

• Company A’s profitability = (Rs.10,000/Rs.50,000) * 100, which is then

equal to 20%.

• Company B’s profitability = (Rs.10,000/Rs.20,000) * 100 = 50%.

What are HR Metrics?

• HR metrics quantify the impact and measure the success of

human resource programs and processes

• How HR activities contribute to business performance

• Formulas that are used to show the effectiveness and

efficiency of the HR department
Levels of Metrics

Efficiency is about doing things right

(optimizing resources)

Effectiveness is about doing the right things

(achieving desired outcomes)
Three Levels of Metrics

Efficiency

Focus: Resource utilization.

Goal: Achieving the desired outcome with minimal resources (time, cost, effort).
Question Answered: How well are we using our resources?

Characteristics:
• Prioritizes cost reduction and speed.
• May involve reducing waste, streamlining processes, or using fewer resources.
• Can be measured as a ratio (e.g., output per unit of input).
Three Levels of Metrics

Effectiveness

•Focus: Quality of outcomes.

•Goal: Achieving the best possible results or outcomes.
•Question Answered: Are we achieving the desired outcome or goal?

Characteristics:
• Prioritizes achieving the goal, regardless of the resources used.
• It often looks at longer-term outcomes, such as performance, satisfaction, or retention.
• Effectiveness is about whether the intended goals or standards are met.
Three Levels of Metrics

Impact : An impact matrix is a tool used to evaluate and prioritize various

factors based on their potential impact and likelihood. It's commonly used in
project management, risk assessment, and strategic planning to help organizations
make informed decisions by visualizing the effects of different factors or decisions.

Focus: Evaluating and prioritizing initiatives or decisions based on their

impact (value) and the effort (resources) required to implement them.

Question Answered:
• Which initiatives provide the highest impact for the least effort, and how
should we prioritize our actions?
•Identify Factors: List the factors or issues you need to assess.
These could be risks, opportunities, actions, or any elements that
might affect a project or decision.

•Define Impact and Likelihood: For each factor, determine the

potential impact it could have and the likelihood of that impact
occurring. Impact is usually rated on a scale (e.g., low, medium,
high) and likelihood is similarly rated.
Business Indicators
Lagging Indicators

• Lagging indicators are outcome measures that help you measure your
HR progress by examining the final end result or outcomes of your
collective efforts

• Use of the “lagging” term reflects the delay or gap between your
actions and a change in the final end result
CHARACTERISTICS

• Process measures
• Immediate feedback to the system
• Indicates the end result of the system
• Tells what happened, not what is happening
• Can be tracked over time
• Very responsive to changes in the system

Examples:
Employee retention, Employee performance, Organizational performance
Customer retention, Employee productivity
Lagging Indicators
•Matrix Type: Effectiveness Matrix.

•Focus: Outcomes and results that have already occurred.

•Purpose: To evaluate the effectiveness of past actions or processes based

on actual results.

•Examples:
•Quality of Hire: Reflects how well new hires perform over time.
•Employee Retention Rates: Measures how successful retention strategies
have been in keeping employees.

•Characteristics:Outcome-based:
• Indicates the success of strategies or processes after the fact.Historical:
Provides insights into past performance.
Leading Indicators

• Leading indicators are process measures that measure incremental

progress toward key HR outcome (lagging) measures

• Since leading indicators measure the results from processes, there is less
of a delay between actions and a change in the system

• Performance drivers — the key factors that enable the overall end result
(outcome) to be achieved
CHARACTERISTICS

• Immediate feedback to the system

• Tells what is happening now
• Can be tracked over time

Examples:
• Increasing retention,

• A reduction in absenteeism in key positions,

• % increase in internal people expressing interest in position,

• Number of positive comments from customers

Leading Indicators

Matrix Type: Efficiency Matrix.

• Focus: Predictive measures that signal future performance or outcomes.
• Purpose: To assess the efficiency of current processes and predict future
success.
• Examples:
• Time-to-Fill: Measures how efficiently the hiring process is filling positions.
• Cost-per-Hire: Evaluates the cost-effectiveness of the recruitment process.
Characteristics:
• Predictive: Offers early signals about future outcomes.
• Process-focused: Reflects current process efficiency and its potential
impact on future results.
Time 1 Time 2 Time 3 Time 4
3-12 months 3-18 months 12-36 months

Absenteeism
Decreased
Individual
Performance
Turnover
Decreased
Decreased Poor Customer
Firm
Satisfaction Relationships
Performance
Disengagement

Reduced
Quality
Job Searching

Leading Indicators
Dimension Explanation Metric Focus
Metrics which assess the Adhere to recruitment
HR efficiency through a Cost per joining budgets
EFFICIENCY focus on productivity Average lead time
and cost from requirement Recruiter efficiency
received by TAG to
offer made
Channel mix Adhere to recruitment Budgets

Joiner per recruiter Recruiter productivity

HR Programs and Affirmative action to
practices that have the Gender mix
Increase employee diversity
intended effect on
people or talent pools This measure of
EFFECTIVENESS toward which they are Offer reject and renege recruiter/employer engagement
directed. Typical metrics ability impacts demand
include measures of Fulfilment and overall
strategic skills and core workforce productivity
competencies in work Channel mix how well different recruitment
force, how pivotal jobs channels work to attract and hire
are filled, etc. the right talent
Dimension Explanation Metric Focus
Impacting billability and
Fulfilment ratio
utilization and influencing
Profitability
Demonstrating the link Company initiated Improving quality of talent
between what HR does attrition
and the tangible effect during probation
on the organization’s Linked to quality of hire, P–O and P–J
ability to gain & sustain Early attrition fit and loss owing to attrition
IMPACT competitive advantage. (revenue loss, loss of training
investment done, new training
Operational effectiveness investment required, etc.)
impact metrics focus on
changes in business Panel performance Linked to selecting quality
Rating Distribution talent , better P–J fit
performance (increased
speed and reduced
Panel tenure Linked to selecting talent with
defects) that occur when a better P–O fit
distribution
quality of talent is
% of TP Hiring Salary costs and directly drives
improved profitability for projects
Average resource Salary costs and directly drives
cost vs. Hiring premium profitability for projects
Anatomy of Statistical Modeling

• Data Discovery - Understanding the business problem

• Data Preparation

• Data Modeling - Model Selection and Building

• Model Validation and Implementation

Understanding the Business Problem

• Define the business problem

• Defined the objectives

• Investigate question and gather requirements

• Convert business problem into a statistical problem

Data Discovery and Collection

Process of acquiring appropriate data from the database of an

organization for solving any business problem

 Understanding Data architecture

 Data list preparation and identification of data sources
 Initial Data collection
 Data Dictionary – Define variables and create data dictionary
 Data Verification – validate for correctness
• Data sets of an individual like: distance from home, gender, marital
status, time of offer, gender, education level, total experience

• Data sets for groups like: recruitment data, absenteeism figures,

productivity data, personal development interviews, competence
profiles, staff satisfaction and health data
Data Preparation

Process of gathering, combining, and structuring the data so that we

can perform analysis by feeding the right information to the model

1. Univariate analysis

2. Data Cleaning
a. Outlier treatment
b.Missing value treatment

3. Feature engineering
a. Variable creation
b.Data Transformation
c. Dimension Reduction
4. Bivariate Analysis and Hypothesis Testing

5. Data Split
a.Training Set
b.Testing Set
Analytics Life Cycle
What is Factor Analysis ??
• Variable reduction technique: decreases the number of variables and
clusters them under factors

• To identify the underlying structure of relationship among variable

and classify them into homogeneous groups or clusters, that is
referred to as factors

• To remove redundancy or duplicity from a set of correlated variables

• To identify and distinguish between Latent variables (that are called

factors) and Observed variables within the data set
• Factor analysis is a technique that requires a large sample size.

• Factor analysis is based on the correlation matrix of the

variables involved, and correlations usually need a large
sample size before they stabilize.

• Factor analysis aims to find independent latent variables.

What is a factor?

• The key concept of factor analysis is that multiple observed variables

have similar patterns of responses because they are all associated
with a latent (i.e. not directly measured) variable.

• For example, people may respond similarly to questions about

income, education, and occupation, which are all associated with the
latent variable socioeconomic status.
What is a Latent variable ??
• A latent variable is a variable that is inferred using models from observed
data

• Variables that are not directly observed but are rather inferred from
other variables that are observed (directly measured)

• Measuring latent variables is to use a series of questions that are all

designed to measure the latent variable.

• This is known as a multi-item scale, where an “item” is a question, and a

“scale” is the resulting estimate of the latent variable

• Examples in psychology include intelligence (a.k.a. cognitive

ability), answers in an IQ test (the observed data) by asking lots of
questions
Assumptions of Factor Analysis
• Data set usually interval in nature – Measurement of variables in
interval scale

• Ordinal scale- scores are presented in Likert scale form

• Variables related to factor analysis – need to be linearly correlated

• Variables should exhibit moderate to high degree correlation

Uses of Factor Analysis
• Developing psychometric test and different scales
• Helps in deciding the factor structure of the items
• Identify latent dimensions
Types of Factoring
• Principle components Analysis – maximum variance for 1st factor,
removes that and uses maximum for 2nd factor and so on.

• Common Factor Analysis: it extracts the common variance and puts

them into factors. This method is used in SEM.

• Image Factoring: This method is based on correlation matrix. OLS

Regression method is used to predict the factor in image factoring.

• Maximum Likelihood Method: This method also works on correlation

metric but it uses maximum likelihood method to factor.

• Other methods: Alpha Factoring, Weight Square

Process in Factor Analysis
• Estimate Commonalities : This is the proportion of each variables'
variance that can be explained by factor.
• It is also noted as h2 and can be defined as the sum of squared factor loadings
for the variables.

• Factor Loading: Relation of each variable to the underlying factor. It

shows the variance explained by the variable on that particular factor.
• Eigenvalue: is a number, telling you how much variance there is in the data in
that direction, in the example above the eigenvalue is a number telling
us how spread out the data is on the line.
• Factor score: a weighted sum of the items.
Factor rotation: Procedure in which the eigen vectors (factors) are
rotated in an attempt to achieve simple structure. Example: orthogonal
and oblique. Majorly study use: orthogonal (varimax rotation)
• Orthogonal rotation: which impose the restriction that the factors
cannot be correlated,

• oblique rotations, such as promax, which allow the factors to be

correlated with one another.
Factor Loading
Criteria for determining the number of factors:

• Eigenvalues is a good criteria for determining a factor.

• If Eigenvalues is greater than one, we should consider that a factor

and if Eigenvalues is less than one, then we should not consider that a
factor.

• According to the variance extraction rule, it should be more than

0.7. If variance is less than 0.7, then we should not consider that a
factor.
Types of Factor Analysis

• Exploratory Factor Analysis (EFA)

• Confirmatory Factor Analysis (EFA)

Exploratory Factor Analysis (EFA)
• To identify the total number of factors that exist for the given correlated
variables.

• It also helps to determine the correlation between the variables and the
factors in the data set.

Confirmatory Factor Analysis (EFA)

• To confirm or validate the priori theorized or hypothesized factor structure
and its underlying variables.
Consumer Engagement
People Analytics
Session 3

HR Analytics Tools and Techniques

Case: TrustSphere: Building a Market for Relationship
Analytics

• What are TrustSphere’s main products? What are their primary uses?

• How could TrustSphere’s product line be applied to each of the three

primary markets they targeted?

• Rate the value of TrustSphere’s products on a scale of 1 (worst) to 5

(best).
If data analytics is so valuable, why is TrustSphere not

more successful?
• How exactly is TrustSphere creating value? What organizational
problems are they trying to solve?

• Challenge #1: Insight without Action

• Challenge #2: Advocates without Purchasing Power
• Challenge #3: Missed Opportunities
• Challenge #4: Data Quality
Action Plan

• Action #1: Providing Actionable Insights

• Action #2: Marketing to Leaders and Building a Network of Users

• Action #3: Exploiting More Opportunities

• Action #4: Improving Data Quality

Types of Data measurement scales

• Nominal : used for labeling variables, without any quantitative value

• value represent categories with no intrinsic ranking (gender, religion,
industry)
• Nominal scales could simply be called “labels.”
Ordinal
• Value represent categories with some intrinsic ranking (eg: Likert scale)

• With ordinal scales, the order of the values is what’s important and significant,
but the differences between each one is not really know

• Ordinal scales are typically measures of non-numeric concepts like satisfaction,

happiness, discomfort, etc
Interval

• Interval scales are numeric scales in which we know both the

order and the exact differences between the values.

• Interval data - form of a numerical value where the difference

between points is standardized and meaningful.

• Example: Celsius temperature because the difference between each

value is the same.
Ratio
• Order is known, the exact value between units

• Also have an absolute zero–which allows for a wide range of

both descriptive and inferential statistics to be applied.

• Everything above about interval data applies to ratio scales, plus ratio
scales have a clear definition of zero.
• Examples : height, weight, and duration.
What is Factor Analysis ??
• Variable reduction technique: decreases the number of variables and
clusters them under factors

• To identify the underlying structure of relationship among variable

and classify them into homogeneous groups or clusters, that is
referred to as factors

• To remove redundancy or duplicity from a set of correlated variables

• To identify and distinguish between Latent variables (that are called

factors) and Observed variables within the data set
• Factor analysis is a technique that requires a large sample size.

• Factor analysis is based on the correlation matrix of the

variables involved, and correlations usually need a large
sample size before they stabilize.

• Factor analysis aims to find independent latent variables.

What is a factor?

• The key concept of factor analysis is that multiple observed variables

have similar patterns of responses because they are all associated
with a latent (i.e. not directly measured) variable.

• For example, people may respond similarly to questions about

• Variables that are not directly observed but are rather inferred from
other variables that are observed (directly measured)

• Measuring latent variables is to use a series of questions that are all

designed to measure the latent variable.

• This is known as a multi-item scale, where an “item” is a question, and a

“scale” is the resulting estimate of the latent variable

• Examples in psychology include intelligence (a.k.a. cognitive

ability), answers in an IQ test (the observed data) by asking lots of
questions
Assumptions of Factor Analysis

• Data set usually interval in nature – Measurement of variables in

interval scale

• Ordinal scale- scores are presented in Likert scale form

• Variables related to factor analysis – need to be linearly correlated

• Variables should exhibit moderate to high degree correlation

Uses of Factor Analysis

• Developing psychometric test and different scales

• Helps in deciding the factor structure of the items

• Identify latent dimensions

Types of Factoring
• Principle components Analysis – maximum variance for 1st factor,
removes that and uses maximum for 2nd factor and so on.

• Common Factor Analysis: it extracts the common variance and puts

them into factors. This method is used in SEM.

• Image Factoring: This method is based on correlation matrix. OLS

Regression method is used to predict the factor in image factoring.

• Maximum Likelihood Method: This method also works on correlation

metric but it uses maximum likelihood method to factor.

• Other methods: Alpha Factoring, Weight Square

• Factor Loading: Relation of each variable to the underlying factor. It

• oblique rotations, such as promax, which allow the factors to be

correlated with one another.
Factor Loading
Criteria for determining the number of factors:

• Eigenvalues is a good criteria for determining a factor.

• If Eigenvalues is greater than one, we should consider that a factor

and if Eigenvalues is less than one, then we should not consider that a
factor.

• According to the variance extraction rule, it should be more than

0.7. If variance is less than 0.7, then we should not consider that a
factor.
Types of Factor Analysis

• Exploratory Factor Analysis (EFA)

• Confirmatory Factor Analysis (EFA)

Exploratory Factor Analysis (EFA)
• To identify the total number of factors that exist for the given correlated
variables.

• It also helps to determine the correlation between the variables and the
factors in the data set.

Confirmatory Factor Analysis (EFA)

• To confirm or validate the priori theorized or hypothesized factor structure
and its underlying variables.
Consumer Engagement
Performing Factor Analysis – SPSS, MATLAB, R
: Understanding Interpretations
• http://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdf

• https://stats.idre.ucla.edu/spss/output/factor-analysis/

• https://www.projectguru.in/publications/interpretation-of-factor-analysis-
using-spss/

• https://www.promptcloud.com/blog/exploratory-factor-analysis-in-r/

• https://www.promptcloud.com/blog/exploratory-factor-analysis-in-r/
People Analytics
Session: 2

Course Instructor:
Amita Shivhare
People analytics at Mckinsey: A case study
*Source: whttps://medium.com/oreillymedia/ai-adoption-in-the-enterprise-2020-e2263f781647
Supervised Machine Learning

• In supervised machine learning, you train the machine using data

which is well “labelled”, i.e. some input data is already tagged with
correct answer and the algorithms learns from training data that
helps you to predict the future outcomes”
How Does Supervised Learning Works???
• What are Features?
• In machine learning, features are the measurable properties or
characteristics of the data that the model uses to make decisions.
Types of Supervised ML Algorithms - Techniques

Regression Classification
• Linear Regression Random Forest
• Polynomial Regression Decision Tress
• Regression Trees Logistic Regression
Support Vector Machine
` K-Nearest Neighbors (KNN)
Naive Bayes
Applications of Supervised Learning
Supervised learning is used in a wide variety of applications, including:

• Image classification: Identify objects, faces, and other features in images.

• Natural language processing: Extract information from text, such as
sentiment, entities, and relationships.
• Speech recognition: Convert spoken language into text.
• Recommendation systems: Make personalized recommendations to users.
• Predictive analytics: Predict outcomes, such as sales, customer churn, and
stock prices.
• Medical diagnosis: Detect diseases and other medical conditions.
• Fraud detection: Identify fraudulent transactions.
• Autonomous vehicles: Recognize and respond to objects in the
environment.
Applications of Supervised Learning

• Email spam detection: Classify emails as spam or not spam.

• Quality control in manufacturing: Inspect products for defects.
• Credit scoring: Assess the risk of a borrower defaulting on a loan.
• Gaming: Recognize characters, analyze player behavior, and create NPCs.
• Customer support: Automate customer support tasks.
• Weather forecasting: Make predictions for temperature, precipitation, and
other meteorological parameters.
• Sports analytics: Analyze player performance, make game predictions, and
optimize strategies.
Advantages of Supervised Learning

• Full control over what machine is learning

• Easily test and debug model

• Can determine the number of classes

Disadvantages of Supervised Learning

• Have limited scope

• Collected labelled date set is expensive – time consuming

• Wrong prediction
Unsupervised Machine Learning

• In unsupervised machine learning, you train the machine using data

which is “unlabeled” and models itself find the hidden patterns and
insights from the given data.
How does Unsupervised learning works??

The goal of unsupervised learning is to group unlabelled data according

to the similarities, patterns and differences without any prior training
of data.
Types of Unsupervised ML Algorithms
Unsupervised Learning Algorithms
Applications of Unsupervised Learning

• Here are some common applications of unsupervised learning:

• Clustering: Group similar data points into clusters.
• Anomaly detection: Identify outliers or anomalies in data.
• Dimensionality reduction: Reduce the dimensionality of data while
preserving its essential information.
• Recommendation systems: Suggest products, movies, or content
to users based on their historical behavior or preferences.
• Topic modeling: Discover latent topics within a collection of
documents.
• Image and video compression: Reduce the amount of storage
required for multimedia content.
• Data preprocessing: Help with data preprocessing tasks such as
data cleaning, imputation of missing values, and data scaling.
• Market basket analysis: Discover associations between products.
• Genomic data analysis: Identify patterns or group genes with similar
expression profiles.
• Image segmentation: Segment images into meaningful regions.
• Community detection in social networks: Identify communities or
groups of individuals with similar interests or connections.
• Customer behavior analysis: Uncover patterns and insights for
better marketing and product recommendations.
• Content recommendation: Classify and tag content to make it easier
to recommend similar items to users.
• Exploratory data analysis (EDA): Explore data and gain insights
before defining specific tasks.
Advantages of Unsupervised Learning

• Used for more complex tasks

• Helpful in finding patterns in data

• Saves lot of work and expense

Disavantages of Unsupervised Learning

• Less accuracy

• Time Consuming

• More the features, more the complexity

AI Tools for Recruitment

Ideal
• An innovative recruitment platform using artificial intelligence to
streamline the process of finding exceptional talent swiftly and
accurately.

Textio
• An online platform that uses machine learning in HR to scrutinize job
descriptions.
Employee Attrition Prediction and Retention Strategies

AI Tools for Employee Attrition

• Humanlytics
This platform utilizes advanced machine learning algorithms to
meticulously analyze employee data and provide invaluable insights.
Its sophisticated technology digs deep into the intricate factors that
influence employee turnover rates, engagement levels, and overall
productivity within an organization.
Personalized Learning and Development

• AI Tools for Learning and Development

• Degreed
Degreed is an innovative learning platform that utilizes the power of machine
learning (ML) to curate personalized educational experiences for each user. Analyzing
individual preferences, learning styles, and skill levels, Degreed's sophisticated
algorithms generate tailored recommendations for courses, resources, and deve-
lopment opportunities.

• Pluralsight
It is an innovative online education platform that utilizes machine learning
technology to create customized learning paths tailored to each individual's needs
and preferences.
Diversity and Inclusion Enhancement

Textio Tone
• Textio Tone is an amazing machine learning software that helps identify
biased or unfair language and tones in job descriptions and other
important communications. This powerful tool analyses the texts and
highlights any potentially discriminatory or exclusionary wording, allowing
companies to create more inclusive and welcoming job postings.

Pymetrics
• Their platform employs a series of interactive tests designed to analyze an
individual's cognitive abilities and emotional traits objectively.
Workforce Optimization

• Visier
It is an innovative people analytics platform that uses the power of
machine learning (ML) to analyze workforce data. Its advanced algorithms
analyze complex datasets, providing valuable insights that empower strate-
gic workforce management.

• SuccessFactors:
It is a comprehensive HR software suite designed to streamline and
enhance various aspects of human resource management.
Sentiment Analysis and Employee Feedback

AI Tools for Employee Feedback

• Xander
Employee feedback plays a big role in keeping workers happy and satisfied.
Xander is a useful tool that looks at what employees say and how they feel.
It uses machine learning applications for HR to find employee feedback.

• Peakon
Peakon is another helpful program for employee engagement. It also uses
machine learning to study what employees say. The goal is to find areas
where things could be better.
People Analytics
Session: 1
Introduction- Why, What
and How of HR Analytics

Course Instructor:
Amita Shivhare
HR Continues to Evolve
& the model defines our aspirations
Add Value
& continue
Maximize
Upside
Strategic to grow
HR Planning
Culture
Impact/Contribution to the Business

Organizational & Image

Design
HR as
Survey Action Business
Planning Partner

Staffing
EEO/AA

Employee Training &

Relations Development Performance
Labor/Union Management
Relations
Compensation
Benefits HR Information
Systems (HRIS)
Safety &
Limit Compliance
Liability Workers’
& Compensation
Protect
Downside

Labor Employee Personnel Human Organizational

Relations Relations Resources Effectiveness

A Century of Evolution in the Function

What is HR/People Analytics?
Why you should apply analytics to your
people strategy?
Or
Why do we need HR Analytics?
• Need to understand the efficiency, effectiveness, and financial impact
of HR initiatives.

• Empowers executives, line managers, and HR teams to easily gain

insight into detailed information about the people and the processes
in their organization
What HR Analytics do????

• To make better decisions and get better results

• Create business value

• To improve organizational performance and profitability

People Analytics Maturity Model
Descriptive Analytics

• Answer the question: WHAT happened

• Give the organization insights about its current state, enable a comparison
against other similar organization and allow managers

• To identify and solve current issues

• Findings may highlight something is wrong,

however, may not explain WHY

For example:
• What percentage of your workforce will retire in the next year or in the
next five years?

• What is the average age in your organization? In the last five years, has it
increased, decreased, or remained the same?

• What is your revenue per employee? Is it higher than your competitors’?

How does it vary across lines of business or geographies?

• How many workplace injuries per 100 employees occur on an annual basis?
How does this compare across your locations and within your industry?
• What percentage of your employees are fully engaged and are doing their
best work at all times? How does this vary across functions and lines of
business?

• What percentage of your employees would say that they are committed to
the organization?

• What is your voluntary turnover rate among key executives? What are the
associated costs to the business?

• What percentage of new hires are terminated in the first six months?
Diagnostic Analytics

• Answer the question: WHY it happened

• Understands the causal relation between two variables.

• It gives us in-depth insights into a particular problem.
• It uncovers the cause of events as revealed by descriptive data.

• So, if you know the problem you can resolve the problem
Business Problems such as:

• WHY performance is bad or good

• WHY a percentage of critical roles are unfilled as of today?

• WHY your high performance is at high risk for departure?

• WHY your involuntary termination rate is higher?

People Analytics Maturity Model
Predictive Analytics

• Primarily means telling - What is likely to happen

• It predicts future

• What can happen in future based on details of past events

• Instead of reporting what has happened it helps in stooping the bad

to happen
Prescriptive Analytics

• To prescribe - Actions to be taken

• What and Why both.

• HR moving from technical/administrative role to strategic guidance

that organizations need.
People Analytics Maturity Model
Descriptive Analytics and Diagnostics Analytics offer a
reactive approach

Predictive Analytics and Prescriptive Analytics offer a

proactive approach
People
Analytics
Stairway
Measuring People Analytics’ Analytical Maturity

• Question 1 : Does HR provide dashboards such as headcount , terminations , transfers ,

leave , recruitment with relevant metrics to managers and executives.

• Question 2 : Can HR analyse diversity , pay , attrition rates for different groups of
employees based on performance , years of service etc.

• Question 3 : Does HR conduct external benchmarking of employee data such job grading
, salary survey

• Question 4 : HR continually develops predictive models to support strategic decision

making such as conducting an A/B test to test if an HR intervention worked. An eg is to
study the effectiveness of a 2 day face to face onboarding training vs an online
onboarding program
Points that really matter

• HR actually need to know how the business makes money.

• HR need to understand the science behind talent.

• There’s the quant part—and this part has become emergent -

which is, HR don’t have to be a statistician, but HR do need to
know enough to be a good consumer of statistics.
Instinct has no role to play in the recruiting, development,
management, and retention of employees – or in identifying the
combination of people skills that drives great performance.

It appears that executives who can complement experience -

based wisdom with analytically driven insight stand a much
better chance of linking their talent efforts to business value

Probit Logit Interpretation
100% (1)
Probit Logit Interpretation
26 pages
Logistic Regression
100% (3)
Logistic Regression
41 pages
Logistic Regression
100% (3)
Logistic Regression
30 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Module 2
No ratings yet
Module 2
92 pages
5.1) Binary Logistic Regression
No ratings yet
5.1) Binary Logistic Regression
32 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Logistic Regression Monograph - DSBA v2
No ratings yet
Logistic Regression Monograph - DSBA v2
54 pages
Logistic Regression
No ratings yet
Logistic Regression
72 pages
Ch4 Classifications24
No ratings yet
Ch4 Classifications24
42 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
T3 Logistic Regression
No ratings yet
T3 Logistic Regression
53 pages
Topic 7 Regression (Cont2) Logistic Regression
No ratings yet
Topic 7 Regression (Cont2) Logistic Regression
33 pages
Detailed Logistic Regression
No ratings yet
Detailed Logistic Regression
30 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Log Reg
No ratings yet
Log Reg
32 pages
Lecture 7 Logistic Regression
No ratings yet
Lecture 7 Logistic Regression
33 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
Logistic Regression: Multivariate Analysis
No ratings yet
Logistic Regression: Multivariate Analysis
29 pages
XFLR5 - Guide
50% (2)
XFLR5 - Guide
72 pages
Chap10 Logistic Regression
No ratings yet
Chap10 Logistic Regression
36 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Lecture 22. GLM
No ratings yet
Lecture 22. GLM
41 pages
Report Logistic Regression
No ratings yet
Report Logistic Regression
21 pages
ML2 Logistic Regression
No ratings yet
ML2 Logistic Regression
23 pages
Reference Material Logistic Regression
No ratings yet
Reference Material Logistic Regression
11 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
13 Logistic Regression Main
No ratings yet
13 Logistic Regression Main
14 pages
Reference Material - Logistic - Regression
No ratings yet
Reference Material - Logistic - Regression
11 pages
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101695 2024-08-12 Reference-Material-II
19 pages
Additional Techniques
No ratings yet
Additional Techniques
17 pages
Diagram Interpretation
No ratings yet
Diagram Interpretation
94 pages
Logistic Regression in R and Python
No ratings yet
Logistic Regression in R and Python
9 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
Advanced Regression
No ratings yet
Advanced Regression
13 pages
VO MCA S4 Data Mining Unit 8
No ratings yet
VO MCA S4 Data Mining Unit 8
18 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
1 LogisticRegressionNotes1
No ratings yet
1 LogisticRegressionNotes1
11 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Logistic Regression
No ratings yet
Logistic Regression
6 pages
Background 2.1. Logistic Definition
No ratings yet
Background 2.1. Logistic Definition
6 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
XSTK
No ratings yet
XSTK
8 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
Asme, Aws Codes
100% (1)
Asme, Aws Codes
21 pages
Logistic Ordinal Regression
No ratings yet
Logistic Ordinal Regression
10 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Kelman DGA 900 PLUS: Grid Solutions
No ratings yet
Kelman DGA 900 PLUS: Grid Solutions
6 pages
Archicad Shortcuts
100% (1)
Archicad Shortcuts
5 pages
M.E. Industrial Safety Engineering
No ratings yet
M.E. Industrial Safety Engineering
102 pages
APPROXIMATION
No ratings yet
APPROXIMATION
5 pages
PSTN
No ratings yet
PSTN
26 pages
Slab 034 V
No ratings yet
Slab 034 V
21 pages
CH 6.0 & 7.0 Notes
No ratings yet
CH 6.0 & 7.0 Notes
17 pages
Motion and Mass First Steps Into Physics
100% (1)
Motion and Mass First Steps Into Physics
218 pages
Camera Accessories
No ratings yet
Camera Accessories
22 pages
Mathematical Language and Symbol: Lesson 1. Mathematics and English As Languages
No ratings yet
Mathematical Language and Symbol: Lesson 1. Mathematics and English As Languages
29 pages
How To Install Turbo C
No ratings yet
How To Install Turbo C
11 pages
EC6513 Microprocessor Microcontroller Lab 1 2013 Regulation
No ratings yet
EC6513 Microprocessor Microcontroller Lab 1 2013 Regulation
92 pages
CBSE Class 12 Chemistry Paper Sample Paper Solution Set 4
No ratings yet
CBSE Class 12 Chemistry Paper Sample Paper Solution Set 4
14 pages
(Ebook) Linear Algebra and Its Applications by Lax, Peter D ISBN 9780471751564, 9781118626924, 0471751561, 1118626923
No ratings yet
(Ebook) Linear Algebra and Its Applications by Lax, Peter D ISBN 9780471751564, 9781118626924, 0471751561, 1118626923
60 pages
P03-02 - Vertical Integration With OPC - V8.1 - S0915 - EN
No ratings yet
P03-02 - Vertical Integration With OPC - V8.1 - S0915 - EN
22 pages
BSC Bca 5 Sem Computer Networks 22100010 Jan 2022
No ratings yet
BSC Bca 5 Sem Computer Networks 22100010 Jan 2022
2 pages
Ict Notes 1
No ratings yet
Ict Notes 1
15 pages
PR PPT2
No ratings yet
PR PPT2
11 pages
Short Circuit Calculation
No ratings yet
Short Circuit Calculation
12 pages
Basic Principles in Impression Making
No ratings yet
Basic Principles in Impression Making
22 pages
Concrete Weights
No ratings yet
Concrete Weights
7 pages
Er 108 1 PDF
No ratings yet
Er 108 1 PDF
3 pages
Week5HW S15 Solutions PDF
No ratings yet
Week5HW S15 Solutions PDF
14 pages
gmc8022-v813b v3
No ratings yet
gmc8022-v813b v3
6 pages
Tybms Sem6 or Nov19
No ratings yet
Tybms Sem6 or Nov19
7 pages
Mathprog3 - HW - U07 - 2 Asdasdasdas
No ratings yet
Mathprog3 - HW - U07 - 2 Asdasdasdas
4 pages
Ling 2 - Format and Sample Questions of Final Exam - K45
No ratings yet
Ling 2 - Format and Sample Questions of Final Exam - K45
2 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet

Ilovepdf Merged

Uploaded by

Ilovepdf Merged

Uploaded by

Logistic Regression

• Linear regression models are used to identify the relationship

• Similar to linear regression, logistic regression is also used to estimate

• While both models are used in regression analysis to make

Regression Equation: y=β0​+β1​x+ϵ

Substituting the values:

So, to calculate further in the given data set:

Step 1: Log (odds) or Logit = b0 + b1X

• The natural logarithm of the odds is known as log-odds or

• A link function is simply a function of the mean of the response

• ODDS = Probability of success

• If the OR is greater than 1, then the event is associated with a higher

• Probability deals with the possibility of a random experiment

• The term Likelihood refers to the process of determining the best

• Step 6: Sum of Likelihood

• However, when you calculate the likelihood, you’re attempting to

• Step 9: Define Name – Status column as “Actual” and Prediction column as

• Step 11: Calculate accuracy, precision and sensitivity percentage

• Step 13: Insert graph – plot the values - scalar plot

• Recall or Sensitivity is the Ratio of true positives to total (actual)

• Precision is from all positively predicted events how much correctly

• Specificity is how good are you in predicting negative events.

• More interested in False Positive cases, go for precision.

• Accuracy is better if you have a biased scenario. If it’s unbiased —

• An ROC curve shows the relationship between clinical sensitivity and

Step 1: Log (odds) or Logit = b0 + b1X

• The natural logarithm of the odds is known as log-odds or

• A link function is simply a function of the mean of the response

• Step 6: Sum of Likelihood

• Step 9: Define Name – Status column as “Actual” and Prediction column as

• Step 11: Calculate accuracy, precision and sensitivity percentage

• Step 13: Insert graph – plot the values - scalar plot

Common Use: Binary data is often used in logistic regression,

• Common Use: Categorical data is used in classification models (e.g.,

Example Pass/Fail, On/Off, True/False Car brands, Blood types, Color

Applications Binary classification, Logistic Multinomial classification, Grouping

are used for

•When the outcome is continuous (e.g., salary, performance score):

•When the outcome is categorical (e.g., employee attrition – stay/leave):

•When there are many predictors and potential for multicollinearity:

• While regression is widely used for predictive modeling, other

• These models can be particularly useful when the relationships

• What is happening in this company? What problem is it facing?

• As HR managers, what do we want to prove to the board?

“We want to prove that the company’s decline in performance is

How do we want to prove it?

What type of data do we have?

1. Objective Data 2. Subjective Data

What do you think about this questionnaire

• “We want to prove that the company’s decline in performance is

= Difference in performance rating (2018-2017)

X (independent variable) = All variables

• Interpretation: R-squared explains the proportion of variance in the

• Range: It ranges from 0 to 1.

• Example: If R² = 0.75, then 75% of the variation in the dependent variable is

• Interpretation: Adjusted R-squared adjusts for the number of

• It can be lower than R² if unnecessary variables are added to the

• Interpretation: The coefficient values represent the change in the

Positive coefficient: The independent variable has a positive relationship

• Interpretation: The standard error measures the accuracy of the

• Usage: Standard errors are used to calculate confidence intervals and

• Interpretation: The t-statistic is calculated by dividing the coefficient

• High t-statistic: Suggests that the independent variable has a

• Interpretation: The p-value tells you the probability that the

• Low p-value (typically < 0.05): The coefficient is statistically

• Help Sharma develop a report that can be presented to Cynet management

• A quarter of recruiters were inactive for over a month, indicating a

• The employee turnover is at such a high rate as to require a separate

• The average tenure of the recruiters working continuously in Cynet is

• Cynet must work on interventions to help employees stay longer in the

• What is the objective of structural realignment?

Let’s take 2 companies, both with a Rs.10,000 profit.

• Company A earned Rs.50,000 in one month, and its expenses were

• Company A’s profitability = (Rs.10,000/Rs.50,000) * 100, which is then

• Company B’s profitability = (Rs.10,000/Rs.20,000) * 100 = 50%.

• HR metrics quantify the impact and measure the success of

Regression Equation: y=β0+β1x+ϵ