0% found this document useful (0 votes)

12 views8 pages

Ba Report

Uploaded by

hm4000981

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views8 pages

Ba Report

Uploaded by

hm4000981

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Introduction:-

This report provides an analysis of a dataset that contains personal and

academic information about a group of students, with the goal of predicting
salaries and determining the probability of placement, particularly for a
specific student, Sarah.

The dataset includes variables such as age, gender, entry exam scores, work
experience, and known salary details. Using statistical techniques such as
hypothesis testing, regression analysis, and descriptive statistics, the
analysis is done to make predictions about the placement and salary of
individuals whose data may be incomplete or missing.

The primary objectives of this report are:

1. Probability of Placement Prediction for Sarah: Based on her

profile (age, exam score, work experience, etc.), this report will
determine the likelihood that Sarah will secure a job placement.

2. Hypothesis Testing on Known Salaries: A hypothesis test is

conducted on the available salary data to determine whether specific
variables such as gender, work experience, and exam scores have a
statistically significant impact on salary outcomes.

3. Salary Estimation for Missing Data: For students whose salary

information is missing, predictive models will be used to estimate their
salaries based on the known relationships between other factors in the
dataset.

4. Correlation and Regression Analysis: A relationship between key

variables, such as age, work experience, and salary, will be conducted
through correlation analysis. Additionally, multiple regression analysis
will be used to predict Sarah’s salary, based on her demographic and
academic profile.

5. Descriptive and Visual Analysis: To summarize and visualize the

data effectively, descriptive statistics such as mean, median, and
mode will be calculated for various factors, along with graphical
representations of trends in the dataset.
2. Data Cleaning & Preparation

In this report, the dataset provided included several missing values, marked
as either “NA” (Not Available) or “NP” (Not Provided), which were removed or
addressed before conducting further analysis.

The data cleaning process involved the following steps:

1. Handling Missing Values:

a. Removal of Entries: Entries with “NA” or “NP” in key fields
such as salary, age, or work experience were removed. This step
ensured that the remaining dataset only included complete
entries that could be used for accurate analysis.
2. Consistency Check:
a. After handling missing values, the dataset was reviewed for
consistency across all columns. For example, ages were checked
to ensure they fell within a reasonable range for students, and
work experience was cross-checked with age to avoid
inconsistencies (e.g., unusually high work experience for very
young individuals).
3. Variable Formatting:
a. All numeric fields, such as age, exam scores, work experience,
and salary, were standardized to ensure uniformity in format.
This was important for conducting statistical analyses like
correlation, regression, and hypothesis testing.
4. Categorical Variables:
a. Categorical variables, such as gender, were converted into
numerical codes to allow for statistical analysis. For example,
gender, was coded as “0” for male and “1” for female to enable
comparisons in hypothesis testing and regression models.
3. Probability of Placement for Sarah

A key objective of this analysis is to estimate the probability of placement for

Sarah, by utilizing logistic regression analysis, we can calculate the likelihood
that Sarah will secure a job placement based on the factors available in the
dataset.

3.1 Variables Considered

The logistic regression model was built using the following variables:

 Age: Sarah is 25 years old.

 Work Experience: Sarah has 2 years of professional experience.
 Entry Exam Score: Sarah's score is 680.
 Gender: Sarah is female.

3.2 Logistic Regression for Placement Probability

Logistic regression is a statistical method used for predicting the probability

of a binary outcome (in this case, whether or not Sarah will be placed) based
on one or more independent variables. The dependent variable here is
placement status (coded as 1 for placed and 0 for not placed), and the
independent variables include Sarah's age, work experience, entry exam
score, and gender.

3.3 Key Findings

 Work Experience: The analysis revealed that work experience has a

strong positive impact on placement probability. Students with more
years of professional experience were more likely to get placed.

 Exam Score: Higher entry exam scores were also positively correlated
with placement probability, indicating that students with higher
academic performance were more attractive to employers.
 Gender: Gender was not found to have a statistically significant
impact on placement probability in this analysis.

For Sarah, the logistic regression model estimated a high probability of

placement, primarily by her exam score and work experience. This
suggests that Sarah is well-positioned to secure a job, and her chances of
placement are favorable compared to the average student in the dataset.

3.4 Visual Representation

To better understand how Sarah's probability compares to others, a visual

plot showing the relationship between work experience and placement
probability .

Figure 1: Placement Probability vs. Work Experience

Figure 2: Logistic Regression

4. Hypothesis Testing on Known Salaries

Hypothesis testing of factors, such as gender, work experience, and entry

exam scores, have a significant impact on salaries. In this, we focus on
testing the differences in salary outcomes based on available demographic
data in the dataset. By conducting t-tests and ANOVA, we assess whether
certain groups have significantly different salary levels.

4.1 Hypothesis Testing for Gender and Salary

 One of the key questions in salary analysis is whether there is a

significant difference in salaries based on gender. To test this, a two-
sample t-test was conducted, comparing the mean salaries of males
and females in the dataset.
 Null Hypothesis (H₀): There is no significant difference in the
average salaries between males and females.
 Alternative Hypothesis (H₁): There is a significant difference in the
average salaries between males and females.

The t-test statistic was calculated to determine if the observed differences in

salary means between the two groups are statistically significant or not. The
test was performed at a 95% confidence level.

Results:

The p-value obtained from the t-test was 0.23, which is higher than the
alpha level of 0.05. This result indicates that there is no statistically
significant difference between the salaries of males and females in the
dataset. Therefore, the null hypothesis cannot be rejected, suggesting that
gender alone does not explain salary variations in this sample.

4.2 Hypothesis Testing for Work Experience and Salary

Work experience was another variable tested for its impact on salary. We
hypothesized that individuals with more work experience would have higher
salaries. An ANOVA (Analysis of Variance) test was conducted to
determine whether the mean salaries differ significantly among different
levels of work experience.

 Null Hypothesis (H₀): There is no significant difference in the

average salaries across different work experience groups.
 Alternative Hypothesis (H₁): There is a significant difference in the
average salaries across different work experience groups.

The ANOVA test was conducted across various experience ranges (e.g., 0-2
years, 3-5 years, 6+ years).

Results:

The ANOVA test resulted in a p-value of 0.01, which is below the 0.05
threshold. This indicates that there is a statistically significant difference in
salaries based on work experience. Specifically, individuals with more work
experience tend to have higher salaries, confirming that experience is a
strong determinant of salary outcomes.

4.3 Hypothesis Testing for Exam Scores and Salary

A similar analysis was conducted to explore whether entry exam scores have
a significant effect on salary. Higher scores could be expected to correlate
with better job placements and higher salaries.

 Null Hypothesis (H₀): There is no significant difference in the

average salaries across different exam score ranges.
 Alternative Hypothesis (H₁): There is a significant difference in the
average salaries across different exam score ranges.

The ANOVA test was conducted by grouping the data into ranges of exam
scores (e.g., below 600, 600-700, above 700).

Results:

The test produced a p-value of 0.18, indicating no significant difference in

salary across different exam score groups. This suggests that while academic
performance may influence placement probability, it does not necessarily
translate to higher salaries in this dataset.
4.4 Overall Hypothesis Testing

 Gender: No statistically significant difference in salaries based on

gender.
 Work Experience: A significant impact on salary, with higher
experience leading to higher salaries.
 Exam Scores: No statistically significant effect of exam scores on
salary, although they may still influence placement outcomes.

5. Salary Estimation for Individuals with Missing

Placement Data

In this analysis, some individuals lacked salary information, making it

necessary to estimate their potential salaries based on other available data,
such as age, work experience, gender, and exam scores. Using statistical
modeling techniques, we can predict the likely salary for these individuals,
allowing for a more complete understanding of salary distribution across the
dataset.

5.1 Methodology for Salary Estimation

To estimate the salaries for individuals with missing placement data, a

multiple linear regression model was used. This allowed us to predict the
salary based on several independent variables that are known for each
individual. The independent variables included in the model are:

 Age: A key demographic factor that could influence salary.

 Work Experience: One of the most significant predictors of salary, as
shown in the previous hypothesis testing.
 Entry Exam Score: While not statistically significant in hypothesis
testing, exam scores were still included as they may hold some
predictive value in combination with other factors.
 Gender: Although no significant impact of gender on salary was found
in the hypothesis testing, it remains part of the model to control for
any potential interactions with other variables.
5.2 Regression Model Results

The regression model was trained using the available data of individuals
whose salary information was known. The resulting coefficients from the
model were applied to predict the salaries of individuals with missing salary
data. Key findings include:

 Work Experience: As expected, work experience had the largest

positive effect on predicted salary, with more experienced individuals
likely to earn higher salaries.
 Age: Age also contributed positively, though to a lesser extent than
work experience.
 Entry Exam Score: While the exam score had a smaller influence, it
still contributed positively to the salary prediction.
 Gender: Gender had no significant effect on the salary prediction,
aligning with the results from hypothesis testing.

5.3 Estimating Salaries for Missing Data

Using the regression model, we estimated the salaries of individuals whose

placement and salary data were missing. The estimated salaries were
incorporated into the dataset to ensure a more comprehensive analysis of
salary trends and distributions.

For example:

 Individual A: A 28-year-old male with 5 years of work experience and

an exam score of 710 was predicted to have a salary of approximately
Rs. 60,000.
 Individual B: A 24-year-old female with 2 years of work experience
and an exam score of 680 was predicted to have a salary of
approximately Rs. 45,000.

These predictions allowed for a more complete understanding of how salary

outcomes might vary for individuals whose placement data was initially
missing.

Advanced Statistics Project Report Final
No ratings yet
Advanced Statistics Project Report Final
40 pages
Business Report - Advanced Statistics - Great Learning
100% (1)
Business Report - Advanced Statistics - Great Learning
20 pages
Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
Business Report: Advanced Statistics Project
100% (5)
Business Report: Advanced Statistics Project
24 pages
Business Report: Pgpdsba Advanced Statistics Module Project
100% (3)
Business Report: Pgpdsba Advanced Statistics Module Project
18 pages
Introduction
No ratings yet
Introduction
34 pages
Gimeno Final
No ratings yet
Gimeno Final
21 pages
AV Project Shivakumar Vanga
No ratings yet
AV Project Shivakumar Vanga
36 pages
DS CP Project Report
No ratings yet
DS CP Project Report
7 pages
Data
No ratings yet
Data
17 pages
Kingmaaaaaaaaaaaaaaaa
No ratings yet
Kingmaaaaaaaaaaaaaaaa
10 pages
Documentation - Ishaan Mittal - Jio - Assessment
No ratings yet
Documentation - Ishaan Mittal - Jio - Assessment
9 pages
amr.850 851.1069
No ratings yet
amr.850 851.1069
5 pages
Group 9
No ratings yet
Group 9
9 pages
Christopher Alvarez - Nevada Gaming Case Report
No ratings yet
Christopher Alvarez - Nevada Gaming Case Report
3 pages
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
13 pages
UPdated Task
No ratings yet
UPdated Task
8 pages
Advanced Statistics
100% (1)
Advanced Statistics
16 pages
Group 9 Analytics Assignmnet
No ratings yet
Group 9 Analytics Assignmnet
2 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
Ruhee Ansari - Advanced Statistic Project SCB
100% (1)
Ruhee Ansari - Advanced Statistic Project SCB
28 pages
AMCAT Data Analysis
No ratings yet
AMCAT Data Analysis
18 pages
Group 3
No ratings yet
Group 3
21 pages
Frequencies
No ratings yet
Frequencies
14 pages
ASProject-Padma Murali
No ratings yet
ASProject-Padma Murali
45 pages
Exploratory Data Analysis:: Salarydata - CSV
No ratings yet
Exploratory Data Analysis:: Salarydata - CSV
32 pages
AV Project Shivakumar Vanga
100% (1)
AV Project Shivakumar Vanga
37 pages
A Model To Predict Pay Scale Fixation in Job Marke
No ratings yet
A Model To Predict Pay Scale Fixation in Job Marke
6 pages
Presentation of Tables1
No ratings yet
Presentation of Tables1
4 pages
Probability of A Term Deposit
No ratings yet
Probability of A Term Deposit
31 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
No ratings yet
Advanced Statistics Project Module 3 - Advanced Statistics: Submitted To Great Learning
37 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
SPSS LAB Assignment 3
No ratings yet
SPSS LAB Assignment 3
9 pages
A Report On Multiple Regressions: Course: BUS 173
No ratings yet
A Report On Multiple Regressions: Course: BUS 173
14 pages
Descriptive Analytics and ANOVA
No ratings yet
Descriptive Analytics and ANOVA
31 pages
AS Project Report - 16-10-21
No ratings yet
AS Project Report - 16-10-21
16 pages
Project 144520
No ratings yet
Project 144520
2 pages
Capstone Final PPT Group 6
No ratings yet
Capstone Final PPT Group 6
19 pages
New Content-1
No ratings yet
New Content-1
2 pages
EXAM PAPER FORMAT Statistics Question SET A 1
No ratings yet
EXAM PAPER FORMAT Statistics Question SET A 1
11 pages
BRM Assgnmnt
No ratings yet
BRM Assgnmnt
14 pages
Synopsis Group 6 Final
No ratings yet
Synopsis Group 6 Final
6 pages
Assignment Report - Group A
No ratings yet
Assignment Report - Group A
31 pages
Final Report - Group 5
No ratings yet
Final Report - Group 5
21 pages
Salary Hike Predictor Synopsis
No ratings yet
Salary Hike Predictor Synopsis
4 pages
An Analysis of Demographics of Employees of XYZ Organization
No ratings yet
An Analysis of Demographics of Employees of XYZ Organization
4 pages
Advanced Statistics Assignment: Business Report (PGP - DSBA)
No ratings yet
Advanced Statistics Assignment: Business Report (PGP - DSBA)
23 pages
Advanced Statistics Project Report
No ratings yet
Advanced Statistics Project Report
20 pages
Statistical Analysis: College Graduates' Starting Compensations
No ratings yet
Statistical Analysis: College Graduates' Starting Compensations
13 pages
Project As
No ratings yet
Project As
23 pages
QTA Interpretation
No ratings yet
QTA Interpretation
17 pages
AKSHAYA - Advanced Statistics Project Report
No ratings yet
AKSHAYA - Advanced Statistics Project Report
50 pages
Problem Statement: Compensation For Sales Professionals
No ratings yet
Problem Statement: Compensation For Sales Professionals
21 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Ashirvad Pipes Pvt. LTD., Bangalore: Test Report of CPVC Pipes As Per Is 15778
67% (3)
Ashirvad Pipes Pvt. LTD., Bangalore: Test Report of CPVC Pipes As Per Is 15778
7 pages
MGT555 Individual Assignment 2
No ratings yet
MGT555 Individual Assignment 2
9 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
Geostatistical Modeling of Multiple Variables in Presence of Complex Trends and Mineralogical Constraints
No ratings yet
Geostatistical Modeling of Multiple Variables in Presence of Complex Trends and Mineralogical Constraints
11 pages
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
No ratings yet
Section: - This Is An Open-Book and Open-Note Test. However, Sharing of Material Is NOT Permitted
9 pages
Warner em D e 0616 PDF
No ratings yet
Warner em D e 0616 PDF
92 pages
First Quarter: (DRRM)
No ratings yet
First Quarter: (DRRM)
14 pages
Gabriel Taborin College of Davao Foundation, Inc
No ratings yet
Gabriel Taborin College of Davao Foundation, Inc
17 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
StatementOfAccount 3211598770 Apr05 131805
No ratings yet
StatementOfAccount 3211598770 Apr05 131805
9 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
XH C6
No ratings yet
XH C6
15 pages
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Educational Neuroscience - 1st Edition FULL PDF DOCX DOWNLOAD
100% (17)
Educational Neuroscience - 1st Edition FULL PDF DOCX DOWNLOAD
16 pages
Related Searches: Electrical-Interview-Questions-Answers PDF
No ratings yet
Related Searches: Electrical-Interview-Questions-Answers PDF
1 page
Fatigue Analysis of An Automobile Wheel Rim: Abstract
No ratings yet
Fatigue Analysis of An Automobile Wheel Rim: Abstract
10 pages
Y7 Separating Mixtures
No ratings yet
Y7 Separating Mixtures
2 pages
Acs Q1 DLL W7
No ratings yet
Acs Q1 DLL W7
4 pages
Plant Consciousness Communication and Et
No ratings yet
Plant Consciousness Communication and Et
13 pages
Determination of Reichert Meissl Value
No ratings yet
Determination of Reichert Meissl Value
4 pages
Form 3 Mat 121-1 (Za11.0) October 2023 QP
No ratings yet
Form 3 Mat 121-1 (Za11.0) October 2023 QP
16 pages
COMP3014J Week5.
No ratings yet
COMP3014J Week5.
38 pages
English 7-Q3 Module 3
No ratings yet
English 7-Q3 Module 3
11 pages
Week 6 SVM
No ratings yet
Week 6 SVM
18 pages
WS 4 (Done)
No ratings yet
WS 4 (Done)
3 pages
A Study of Communication Barriers in Open Distance Learning System of Education
No ratings yet
A Study of Communication Barriers in Open Distance Learning System of Education
15 pages
Topological Indices of Molecular Graph and Drug Design
No ratings yet
Topological Indices of Molecular Graph and Drug Design
5 pages
Quality Properties
No ratings yet
Quality Properties
9 pages
Research Methodology
No ratings yet
Research Methodology
10 pages
Cognition and Metacognition: Prepared By: Dr. Pooja Gupta Dr. Devaleena Kundu
No ratings yet
Cognition and Metacognition: Prepared By: Dr. Pooja Gupta Dr. Devaleena Kundu
10 pages
English Exam 4th
No ratings yet
English Exam 4th
5 pages
The Forms of Anumana
No ratings yet
The Forms of Anumana
4 pages
5985 - EAJ - Request - Master 1 Extension - Abdulaziz Ellafi
No ratings yet
5985 - EAJ - Request - Master 1 Extension - Abdulaziz Ellafi
8 pages
HOW BIG TECH PLANS TO CREATE NEW NATIONS - Notes
No ratings yet
HOW BIG TECH PLANS TO CREATE NEW NATIONS - Notes
2 pages
Planting Design 1
No ratings yet
Planting Design 1
20 pages

Ba Report

Uploaded by

Ba Report

Uploaded by

Introduction:-

This report provides an analysis of a dataset that contains personal and

The primary objectives of this report are:

1. Probability of Placement Prediction for Sarah: Based on her

2. Hypothesis Testing on Known Salaries: A hypothesis test is

3. Salary Estimation for Missing Data: For students whose salary

4. Correlation and Regression Analysis: A relationship between key

5. Descriptive and Visual Analysis: To summarize and visualize the

The data cleaning process involved the following steps:

1. Handling Missing Values:

A key objective of this analysis is to estimate the probability of placement for

3.1 Variables Considered

 Age: Sarah is 25 years old.

3.2 Logistic Regression for Placement Probability

Logistic regression is a statistical method used for predicting the probability

3.3 Key Findings

 Work Experience: The analysis revealed that work experience has a

For Sarah, the logistic regression model estimated a high probability of

3.4 Visual Representation

To better understand how Sarah's probability compares to others, a visual

Figure 1: Placement Probability vs. Work Experience

Figure 2: Logistic Regression

Hypothesis testing of factors, such as gender, work experience, and entry

4.1 Hypothesis Testing for Gender and Salary

 One of the key questions in salary analysis is whether there is a

The t-test statistic was calculated to determine if the observed differences in

4.2 Hypothesis Testing for Work Experience and Salary

 Null Hypothesis (H₀): There is no significant difference in the

4.3 Hypothesis Testing for Exam Scores and Salary

 Null Hypothesis (H₀): There is no significant difference in the

The test produced a p-value of 0.18, indicating no significant difference in

 Gender: No statistically significant difference in salaries based on

5. Salary Estimation for Individuals with Missing

In this analysis, some individuals lacked salary information, making it

5.1 Methodology for Salary Estimation

To estimate the salaries for individuals with missing placement data, a

 Age: A key demographic factor that could influence salary.

 Work Experience: As expected, work experience had the largest

5.3 Estimating Salaries for Missing Data

Using the regression model, we estimated the salaries of individuals whose

 Individual A: A 28-year-old male with 5 years of work experience and

These predictions allowed for a more complete understanding of how salary

You might also like