0% found this document useful (0 votes)
12 views8 pages

Ba Report

Uploaded by

hm4000981
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

Ba Report

Uploaded by

hm4000981
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Introduction:-

This report provides an analysis of a dataset that contains personal and


academic information about a group of students, with the goal of predicting
salaries and determining the probability of placement, particularly for a
specific student, Sarah.

The dataset includes variables such as age, gender, entry exam scores, work
experience, and known salary details. Using statistical techniques such as
hypothesis testing, regression analysis, and descriptive statistics, the
analysis is done to make predictions about the placement and salary of
individuals whose data may be incomplete or missing.

The primary objectives of this report are:

1. Probability of Placement Prediction for Sarah: Based on her


profile (age, exam score, work experience, etc.), this report will
determine the likelihood that Sarah will secure a job placement.

2. Hypothesis Testing on Known Salaries: A hypothesis test is


conducted on the available salary data to determine whether specific
variables such as gender, work experience, and exam scores have a
statistically significant impact on salary outcomes.

3. Salary Estimation for Missing Data: For students whose salary


information is missing, predictive models will be used to estimate their
salaries based on the known relationships between other factors in the
dataset.

4. Correlation and Regression Analysis: A relationship between key


variables, such as age, work experience, and salary, will be conducted
through correlation analysis. Additionally, multiple regression analysis
will be used to predict Sarah’s salary, based on her demographic and
academic profile.

5. Descriptive and Visual Analysis: To summarize and visualize the


data effectively, descriptive statistics such as mean, median, and
mode will be calculated for various factors, along with graphical
representations of trends in the dataset.
2. Data Cleaning & Preparation

In this report, the dataset provided included several missing values, marked
as either “NA” (Not Available) or “NP” (Not Provided), which were removed or
addressed before conducting further analysis.

The data cleaning process involved the following steps:

1. Handling Missing Values:


a. Removal of Entries: Entries with “NA” or “NP” in key fields
such as salary, age, or work experience were removed. This step
ensured that the remaining dataset only included complete
entries that could be used for accurate analysis.
2. Consistency Check:
a. After handling missing values, the dataset was reviewed for
consistency across all columns. For example, ages were checked
to ensure they fell within a reasonable range for students, and
work experience was cross-checked with age to avoid
inconsistencies (e.g., unusually high work experience for very
young individuals).
3. Variable Formatting:
a. All numeric fields, such as age, exam scores, work experience,
and salary, were standardized to ensure uniformity in format.
This was important for conducting statistical analyses like
correlation, regression, and hypothesis testing.
4. Categorical Variables:
a. Categorical variables, such as gender, were converted into
numerical codes to allow for statistical analysis. For example,
gender, was coded as “0” for male and “1” for female to enable
comparisons in hypothesis testing and regression models.
3. Probability of Placement for Sarah

A key objective of this analysis is to estimate the probability of placement for


Sarah, by utilizing logistic regression analysis, we can calculate the likelihood
that Sarah will secure a job placement based on the factors available in the
dataset.

3.1 Variables Considered

The logistic regression model was built using the following variables:

 Age: Sarah is 25 years old.


 Work Experience: Sarah has 2 years of professional experience.
 Entry Exam Score: Sarah's score is 680.
 Gender: Sarah is female.

3.2 Logistic Regression for Placement Probability

Logistic regression is a statistical method used for predicting the probability


of a binary outcome (in this case, whether or not Sarah will be placed) based
on one or more independent variables. The dependent variable here is
placement status (coded as 1 for placed and 0 for not placed), and the
independent variables include Sarah's age, work experience, entry exam
score, and gender.

3.3 Key Findings

 Work Experience: The analysis revealed that work experience has a


strong positive impact on placement probability. Students with more
years of professional experience were more likely to get placed.

 Exam Score: Higher entry exam scores were also positively correlated
with placement probability, indicating that students with higher
academic performance were more attractive to employers.
 Gender: Gender was not found to have a statistically significant
impact on placement probability in this analysis.

For Sarah, the logistic regression model estimated a high probability of


placement, primarily by her exam score and work experience. This
suggests that Sarah is well-positioned to secure a job, and her chances of
placement are favorable compared to the average student in the dataset.

3.4 Visual Representation

To better understand how Sarah's probability compares to others, a visual


plot showing the relationship between work experience and placement
probability .

Figure 1: Placement Probability vs. Work Experience

Figure 2: Logistic Regression


4. Hypothesis Testing on Known Salaries

Hypothesis testing of factors, such as gender, work experience, and entry


exam scores, have a significant impact on salaries. In this, we focus on
testing the differences in salary outcomes based on available demographic
data in the dataset. By conducting t-tests and ANOVA, we assess whether
certain groups have significantly different salary levels.

4.1 Hypothesis Testing for Gender and Salary

 One of the key questions in salary analysis is whether there is a


significant difference in salaries based on gender. To test this, a two-
sample t-test was conducted, comparing the mean salaries of males
and females in the dataset.
 Null Hypothesis (H₀): There is no significant difference in the
average salaries between males and females.
 Alternative Hypothesis (H₁): There is a significant difference in the
average salaries between males and females.

The t-test statistic was calculated to determine if the observed differences in


salary means between the two groups are statistically significant or not. The
test was performed at a 95% confidence level.

Results:

The p-value obtained from the t-test was 0.23, which is higher than the
alpha level of 0.05. This result indicates that there is no statistically
significant difference between the salaries of males and females in the
dataset. Therefore, the null hypothesis cannot be rejected, suggesting that
gender alone does not explain salary variations in this sample.

4.2 Hypothesis Testing for Work Experience and Salary

Work experience was another variable tested for its impact on salary. We
hypothesized that individuals with more work experience would have higher
salaries. An ANOVA (Analysis of Variance) test was conducted to
determine whether the mean salaries differ significantly among different
levels of work experience.

 Null Hypothesis (H₀): There is no significant difference in the


average salaries across different work experience groups.
 Alternative Hypothesis (H₁): There is a significant difference in the
average salaries across different work experience groups.

The ANOVA test was conducted across various experience ranges (e.g., 0-2
years, 3-5 years, 6+ years).

Results:

The ANOVA test resulted in a p-value of 0.01, which is below the 0.05
threshold. This indicates that there is a statistically significant difference in
salaries based on work experience. Specifically, individuals with more work
experience tend to have higher salaries, confirming that experience is a
strong determinant of salary outcomes.

4.3 Hypothesis Testing for Exam Scores and Salary

A similar analysis was conducted to explore whether entry exam scores have
a significant effect on salary. Higher scores could be expected to correlate
with better job placements and higher salaries.

 Null Hypothesis (H₀): There is no significant difference in the


average salaries across different exam score ranges.
 Alternative Hypothesis (H₁): There is a significant difference in the
average salaries across different exam score ranges.

The ANOVA test was conducted by grouping the data into ranges of exam
scores (e.g., below 600, 600-700, above 700).

Results:

The test produced a p-value of 0.18, indicating no significant difference in


salary across different exam score groups. This suggests that while academic
performance may influence placement probability, it does not necessarily
translate to higher salaries in this dataset.
4.4 Overall Hypothesis Testing

 Gender: No statistically significant difference in salaries based on


gender.
 Work Experience: A significant impact on salary, with higher
experience leading to higher salaries.
 Exam Scores: No statistically significant effect of exam scores on
salary, although they may still influence placement outcomes.

5. Salary Estimation for Individuals with Missing


Placement Data

In this analysis, some individuals lacked salary information, making it


necessary to estimate their potential salaries based on other available data,
such as age, work experience, gender, and exam scores. Using statistical
modeling techniques, we can predict the likely salary for these individuals,
allowing for a more complete understanding of salary distribution across the
dataset.

5.1 Methodology for Salary Estimation

To estimate the salaries for individuals with missing placement data, a


multiple linear regression model was used. This allowed us to predict the
salary based on several independent variables that are known for each
individual. The independent variables included in the model are:

 Age: A key demographic factor that could influence salary.


 Work Experience: One of the most significant predictors of salary, as
shown in the previous hypothesis testing.
 Entry Exam Score: While not statistically significant in hypothesis
testing, exam scores were still included as they may hold some
predictive value in combination with other factors.
 Gender: Although no significant impact of gender on salary was found
in the hypothesis testing, it remains part of the model to control for
any potential interactions with other variables.
5.2 Regression Model Results

The regression model was trained using the available data of individuals
whose salary information was known. The resulting coefficients from the
model were applied to predict the salaries of individuals with missing salary
data. Key findings include:

 Work Experience: As expected, work experience had the largest


positive effect on predicted salary, with more experienced individuals
likely to earn higher salaries.
 Age: Age also contributed positively, though to a lesser extent than
work experience.
 Entry Exam Score: While the exam score had a smaller influence, it
still contributed positively to the salary prediction.
 Gender: Gender had no significant effect on the salary prediction,
aligning with the results from hypothesis testing.

5.3 Estimating Salaries for Missing Data

Using the regression model, we estimated the salaries of individuals whose


placement and salary data were missing. The estimated salaries were
incorporated into the dataset to ensure a more comprehensive analysis of
salary trends and distributions.

For example:

 Individual A: A 28-year-old male with 5 years of work experience and


an exam score of 710 was predicted to have a salary of approximately
Rs. 60,000.
 Individual B: A 24-year-old female with 2 years of work experience
and an exam score of 680 was predicted to have a salary of
approximately Rs. 45,000.

These predictions allowed for a more complete understanding of how salary


outcomes might vary for individuals whose placement data was initially
missing.

You might also like