0% found this document useful (0 votes)
14 views9 pages

BA15 Machine Learning Assignment Guidelines Assignment 01

The assignment focuses on developing a machine learning-based classification model to assist the Office of Foreign Labor Certification (OFLC) in processing visa applications more efficiently. Students will analyze data related to visa applications, perform exploratory data analysis, and implement various machine learning techniques to predict visa outcomes. The final deliverables include a comprehensive analysis, model building, and actionable insights to improve the visa approval process.

Uploaded by

shreeshu.kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views9 pages

BA15 Machine Learning Assignment Guidelines Assignment 01

The assignment focuses on developing a machine learning-based classification model to assist the Office of Foreign Labor Certification (OFLC) in processing visa applications more efficiently. Students will analyze data related to visa applications, perform exploratory data analysis, and implement various machine learning techniques to predict visa outcomes. The final deliverables include a comprehensive analysis, model building, and actionable insights to improve the visa approval process.

Uploaded by

shreeshu.kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

Assignment Description:

In FY 2016, the OFLC processed 775,979 employer applications for 1,699,957 positions for temporary and
permanent labor certifications. This was a nine percent increase in the overall number of processed applications
from the previous year. The process of reviewing every case is becoming a tedious task as the number of applicants
is increasing every year.

The increasing number of applicants every year calls for a Machine Learning based solution that can help shortlist
candidates with higher chances of VISA approval. OFLC has hired the firm EasyVisa for data-driven solutions.
You as a data scientist at EasyVisa have to analyze the data provided and, with the help of a classification model:

1. Facilitate the process of visa approvals.


2. Recommend a suitable profile for the applicants for whom the visa should be certified or denied based on
the drivers that significantly influence the case status.

Assignment Objectives:
By completing this assignment, you will be able to:

 Understand and frame a business problem in a labor market context.


 Conduct univariate and bivariate EDA on structured datasets.
 Perform feature engineering and handle missing/outlier values.
 Implement ensemble models including Decision Trees, Random Forests, Bagging,
AdaBoost, and Gradient Boosting.
 Tune hyperparameters and compare model performances using classification metrics.
 Generate actionable insights to guide visa decision processes

For more information, visit the program LMS https://racelms.reva.edu.in Page 1 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

Assignment Guidelines:

Your analysis should follow these structured steps:

Datasets to Use:

Each row in the dataset corresponds to a visa application and includes both employee and employer-
related features:

Feature Description
case_id ID of each visa application
continent Continent of the applicant
education_of_employee Education level of the employee
has_job_experience Y/N indicating prior job experience
requires_job_training Y/N indicating need for job training
no_of_employees Number of employees in employer company
yr_of_estab Year of company establishment
region_of_employment Region of employment in the U.S.
prevailing_wage Market wage for the job in that location
unit_of_wage Wage unit (Hourly, Weekly, Monthly, Yearly)
full_time_position Y/N indicating full-time role
case_status Target variable: Certified / Denied

Problem Statement:

For more information, visit the program LMS https://racelms.reva.edu.in Page 2 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

The U.S. labor market is experiencing an increasing demand for skilled workers, driving employers to
seek qualified individuals both domestically and internationally. The Office of Foreign Labor
Certification (OFLC) oversees the complex process of evaluating visa applications submitted by
employers wishing to hire foreign talent. In FY 2016 alone, over 775,000 applications were processed —
a 9% increase from the previous year. As this volume grows annually, the manual review of applications
becomes increasingly inefficient, error-prone, and resource-intensive.

To address this challenge, EasyVisa, a data science consultancy firm, has been hired by OFLC to develop
a machine learning–based classification model that can assist in identifying applications that are most
likely to be approved. By analyzing a variety of employer and employee attributes — such as education
level, work experience, job type, prevailing wage, and location — the model aims to:

 Predict whether a visa application will be certified or denied.


 Identify key factors influencing visa outcomes.
 Streamline the visa review process by enabling data-driven decision-making.

This project will involve comprehensive exploratory data analysis (EDA), data preprocessing, and the
application of ensemble techniques such as Bagging (Random Forest) and Boosting (AdaBoost,
Gradient Boosting). The end goal is to build a robust and interpretable model that improves the
efficiency, fairness, and scalability of the visa approval process.

Analysis Expectations:s

For more information, visit the program LMS https://racelms.reva.edu.in Page 3 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

Your Jupyter notebook should include:

1. Exploratory Data Analysis (EDA) — 15 marks

1.1 Problem Definition (2 marks)

 Q1: Clearly define the business problem and its relevance in the current labor market
scenario. (2 marks)

1.2 Univariate Analysis (4 marks)

 Q2: Perform univariate analysis on categorical and numerical variables using appropriate
plots. (2 marks)
 Q3: Comment on the distribution and patterns observed from univariate analysis. (2 marks)

1.3 Bivariate Analysis (5 marks)

 Q4: Perform bivariate analysis between independent features and the target variable using
visualizations. (3 marks)
 Q5: Provide insights on how features like education, experience, pay unit, continent, and
prevailing wage influence visa status. (2 marks)

1.4 Insight Generation (4 marks)

 Q6: Summarize key insights based on EDA. (2 marks)


 Q7: Answer the predefined EDA questions using visualizations and reasoning. (2 marks)

2. Data Preprocessing — 10 marks

For more information, visit the program LMS https://racelms.reva.edu.in Page 4 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

2.1 Missing Value & Outlier Treatment (4 marks)

 Q8: Identify missing values and justify the chosen treatment method. (2 marks)
 Q9: Detect and treat outliers (if any), and provide rationale. (2 marks)

2.2 Feature Engineering (3 marks)

 Q10: Create or transform features that help improve model performance and explain the
reasoning. (3 marks)

2.3 Train-Test Split (3 marks)

 Q11: Properly split the dataset into training and testing sets with justification of the split
ratio. (3 marks)

3. Model Building - Bagging — 10 marks

3.1 Initial Model Building (6 marks)

 Q12: Build and evaluate Decision Tree, Bagging, and Random Forest classifiers. (3 marks)
 Q13: Compare model performance using metrics like Accuracy, Precision, Recall, F1-Score.
(3 marks)

3.2 Metric Selection (1 mark)

 Q14: Select and justify the evaluation metric(s) appropriate for this classification task. (1
mark)

3.3 Model Interpretation (3 marks)

For more information, visit the program LMS https://racelms.reva.edu.in Page 5 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

 Q15: Interpret feature importance and model behavior. (3 marks)

4. Model Improvement - Bagging — 7 marks

4.1 Hyperparameter Tuning (5 marks)

 Q16: Perform hyperparameter tuning for Decision Tree, Bagging, and Random Forest
models. (3 marks)
 Q17: Evaluate and compare performance of tuned models across all metrics. (2 marks)

4.2 Insights from Tuning (2 marks)

 Q18: Comment on the effect of hyperparameter tuning on model performance. (2 marks)

5. Model Building & Improvement - Boosting — 10 marks

5.1 Initial Model Building (4 marks)

 Q19: Build AdaBoost and Gradient Boosting models and evaluate their performance. (2
marks)
 Q20: Compare results with Bagging models using chosen metrics. (2 marks)

5.2 Hyperparameter Tuning (4 marks)

 Q21: Tune AdaBoost and Gradient Boosting models. (2 marks)


 Q22: Analyze improvement post tuning using metrics and visual tools. (2 marks)

For more information, visit the program LMS https://racelms.reva.edu.in Page 6 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

5.3 Interpretation (2 marks)

 Q23: Interpret feature importance and how boosting methods capture complex patterns. (2
marks)

6. Actionable Insights & Final Recommendations — 5 marks

 Q24: Based on overall analysis, provide at least 3 actionable insights that could help
stakeholders. (2 marks)
 Q25: Justify the final model selection based on performance and explain how it can be used
by EasyVisa. (3 marks)

7. Notebook Structure & Execution

 Q26: Ensure smooth structure, appropriate code comments, readable formatting, and no
execution errors. (3 marks)

Deliverables:

Students must include:

1. Exploratory Data Analysis (EDA)


2. Data Preprocessing
3. Model Building - Bagging
4. Model Improvement - Bagging
5. Model Building & Improvement - Boosting
6. Actionable Insights & Final Recommendations

For more information, visit the program LMS https://racelms.reva.edu.in Page 7 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

7. Notebook Structure & Execution

Use appropriate visualizations.

Submission Details:

 Deadline: 28th July,2025

 Submission Mode: LMS upload as a zipped folder containing the notebook and presentation (if
applicable)

Evaluation Criteria:

1. Exploratory Data Analysis (EDA) — 15 marks


2. Data Preprocessing — 10 marks
3. Model Building - Bagging — 10 marks
4. Model Improvement - Bagging — 7 marks
5. Model Building & Improvement - Boosting — 10 marks
6. Actionable Insights & Final Recommendations — 5 marks
7. Notebook Structure & Execution — 3 marks

Plagiarism Policy:
 All submissions must be original
 Cite all external data sources or references
 Academic integrity will be strictly enforced

For more information, visit the program LMS https://racelms.reva.edu.in Page 8 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning


Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

For more information, visit the program LMS https://racelms.reva.edu.in Page 9 of 9


©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.

You might also like