0% found this document useful (0 votes)

14 views9 pages

BA15 Machine Learning Assignment Guidelines Assignment 01

The assignment focuses on developing a machine learning-based classification model to assist the Office of Foreign Labor Certification (OFLC) in processing visa applications more efficiently. Students will analyze data related to visa applications, perform exploratory data analysis, and implement various machine learning techniques to predict visa outcomes. The final deliverables include a comprehensive analysis, model building, and actionable insights to improve the visa approval process.

Uploaded by

shreeshu.kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views9 pages

BA15 Machine Learning Assignment Guidelines Assignment 01

Uploaded by

shreeshu.kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

Assignment Description:

In FY 2016, the OFLC processed 775,979 employer applications for 1,699,957 positions for temporary and
permanent labor certifications. This was a nine percent increase in the overall number of processed applications
from the previous year. The process of reviewing every case is becoming a tedious task as the number of applicants
is increasing every year.

The increasing number of applicants every year calls for a Machine Learning based solution that can help shortlist
candidates with higher chances of VISA approval. OFLC has hired the firm EasyVisa for data-driven solutions.
You as a data scientist at EasyVisa have to analyze the data provided and, with the help of a classification model:

1. Facilitate the process of visa approvals.

2. Recommend a suitable profile for the applicants for whom the visa should be certified or denied based on
the drivers that significantly influence the case status.

Assignment Objectives:
By completing this assignment, you will be able to:

 Understand and frame a business problem in a labor market context.

 Conduct univariate and bivariate EDA on structured datasets.
 Perform feature engineering and handle missing/outlier values.
 Implement ensemble models including Decision Trees, Random Forests, Bagging,
AdaBoost, and Gradient Boosting.
 Tune hyperparameters and compare model performances using classification metrics.
 Generate actionable insights to guide visa decision processes

For more information, visit the program LMS https://racelms.reva.edu.in Page 1 of 9

©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

Assignment Guidelines:

Your analysis should follow these structured steps:

Datasets to Use:

Each row in the dataset corresponds to a visa application and includes both employee and employer-
related features:

Feature Description
case_id ID of each visa application
continent Continent of the applicant
education_of_employee Education level of the employee
has_job_experience Y/N indicating prior job experience
requires_job_training Y/N indicating need for job training
no_of_employees Number of employees in employer company
yr_of_estab Year of company establishment
region_of_employment Region of employment in the U.S.
prevailing_wage Market wage for the job in that location
unit_of_wage Wage unit (Hourly, Weekly, Monthly, Yearly)
full_time_position Y/N indicating full-time role
case_status Target variable: Certified / Denied

Problem Statement:

For more information, visit the program LMS https://racelms.reva.edu.in Page 2 of 9

©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

The U.S. labor market is experiencing an increasing demand for skilled workers, driving employers to
seek qualified individuals both domestically and internationally. The Office of Foreign Labor
Certification (OFLC) oversees the complex process of evaluating visa applications submitted by
employers wishing to hire foreign talent. In FY 2016 alone, over 775,000 applications were processed —
a 9% increase from the previous year. As this volume grows annually, the manual review of applications
becomes increasingly inefficient, error-prone, and resource-intensive.

To address this challenge, EasyVisa, a data science consultancy firm, has been hired by OFLC to develop
a machine learning–based classification model that can assist in identifying applications that are most
likely to be approved. By analyzing a variety of employer and employee attributes — such as education
level, work experience, job type, prevailing wage, and location — the model aims to:

 Predict whether a visa application will be certified or denied.

 Identify key factors influencing visa outcomes.
 Streamline the visa review process by enabling data-driven decision-making.

This project will involve comprehensive exploratory data analysis (EDA), data preprocessing, and the
application of ensemble techniques such as Bagging (Random Forest) and Boosting (AdaBoost,
Gradient Boosting). The end goal is to build a robust and interpretable model that improves the
efficiency, fairness, and scalability of the visa approval process.

Analysis Expectations:s

For more information, visit the program LMS https://racelms.reva.edu.in Page 3 of 9

©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

Your Jupyter notebook should include:

1. Exploratory Data Analysis (EDA) — 15 marks

1.1 Problem Definition (2 marks)

 Q1: Clearly define the business problem and its relevance in the current labor market
scenario. (2 marks)

1.2 Univariate Analysis (4 marks)

 Q2: Perform univariate analysis on categorical and numerical variables using appropriate
plots. (2 marks)
 Q3: Comment on the distribution and patterns observed from univariate analysis. (2 marks)

1.3 Bivariate Analysis (5 marks)

 Q4: Perform bivariate analysis between independent features and the target variable using
visualizations. (3 marks)
 Q5: Provide insights on how features like education, experience, pay unit, continent, and
prevailing wage influence visa status. (2 marks)

1.4 Insight Generation (4 marks)

 Q6: Summarize key insights based on EDA. (2 marks)

 Q7: Answer the predefined EDA questions using visualizations and reasoning. (2 marks)

2. Data Preprocessing — 10 marks

For more information, visit the program LMS https://racelms.reva.edu.in Page 4 of 9

©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

2.1 Missing Value & Outlier Treatment (4 marks)

 Q8: Identify missing values and justify the chosen treatment method. (2 marks)
 Q9: Detect and treat outliers (if any), and provide rationale. (2 marks)

2.2 Feature Engineering (3 marks)

 Q10: Create or transform features that help improve model performance and explain the
reasoning. (3 marks)

2.3 Train-Test Split (3 marks)

 Q11: Properly split the dataset into training and testing sets with justification of the split
ratio. (3 marks)

3. Model Building - Bagging — 10 marks

3.1 Initial Model Building (6 marks)

 Q12: Build and evaluate Decision Tree, Bagging, and Random Forest classifiers. (3 marks)
 Q13: Compare model performance using metrics like Accuracy, Precision, Recall, F1-Score.
(3 marks)

3.2 Metric Selection (1 mark)

 Q14: Select and justify the evaluation metric(s) appropriate for this classification task. (1
mark)

3.3 Model Interpretation (3 marks)

For more information, visit the program LMS https://racelms.reva.edu.in Page 5 of 9

©Copyright 2024. REVA Academy for Corporate Excellence, REVA University. All Rights Reserved.
Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

 Q15: Interpret feature importance and model behavior. (3 marks)

4. Model Improvement - Bagging — 7 marks

4.1 Hyperparameter Tuning (5 marks)

 Q16: Perform hyperparameter tuning for Decision Tree, Bagging, and Random Forest
models. (3 marks)
 Q17: Evaluate and compare performance of tuned models across all metrics. (2 marks)

4.2 Insights from Tuning (2 marks)

 Q18: Comment on the effect of hyperparameter tuning on model performance. (2 marks)

5. Model Building & Improvement - Boosting — 10 marks

5.1 Initial Model Building (4 marks)

 Q19: Build AdaBoost and Gradient Boosting models and evaluate their performance. (2
marks)
 Q20: Compare results with Bagging models using chosen metrics. (2 marks)

5.2 Hyperparameter Tuning (4 marks)

 Q21: Tune AdaBoost and Gradient Boosting models. (2 marks)

 Q22: Analyze improvement post tuning using metrics and visual tools. (2 marks)

For more information, visit the program LMS https://racelms.reva.edu.in Page 6 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

5.3 Interpretation (2 marks)

 Q23: Interpret feature importance and how boosting methods capture complex patterns. (2
marks)

6. Actionable Insights & Final Recommendations — 5 marks

 Q24: Based on overall analysis, provide at least 3 actionable insights that could help
stakeholders. (2 marks)
 Q25: Justify the final model selection based on performance and explain how it can be used
by EasyVisa. (3 marks)

7. Notebook Structure & Execution

 Q26: Ensure smooth structure, appropriate code comments, readable formatting, and no
execution errors. (3 marks)

Deliverables:

Students must include:

1. Exploratory Data Analysis (EDA)

2. Data Preprocessing
3. Model Building - Bagging
4. Model Improvement - Bagging
5. Model Building & Improvement - Boosting
6. Actionable Insights & Final Recommendations

For more information, visit the program LMS https://racelms.reva.edu.in Page 7 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

7. Notebook Structure & Execution

Use appropriate visualizations.

Submission Details:

 Deadline: 28th July,2025

 Submission Mode: LMS upload as a zipped folder containing the notebook and presentation (if
applicable)

Evaluation Criteria:

1. Exploratory Data Analysis (EDA) — 15 marks

2. Data Preprocessing — 10 marks
3. Model Building - Bagging — 10 marks
4. Model Improvement - Bagging — 7 marks
5. Model Building & Improvement - Boosting — 10 marks
6. Actionable Insights & Final Recommendations — 5 marks
7. Notebook Structure & Execution — 3 marks

Plagiarism Policy:
 All submissions must be original
 Cite all external data sources or references
 Academic integrity will be strictly enforced

For more information, visit the program LMS https://racelms.reva.edu.in Page 8 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

Batch: BA_15 Mentor: Yuvaraju M
Date: 15th July 2025 Max Marks: 60
Prepared by: Utsav Chatterjee Type: Individual

For more information, visit the program LMS https://racelms.reva.edu.in Page 9 of 9

ML2 Project
No ratings yet
ML2 Project
38 pages
CHaitanya Mondi - CV
No ratings yet
CHaitanya Mondi - CV
3 pages
Shashank Srivastava
No ratings yet
Shashank Srivastava
2 pages
Trisha (Searchmyexpert)
No ratings yet
Trisha (Searchmyexpert)
2 pages
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
Dnyaneshwar Ds
No ratings yet
Dnyaneshwar Ds
2 pages
1 - Swati Madhukar Taur
No ratings yet
1 - Swati Madhukar Taur
2 pages
Ashwani - Balyan - 081023 - Ashwani Balyan
No ratings yet
Ashwani - Balyan - 081023 - Ashwani Balyan
2 pages
TRB CV
No ratings yet
TRB CV
2 pages
ASSIGNMENT 2 (Business Analytics For Managers)
No ratings yet
ASSIGNMENT 2 (Business Analytics For Managers)
5 pages
Harshada Bammkanti Resume
No ratings yet
Harshada Bammkanti Resume
2 pages
Project 5-EasyVisa Assignment
No ratings yet
Project 5-EasyVisa Assignment
57 pages
CVSandhya Sharma
No ratings yet
CVSandhya Sharma
2 pages
Online Course Ideas for the Course Developer
From Everand
Online Course Ideas for the Course Developer
Dennis DeLaurier
No ratings yet
Haranadh Reddy Ravi - Data Analyst-05-25
No ratings yet
Haranadh Reddy Ravi - Data Analyst-05-25
5 pages
Puneeth Resume
No ratings yet
Puneeth Resume
2 pages
New Resume Saurav
No ratings yet
New Resume Saurav
1 page
Girish Dasari: Skype ID: Girishmkr
No ratings yet
Girish Dasari: Skype ID: Girishmkr
3 pages
Business Anlytics
No ratings yet
Business Anlytics
1 page
Sandeep DS - Resume
No ratings yet
Sandeep DS - Resume
5 pages
Rohit Agrawal - Analyst
No ratings yet
Rohit Agrawal - Analyst
2 pages
Sample - Resume-2 - 1688986274153
No ratings yet
Sample - Resume-2 - 1688986274153
2 pages
Ishika Rawat Resume Template
No ratings yet
Ishika Rawat Resume Template
2 pages
Dsba Project Main Et Easyvisa
No ratings yet
Dsba Project Main Et Easyvisa
46 pages
Arun Kumar Data Analyst
No ratings yet
Arun Kumar Data Analyst
2 pages
Resume 1
No ratings yet
Resume 1
3 pages
Resume
No ratings yet
Resume
2 pages
Raushan Nov-2023
No ratings yet
Raushan Nov-2023
2 pages
Sangeeta Resume
No ratings yet
Sangeeta Resume
1 page
Dipti Olekar (Data Science)
No ratings yet
Dipti Olekar (Data Science)
3 pages
Satish Myresume
No ratings yet
Satish Myresume
2 pages
Ishika Rawat Resume
No ratings yet
Ishika Rawat Resume
3 pages
Machine Learning
100% (2)
Machine Learning
30 pages
R01 - 1 (1480)
No ratings yet
R01 - 1 (1480)
4 pages
Girish Data Scientist
No ratings yet
Girish Data Scientist
4 pages
Rashmi - Resume
No ratings yet
Rashmi - Resume
1 page
Namrata Resume
No ratings yet
Namrata Resume
4 pages
Competency-Based Training Basics
From Everand
Competency-Based Training Basics
William J. Rothwell
No ratings yet
Data Analysis Portfolio
No ratings yet
Data Analysis Portfolio
20 pages
Fresh Resume
No ratings yet
Fresh Resume
1 page
Machine Learning
No ratings yet
Machine Learning
7 pages
Raushan Dec-2023
No ratings yet
Raushan Dec-2023
2 pages
Data Science Checklist
No ratings yet
Data Science Checklist
22 pages
Answer Adm Sample
No ratings yet
Answer Adm Sample
4 pages
Updated Resume Verdana
No ratings yet
Updated Resume Verdana
7 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
49 pages
Resume 27 - Aug - 2024
No ratings yet
Resume 27 - Aug - 2024
1 page
Data Analyst Resume 2
No ratings yet
Data Analyst Resume 2
1 page
Rahul Sehrawat Resume
No ratings yet
Rahul Sehrawat Resume
1 page
Swapnil Mane Data Scientist
No ratings yet
Swapnil Mane Data Scientist
1 page
Mittal School of Business Lovely Professional University Academic Task-2
No ratings yet
Mittal School of Business Lovely Professional University Academic Task-2
1 page
Abbys
No ratings yet
Abbys
1 page
Placment Predection Using Machine Learning
No ratings yet
Placment Predection Using Machine Learning
9 pages
Rajini
No ratings yet
Rajini
3 pages
Nanda Kumar CV Da 3
No ratings yet
Nanda Kumar CV Da 3
2 pages
Bhavana Raghupatruni
No ratings yet
Bhavana Raghupatruni
3 pages
Loshitha Resume
No ratings yet
Loshitha Resume
2 pages
Aditya Shebe
No ratings yet
Aditya Shebe
3 pages
Viet Nam Technical and Vocational Education and Training Sector Assessment
From Everand
Viet Nam Technical and Vocational Education and Training Sector Assessment
Asian Development Bank
No ratings yet
Business & Leadership: Vol 3
From Everand
Business & Leadership: Vol 3
Zaheer Siddiqui
No ratings yet
Numerical Diff and Integration
No ratings yet
Numerical Diff and Integration
56 pages
LM Chart Cast Alloys Aluminum
0% (1)
LM Chart Cast Alloys Aluminum
2 pages
HCA5 Rack Layout - Equipment Layout Specification
100% (1)
HCA5 Rack Layout - Equipment Layout Specification
43 pages
Hepa Filters 01
No ratings yet
Hepa Filters 01
1 page
The Design of A Low-Voltage Bandgap Reference The Analog Mind
No ratings yet
The Design of A Low-Voltage Bandgap Reference The Analog Mind
8 pages
Wravor Catalog en
No ratings yet
Wravor Catalog en
28 pages
Discrete Maths Sle
No ratings yet
Discrete Maths Sle
13 pages
R23 DBMS Syllabus
No ratings yet
R23 DBMS Syllabus
3 pages
AMF-65 AMS RFT Partnership Range - CULT PDF
No ratings yet
AMF-65 AMS RFT Partnership Range - CULT PDF
46 pages
Data Quality Model
No ratings yet
Data Quality Model
107 pages
Docker Training
100% (1)
Docker Training
261 pages
3M Versaflo Respirator Systems Are Easy To Select: Modular Means Versatile
No ratings yet
3M Versaflo Respirator Systems Are Easy To Select: Modular Means Versatile
2 pages
Spider-Man 3
No ratings yet
Spider-Man 3
19 pages
Raptor 2024
No ratings yet
Raptor 2024
8 pages
Sales Force Techleap Ad PDF
No ratings yet
Sales Force Techleap Ad PDF
137 pages
Cold Insulation Thickness
No ratings yet
Cold Insulation Thickness
3 pages
Solution Manual For Data Structures and Problem Solving Using C++ 2/E Mark A. Weiss Immediately PDF
100% (7)
Solution Manual For Data Structures and Problem Solving Using C++ 2/E Mark A. Weiss Immediately PDF
13 pages
Rahul Choudhary CV
No ratings yet
Rahul Choudhary CV
1 page
CFS Families
No ratings yet
CFS Families
4 pages
Tips For Managing Virtual Teams 24 03 PDF
No ratings yet
Tips For Managing Virtual Teams 24 03 PDF
1 page
Nexans NYY 80-0-6 1 KV Single Core
No ratings yet
Nexans NYY 80-0-6 1 KV Single Core
6 pages
Candidate Privacy
No ratings yet
Candidate Privacy
6 pages
? Excel VLOOKUP - Massive Guide With 8 Examples
No ratings yet
? Excel VLOOKUP - Massive Guide With 8 Examples
19 pages
BMED208 Assessment 4
No ratings yet
BMED208 Assessment 4
5 pages
05 Group Account Management
No ratings yet
05 Group Account Management
13 pages
Running Head: DATA STRUCTURES 1: Course: Project Name: Student Name: Date
No ratings yet
Running Head: DATA STRUCTURES 1: Course: Project Name: Student Name: Date
7 pages
10 Ways To Sell 100,000 Copies of Your Book PDF
No ratings yet
10 Ways To Sell 100,000 Copies of Your Book PDF
6 pages
Block Diagram
No ratings yet
Block Diagram
6 pages
Appendix D: Introduction To Flowcharting
No ratings yet
Appendix D: Introduction To Flowcharting
10 pages
Preliminaryproject
No ratings yet
Preliminaryproject
9 pages

BA15 Machine Learning Assignment Guidelines Assignment 01

Uploaded by

BA15 Machine Learning Assignment Guidelines Assignment 01

Uploaded by

Assignment Guidelines

Program: PGDM/MS in Business Analytics Module: Machine Learning

1. Facilitate the process of visa approvals.

 Understand and frame a business problem in a labor market context.

For more information, visit the program LMS https://racelms.reva.edu.in Page 1 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

Your analysis should follow these structured steps:

For more information, visit the program LMS https://racelms.reva.edu.in Page 2 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

 Predict whether a visa application will be certified or denied.

For more information, visit the program LMS https://racelms.reva.edu.in Page 3 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

Your Jupyter notebook should include:

1. Exploratory Data Analysis (EDA) — 15 marks

1.1 Problem Definition (2 marks)

1.2 Univariate Analysis (4 marks)

1.3 Bivariate Analysis (5 marks)

1.4 Insight Generation (4 marks)

 Q6: Summarize key insights based on EDA. (2 marks)

2. Data Preprocessing — 10 marks

For more information, visit the program LMS https://racelms.reva.edu.in Page 4 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

2.1 Missing Value & Outlier Treatment (4 marks)

2.2 Feature Engineering (3 marks)

2.3 Train-Test Split (3 marks)

3. Model Building - Bagging — 10 marks

3.1 Initial Model Building (6 marks)

3.2 Metric Selection (1 mark)

3.3 Model Interpretation (3 marks)

For more information, visit the program LMS https://racelms.reva.edu.in Page 5 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

 Q15: Interpret feature importance and model behavior. (3 marks)

4. Model Improvement - Bagging — 7 marks

4.1 Hyperparameter Tuning (5 marks)

4.2 Insights from Tuning (2 marks)

 Q18: Comment on the effect of hyperparameter tuning on model performance. (2 marks)

5. Model Building & Improvement - Boosting — 10 marks

5.1 Initial Model Building (4 marks)

5.2 Hyperparameter Tuning (4 marks)

 Q21: Tune AdaBoost and Gradient Boosting models. (2 marks)

For more information, visit the program LMS https://racelms.reva.edu.in Page 6 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

5.3 Interpretation (2 marks)

6. Actionable Insights & Final Recommendations — 5 marks

7. Notebook Structure & Execution

Students must include:

1. Exploratory Data Analysis (EDA)

For more information, visit the program LMS https://racelms.reva.edu.in Page 7 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

7. Notebook Structure & Execution

Use appropriate visualizations.

 Deadline: 28th July,2025

1. Exploratory Data Analysis (EDA) — 15 marks

For more information, visit the program LMS https://racelms.reva.edu.in Page 8 of 9

Program: PGDM/MS in Business Analytics Module: Machine Learning

For more information, visit the program LMS https://racelms.reva.edu.in Page 9 of 9

You might also like