DSciHomeworkAssignmentV4

The assignment requires the development of a salary prediction system for job postings that lack salary information, utilizing provided CSV files containing job metadata and salary data. The deliverables include a CSV file with predicted salaries, the code used for the prediction, and answers to specific questions regarding the methodology and data preparation. The accuracy of the predictions will be evaluated using root-mean-square error (RMSE).

Uploaded by

mahmutluck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

DSciHomeworkAssignmentV4

Uploaded by

mahmutluck

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Data Science Interview Assignment

Assignment Description
Being a job search engine, it’s helpful if we can suggest an approximate salary to job seekers
for a given job post. Unfortunately, not all job postings include the salary. This is where you
come in: your first task as an Indeed Data Scientist is to develop a salary prediction system.
The goal: provide estimated salaries for a new job posting.

Data Supplied
You are given three CSV (comma-separated) data files:
● train_features.csv: Each row represents metadata for an individual job posting.
The “jobId” column represents a unique identifier for the job posting. The remaining
columns describe features of the job posting.
● train_salaries.csv: Each row associates a “jobId” with a “salary”.
● test_features.csv: Similar to train_features.csv, each row represents
metadata for an individual job posting
The first row of each file contains headers for the columns. Keep in mind that the metadata
and salaries have been extracted by our aggregation and parsing systems. As such, it’s
possible that the data is dirty (may contain errors).

The Task
You must build a model to predict the salaries for the job postings contained in
test_features.csv. The output of your system should be a CSV file entitled
test_salaries.csv where each row has the following format:
jobId,salary
As a reference, your output should mirror the format of train_salaries.csv.

To judge the accuracy, we will compare your salary predictions to a ground-truth using the
root-mean-square error (RMSE).

As a guideline, you should expect to spend around 4 hours to complete this exercise
(including model training time). The assignment does not have to be completed all at once.
Please do not share the assignment, the data, or your solutions with anyone other than your
recruiter.

Deliverables
The following deliverables must be submitted to Indeed:
● Your test_salaries.csv file containing the salary predictions and job Ids for the
test data set (please use .zip or .gz compression).
● The code that you wrote to solve the problem
● Answers to the questions below [in .pdf or .txt].
● Any related files such as figures, etc…

Please do not include your name in any of the deliverables, since evaluation is double blind.

Questions
Please answer the following questions.
1. How long did it take you to solve the problem?
2. What software language and libraries did you use to solve the problem? Why did you
choose these languages/libraries?
3. What steps did you take to prepare the data for the project? Was any cleaning
necessary?
4. a) What machine learning method did you apply?
b) Why did you choose this method?
c) What other methods did you consider?
5. Describe how the machine learning algorithm that you chose works.
6. Was any encoding or transformation of features necessary? If so, what
encoding/transformation did you use?
7. Which features had the greatest impact on salary? How did you identify these to be
most significant? Which features had the least impact on salary? How did you identify
these?
8. How did you train your model? During training, what issues concerned you?
9. a) Please estimate the RMSE that your model will achieve on the test dataset.
b) How did you create this estimate?
10. What metrics, other than RMSE, would be useful for assessing the accuracy of salary
estimates? Why?

Certified Red Team Professional (CRTP)
No ratings yet
Certified Red Team Professional (CRTP)
33 pages
Ace the Trading Systems Developer Interview (C++ Edition) : Insider's Guide to Top Tech Jobs in Finance
From Everand
Ace the Trading Systems Developer Interview (C++ Edition) : Insider's Guide to Top Tech Jobs in Finance
Dennis Thompson Sr
5/5 (1)
Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
Cybersecurity Jobs 3- in-1 Value Bundle: Resume, Career Paths, and Work From Home
From Everand
Cybersecurity Jobs 3- in-1 Value Bundle: Resume, Career Paths, and Work From Home
Bruce Brown
No ratings yet
Machine Learning with SAS Viya
From Everand
Machine Learning with SAS Viya
SAS Institute Inc.
No ratings yet
Automated Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Automated Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Practical Earned Value Analysis: 25 Project Indicators from 5 Measurements
From Everand
Practical Earned Value Analysis: 25 Project Indicators from 5 Measurements
Akram Najjar
No ratings yet
Hack Travian Account
No ratings yet
Hack Travian Account
1 page
WoW Exploiting Macros
No ratings yet
WoW Exploiting Macros
9 pages
Project Report
No ratings yet
Project Report
11 pages
BT4234 - RPT - Mr. Sreenarayanan N M
No ratings yet
BT4234 - RPT - Mr. Sreenarayanan N M
32 pages
Group 24 Miniproject
No ratings yet
Group 24 Miniproject
33 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
CODE MASTERS
No ratings yet
CODE MASTERS
10 pages
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
Salary Prediction Using Machine Learning
No ratings yet
Salary Prediction Using Machine Learning
4 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
AI In Career Development: A Comprehensive Guide
From Everand
AI In Career Development: A Comprehensive Guide
Ronald Matheny
No ratings yet
JOB SALARIES PREDICTION SYSTEM
No ratings yet
JOB SALARIES PREDICTION SYSTEM
9 pages
Salary Predictions
No ratings yet
Salary Predictions
43 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
How to Track Schedules, Costs and Earned Value with Microsoft Project
From Everand
How to Track Schedules, Costs and Earned Value with Microsoft Project
Akram Najjar
No ratings yet
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
From Everand
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
Cloudy Heaven Games
No ratings yet
Task1
No ratings yet
Task1
5 pages
Software Industry Salary Prediction
No ratings yet
Software Industry Salary Prediction
14 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
AI-Powered Resume Mastery: Complete Guide on How to Write a Winning Resume with AI (Without Getting Caught!)
From Everand
AI-Powered Resume Mastery: Complete Guide on How to Write a Winning Resume with AI (Without Getting Caught!)
Timo Sprenger
No ratings yet
AI 53
No ratings yet
AI 53
13 pages
Salary_hike_predictor_synopsis
No ratings yet
Salary_hike_predictor_synopsis
4 pages
MCD2080 Business Statistics Group Assignment-Final
No ratings yet
MCD2080 Business Statistics Group Assignment-Final
5 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
13 pages
Gladwin Tirkey Research Paper
No ratings yet
Gladwin Tirkey Research Paper
7 pages
PeopleSoft HRMS Interview Questions, Answers, and Explanations
From Everand
PeopleSoft HRMS Interview Questions, Answers, and Explanations
equitypress
4.5/5 (3)
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
No ratings yet
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
10 pages
Software and Systems Modeling Second Edition
From Everand
Software and Systems Modeling Second Edition
Gerardus Blokdyk
No ratings yet
Biostatistics by Example Using SAS Studio
From Everand
Biostatistics by Example Using SAS Studio
Ron Cody
No ratings yet
SARATH RESUME_P
No ratings yet
SARATH RESUME_P
2 pages
Model Based Systems Engineering A Complete Guide - 2020 Edition
From Everand
Model Based Systems Engineering A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet
Practice Questions for UiPath Certified RPA Associate Case Based
From Everand
Practice Questions for UiPath Certified RPA Associate Case Based
Exam OG
No ratings yet
ML Report
No ratings yet
ML Report
20 pages
Problem Statement
0% (2)
Problem Statement
2 pages
Problem Statement
No ratings yet
Problem Statement
2 pages
System Engineering Software Second Edition
From Everand
System Engineering Software Second Edition
Gerardus Blokdyk
No ratings yet
Computer Algebra: Fundamentals and Applications
From Everand
Computer Algebra: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Datacenter Operating System The Ultimate Step-By-Step Guide
From Everand
Datacenter Operating System The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
PTE_Data_Science_Candidate_Interview_Prep_Pack_[Final] 2
No ratings yet
PTE_Data_Science_Candidate_Interview_Prep_Pack_[Final] 2
12 pages
Muscle-Computer Interface Third Edition
From Everand
Muscle-Computer Interface Third Edition
Gerardus Blokdyk
No ratings yet
KEL 2 - UAS DATA SCIENCE
No ratings yet
KEL 2 - UAS DATA SCIENCE
17 pages
TypeScript Interview Playbook
From Everand
TypeScript Interview Playbook
Tech Interviews
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
OS Containers Standard Requirements
From Everand
OS Containers Standard Requirements
Gerardus Blokdyk
No ratings yet
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
Engineering mathematics The Ultimate Step-By-Step Guide
From Everand
Engineering mathematics The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Automated System Operations ASO Complete Self-Assessment Guide
From Everand
Automated System Operations ASO Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Designing Machine Learning Systems with Python Complete Self-Assessment Guide
From Everand
Designing Machine Learning Systems with Python Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
Task8
No ratings yet
Task8
2 pages
Software Engineering & Object Oriented Modeling
From Everand
Software Engineering & Object Oriented Modeling
Jitendra Patel
No ratings yet
CAD system Standard Requirements
From Everand
CAD system Standard Requirements
Gerardus Blokdyk
No ratings yet
Notice For Students Choice Elective-1
No ratings yet
Notice For Students Choice Elective-1
2 pages
DataKitchen Dataops Cookbook
100% (3)
DataKitchen Dataops Cookbook
142 pages
Zeos Component Library:: Expression Reference Guide
No ratings yet
Zeos Component Library:: Expression Reference Guide
8 pages
Security: Strategies For Securing Distributed Systems
No ratings yet
Security: Strategies For Securing Distributed Systems
64 pages
Design Thinking Slot and Work Load Reg
No ratings yet
Design Thinking Slot and Work Load Reg
2 pages
DevOps Course Breakdown
No ratings yet
DevOps Course Breakdown
2 pages
Hanwen Antonyo
No ratings yet
Hanwen Antonyo
2 pages
FoST 2018 Final-Chapter 3
No ratings yet
FoST 2018 Final-Chapter 3
31 pages
Elektor en Article Easyavr5a Serial Ethernet Basic
No ratings yet
Elektor en Article Easyavr5a Serial Ethernet Basic
2 pages
Section C - Digital MCQ3
No ratings yet
Section C - Digital MCQ3
3 pages
Binary File Handling
No ratings yet
Binary File Handling
8 pages
(Ebook) Go in Action - Second Edition (MEAP V03) by Andrew Walker, William Kennedy ISBN 9781633439702, 1633439704 - Download the ebook now to never miss important content
100% (2)
(Ebook) Go in Action - Second Edition (MEAP V03) by Andrew Walker, William Kennedy ISBN 9781633439702, 1633439704 - Download the ebook now to never miss important content
75 pages
1.1 - Types of Computers
No ratings yet
1.1 - Types of Computers
2 pages
Nelito's - Tablet Banking - Doorstep Banking
No ratings yet
Nelito's - Tablet Banking - Doorstep Banking
8 pages
Visual C++ 2008 Tutorial
No ratings yet
Visual C++ 2008 Tutorial
12 pages
Test Strategy
100% (1)
Test Strategy
18 pages
DSA Assignment-4
No ratings yet
DSA Assignment-4
6 pages
AAA
No ratings yet
AAA
4 pages
2018 q3 q4 Grit Report
No ratings yet
2018 q3 q4 Grit Report
133 pages
IP Project Saleha
No ratings yet
IP Project Saleha
34 pages
The Role of Data Science in Healthcare Advancement
No ratings yet
The Role of Data Science in Healthcare Advancement
11 pages
Mit Database Notes
No ratings yet
Mit Database Notes
703 pages
Rust Cheat Sheet
No ratings yet
Rust Cheat Sheet
78 pages
01 DataStage Overview
No ratings yet
01 DataStage Overview
17 pages
ENGR 1200 Introduction To Programming Lecture 0
No ratings yet
ENGR 1200 Introduction To Programming Lecture 0
12 pages
Lock 101
No ratings yet
Lock 101
4 pages
Mac OS X Mail Keyboard Shortcuts
100% (3)
Mac OS X Mail Keyboard Shortcuts
2 pages

DSciHomeworkAssignmentV4

Uploaded by

DSciHomeworkAssignmentV4

Uploaded by

Data Science Interview Assignment

You might also like