0% found this document useful (0 votes)
17 views12 pages

Code Output

Uploaded by

Rakshit Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

Code Output

Uploaded by

Rakshit Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

final-project - Jupyter Notebook 2021-12-09, 2:30 AM

Automated Resume Screening


COMP 4750: Natural Language Processing
Shawon Ibn Kamal, 201761376

In [1]: from os import path


from glob import glob

from pdfminer.high_level import extract_text

import nltk
from [Link] import stopwords
import re
import subprocess

import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer


from [Link] import cosine_similarity
from [Link] import euclidean_distances

Part 1: Parsing
Read resume pdf

[Link] Page 1 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [2]: mypath = "resumes-list"

def find_ext(dr, ext):


return glob([Link](dr,"*.{}".format(ext)))

resumepaths = find_ext(mypath, "pdf")

df = [Link] (resumepaths, columns = ['path'])

df['text'] = df['path'].apply(lambda x: extract_text(x))

[Link]()

Out[2]:
path text

resumes-list/resume-example-option-
0 justin.green11@[Link]\n(123) 456-7890\nWash...
software-en...

resumes-list/resume-example-option-project-
1 Stephen Greet\nProject Manager\nPMP certified p...
man...

resumes-list/resume-example-option-
2 Ashley Doyle, Esq\n\[Link]@[Link]\n\n(1...
[Link]

resumes-list/resume-example-option-
3 Stephen Greet\nSales Associate\n\nWork Experie...
[Link]

resumes-list/data-scientist-resume- KANDICE LOUDOR\n\nDATA


4 [Link] SCIENTIST\n\nCONTACT\n\...

Retrieve candidate name

[Link] Page 2 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [3]: def extract_names(txt):


person_names = []

for sent in nltk.sent_tokenize(txt):


for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent
if hasattr(chunk, 'label') and [Link]() == 'PERSON':
person_names.append(
' '.join(chunk_leave[0] for chunk_leave in [Link]
)

return person_names

df['name'] = [Link](lambda x: extract_names(x)[0])

[Link]()

Out[3]:
path text name

resumes-list/resume-example-option- justin.green11@[Link]\n(123) 456- Github


0 software-en... 7890\nWash... SKILLS

resumes-list/resume-example-option- Stephen Greet\nProject Manager\nPMP


1 Stephen
project-man... certified p...

resumes-list/resume-example-option- Ashley Doyle,


2 Ashley
[Link] Esq\n\[Link]@[Link]\n\n(1...

resumes-list/resume-example-option- Stephen Greet\nSales Associate\n\nWork


3 Stephen
[Link] Experie...

resumes-list/data-scientist-resume- KANDICE LOUDOR\n\nDATA


4 Github
[Link] SCIENTIST\n\nCONTACT\n\...

Extract phone-number

[Link] Page 3 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [4]: phone_regex = [Link](r'[\+\(]?[1-9][0-9 .\-\(\)]{8,}[0-9]')

def extract_phone_number(resume_text):
phone = [Link](phone_regex, resume_text)

if phone:
number = ''.join(phone[0])

if resume_text.find(number) >= 0 and len(number) < 16:


return number
return None

df['phone'] = [Link](lambda x: extract_phone_number(x))

[Link]()

Out[4]:
path text name phone

resumes-list/resume-example- justin.green11@[Link]\n(123) 456- Github (123)


0 option-software-en... 7890\nWash... SKILLS 456-7890

resumes-list/resume-example- Stephen Greet\nProject Manager\nPMP (123)


1 Stephen
option-project-man... certified p... 456-7890

resumes-list/resume-example- Ashley Doyle, (123)


2 Ashley
[Link] Esq\n\[Link]@[Link]\n\n(1... 456-7890

resumes-list/resume-example- Stephen Greet\nSales (123)


3 Stephen
[Link] Associate\n\nWork Experie... 456-7890

resumes-list/data-scientist- KANDICE LOUDOR\n\nDATA (123)


4 Github
[Link] SCIENTIST\n\nCONTACT\n\... 456-7890

Extract email

[Link] Page 4 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [5]: email_regex = [Link](r'[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+')


def extract_emails(resume_text):
return [Link](email_regex, resume_text)

df['email'] = [Link](lambda x: extract_emails(x))

[Link]()

Out[5]:
path text name phone email

resumes-
list/resume- (123)
justin.green11@[Link]\n(123) Github
0 example- 456- [justin.green11@[Link]]
456-7890\nWash... SKILLS
option- 7890
software-en...

resumes-
list/resume- (123)
Stephen Greet\nProject
1 example- Stephen 456- [stephen@[Link]]
Manager\nPMP certified p...
option- 7890
project-man...

resumes-
list/resume- (123)
Ashley Doyle,
2 example- Ashley 456- [[Link]@[Link]]
Esq\n\[Link]@[Link]\n\n(1...
option- 7890
[Link]

resumes-
list/resume- (123)
Stephen Greet\nSales
3 example- Stephen 456- [stephen@[Link]]
Associate\n\nWork Experie...
option- 7890
[Link]

resumes-
list/data- (123)
KANDICE LOUDOR\n\nDATA
4 scientist- Github 456- [kloudor@[Link]]
SCIENTIST\n\nCONTACT\n\...
resume- 7890
[Link]

Extract school

In [6]: school_keywords = [
'school',
'college',
'university',
'academy',
'faculty',
'institute',
'diploma',
]

def extract_education(input_text):
[Link] Page 5 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

def extract_education(input_text):
organizations = []

for sent in nltk.sent_tokenize(input_text):


for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent
if hasattr(chunk, 'label'): #and [Link]() == 'ORGANIZATIO
[Link](' '.join(c[0] for c in [Link]

education = set()
for org in organizations:
for word in school_keywords:
if [Link]().find(word) >= 0:
[Link](org)

return education

df['school'] = [Link](lambda x: extract_education(x))

df

Out[6]:
path text name phone

resumes-list/resume- (123)
justin.green11@[Link]\n(123) Github
0 example-option- 456- [justin.green11@[Link]
456-7890\nWash... SKILLS
software-en... 7890

resumes-list/resume- (123)
Stephen Greet\nProject
1 example-option- Stephen 456- [stephen@[Link]
Manager\nPMP certified p...
project-man... 7890

resumes-list/resume- (123)
Ashley Doyle,
2 example-option- Ashley 456- [[Link]@[Link]
Esq\n\[Link]@[Link]\n\n(1...
[Link] 7890

resumes-list/resume- (123)
Stephen Greet\nSales
3 example-option- Stephen 456- [stephen@[Link]
Associate\n\nWork Experie...
[Link] 7890

resumes-list/data- (123)
KANDICE LOUDOR\n\nDATA
4 scientist-resume- Github 456- [kloudor@[Link]
SCIENTIST\n\nCONTACT\n\...
[Link] 7890

resumes-list/full-stack- (123)
ALEKS LUDKEE\nFull-Stack
5 developer-resume- ALEKS 456- [[Link]@[Link]
Developer\n\nludkee.a...
examp... 7890

resumes- Mobile: +1 (709) 986-7643\nWebsite:


6 Education None [sikamal@mun
list/shawon_resume.pdf [Link]

Niantic
resumes-list/entry- Data (123)
Trish Mathers\nEntry-Level Data
7 level-data-scientist- Scientist 456- [tmathers@[Link]
Scientist\nInn...
resume... Intern 7890
Seattle

[Link] Page 6 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

resumes-list/resume- Stephen\nGreet\nWeb (123)


8 Stephen [stephen@[Link]
example-option- Developer\n\nWork Experien... 456-
college-stu... 7890

resumes-list/resume- (123)
ALICE LEWIS, APRN\n\nNurse San
9 example-option- 456- [alicelewis409@[Link]
Practitioner\n\nCON... Diego
[Link] 7890

Extract previous job titles

In [7]: df_job_titles = pd.read_csv('job_titles_set.csv')


df_job_titles.[Link]

Out[7]: array(['owner', 'manager', 'president', ...,


'corporate account executive', 'trade marketing',
'library director'], dtype=object)

In [8]: JOB_TITLE_DB = df_job_titles.[Link]

def extract_job_titles(input_text):
stop_words = set([Link]('english'))
word_tokens = [Link].word_tokenize(input_text)

#preprocessing
filtered_tokens = [w for w in word_tokens if w not in stop_words]
filtered_tokens = [w for w in word_tokens if [Link]()]

grams = list(map(' '.join, [Link](filtered_tokens, 2, 3)))

found_skills = set()

for i in filtered_tokens:
if [Link]() in JOB_TITLE_DB:
found_skills.add(i)

for i in grams:
if [Link]() in JOB_TITLE_DB:
found_skills.add(i)

return found_skills

df['job_titles'] = [Link](lambda x: extract_job_titles(x))

[Link]()

Out[8]:
path text name phone email

[Link] Page 7 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

resumes-
list/resume-
(123)
example- justin.green11@[Link]\n(123) Github
0 456- [justin.green11@[Link]]
option- 456-7890\nWash... SKILLS
7890
software-
en...

resumes-
list/resume-
(123)
example- Stephen Greet\nProject {Admin
1 Stephen 456- [stephen@[Link]]
option- Manager\nPMP certified p...
7890
project-
man...

resumes-
list/resume- (123)
Ashley Doyle,
2 example- Ashley 456- [[Link]@[Link]]
Esq\n\[Link]@[Link]\n\n(1...
option- 7890
[Link]

resumes-
list/resume- (123)
Stephen Greet\nSales {Johns
3 example- Stephen 456- [stephen@[Link]]
Associate\n\nWork Experie...
option- 7890
[Link]

resumes-
list/data- (123)
KANDICE LOUDOR\n\nDATA
4 scientist- Github 456- [kloudor@[Link]]
SCIENTIST\n\nCONTACT\n\...
resume- 7890
[Link]

Part 2: Evaluation
Calculate similarity between job description and resume

[Link] Page 8 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [9]: job_description = open("job_description.txt", "r").read()


job_description

Out[9]: "Software Developer\nLocation: St. John's;\n\nEach day our Software D


evelopers get to work on challenging problems. No two days are the sa
me, each day you’ll collaborate with other Software Developers to pro
blem solve and write code that has an impact in the real world. Our p
roduct, Verafin, helps fight crime by stopping fraud and money launde
ring. Stopping the flow of this money means stopping crimes such as h
uman trafficking, elder abuse, and drug trafficking. Our Software Dev
elopers get the opportunity to move around the business as there are
new teams and projects developed all the time to help us towards our
mission of stopping crime. Being a Software Developer at Verafin mean
s getting the opportunity to have an impact on criminal activity by g
etting to do what you love – solve cool problems using code.\n\nEssen
tial Skills & Qualifications\nA university degree or college diploma
in Computer Engineering, Computer Science, or a combination of educat
ion and previous experience would be considered\nStrong analytical sk
ills for complex and creative problem solving\nExperience in object-o
riented software development \nAutomated testing\nExcellent int
erpersonal and organizational skills; able to work closely with team
members\nWould be good to have experience in a few of the following a
reas\nJava\nExperience using JavaScript, CSS, REST\nPrevious experien
ce working with Core Banking Systems\nAmazon Web Services\nIntelligen
t systems, artificial intelligence and data science\nDistributed comp
uting\nDatabase technologies (PostgresSQL)\nBig data technologies\nDa
ta extraction, manipulation/cleansing and integration \n\n\nIndustry
and on-the-job training is provided for all roles at Verafin. \n\n\u2
00bVerafin places a high value on building a diverse team, candidates
of all backgrounds are encouraged to apply.\n\nMobile devices are not
supported for job applications currently. Please apply using a deskto
p device for the best user experience.\n\nPlease note: we frequently
see our jobs posted on job aggregators, which are essentially search
engines for jobs. Generally those sites ask you to use their sites to
apply for the posted job and they do not send us the application. As
a reminder, the the only way to apply for a job with Verafin is on ou
r site [Link]/careers. We look forward to reviewing your app
lication."

[Link] Page 9 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [10]: new_row = [Link]({'path':'job_description', 'text': job_description


df = [Link]([new_row, df]).reset_index(drop = True)

[Link]()

Out[10]:
path text name phone email

Software Developer\nLocation: St.


0 job_description NaN NaN NaN
John's;\n\nE...

resumes-
list/resume- (123)
justin.green11@[Link]\n(123) Github
1 example- 456- [justin.green11@[Link]]
456-7890\nWash... SKILLS
option- 7890
software-en...

resumes-
list/resume- (123)
Stephen Greet\nProject
2 example- Stephen 456- [stephen@[Link]]
Manager\nPMP certified p...
option-project- 7890
man...

resumes-
list/resume- (123)
Ashley Doyle,
3 example- Ashley 456- [[Link]@[Link]]
Esq\n\[Link]@[Link]\n\n(1...
option- 7890
[Link]

resumes-
list/resume- (123)
Stephen Greet\nSales
4 example- Stephen 456- [stephen@[Link]]
Associate\n\nWork Experie...
option- 7890
[Link]

[Link] Page 10 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [11]: # Remove stop words and punctuations from text


stop_words_l=[Link]('english')

df['text_cleaned']=[Link](lambda x: " ".join([Link](r'[^a-zA-Z]'

tfidfvectoriser=TfidfVectorizer()
[Link](df.text_cleaned)
tfidf_vectors=[Link](df.text_cleaned)

similarities=[Link](tfidf_vectors,tfidf_vectors.T).toarray()

for i in range(len(similarities[0])):
[Link][i, "similarity"] = similarities[0][i]

df.sort_values(by='similarity', ascending=False, inplace=True)

df = [Link](0)
df.reset_index(drop=True, inplace=True)

df

Out[11]:
path text name phone

resumes-list/full-stack- (123)
ALEKS LUDKEE\nFull-Stack
0 developer-resume- ALEKS 456- [[Link]@[Link]
Developer\n\nludkee.a...
examp... 7890

resumes- Mobile: +1 (709) 986-7643\nWebsite:


1 Education None [sikamal@mun
list/shawon_resume.pdf [Link]

resumes-list/resume- (123)
justin.green11@[Link]\n(123) Github
2 example-option- 456- [justin.green11@[Link]
456-7890\nWash... SKILLS
software-en... 7890

resumes-list/resume- (123)
Stephen\nGreet\nWeb
3 example-option- Stephen 456- [stephen@[Link]

Ranking Output

[Link] Page 11 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [12]: df[['path', 'name', 'email', 'similarity']]

Out[12]:
path name email similarity

resumes-list/full-stack-developer-
0 ALEKS [[Link]@[Link]] 0.143581
resume-examp...

1 resumes-list/shawon_resume.pdf Education [sikamal@[Link]] 0.138904

resumes-list/resume-example-
2 Github SKILLS [justin.green11@[Link]] 0.101460
option-software-en...

resumes-list/resume-example-
3 Stephen [stephen@[Link]] 0.079581
option-college-stu...

resumes-list/resume-example-
4 Stephen [stephen@[Link]] 0.079037
option-project-man...

resumes-list/data-scientist-resume-
5 Github [kloudor@[Link]] 0.052557
[Link]

resumes-list/entry-level-data- Niantic Data Scientist


6 [tmathers@[Link]] 0.050098
scientist-resume... Intern Seattle

resumes-list/resume-example-
7 San Diego [alicelewis409@[Link]] 0.030303
[Link]

resumes-list/resume-example-
8 Stephen [stephen@[Link]] 0.028344
[Link]

resumes-list/resume-example-
9 Ashley [[Link]@[Link]] 0.021063
[Link]

[Link] Page 12 of 12

You might also like