0% found this document useful (0 votes)
6 views

code-output

Uploaded by

Rakshit Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

code-output

Uploaded by

Rakshit Anand
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

final-project - Jupyter Notebook 2021-12-09, 2:30 AM

Automated Resume Screening


COMP 4750: Natural Language Processing
Shawon Ibn Kamal, 201761376

In [1]: from os import path


from glob import glob

from pdfminer.high_level import extract_text

import nltk
from nltk.corpus import stopwords
import re
import subprocess

import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer


from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import euclidean_distances

Part 1: Parsing
Read resume pdf

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 1 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [2]: mypath = "resumes-list"

def find_ext(dr, ext):


return glob(path.join(dr,"*.{}".format(ext)))

resumepaths = find_ext(mypath, "pdf")

df = pd.DataFrame (resumepaths, columns = ['path'])

df['text'] = df['path'].apply(lambda x: extract_text(x))

df.head()

Out[2]:
path text

resumes-list/resume-example-option-
0 [email protected]\n(123) 456-7890\nWash...
software-en...

resumes-list/resume-example-option-project-
1 Stephen Greet\nProject Manager\nPMP certified p...
man...

resumes-list/resume-example-option-
2 Ashley Doyle, Esq\n\[email protected]\n\n(1...
attorney.pdf

resumes-list/resume-example-option-
3 Stephen Greet\nSales Associate\n\nWork Experie...
sales.pdf

resumes-list/data-scientist-resume- KANDICE LOUDOR\n\nDATA


4 example.pdf SCIENTIST\n\nCONTACT\n\...

Retrieve candidate name

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 2 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [3]: def extract_names(txt):


person_names = []

for sent in nltk.sent_tokenize(txt):


for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent
if hasattr(chunk, 'label') and chunk.label() == 'PERSON':
person_names.append(
' '.join(chunk_leave[0] for chunk_leave in chunk.leave
)

return person_names

df['name'] = df.text.apply(lambda x: extract_names(x)[0])

df.head()

Out[3]:
path text name

resumes-list/resume-example-option- [email protected]\n(123) 456- Github


0 software-en... 7890\nWash... SKILLS

resumes-list/resume-example-option- Stephen Greet\nProject Manager\nPMP


1 Stephen
project-man... certified p...

resumes-list/resume-example-option- Ashley Doyle,


2 Ashley
attorney.pdf Esq\n\[email protected]\n\n(1...

resumes-list/resume-example-option- Stephen Greet\nSales Associate\n\nWork


3 Stephen
sales.pdf Experie...

resumes-list/data-scientist-resume- KANDICE LOUDOR\n\nDATA


4 Github
example.pdf SCIENTIST\n\nCONTACT\n\...

Extract phone-number

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 3 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [4]: phone_regex = re.compile(r'[\+\(]?[1-9][0-9 .\-\(\)]{8,}[0-9]')

def extract_phone_number(resume_text):
phone = re.findall(phone_regex, resume_text)

if phone:
number = ''.join(phone[0])

if resume_text.find(number) >= 0 and len(number) < 16:


return number
return None

df['phone'] = df.text.apply(lambda x: extract_phone_number(x))

df.head()

Out[4]:
path text name phone

resumes-list/resume-example- [email protected]\n(123) 456- Github (123)


0 option-software-en... 7890\nWash... SKILLS 456-7890

resumes-list/resume-example- Stephen Greet\nProject Manager\nPMP (123)


1 Stephen
option-project-man... certified p... 456-7890

resumes-list/resume-example- Ashley Doyle, (123)


2 Ashley
option-attorney.pdf Esq\n\[email protected]\n\n(1... 456-7890

resumes-list/resume-example- Stephen Greet\nSales (123)


3 Stephen
option-sales.pdf Associate\n\nWork Experie... 456-7890

resumes-list/data-scientist- KANDICE LOUDOR\n\nDATA (123)


4 Github
resume-example.pdf SCIENTIST\n\nCONTACT\n\... 456-7890

Extract email

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 4 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [5]: email_regex = re.compile(r'[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+')


def extract_emails(resume_text):
return re.findall(email_regex, resume_text)

df['email'] = df.text.apply(lambda x: extract_emails(x))

df.head()

Out[5]:
path text name phone email

resumes-
list/resume- (123)
[email protected]\n(123) Github
0 example- 456- [[email protected]]
456-7890\nWash... SKILLS
option- 7890
software-en...

resumes-
list/resume- (123)
Stephen Greet\nProject
1 example- Stephen 456- [[email protected]]
Manager\nPMP certified p...
option- 7890
project-man...

resumes-
list/resume- (123)
Ashley Doyle,
2 example- Ashley 456- [[email protected]]
Esq\n\[email protected]\n\n(1...
option- 7890
attorney.pdf

resumes-
list/resume- (123)
Stephen Greet\nSales
3 example- Stephen 456- [[email protected]]
Associate\n\nWork Experie...
option- 7890
sales.pdf

resumes-
list/data- (123)
KANDICE LOUDOR\n\nDATA
4 scientist- Github 456- [[email protected]]
SCIENTIST\n\nCONTACT\n\...
resume- 7890
example.pdf

Extract school

In [6]: school_keywords = [
'school',
'college',
'university',
'academy',
'faculty',
'institute',
'diploma',
]

def extract_education(input_text):
http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 5 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

def extract_education(input_text):
organizations = []

for sent in nltk.sent_tokenize(input_text):


for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent
if hasattr(chunk, 'label'): #and chunk.label() == 'ORGANIZATIO
organizations.append(' '.join(c[0] for c in chunk.leaves

education = set()
for org in organizations:
for word in school_keywords:
if org.lower().find(word) >= 0:
education.add(org)

return education

df['school'] = df.text.apply(lambda x: extract_education(x))

df

Out[6]:
path text name phone

resumes-list/resume- (123)
[email protected]\n(123) Github
0 example-option- 456- [[email protected]
456-7890\nWash... SKILLS
software-en... 7890

resumes-list/resume- (123)
Stephen Greet\nProject
1 example-option- Stephen 456- [[email protected]
Manager\nPMP certified p...
project-man... 7890

resumes-list/resume- (123)
Ashley Doyle,
2 example-option- Ashley 456- [[email protected]
Esq\n\[email protected]\n\n(1...
attorney.pdf 7890

resumes-list/resume- (123)
Stephen Greet\nSales
3 example-option- Stephen 456- [[email protected]
Associate\n\nWork Experie...
sales.pdf 7890

resumes-list/data- (123)
KANDICE LOUDOR\n\nDATA
4 scientist-resume- Github 456- [[email protected]
SCIENTIST\n\nCONTACT\n\...
example.pdf 7890

resumes-list/full-stack- (123)
ALEKS LUDKEE\nFull-Stack
5 developer-resume- ALEKS 456- [[email protected]
Developer\n\nludkee.a...
examp... 7890

resumes- Mobile: +1 (709) 986-7643\nWebsite:


6 Education None [sikamal@mun
list/shawon_resume.pdf https://sh...

Niantic
resumes-list/entry- Data (123)
Trish Mathers\nEntry-Level Data
7 level-data-scientist- Scientist 456- [[email protected]
Scientist\nInn...
resume... Intern 7890
Seattle

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 6 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

resumes-list/resume- Stephen\nGreet\nWeb (123)


8 Stephen [[email protected]
example-option- Developer\n\nWork Experien... 456-
college-stu... 7890

resumes-list/resume- (123)
ALICE LEWIS, APRN\n\nNurse San
9 example-option- 456- [[email protected]
Practitioner\n\nCON... Diego
nurse.pdf 7890

Extract previous job titles

In [7]: df_job_titles = pd.read_csv('job_titles_set.csv')


df_job_titles.title.values

Out[7]: array(['owner', 'manager', 'president', ...,


'corporate account executive', 'trade marketing',
'library director'], dtype=object)

In [8]: JOB_TITLE_DB = df_job_titles.title.values

def extract_job_titles(input_text):
stop_words = set(nltk.corpus.stopwords.words('english'))
word_tokens = nltk.tokenize.word_tokenize(input_text)

#preprocessing
filtered_tokens = [w for w in word_tokens if w not in stop_words]
filtered_tokens = [w for w in word_tokens if w.isalpha()]

grams = list(map(' '.join, nltk.everygrams(filtered_tokens, 2, 3)))

found_skills = set()

for i in filtered_tokens:
if i.lower() in JOB_TITLE_DB:
found_skills.add(i)

for i in grams:
if i.lower() in JOB_TITLE_DB:
found_skills.add(i)

return found_skills

df['job_titles'] = df.text.apply(lambda x: extract_job_titles(x))

df.head()

Out[8]:
path text name phone email

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 7 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

resumes-
list/resume-
(123)
example- [email protected]\n(123) Github
0 456- [[email protected]]
option- 456-7890\nWash... SKILLS
7890
software-
en...

resumes-
list/resume-
(123)
example- Stephen Greet\nProject {Admin
1 Stephen 456- [[email protected]]
option- Manager\nPMP certified p...
7890
project-
man...

resumes-
list/resume- (123)
Ashley Doyle,
2 example- Ashley 456- [[email protected]]
Esq\n\[email protected]\n\n(1...
option- 7890
attorney.pdf

resumes-
list/resume- (123)
Stephen Greet\nSales {Johns
3 example- Stephen 456- [[email protected]]
Associate\n\nWork Experie...
option- 7890
sales.pdf

resumes-
list/data- (123)
KANDICE LOUDOR\n\nDATA
4 scientist- Github 456- [[email protected]]
SCIENTIST\n\nCONTACT\n\...
resume- 7890
example.pdf

Part 2: Evaluation
Calculate similarity between job description and resume

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 8 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [9]: job_description = open("job_description.txt", "r").read()


job_description

Out[9]: "Software Developer\nLocation: St. John's;\n\nEach day our Software D


evelopers get to work on challenging problems. No two days are the sa
me, each day you’ll collaborate with other Software Developers to pro
blem solve and write code that has an impact in the real world. Our p
roduct, Verafin, helps fight crime by stopping fraud and money launde
ring. Stopping the flow of this money means stopping crimes such as h
uman trafficking, elder abuse, and drug trafficking. Our Software Dev
elopers get the opportunity to move around the business as there are
new teams and projects developed all the time to help us towards our
mission of stopping crime. Being a Software Developer at Verafin mean
s getting the opportunity to have an impact on criminal activity by g
etting to do what you love – solve cool problems using code.\n\nEssen
tial Skills & Qualifications\nA university degree or college diploma
in Computer Engineering, Computer Science, or a combination of educat
ion and previous experience would be considered\nStrong analytical sk
ills for complex and creative problem solving\nExperience in object-o
riented software development \nAutomated testing\nExcellent int
erpersonal and organizational skills; able to work closely with team
members\nWould be good to have experience in a few of the following a
reas\nJava\nExperience using JavaScript, CSS, REST\nPrevious experien
ce working with Core Banking Systems\nAmazon Web Services\nIntelligen
t systems, artificial intelligence and data science\nDistributed comp
uting\nDatabase technologies (PostgresSQL)\nBig data technologies\nDa
ta extraction, manipulation/cleansing and integration \n\n\nIndustry
and on-the-job training is provided for all roles at Verafin. \n\n\u2
00bVerafin places a high value on building a diverse team, candidates
of all backgrounds are encouraged to apply.\n\nMobile devices are not
supported for job applications currently. Please apply using a deskto
p device for the best user experience.\n\nPlease note: we frequently
see our jobs posted on job aggregators, which are essentially search
engines for jobs. Generally those sites ask you to use their sites to
apply for the posted job and they do not send us the application. As
a reminder, the the only way to apply for a job with Verafin is on ou
r site www.verafin.com/careers. We look forward to reviewing your app
lication."

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 9 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [10]: new_row = pd.DataFrame({'path':'job_description', 'text': job_description


df = pd.concat([new_row, df]).reset_index(drop = True)

df.head()

Out[10]:
path text name phone email

Software Developer\nLocation: St.


0 job_description NaN NaN NaN
John's;\n\nE...

resumes-
list/resume- (123)
[email protected]\n(123) Github
1 example- 456- [[email protected]]
456-7890\nWash... SKILLS
option- 7890
software-en...

resumes-
list/resume- (123)
Stephen Greet\nProject
2 example- Stephen 456- [[email protected]]
Manager\nPMP certified p...
option-project- 7890
man...

resumes-
list/resume- (123)
Ashley Doyle,
3 example- Ashley 456- [[email protected]]
Esq\n\[email protected]\n\n(1...
option- 7890
attorney.pdf

resumes-
list/resume- (123)
Stephen Greet\nSales
4 example- Stephen 456- [[email protected]]
Associate\n\nWork Experie...
option- 7890
sales.pdf

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 10 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [11]: # Remove stop words and punctuations from text


stop_words_l=stopwords.words('english')

df['text_cleaned']=df.text.apply(lambda x: " ".join(re.sub(r'[^a-zA-Z]'

tfidfvectoriser=TfidfVectorizer()
tfidfvectoriser.fit(df.text_cleaned)
tfidf_vectors=tfidfvectoriser.transform(df.text_cleaned)

similarities=np.dot(tfidf_vectors,tfidf_vectors.T).toarray()

for i in range(len(similarities[0])):
df.loc[i, "similarity"] = similarities[0][i]

df.sort_values(by='similarity', ascending=False, inplace=True)

df = df.drop(0)
df.reset_index(drop=True, inplace=True)

df

Out[11]:
path text name phone

resumes-list/full-stack- (123)
ALEKS LUDKEE\nFull-Stack
0 developer-resume- ALEKS 456- [[email protected]
Developer\n\nludkee.a...
examp... 7890

resumes- Mobile: +1 (709) 986-7643\nWebsite:


1 Education None [sikamal@mun
list/shawon_resume.pdf https://sh...

resumes-list/resume- (123)
[email protected]\n(123) Github
2 example-option- 456- [[email protected]
456-7890\nWash... SKILLS
software-en... 7890

resumes-list/resume- (123)
Stephen\nGreet\nWeb
3 example-option- Stephen 456- [[email protected]

Ranking Output

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 11 of 12
final-project - Jupyter Notebook 2021-12-09, 2:30 AM

In [12]: df[['path', 'name', 'email', 'similarity']]

Out[12]:
path name email similarity

resumes-list/full-stack-developer-
0 ALEKS [[email protected]] 0.143581
resume-examp...

1 resumes-list/shawon_resume.pdf Education [[email protected]] 0.138904

resumes-list/resume-example-
2 Github SKILLS [[email protected]] 0.101460
option-software-en...

resumes-list/resume-example-
3 Stephen [[email protected]] 0.079581
option-college-stu...

resumes-list/resume-example-
4 Stephen [[email protected]] 0.079037
option-project-man...

resumes-list/data-scientist-resume-
5 Github [[email protected]] 0.052557
example.pdf

resumes-list/entry-level-data- Niantic Data Scientist


6 [[email protected]] 0.050098
scientist-resume... Intern Seattle

resumes-list/resume-example-
7 San Diego [[email protected]] 0.030303
option-nurse.pdf

resumes-list/resume-example-
8 Stephen [[email protected]] 0.028344
option-sales.pdf

resumes-list/resume-example-
9 Ashley [[email protected]] 0.021063
option-attorney.pdf

http://localhost:8888/notebooks/Documents/GitHub/resume-evaluation-nlp/final-project.ipynb#Part-1:-Parsing Page 12 of 12

You might also like