LN and ML-based Model Architecture For Recruiting IT Professionals
LN and ML-based Model Architecture For Recruiting IT Professionals
2298/CSIS123456789X
1
161. Introduction
17Personnel selection is the process of obtaining the quantity and quality of employees
18needed for the business and involves a large number of activities (planning,
19recruitment, selection and incorporation of new employees).
20 [16] indicates that one of the disadvantages of the recruitment process is the cost of
21operation related to the application of appropriate selection techniques, that is,
22choosing the candidate that meets the requirements of the position offered is a
23complicated task because it implies that the Human Resources area invests large
24resources, distributed among activities such as: review of profiles, filtering and
25personal interviews.
26 Human resources management and the problems they present are being addressed
27by Artificial Intelligence (IA) and its branches. For example, in the literature review
28of [6], the author shows us that AI offers a diverse set of suggestions of how specific
29AI techniques could be applied to specific Human Resources tasks.
30 An example of the aforementioned is reflected in the proposal of [4], in which they
31address the problem of candidate classification with the help of Machine Learning.
32For this purpose, they evaluated algorithms (linear regression, M5 model tree, REP
33decision tree and support vector machine) of supervised learning in combination with
34a semantic skill matching mechanism to achieve automated electronic recruitment.
3
12 First Author et al.
22We analyzed a total of 20 investigations and divided them into 3 categories according
23to the techniques applied: Machine learning, Natural Language Processing and
24Semantic Correspondence.
26[4] proposes a system for candidate selection through the analysis of the candidate’s
27LinkedIn and blogger profile. For this purpose, they evaluated supervised learning
28algorithms (linear regression, M5 model tree, REP decision tree and support vector
29machine) and combined them with a semantic skill matching mechanism.
30Supported by the strengths of semantic knowledge (concept similarity) and the
31strengths of Machine Learning methods, [3] propose a scalable and stateless
32architecture for an automated Human Capital Management system and with which
33they seek to recommend jobs to a candidate and vice versa, recommend candidates for
34a company.
35 A recommendation system that uses a Gradient Boosting Decision Tree (GBDT)
36and a hybrid convolutional neural network model to compute a correlation between a
37job seeker and a job offer with the goal of improving the quality of human resource
38recommendation is proposed by [17].
39 [20] proposes a convolutional neural network model with the objective of solving
40the person-job matching problem. The authors’ proposal is a neural network that
41learns the joint representation of person-job fit from historical job applications.
2
1 Authors’ Instructions 3
2
3
1 [11] proposes an architecture for automation through recommendation using
2machine learning and statistical methods. The authors’ proposal is an extension of the
3research of [3] in which they aim to achieve better system robustness and
4recommendation quality by implementing features such as candidate career interests,
5scoring functions for academic information and professional experience, string
6matching, etc.
7 [15] presents an automated Machine Learning-based model for CV
8recommendation. In which, a CV goes through preprocessing for cleaning and feature
9extraction using the TF-IDF approach and subsequently through the classification
10model is assigned to a category.
11 In the recruitment process, recruiters do not focus exclusively on a person’s
12technical skills to determine their sustainability for an offered position, but also take
13into account characteristics such as education, personality, experience, etc.
4
14 First Author et al.
23A job recommendation system based on user profile is proposed by [8], in which they
24also seek to predict career advancement from the user’s work history.
25A content-based recommendation algorithm that extends and updates the Minkowski
26distance is proposed by [1], with the objective of matching people and jobs. The
2
1 Authors’ Instructions 5
2
3
1authors’ proposal quantifies the sustainability of a searcher/candidate by analyzing a
2structured form of the candidate’s job and profile created from the content analysis of
3the unstructured form of these.
4 [7] proposes a Resume Matching System called ResuMatcher, which determines
5the sustainability of a job by calculating the similarity between the models generated
6from the resume and the job description.
7 A career path recommendation system that relies on text mining and collaborative
8filtering techniques and also recommends skills based on related job offers generated
9from the user’s profile skills is proposed by [13].
10 [12] proposes a candidate recommendation system called Smart Applicant Ranker;
11in it, they use ontologies to compare CV models (consisting of education, work
12experience and skills) and job requirement models to find the best candidates based on
13the similarity of the generated ontological models.
14 A bidirectional semantic correspondence system is proposed by [2] to measure the
15degree of semantic similarity between the skills and qualifications of a job seeker and
16an offered vacancy. In addition, they apply machine learning techniques for
17bidirectional matching of job vacancies and occupational standards to improve the
18content of job vacancies and job seeker profiles based on social network analysis and
19occupational standards.
20 [18] propose the use of weighted tree algorithms to calculate the similarity
21between job advertisements and keywords or criteria used by job seekers.
22 [14] propose an ontology-based (most relevant) job recommendation system that is
23built from the basic information collected and the list of favorite and viewed jobs by
24the user.
25 In the proposals of [8], [1], [12], [2], [18] and [14] the authors propose solutions
26that require the information to be analyzed to have a certain structure. On the other
27hand, the proposals of [7] and [13] apply unstructured analysis, taking into account
28that the information contained in a CV does not present a unique style or format.
29 Table 2. Format of the information to be processed
Information to process
Author(s) Structured Not Structured
[Error: X
Reference source
not found]
[Error: X
Reference source
not found]
[Error: X
Reference source
not found]
[Error: X
Reference source
not found]
[Error: X
Reference source
not found]
4
16 First Author et al.
[Error: X
Reference source
not found]
[Error: X
Reference source
not found]
[Error: X
Reference source
not found]
1 [8], [7] and [2] present proposals that approach the selection problem from the
2perspective of similarity between a candidate’s CV/profile and the vacancy/position
3offered. In contrast, [18] addresses the problem through the similarity of the content of
4a job offer and the search keywords used by a user.
5 Although the proposals of [8], [7] and [2] address the same similarity approach,
6each one presents some peculiarity. In the proposal put forward by [8],
7recommendation based on the content of the candidate’s work history is applied. [7]
8rely on the qualifications, skills and work experience described in the candidate’s CV
9and those required in the job offer and generate recommendations based on the
10similarity between them. Finally, [2] take into account the similarity of qualifications
11and skills and also take into account the candidate’s connections since their testimony
12enhances the process of evaluating whether or not a candidate is suitable for a
13vacancy.
14
15 Table 3. Data source
Author(s) Data Source Quantity
[Error: LinkedIn 2400
Reference
source not
found]
[Error: Kaggle 100
Reference
source not
found]
[Error: Indeed 1000
Reference
source not
found]
[Error: Universidad Estatal de 1000
Reference San José
source not
found]
[Error: - -
Reference
source not
found]
[Error: Not specified 175
2
1 Authors’ Instructions 7
2
3
Reference
source not
found]
[Error: Not specified 100
Reference
source not
found]
[Error: - -
Reference
source not
found]
2An online recruitment system that exploits multiple semantic resources and uses
3statistical measures of concepts relatedness is proposed by [10]. Moreover, it relies on
4PLN to identify and extract possible concept lists from job postings and candidate
5CVs.
6 [9] propose a solution focused on job matching for older workers. In this solution,
7from the description entered in the system search engine, keywords are extracted from
8the text after tokenizing sentences and filtering words based on morphological
9analysis. Then, based on the top 10 keywords, the search for related job offer
10documents is performed.
11 To solve the resume-job offer matching problem of job portals [19] pose a hybrid
12approach and incorporate the use of resume categorization to reduce the dataset to be
13analyzed, that is, instead of evaluating the total resumes, the analysis is only applied
14to resumes that fall within the category described in the job offer.
15 To cover the problem of CV retrieval based on the description of a job offer, [5]
16propose the use of the average word embedding (AWE) model and the Principal
17Component (PCA) algorithm to solve the dimensionality problem that AWE can
18present.
19
20 Table 4. Weighting techniques applied in proposals using PLN
21
Author(s Technique/Approach
) Weighting
[Error: TF-IDF
Reference
source not
found]
[Error: BM25
Reference
source not
found]
[Error: TF-IDF
Reference
4
18 First Author et al.
source not
found]
[Error: AWE
Reference
source not
found]
1
2In the proposals of [10], [9], [19] and [5], we could appreciate different techniques
3applied to information retrieval, as shown in Table 4, [9] applied that TF-IDF
4weighting scheme to eliminate concepts that do not present significant value. [9] made
5use of Solr/lucene scores of the BM25 algorithm, which performs scoring based on
6term frequency and document length normalization. [9], relied on the TF-IDF
7technique, which subsequently performs concept list filtering/refinement by removing
8concepts with low weights assigned by this technique. On the other hand, [9] indicate
9that classical information retrieval models such as Bag of Word (BOW) and BM25
10have certain weaknesses and require complementary techniques such as latent
11semantic indexing (LSI). Therefore, they rely on the average word embeddings
12(AWE) models.
13
2
1 Authors’ Instructions 9
2
3
1
2 Figure 1. Model Architecture
3 Figure 1 shows the architecture of our model and its components:
4 Data form
5 Pre-processing module
6 Categorization module
7 Matching module
9 It represents the core of the system and is the component that receives the necessary
10information for the model to work. Through it, the actors (applicant and candidate)
11initiate the behavior of the model, since they provide the data that pass through each
12of the components of the model and ultimately generate a ranking of candidates for
13the job offer entered or a ranking of job offers for the CV entered.
4
110 First Author et al.
2 In this component, the corpus of the text entered in the skills section goes through a
3cleaning process, through which we detect and eliminate those punctuation marks or
4symbols that do not provide context-relevant meaning or that cause an IT skill not to
5be detected.
6
7 Figure 2. Skills corpus cleaning
8
9In figure 2, we present the proposed flowchart for data cleaning. Since in our skills
10detection process we rely on an IT dictionary, it is necessary to ensure that an IT skill
11(contained in the skills section of each form) does not contain characters that would
12cause the omission of this skill during the process. Therefore, the first step to follow is
13the conversion of the text of the skills section into a list of characters. After that, we
14parse each element of the generated list and remove the signs and symbols. Finally,
15we rejoin this list of characters and obtain a clean corpus to process.
16 An important element in this module is Word2vec, which is a neural network
17composed of an input layer, a hidden layer and an output layer that allows us to
18calculate the semantic relationship between words in a given context. Taking into
19account the above, we take advantage of this tool and train it with IT skills.
20 This model helps us to fulfill the objective of this module, which is to obtain a
21subset of skills with a strong semantic relationship and thus, reduce the number of
22queries to be made later in the categorization module. This is under the premise that a
23set of strongly related skills will result in an equally related number of IT occupations.
24
26 With this module we obtain the IT occupations related to each of the skills detected
27in the previous module. These occupations help us to categorize the document (job
2
1 Authors’ Instructions 11
2
3
1offer or CV) that is being processes and also serve to reduce the volume of data to be
2worked with in the next module.
3
4 Table 5. IT dictionary excerpt
5
IT Skill IT Professions
Expressjs backend, js developer
Extjs frontend, js developer
Firebase backend, mobile
developer
Flask python developer,
backend, web developer
6 Table 5 shows a small excerpt of how the IT dictionary is composed.
7
8 An IT skill is not exclusive to one profession and that is why during the
9consultation of our IT skill dictionary it is possible that there are one or more IT skills
10that have in common one or more IT professions/occupations.
11 Taking into account the above, during each query to our dictionary we assign a
12frequency value. Then, at the end of the query process, we calculate the average
13frequency and categorize the document under evaluation (job offer or CV) with those
14professions that have a value greater than or equal to the average.
15
17 In this module, in case a job offer is being processed, the list of professional
18categories obtained is taken and for each of these, the CVs of the same category are
19extracted from the database. In case a profile or CV is being processed, the documents
20extracted from the database will be job offers.
21 With the set of documents obtained, a data table is built. This data table has as
22column headers the IT skills detected from the filtered set and the item being
23processed, each row will be represented by a profile or CV, where each row – column
24intersection will have a value that depends on the following conditions:
25
26 • 0 will be assigned if the CV does not possess the IT skill
27 described in the column.
28 • 1 will be assigned if the CV possesses the IT skill described in
29 the column.
30 • 2 will be assigned if the CV contains the IT skill described in
31 the column and it matches one of the requirements of the job
32 offer.
33
34 In case a profile or CV is being processed, the criteria are the same, with the
35difference that each row will be represented by a job offer.
36This data table represents the input for clustering. The unsupervised Mean-shift
37algorithm is in charge of analyzing this set and assigning a group or cluster number to
4
112 First Author et al.
1each one. This algorithm, unlike others, does not require a number of clusters to be
2assigned, but it iterates and analyzes each of the elements of the set and establishes the
3number of clusters. Once the process is finished, we have the number of clusters to
4which each element belongs. Of these, those that are in the same cluster as the
5document (job offer or CV) being processed represent the output of the clustering
6component. es el encargado de analizar este conjunto y asignar a cada uno un número
7de grupo o clúster.
8
10 Our final objective is to obtain a ranking of candidates; therefore, we order the CVs
11(obtained during clustering) based on the percentage of skills that a CV fulfills with
12respect to those specified in the job offer. Put differently, given a CVi, where i є N,
13which contains an HCV list of skills, and given the job offer, which contains the
14required skills (RS) and the desirable skills (DS). The percentage of RS (%RS) is
15calculated as the number of RS that are contained in HCV over the total number of
16HCV items.
17
18As an example, given a CV and a job offer with RS and DS. The percentage of RS and
19DS is calculated as follows:
20 HCV = [Java, Spring, JSF, Oracle, Android, Flutter, Spring Boot]
21 • n(HCV) = 7
22 • RS = [Java, Android, React, Flutter] %RS = 3/7 ≈ 42.8%
23 • DS = [Spring, Spring Boot] %DS = 2/7 ≈ 28.5%
25 In this section, for the evaluation and discussion of results, we used 200 job offers
26and 50 profiles or CVs. In addition, we rely on an IT dictionary which consists of 225
27skills, and the occupations associated with each of these.
28 As we indicated in the theoretical input chapter, out model consists of 3
29components: pre-processing, categorization and clustering. In this chapter we will
30show the results of processing a document (job offer or CV) by each of these
31components.
32
2
1 Authors’ Instructions 13
2
3
1 Table 6. CV: Pre-processing results
CV IT skills detected Most similar IT skills
cv_0001 9 = ['html', 'css', 'javascript', 8 = ['html5', 'css3',
'java', 'php', 'laravel', 'vuejs', 'javascript', 'php', 'vue.js', 'java',
'rxjava', 'spring'] 'spring', 'laravel']
cv_0002 13 = ['html', 'css', 'javascript', 9 = ['html5', 'css3',
'typescript', 'java', 'php', 'python', 'javascript', 'php', 'typescript',
'angular', 'nodejs', 'azure', 'react', 'angular', 'python', 'react',
'js', 'nestjs'] 'nodejs']
cv_ 16 = ['java', 'hibernate', 'jpa', 10 = ['java', 'spring', 'android',
0049 'mybatis', 'spring', 'spring', 'hibernate', 'mybatis',
'javascript', 'c', 'python', 'flask', 'javascript', 'html5', 'css3',
'html', 'css', 'datastage', 'sql', 'python', 'linux']
'linux', 'android']
cv_0050 11 = ['python', 'django', 'drf', 5 = ['android', 'java', 'css3',
'flask', 'angular', 'android', 'java', 'html5', 'javascript']
'css', 'html', 'js', 'net']
2
3 Table 6 shows the results obtained by pre-processing the skills section, at this point,
4Word2vec helps us to reduce the skills detected in the aforementioned section and as
5results, we obtain those IT skills that are more related to each other or, in other words,
6those that have a greater semantic relationship.
7
8 4.1.2 Case 2: job offer
9 On the other hand, in the case of a job offer, the sections that go through pre-
10processing are the required skills and desirable skills, since these include IT skills.
11
12 Table 7. Job offer: pre-processing results
13
Offer IT skills detected Most similar IT skills
Oferta_1 9 = ['html', 'css', 8 = ['html5', 'css3',
'javascript', 'nodejs', 'angular', 'javascript', 'php', 'nodejs',
'php', 'laravel', 'aws', 'azure'] 'laravel', 'aws', 'azure']
Oferta_2 13 = ['php', 'javascript', 10 = ['php', 'python',
'typescript', 'c#', 'xamarin', 'symfony', 'css3', 'javascript',
'python', 'symfony', 'django', 'html5', 'typescript', 'angular',
'html', 'css', 'aws', 'dynamo', 'c#', 'xamarin']
'angular']
4
114 First Author et al.
2
1 Authors’ Instructions 15
2
3
4
116 First Author et al.
2
3 With the subset obtained, as shown in Table 10, we created a ranking of the best
4job offers for each CV.
5 In the proposal made by [4], they employ semantic matching to calculate the
6distance between the candidate’s profile skills and experience with the job offer
7requirements. On the other hand, [2] use string matching to evaluate the
8correspondence between a vacancy (job offer) and a profile. In this type of methods, it
9does not consider that some IT skills can be represented in more than one form (Ex:
10Javascript can be found in some offers or profiles al JS). Therefore, in our proposal we
11create an IT dictionary to deal with this problem. Such a dictionary not only informs
12us about the occupations related to a skill, but also considers the various forms of
13writing with which this IT skill can be represented. The latter contributes to broaden
14the detection of skills and, thus, to obtain a better quality result.
15 In the study conducted by [19] they propose a method to automatically classify CVs
16to their respective job offers, they perform a categorization/labeling of the documents,
17with the objective of comparing only the elements of the same category. To this end,
18they combined two knowledge bases (DICE and O*NET) with which they obtained
19the occupation associated with each skill. On the other hand, in our proposal, we
20constructed an IT dictionary with 226 skills. In this dictionary, for each skill there is a
21set of associated IT occupations according to the current market. The latter is what
22differentiates us from the aforementioned proposal, since, unlike the author’s
2
1 Authors’ Instructions 17
2
3
1proposal, our proposal focuses exclusively on the IT area, using a knowledge base
2built manually for this purpose.
3
136. References
141. Almalis, N. D., Tsihrintzis, G. A., Karagiannis, N., & Strati, A. D. (2016). FoDRA - A
15 new content-based job recommendation algorithm for job seeking and recruiting. IISA
16 2015 - 6th International Conference on Information, Intelligence, Systems and
17 Applications.
182. Chala, S. A., Ansari, F., Fathi, M., & Tijdens, K. (2018). Semantic matching of job seeker
19 to vacancy: a bidirectional approach. International Journal of Manpower, 39(8), 1047–
20 1063.
213. Chaudhary, A., Jobanputra, M., Shah, S., Gandhi, R., Chaudhary, S., & Goswami, R.
22 (2018). Automated human capital management system. 12th Annual IEEE International
23 Systems Conference, SysCon 2018 - Proceedings, 1–8.
244. Faliagka, E., Iliadis, L., Karydis, I., Rigou, M., Sioutas, S., Tsakalidis, A., & Tzimas, G.
25 (2014). On-line consistent ranking on e-recruitment: Seeking the truth behind a well-
26 formed CV. Artificial Intelligence Review, 42(3), 515–528.
275. Fernández-Reyes, F. C., & Shinde, S. (2019). CV Retrieval System based on job
28 description matching using hybrid word embeddings. Computer Speech and Language, 56,
29 73–79.
306. Figueroa-García, J. C., Kalenatic, D., & López-Bello, C. A. (2015). Artificial Intelligent
31 Techniques in Human Resource Management. Intelligent Systems Reference Library, 87,
32 623–643.
337. Guo, S., Alamudun, F., & Hammond, T. (2016). RésuMatcher: A personalized résumé-job
34 matching system. Expert Systems with Applications, 60, 169–182.
358. Heap, B., Krzywicki, A., Wobcke, W., Bain, M., & Compton, P. (2014). Combining career
36 progression and profile matching in a job recommender system. Lecture Notes in
37 Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
38 Notes in Bioinformatics), 8862, 396–408.
399. Kaoru, S., Kenichi, S., Masatomo, K., & Atsuhi, H. (2017). Towards extracting recruiters’
40 tacit knowledge based on interactions with a job matching system. Lecture Notes in
41 Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
42 Notes in Bioinformatics), 10298, 557–568.
4
118 First Author et al.
110. Kmail, A. B., Maree, M., Belkhatir, M., & Alhashmi, S. M. (2016). An automatic online
2 recruitment system based on exploiting multiple semantic resources and concept-
3 relatedness measures. Proceedings - International Conference on Tools with Artificial
4 Intelligence, ICTAI, 2016-Janua, 620–627.
511. Mehta, M., Derasari, R., Patel, S., Kakadiya, A., Gandhi, R., Chaudhary, S., & Goswami,
6 R. (2019). A service-oriented human capital management recommendation platform.
7 SysCon 2019 - 13th Annual IEEE International Systems Conference, Proceedings, 1–8.
812. Mohamed, A., Bagawathinathan, W., Iqbal, U., Shamrath, S., & Jayakody, A. (2018).
9 Smart Talents Recruiter - Resume Ranking and Recommendation System. 2018 IEEE 9th
10 International Conference on Information and Automation for Sustainability, ICIAfS 2018,
11 1–5.
1213. Patel, B., Kakuste, V., & Eirinaki, M. (2017). CaPaR: A career path recommendation
13 framework. Proceedings - 3rd IEEE International Conference on Big Data Computing
14 Service and Applications, BigDataService 2017, 23–30.
1514. Rimitha, S. R., Abburu, V., Kiranmai, A., Marimuthu, C., & Chandrasekaran, K. (2019).
16 Improving Job Recommendation Using Ontological Modeling and User Profiles. 2019
17 15th International Conference on Information Processing: Internet of Things, ICINPRO
18 2019 - Proceedings.
1915. Roy, P. K., Chowdhary, S. S., & Bhatia, R. (2020). A Machine Learning approach for
20 automation of Resume Recommendation system. Procedia Computer Science, 167(2019),
21 2318–2327.
2216. Vallejo Chávez, L. M. (2016). Gestión del talento humano ESPOCH 2016.
2317. Wang, H., Liang, G., & Zhang, X. (2018). Feature Regularization and Deep Learning for
24 Human Resource Recommendation. IEEE Access, 6, 39415–39421.
2518. Wierfi, A. D., Utami, E., & Sunyoto, A. (2019). The application of extended weighted tree
26 similarity algorithm for similarity searching. 2019 International Conference on
27 Information and Communications Technology, ICOIACT 2019, 428–433.
2819. Zaroor, A., Maree, M., & Sabha, M. (2018). A Hybrid Approach to Conceptual
29 Classification and Ranking of Resumes and Their Corresponding Job Posts. International
30 Conference on Intelligent Decision Technologies, 2, 13–21.
3120. Zhu, C., Zhu, H., Xiong, H., Ma, C., Xie, F., Ding, P., & Li, P. (2018). Person-Job Fit.
32 ACM Transactions on Management Information Systems, 9(3), 1–17.
33