Harnessing Text and Web Analytics To Enhance Decision-Making in Job Opportunity Categorization
Harnessing Text and Web Analytics To Enhance Decision-Making in Job Opportunity Categorization
Keywords: Text analytics, Clustering, Job Opportunities, Predictive Analytics, Web Analytics
1 Introduction
In today’s era of globalisation, Big Data is frequently linked to an increase in real-time data acquired from social
media, and online portals such as job websites. According to Mezzanzanica and Mercorio [1] big data is a technology
that enables businesses to get value from massive volumes of data to better track the adoption of services in the
market. In other words, big data is concerned with analytical processes including a mix of data volume, velocity, and
diversity, which may involve advanced algorithms and multiple data types. In this research paper, a study on the
application of text analytics on online job postings is conducted.
2 Problem Statement
As Malaysia enters the rapid globalisation phase, more digital jobs are being offered hence why it is necessary
for fresh graduates to be highly skilled to fulfil the present industrial demands. 1 out of every 5 over 290,000 fresh
graduates each year remain unemployed after 6 months of graduating, accounting for up to 55% of those unemployed
[9]. The Department of Statistics Malaysia claimed that the unemployment rate increased from 3.3% in June 2019 to
4.7% in August 2020 [10]. Therefore, analytics on online postings are highly required to address some of the major
problems.
3 Proposed solution
In this research, text mining algorithms such as clustering will be utilized to gather data of online job postings
and job seekers behaviours from job portals as well as social media. The findings are then summarized in the form
of data visualization insights and displayed via an analytical dashboard. Particular competencies from job
descriptions are captured to identify the skills required for each area of job title available in the current job market
according to their location. The required data will be scraped from online job portals and social media, processed, as
well as filtered using a set of keywords related to the particular job title to address the problems stated above.
76
HARNESSING TEXT AND WEB ANALYTICS TO ENHANCE DECISION-MAKING IN JOB OPPORTUNITY CATEGORIZATION
3.1 Data Integration
Unstructured data cannot be easily merged and examined using a relational database management system
(RDBMS). The goal of this research is to bridge the gap between unstructured and organised data by converting
unstructured input into structured column values and mapping them to database entities. In this research, we propose
an unstructured data integration and analysis system that analyses online job postings using text analytic approaches
to extract important information.
77
SANTORINI SURABANI
The first visualization represents a frequency bar chart of job titles. From the bar chart, it can be analyzed that
the job title with the highest frequency of 8 is “Data Analyst” followed by “project manager” and “social media
manager” with a frequency of 4 each. “Business Analyst”, “Data Engineer” and “Digital Marketing” have the lowest
frequency each with a value of 1. In terms of the tree map, the job postings’ locations are all located in Malaysia
where Kuala Lumpur takes up the highest number of job postings with Johor Bahru staying closely behind.
78
HARNESSING TEXT AND WEB ANALYTICS TO ENHANCE DECISION-MAKING IN JOB OPPORTUNITY CATEGORIZATION
As for the pie chart, job reviews are being analysed in terms of what language they represent. Based on Figure 4
above, the job reviews are mostly in English with a frequency percentage of 83.33%. The other languages available
are Korean, Malay, French and Indonesian. The last visualization in the dashboard above is the word cloud. The word
cloud represents the keywords associated with job requirements. It can be seen that “skills”, “data” and “marketing”
are the most popular keywords among job postings.
Part of a sentiment analysis is to find the sentiment score of job reviews where the score of above 0.5 to 1 is
considered as positive sentiment. Figure 5 above, represents the job reviews with a positive sentiment score. The
highest sentiment score is 0.999854207 which is almost a perfect 1. This means that job seekers are highly satisfied
with the following job posting, hence the positive review. Figure 6 below represents web analytic insights on an
online job portal.
79
SANTORINI SURABANI
Based on Figure 6, among the four most common web browsers “Chrome” takes the lead with the highest number
of counts in terms of bounce rate. The bounce rate is obtained using the formula of dividing the total number of
sessions on the site by the number of single-page sessions. As for the average of time spent by job seekers on a page
by location, Malaysia takes up the highest average of time with an average of 30 minutes whereas Melaka has the
lowest average of time of 3 minutes.
According to Figure7 above, in terms of geospatial analysis a geo map is used to represent the countries from
where job seekers log in to the job website. It can be seen that the job seekers are all from Asian countries. Majority
of job seekers log in to the website from Malaysia thus making it the country with the highest frequency of 7 whereas
Korea and Thailand are the countries with the lowest frequency of 2.
80
HARNESSING TEXT AND WEB ANALYTICS TO ENHANCE DECISION-MAKING IN JOB OPPORTUNITY CATEGORIZATION
5.2 Web Crawling
Web crawler is a script that crawls the Internet in a systematic and automatic manner. These crawlers are
programmes that retrieve web pages and store them in a database. Crawlers create a replica of the recorded web
pages, which is subsequently processed by a search engine, which indexes the downloaded pages to aid in rapid
searches. In this paper, web crawling is used to crawl text of job seekers’ comments on particular job posts or tweets
on social media as well as job postings according to location, title, and requirements. Figure 8 below displays a flow
chart of the Web Crawler process.
81
SANTORINI SURABANI
5.5 Data Visualisation
As the world accelerates further into the “age of Big Data”, data visualization becomes a significant tool for
making sense of the unlimited number of rows of data created daily especially with job postings. For example, a word
cloud organises keywords by word frequency, then arranges them according to defined rules and visualises them with
graphic attributions such font size and colour. Due to its readability, understandability, and simplicity, word clouds
are the most commonly utilised technique when it comes to determining the current trends in keywords from job
descriptions.
7 Conclusion
In conclusion, by implementing text analytics, text data may be grouped with the goal of providing outcomes in
the form of word frequency distribution, pattern identification, and predictive analytics. Text analytics may create
one-of-a-kind values to use in the improvement of decision-making and business processes, as well as the
development of new business models. As for the dashboard, with the aid of visual trends, one may quickly and easily
determine what the ideal next step in making a decision is in a short amount of time as it simplifies the data and
makes it more shareable and available to access.
BIBLIOGRAPHY
[1]. M. Mezzanzanica and F. Mercorio, “Big Data enables Labor Market Intelligence,” Encyclopedia of Big
Data Technologies, pp. 226–236, 2019. doi:10.1007/978-3-319-77525-8_276.
[2]. B. Tucker, E. Santhanam, and E. Zaitseva, “Future directions and challenges in text analytics,” Analysing
Student Feedback in Higher Education, pp. 205–217, Dec. 2021. doi:10.4324/9781003138785-18.
[3]. T.-S. Nguyen, Z. Wu, and D. C. Ong, “Attention uncovers task-relevant semantics in emotional narrative
understanding,” Knowledge-Based Systems, vol. 226, p. 107162, Aug. 2021.
doi:10.1016/j.knosys.2021.107162.
[4]. S. Naeemi, “Social Media Actions Analytics,” Social Media Analytics in Predicting Consumer Behavior,
pp. 111–129, Mar. 2023. doi:10.1201/9781003200154-6.
[5]. A. S. Dange and Dr. M. E, “Text matching technique based intelligent web crawler in hybrid mode,” SSRN
Electronic Journal, 2022. doi:10.2139/ssrn.4053442.
[6]. B. S. McGowan, “Using text mining tools to inform search term generation: An introduction for librarians,”
portal: Libraries and the Academy, vol. 21, no. 3, pp. 603–618, 2021. doi:10.1353/pla.2021.0032.
[7]. K. Sinha, P. Sharma, H. Sharma, and K. Asawa, “Web scraping and job recommender system,” 2023
Second International Conference on Informatics (ICI), Nov. 2023. doi:10.1109/ici60088.2023.10420941.
[8]. A. Goldfarb, B. Taska, and F. Teodoridis, Could machine learning be a general purpose technology? A
comparison of emerging technologies using data from online job postings, Feb. 2022. doi:10.3386/w29767.
[9]. L. M, “Fresh graduate unemployment in Malaysia,” EduAdvisor, https://eduadvisor.my/articles/what-
didnt-know-fresh-graduate-unemployment-malaysia-infographic (accessed Mar. 5, 2024).
[10]. Dosm, Department of Statistics Malaysia, https://www.dosm.gov.my/v1/index.php (accessed Mar. 5,
2024).
[11]. H. Bhorat, “Links between education and the Labour Market: Narrowing the mismatch between demand
and supply,” Skill Formation and Globalization, pp. 145–160, Jun. 2019. doi:10.4324/9781351149006-9.
[12]. D. Deming, The growing importance of decision-making on the job, Apr. 2021. doi:10.3386/w28733.
[13]. H. Surbakti, “Pemodelan Arsitektur Enterprise pada Perguruan Tinggi Untuk Peningkatan Layanan
Pendidikan (Studi Kasus: Universitas Respati Yogyakarta),” UAJY E-Print Thesis. Jan. 2018. https://e-
journal.uajy.ac.id/13582/.
82