Utilizing Text Mining and Feature-Sentiment-Pairs
Utilizing Text Mining and Feature-Sentiment-Pairs
1 Introduction
Massive Open Online Course, often abbreviated to MOOC, is online courses held
by a university with the option of free or open registration. The teachers are faculty
members of the university or general practitioner in their fields. Classes are conducted
by weekly lectures using videos, online assessments, and discussion forums. Howev-
er, some educators assume that the MOOC’s quality of learning is different from face-
to-face class because it cannot replace face-to-face classroom engagement, laboratory,
fieldwork, and any other aspect [1][2][3].
134 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
2 Methods
The proposed framework is illustrated in Fig. 1. The framework aims to find out
the difference between the machine model and the human model. There are four steps
to generate FSP automatically.
The data were collected using Webharvy Web Scrapper [9] from the Coursera
website. The example of the review page of Coursera can be seen in Fig. 2. Webharvy
needs the URL page of each course’s online review where the original dataset in. The
users determine the attribute that would be crawled. In this research, we crawled
“online review”, “course’s name” and “star review”. Table 1 gives a sample of the
crawled data. Besides these attributes, there were other attributes given on the page
such as “username” and “date”. However, we excluded them since we did not catego-
rize the online reviews based on those attributes. Fig. 3 displays the interface of
Webharvy Web Scrapper.
Data Collection Data pre-processing Machine Model sentiment analysis Sentiment classification
Coursera
Tokenize Tag word type Sentiment prediction
Website
Online
Reviews Stemming Feature Sentiment
Database
Transform
Generate FSP
Cases
Count Frequency
The number of online review data crawled in total was 9677 reviews. To balance
the data based on the sentiment review, they were divided into two types, namely
negative and positive sentiment. The sentiment was used as a label of each review
datum. The label had been manually labeled based on the star review in each review:
1-2-star review was categorized as a negative sentiment while 4-5-star review was a
positive sentiment. The categorization was based on [10]. Table 2 shows the total
number of data.
136 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
138 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
The optimal hyper plane is obtained by calculating the largest distance between the
closest points of each class [18]. The hyper plane can be presented algebraically using
Eq. 1 [17].
𝑓(𝑥) = 𝑤 𝑇 𝑥 + b (1)
Eq. 2 is considered as training hyper plane where x is the closest point from the
hyper plane. The closest value to the hyper plane is referred to support vectors. The
length between a point xi and the hyper plane (ri) is calculated using Eq. 3. The for-
mula of the margin (M) is shown in Eq. 4.
|𝑤 𝑇 𝑥 + b| = 1 (2)
𝑤 𝑇 𝑋𝑖 +𝑏
𝑟𝑖 = 𝑦𝑖 (3)
|𝑤|
2
M= (4)
|𝑤|
It should be noted that SVM requires more data training so that the accuracy could
be higher [19]. SVM is better than other classification algorithms if multiple classifi-
cations in the large dataset are involved. The pseudo-code of SVM is shown in Table
4.
140 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
In general, the order of the text classification using SVM is term weighting, data
training, and data testing. Term weighting used the occurrence number of terms in the
documents. Term weighting is often called feature extraction. Feature extraction is
used to transform a text document from any format into a list of features/terms that
can be easily processed by text classification techniques. Feature extraction is one of
the significant pre-processing techniques in the text classification that computes
features/terms value in the documents. There are several methods of term
weighting/feature extraction, such as Term Frequency (TF), Inverse Document
Frequency (IDF), and a combination of TF and IDF which is called TF.IDF. Term
Frequency (TF) is the frequency of occurrence of the term (t) in the document (di) as
shown in Table 5. For example, the term “Increase” appears in four documents, and
each document has a different number of Term Frequency (TF). Meanwhile, Docu-
ment Frequency (DF) is the number of documents in which a term (t) appears. From
Table 5, DF was calculated as illustrated by Table 6. After the DF value is obtained,
then the IDF value is calculated by Eq. 5. The result of TF.IDF is obtained by multi-
plying the TF and IDF as seen on Eq. 6. The results of IDF and TF.IDF are shown in
Table 6. The result of weighting is the formation of feature vectors by looking at the
existence of a word in the document.
1
𝐼𝐷𝐹 = (5)
𝐷𝐹
Table 5. TF Weighting
Term (t) Document 1 (d1) Document 2 (d2) Document 3 (d3) Document 4 (d4) Document 5 (d5)
Increase 0 1 2 5 4
Poor 1 4 0 0 3
Short 2 2 0 3 1
Course 0 0 1 2 0
Teacher 5 0 2 6 0
Furthermore, a feature vector space was formed. Feature vector space was obtained
by counting the number of terms present in the entire document. If there are 5 differ-
ent terms in the entire document as shown in Table 5, then the feature vector will have
5 dimensions, where each part of the vector represents one term. Vector dimensions
can be reduced by deleting insignificant words. After the vector space is formed, the
vectors representing a document were formed using one of the weighting methods. In
this study, TF weighting was chosen as weighting method. If in the first document
there are 1 term “Poor”, 2 terms “Short”, and 5 terms “Teacher” then the feature
vector for the first document is x2 = 1, x3 = 2, and x5 = 5. After the feature vectors
were formed, complete with their respective labels, they are ready to be inserted into
the SVM to be used as data training [20]. The output of the SVM from the training
process is the best hyperplane to be used as a classifier. The formula of hyperplane
can be seen on Eq. 2. At the time of testing, the feature vectors used as testing data are
entered into the SVM without labeling them. The output of the test results is in the
form of a classification result class label. The class labels in this study determine the
sentiment of the review whether it is positive or negative sentiment. To check the
accuracy of the classifier, the output label is then compared with the original label as
described in sub section 4 about evaluation of classification performances.
142 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
(𝑇𝑃+𝑇𝑁)
Accuracy = (𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁) (7)
𝑇𝑃
Precision = (8)
𝑇𝑃+𝐹𝑃
𝑇𝑃
Recall = (9)
𝑇𝑃+𝐹𝑁
2∗(𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
F1 Score = (10)
(𝑅𝑒𝑐𝑎𝑙𝑙+𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
In this research, we asked for annotators’ assistance to model the sentiment analy-
sis. The annotators were involved to label the review data and determined the FSPs.
The results of the annotators’ analysis was used as a control variable.
Interpret data: Annotators observed the sentiment analysis categorization apart
from the results of star review generalizations from the crawl review data. In this
phase, the annotators checked whether the sentiment analysis was in accordance with
the review data.
Infer patterns from data: Unlike machine models, annotators could find syno-
nyms and sarcasm. For example, there were two reviews: “great professors” and
“good teacher”. The annotators knew that both of the reviews were identical, “profes-
sor” means the “teacher” who taught the course. The annotators formulated the FSPs
easier than machine model.
Measure the frequency of feature-sentiment-pairs: The frequency for every
FSPs’ appearance in the review was counted. To find out the significance of the FSPs,
the easiest way was to calculate its appearance on a review. The output of this annota-
tors’ model was to select the highest frequency of FSPs.
3 Results
The model was trained using the SVM algorithm. The review data were divided in-
to two groups: Data training and data testing. For evaluation purposes, accuracy, re-
call and precision were calculated. The result is shown in Table 8. Ratio means the
ratio of data training and data testing. In the first experiment, data training and data
testing were on the same ratio, 50% of the total amount of the data. In the second
experiment, data training was 80%, and data testing was only 20%. The evaluation
result was different and the best result was when the ratio was 80:20. If the total data
were 9677, then the number of data training on 80:20 ratios was 7741 and the number
of data testing was 1936.
The result of the machine model is illustrated in Fig. 6 and Fig. 7. The y-axis from
both charts shows the frequency of negative and positive FSPs. The x-axis represents
the FSPs. The most frequent negative FSPs in Fig. 6 indicate that the course was
basic, poor, difficult and bored. Fig. 7 shows the most frequent positive FSPs are
great, good, understand, and interesting.
As displayed in Fig. 6, the users of the course thought that the course was too basic
as they expected. They also thought that the course was poor and boring. Also, they
faced difficulties while following the course. The positive features can be seen in
Fig.7. The users felt that the course was great and good. Since both words are syno-
nym, so it is implied that the course was good enough. They found that the course was
understandable and interesting to be followed. The positive and negative features are
contrary to each other, so the results are compared with numbers. For example, the
word “good” is the antonym of “poor”, but the frequency of the occurrence of “good”
is higher than “poor”. This also applies to the words “understand” – “difficult” and
the words “interesting” – “bored”.
The result of the human model is illustrated in Fig. 8 and Fig. 9. The y-axis from
both charts shows the frequency of negative and positive FSPs. The x-axis represents
the FSPs. The most frequent negative FSPs in Fig. 8 indicated that the course only
offered a few materials and they were difficult to comprehend. Fig. 9 shows the most
frequent positive FSPs are course was good and interesting, and the introduction was
great.
As displayed in Fig. 8, the users thought that only a few materials provided in the
course, which did not meet their expectations. They also thought that the course was
bad and its material was difficult. The positive features can be seen in Fig. 9. The
users felt that the course was good and interesting. They found that the introduction of
each course was great. The positive and negative features are contrary to each other,
so the results are compared with numbers. For example, the word “good” is the anto-
nym of “bad”, but the frequency of occurrence of “good” is higher than “poor”.
144 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
4 Discussion
The Machine Model and Human Model were compared to measure the success of
the framework. The Human Model was also referred to ground truth because the
result of the Human Model came from the expert analysis. The success of the frame-
work was potentially shown by the similarity of the Human Model and Machine
Model. The positive features produced by the Machine Model were “course-good”,
“course-interesting”, “course-easy”, “course-understand”, “course-recommended”,
and “material-good”.
Those positive features were also the most frequently occurred FSPs in the Human
Model. Even though the number of each positive feature such as “course
recommended” in the Machine Model does not exactly have the same number in the
Human Model, the frequent FSPs in the Human Model and Machine Model are
mostly the same. As illustrated in Fig. 7 and Fig. 9, most of the positive FSPs in the
Machine Model appear in the Human Model. Similarly, in the negative features, the
most frequently occurred FSPs produced by the Machine Model are also nearly the
same as those produced by the Human Model. Further study needs to be carried out to
improve the accuracy of the number of FSPs in both Machine Model and Human
Model.
The result of FSPs from the Machine Model towards the Human Model proves that
the framework managed to transform the unstructured data into meaningful infor-
mation. Since the data were used as the main source in decision making in this
framework, the approach used in this framework was data-driven design. The data-
driven design made decisions regarding the development of content-related design
and system design fully based on the data collected from the MOOC, specifically the
data about how MOOC users interact with the system as seen on MOOC discussion
forum.
There were similar studies which used the same data from MOOC discussion
forum. For example, Wise et al in 2016 developed a linguistic model to categorize and
identify the post in MOOC discussion forum whether or not they are substantially
related to the course content by searching for predefined keywords [22]. It helps the
146 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
instructors to identify the content-related question from the learners, so that the
instructors could improve their course material. However, the predefined keywords
which they proposed were only for a certain course. Thus, if they changed the course,
then the keywords had to be changed according to the course. Brinton et al used large
scale statistical analysis of forum discussion in Coursera [23]. They investigated the
user behaviour on the forum discussion and looked for the most course-relevant dis-
cussion. They ranked the discussion topic based on its relevance with the course using
a unified generative model. Gamage et al [24] also studied user behaviour as done by
[23]. The difference was the method used. [24] applied an ethnographic method by
using a deep interview with two groups of participants. The first group never used
MOOC, and the other one used MOOC. From the deep interview, the researcher could
conclude the recommendation for MOOC design. Agrawal et al made Stanford
MOOCPost Dataset which was tagged manually according to six dimensions includ-
ing confusion, question, answer, opinion, sentiment, and urgency to address the con-
fusion in the MOOC discussion forum. They also gave a recommendation to the user
to open instructional video clips [25]. They classified the confusion by using “track-
ing log data” from log learners’ actions [25].
The limited number of datasets in this study was under 10000 online reviews, but
the Machine Model was able to produce information to support the decision-making
process in MOOC. The data itself were divided into equal number of positive and
negative labels as shown in Table 2. The number of data of each label should be equal
to avoid deviation towards the dominant label. The accuracy, precision, recall, and F1
score were above 80%. The value was high enough even though the number of the
dataset was relatively moderate. Thus, adding more data is compulsory in further
studies to obtain higher accuracy, precision, recall, and F1 score. The framework
transforms qualitative online review data into quantitative data. Using this framework,
the designer does not need to do a manual analysis to identify user preference for each
feature in MOOC. All the decision was generated automatically based on online
review data as an input.
The quantitative result helps the designer to find words and their frequency that of-
ten appear in the review data. It will give suggestions to the designer to add new fea-
tures or improve the existed features in MOOC especially content-related features.
Although using quantitative results is easier to validate and analyse the data, it also
works well in large scale data. However, there are several weaknesses in using quanti-
tative results. Unlike qualitative data, quantitative did not have more specific infor-
mation because the sentences from online review data were separated into each word.
Since they had been separated, it never revealed the causation on why the user indi-
cated the positive and negative review about particular features. FSPs only paired the
adjectives and nouns that were contained in a sentence but it was lack of details, and it
never revealed the causation, too. The designer used FSP to identify certain popular
comments about content-related features on the data. Since FSP consisted of adjec-
tives and nouns, the designer was able to conclude which features that received a
positive response and vice versa.
There were differences between the results as shown in Fig. 6-9. If we compared
the result between Machine Model and Human Model, the Human Model was always
more accurate than Machine Model in terms of grouping the FSP. Some of the causes
of computational processing errors include:
5 Conclusion
148 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
study is needed to improve the accuracy of the result of Machine Model by adding
more dataset and reducing computational processing errors.
6 References
[1] Cooper, S., Sahami, M. (2013). Reflections on Stanford’s MOOCs. Communications of the
ACM, 56(2), pp. 28-30. https://doi.org/10.1145/2408776.2408787
[2] Harder, B. (2013). Are MOOCs the future of medical education? British Medical Journal,
346, pp. 1-3.
[3] Martin, F. G. (2012). Will massive open online courses change how we teach? Communi-
cations of the ACM, 55(8), pp. 26-28. https://doi.org/10.1145/2240236.2240246
[4] Anand, S.S. and BÜCHNER, A.G., (1998). Decision support using data mining. Financial
Times Management.
[5] Bertoni A. (2018). Role and Challenges of Data-Driven Design in the Product Innovation
Process. In: Proceedings of 16th IFAC Symposium on Information Control Problems in
Manufacturing, Italy, 2018. Bergamo: Elsevier, pp. 1107-1112. https://doi.org/10.1016
/j.ifacol.2018.08.455
[6] Miner, L., Bolding, P., Hill, T., Nisbet, R., Walton, N. and Miner, G. (2014). Practical Pre-
dictive Analytics and Decisioning Systems for Medicine. 1st ed. United States: Academic
Press. https://doi.org/10.1016/b978-0-12-411643-6.00044-2
[7] Dina, N.Z. (2020). Tourist sentiment analysis on TripAdvisor using text mining: a case
study using hotels in Ubud, Bali. African Journal of Hospitality Tourism and Leisure, 9(2),
pp. 1-10.
[8] Mallik, R., Sahoo, A. K. (2018). A novel approach to spam filtering using semantic based
naive bayesian classifier in text analytics. In: Proceedings of IEMIS, India, 2018. Singa-
pore: Springer, pp. 301-309. https://doi.org/10.1007/978-981-13-1498-8_27
[9] WebHarvy: Intuitive Powerful Visual Web Scraper, https://www.webharvy.com/
[10] Decker, R. and TRUSOV, M. (2010). Estimating Aggregate Consumer Preferences from
Online Product Reviews. International Journal of Research in Marketing, 27(4), pp. 293-
307. https://doi.org/10.1016/j.ijresmar.2010.09.001
[11] Dina, N.Z. and Juniarta, N. (2020). Aspect based Sentiment Analysis of Employee’s Re-
view Experience. Journal of Information Systems Engineering and Business Intelligence,
6(1), pp. 79-88. https://doi.org/10.20473/jisebi.6.1.79-88
[12] Vyas, V. and UMA, V. (2018). An Extensive study of Sentiment Analysis tools and Binary
Classification of tweets using Rapid Miner. Procedia Computer Science, 125, pp. 329-335.
https://doi.org/10.1016/j.procs.2017.12.044
[13] Toutanova, K., Klein, D., Manning, C. and Singer, Y. (2003). Feature-Rich Part-of-Speech
Tagging with a Cyclic Dependency Network. In: Proceedings of HLT-NAACL, Canada,
2003. Edmonton: ACM, pp. 252-259. https://doi.org/10.3115/1073445.
1073478
[14] Ireland, R. and Liu, A. (2018). Application of data analytics for product design: Sentiment
analysis of online product reviews. CIRP Journal of Manufacturing Science and Technolo-
gy, 23, pp. 128-144. https://doi.org/10.1016/j.cirpj.2018.06.003
[15] Verna, T., Renu, R. and GAUR, D. (2014). Tokenization and Filtering Process in
RapidMiner. International Journal of Applied Information Systems, 7, pp. 16-18. https://
doi.org/10.5120/ijais14-451139
[16] Jin, J., JI, P., Liu, Y. (2016). Identifying comparative customer requirements from product
online reviews for competitor analysis. Engineering Applications of Artificial Intelligence,
49, pp. 61-73. https://doi.org/10.1016/j.engappai.2015.12.005
[17] Ray, S. Understanding Support Vector Machine Algorithm from Examples, 6.10.2015
[Online]. Available: https://www.analyticsvidhya.com/blog/2015/10/understaing-support-
vector-machine-example-code/[Accessed 6 July 2020]
[18] Manning, C.D., Raghavan, P., Schütze, H. (2008) Support Vector Machines: The Linearly
Separable Case. Cambridge University Press.
[19] Adomavicius, G., Tuzhilin, A. (2005) Toward the Next Generation of Recommender Sys-
tems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Transactions on
Knowledge and Data Engineering, 17(6), pp. 734–749. https://doi.org/10.1109/tk
de.2005.99
[20] Purnamawan, I. (2015). Support Vector Machine pada Information Retrieval. Jurnal Pen-
didikan Teknologi dan Kejuruan, 12(2), pp. 173-180. https://doi.org/10.23887/jptk.v12
i2.6481
[21] Tharwat, A. (2018). Classification assessment methods. Applied Computing and Informat-
ics. (Article in Press)
[22] Wise, A. F., Cui, Y. and Vytasek, J. M. (2016). Bringing order to chaos in MOOC
discussion forums with content-related thread identification. In: Proceedings of the Sixth
International Conference on Learning Analytics & Knowledge, United Kingdom, 2016.
Edinburgh: ACM, pp. 188–197. https://doi.org/10.1145/2883851.2883916
[23] Lam, H., Liu, Z. and Wong, F. M. F. (2014). Learning about Social Learning in MOOCs:
From Statistical Analysis to Generative Model. IEEE Transactions On Learning
Technologies, 7(4), pp. 346-359. https://doi.org/10.1109/tlt.2014.23379
00
[24] Agrawal, A., Venkatraman, J., Leonard, S. and Paepcke, A. (2015). YouEDU: addressing
confusion in MOOC discussion forums by recommending instructional video clips. In:
Proceedings of the 8th International Conference on Education Data Mining, Spain, 2015.
New York: ACM, pp. 297-304.
[25] Gamage, D., Perera, I. and Fernando, S. (2020). Exploring MOOC User Behaviors Beyond
Platforms. International Journal of Emerging Technologies in Learning, 15(8), pp. 161-
179. https://doi.org/10.3991/ijet.v15i08.12493
[26] Moral, C., De Antonio, A., Imbert, R. and Ramírez, J. (2014). A survey of stemming algo-
rithms in information retrieval. Information Research, 19(1), pp. 605.
[27] Jivani, A. G. (2011). A comparative study of stemming algorithms. International Journal of
Computer Technology and Applications, 2(6), pp. 1930-1938.
[28] Hu, M. and Liu, B. (2004) Mining opinion features in customer reviews. American Asso-
ciation of Artificial Intelligence, 4(4), pp. 755–760.
7 Authors
150 http://www.i-jet.org
Paper—Utilizing Text Mining and Feature-Sentiment-Pairs to Support Data-Driven Design Automation...
Article submitted 2020-07-17. Resubmitted 2020-09-05. Final acceptance 2020-09-07. Final version
published as submitted by the authors