0% found this document useful (0 votes)
37 views8 pages

Understanding Knowledge Areas in Curriculum Through Text Mining From Course Materials

Uploaded by

Jump Kiatchaiwat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views8 pages

Understanding Knowledge Areas in Curriculum Through Text Mining From Course Materials

Uploaded by

Jump Kiatchaiwat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Understanding Knowledge Areas in Curriculum

through Text Mining from Course Materials

Kornraphop Kawintiranon, Peerapon Vateekul, Atiwong Suchato, Proadpran Punyabukkana


Computer Engineering Department, Faculty of Engineering
Chulalongkorn University
Bangkok, Thailand
[email protected], (peerapon.v, atiwong.s, proadpran.p)@chula.ac.th

Abstract—Curriculum analysis is attracting widespread The well-known institution of educational society named
interest in educational field. There are two main approaches: (i) “ACM Education Board and the IEEE Computer Society's
human-based and (ii) text-based assessments. Although an Education”, which have been working to establish curricular
evaluation by teachers and learners are widely used, it is guideline for over 40 years, has released Computer Engineering
inconvenient and time-consuming. Also, the results absolutely rely Curricula CE2016 in October 2015 [9] to be a guideline for
on individual attitude. The text-based approach aims to directly Undergraduate Degree Programs in Computer Engineering. In
evaluate the course syllabus; however, there is only a course the guideline, 13 Knowledge Areas (KAs) were defined as
description in the syllabus, so this cannot really express the actual related areas of computer engineering as shown in Table I.
course contents. In this paper, we present an automatic text-based
Moreover, each KA was divided into many sub topics termed
curriculum analysis that straightforwardly assesses entire course
Knowledge Units (KUs); the number of KUs corresponding to
materials. Our approach employs a well-known text-mining
technique that extracts keywords using TF-IDF. The analysis is
each KA is varied with extent of the KA. A number of
based on keywords from the course materials matching to the researchers concentrated to examine the guideline document for
keywords from online documents, which is similar to the domain various objectives. Sekiya, et al., tried to map linkages between
expert. Moreover, a new measurement is proposed to quantify two different guidelines and also between guideline and course
associations between course materials and online documents using syllabi [1] whereas Marshall quantified differences of structure
amounts of matching keywords. The experiment was conducted on among the guideline series [6].
materials of three subjects collected from five top universities
mapping to the latest Computer Engineering Curricular Guideline TABLE I. KNOWLEDGE AREAS OF COMPUTER ENGINEERING IN CE2016
(CE2016). The results illustrate significant relations among
courses from different universities and CE2016. To further ID Abbreviation Knowledge Area
analyze the courses, each of them are visualized using radar 01 CAE Circuits and Electronics
charts. 02 CAL Computing Algorithms
03 CAO Computer Architecture and Organization
Keywords—curriculum analysis; curriculum evaluation; course 04 DIG Digital Design
content analysis; keyword extraction; TF-IDF;
05 ESY Embedded Systems
I. INTRODUCTION 06 NWK Computer Networks
07 PFP Professional Practice
A major current focus in educational engineering teaching
and learning development is curriculum analysis [1, 2, 3]. To 08 SEC Information Security
ensure that the developed curriculum was comprised of eligible 09 SET Strategies for Emerging Technologies
contents, many studies have proposed curriculum evaluation 10 SGP Signal Processing
methods to accomplish this task [4, 5]. Their contributions 11 SPE Systems and Project Engineering
completely focused on assessments of teachers and learners. 12 SRM Systems Resource Management
They accumulated opinions from instructors and students; the 13 SWD Software Design
opinions were analyzed to evaluate curriculum quality. While
some works concentrated on human-based assessment, there However, it is usually inconvenient and time-consuming to
have been a number of studies that analyzed curricula through conduct a human-based assessment in order to analyze the
course documents [1, 6, 7]. Among these studies, some of them quality of a curriculum. Furthermore, human-based results could
attempted to analyze curricula through course syllabi called be biased toward personal perspectives of individual assessors.
“course syllabus-based analysis” [1, 7, 8]; they mapped their Likewise, the majority of literature, which concerned about
curriculum topics to related topics in a standard curriculum in analysis of curricula through course syllabi, did not sufficiently
order to express connections between their curricula and the take into account raw course materials that could definitely
standard. express actual course contents. Although a syllabus states the
topics of educational course, but the contents taught in the class

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 161
might be slightly different. Some topics, which is defined in a graph; they compared this graph to the graph created from topics
course syllabus, might be neglected because of several reasons of CS2013.
such as improper subject scope and unsuitable time duration.
Considering the issue of using only course syllabi, Yiling, et
Therefore, course materials could express precise topics of
al., took their attempt to use further external text in order to find
knowledge in order to demonstrate knowledge contents that are
out the correlation between courses and knowledge structure of
actually mentioned in the course.
CS2013 [5]. Their motivation was further related information
In this paper, we present a novel curriculum analysis that is obtained from the Internet could improve accuracy of their
directly examined from course materials, which is similar to a system. They, therefore, searched the Internet by using KAs and
human-like method, comparing to Body of Knowledge (BOK) KUs as queries. Search Engine API named Google Custom
including KAs and KUs defined in CE2016. To be an expert of Search API was utilized for retrieval of documents. Afterwards,
a science, reading a number of the science-related documents is keywords were extracted from retrieved documents; these
a way to be a master. Therefore, we elicited KUs and KAs from keywords were used in conjunction with KAs and KUs for
CE2016; they were used as queries to search on the Internet mapping analysis. However, similarly to Sekiya, et al., they
using search engine APIs. As a result, we obtained abundant concerned only keywords of syllabus, and did not take into
information concerning with their topics. Afterwards, keywords account raw course materials. Our curriculum analysis method
were extracted from both found documents and course materials used actual course materials instead of using only syllabi. We
using widely used text mining technique known as Term extracted keywords from materials and topics of CE2016; these
Frequency–Inverse Document Frequency (TF-IDF) [10]. keywords were matched. Afterwards, the scores of association
Eventually, keywords from different sources were mapped; were computed using our association score computation. When
these matching results represent the association between course the scores were produced, we have plotted these scores in radar
materials and CE2016 in term of our evaluation that is described charts. The charts could illustrate the characteristics of each
in Section V. course. For instance, if the chart shapes of courses of two
different universities are quite similar, it could be assumed that
The rest of this paper is organized as follows: Section II is the course styles of these universities are also quite similar
about related works of analysis on curricula and our data because course materials of these universities focus on the same
collection is described in Section III. Our proposed analysis KUs or KAs. Moreover, radar chart visualization could show
method on curricula through text mining from course materials aims of the course, if a score of a topic represented in the chart
is in Section IV. We performed our experiments described in was extremely high whereas others were utterly low. This
Section V and the conclusion of this work follows in Section VI. indicated that the topic, which is represented by the highest twig
II. RELATED WORKS of chart, was mostly focused in the course.
In recent decades, “ACM Education Board and the IEEE To elicit keywords from text documents, there are some
Computer Society's Education” institution have continuously studies on this kind of text mining technique widely known as
worked on computer science-related curricular guideline. keyword extraction [10, 11, 12]. However, one of the most
Eventually, the latest computer engineering curricular guideline efficient techniques is Term Frequency–Inverse Document
was released called “CE2016”. However, before this guideline Frequency (TF-IDF); it used frequency of word appearance in
establishment, there are several versions of computer science- both target material and the corpus to quantify signification of
related curricular guideline published by the institution. Many the word. The word, which has high signification score, would
studies have tried to analyze series of this guideline. For be selected to be a keyword. This technique requires the
example, Marshall quantified differences of structure among appropriate corpus as a referent word database in order to
computing science curricular guideline series (CS series) [6]. investigate notable words. In our experiment, several corpuses
Graph visualization was used in the study to illustrate were used to perform the keyword extraction including
connections among KAs. The paper represented KAs as a Wikipedia, DBpedia and Linked Open Data.
network graph while showed KUs and topics as edges of graph.
They identified corresponding topics among computing III. OUR DATA COLLECTION
curricular guideline series to compare their discrepancies. On the Our data were collected from course materials of five top
other hand, our study aims to quantify matching topics between institutions; Chulalongkorn University (CU) and other four
a guideline and actual courses in order to indicate their universities ranked by QS World University Rankings in
association; this association could express quality of actual Computer Science & Information Systems [13] including
courses comparing to the guideline CE2016. Massachusetts Institute of Technology (MIT), Stanford
University, Carnegie Mellon University (CMU), and University
Moreover, Sekiya, et al., proposed mapping analysis method of Cambridge. Although CU is not in the 100 top universities
to investigate the changes between CS2008 and CS2013 using
ranked by QS, it is one of the top universities in Thailand.
supervised Latent Dirichlet Allocation (sLDA) and isomap [1].
Moreover, we could fully access course materials from the
Their method was improved from the method that Marshall university and also gather information from CU students in order
used. They showed discrepancies among topics elicited from to conduct the experimental evaluation, which will be described
CS2008 and CS2013 by network graph visualization. in Section V.B.
Furthermore, they also examined the changes by using the actual
course from three universities. Nonetheless, they used merely In addition, three computer engineering-related subjects
course syllabi to extract topics and mapped them into network were focused. We have chosen three subjects of which course
materials are candidly provided from the selected universities.

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 162
These subjects comprise computer networks (CN), computer Curricula released by The ACM Education Board and the IEEE
system architecture (CA), and operating system (OS). Computer Society's Education, comprised of specific KAs, has
been frequently used for investigation of curriculum style [1];
We gathered the up-to-date material sets which the the newest curricular guideline is CE2016. We extracted KUs of
institutions publicly provide from their websites and MIT each KA and analyzed the material style using these particular
OpenCourseWare (OCW). Table II shows the academic year of KUs.
each material set provided by the institutions. There is only one
outdated set out of 15 material sets (before 2010). It is a material The course materials were collected from five top
set of CA subject gathered from Stanford University and used in institutions ranked by QS World University Ranking; the course
academic year of 2001. However, the other sets are completely materials corresponded to three computer engineering-related
contemporary. Especially in OS subjects, all institutions subjects as described in Section III. These materials were used
provided the updated course materials of which the academic in our preliminary experiment to investigate quantitative
year varies in range of 2014 to 2016. Therefore, our data were association between the intended course style stated in the
closely to each other in term of modernity. guideline document CE2016 and actual course style represented
by the materials. There are five main steps in our proposed
TABLE II. THE ACADEMIC YEARS OF MATERIAL SETS PROVIDED BY approach following by (A) Data preprocessing, (B) Keyword
FIVE INSTITUTIONS Extraction, (C) Dict-Matrix Construction, (D) Comparison-
Institution CN CA OS Matrix Construction and (E) Association Score Computation.
These steps are simply illustrated in Fig. 1. Step (A) and (B) are
Cambridge 2014 2011 2015
used twice times because these steps is about preprocessing and
CMU 2013 2015 2016 keywords extraction which have to be applied in both
CU 2014 2014 2014 curriculum guideline and course materials. Eventually, the final
MIT 2015 2015 2015 outputs of this method are the association scores that represent
Stanford 2011 2001 2015 quantitative association as mentioned earlier.
Computer Networks (CN), Computer System Architecture (CA), Operating System (OS)

In our collected data, the amount of course materials


provided by each university are imbalance. In the CN subject,
there are 7, 38, 5, 21 and 17 documents provided by University
of Cambridge, CMU, CU, MIT and Stanford University,
respectively. Similarly, there are 1, 37, 13, 25 and 12 documents
of CA subjects and there are 12, 47, 11, 16 and 16 documents of
OS subjects. Although the number of materials of CA subject
provided by University of Cambridge is only one; it, however,
contained adequate information to be analyzed because its total
number of pages is extremely high comparing to the others.
However, the quantities of these materials were neglected
because we concerned only about the characteristic styles. The
analysis on quantity of materials is not considered in our
research scope.
IV. OUR PROPOSED ANALYSIS METHOD Fig. 1. A process diagram of our proposed method
In this study, we proposed a novel analysis method for
understanding Knowledge Areas (KAs) in curricula based on A. Data Preprocessing
course materials. Also, text mining, the well-known text analysis Data preprocessing is one of the most considerable methods
technique, was used in the experiment to examine the KAs of data mining technique [14]. To obtain the most accurate
representing course style. Our procedure was established by result, careful and intensive data preprocessing was required.
imitation of human-like behavior. To be an expert in any subject, Our materials were preprocessed using a number of text mining
people have to read numerous related documents or books. Also, preprocessing techniques, which are html-tag removal, non-
they must understand and recognize knowledge of the character removal, lowercase conversion, stop word removal
documents. However, in this study, we aim to analyze and using NLTK [15], respectively. Because some materials were
understand courses using only text documents. For holistic view, gathered from websites, so we need to eliminate html-tag from
we declared one side as a knowledge database and the other side the course materials because it is obviously unrelated to the
as knowledge chunks from course materials. Then, two different contents of course materials. Some symbols from the Internet,
sides were compared to each other by keyword matching. As a however, were not removed by using previous step. Therefore,
result, the matching keywords of them demonstrate the all non-characters were totally discarded. Afterwards, we
association between the knowledge database and the knowledge converted all characters to be in lowercase style; stop words
from course materials. were also expelled in order to simplify the data before applying
According to the knowledge database, a standard knowledge text mining technique.
guideline in CE2016 was selected. Computer Engineering

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 163
B. Keyword Extraction API. Nevertheless, Google Custom Search API has more
Many studies ordinarily utilize keyword extraction efficiency for webpage searching task. Therefore, it is
technique to elicit essential keywords from their documents. unnecessary to use original queries as words for searching. The
One of the most famous techniques named TF-IDF [10] was API could search appropriate webpages simply with only
used in this study. The equation of simple TF-IDF is shown in additional queries. Afterwards, the webpages were extracted
(1). Consider term ‫ ݐ‬and document ݀ ‫ܦ א‬, where ‫ ݐ‬appears in ݊ using the data preprocessing and keyword extraction technique
of ܰ documents in ‫ܦ‬. Regularly-used TF and IDF functions are as mentioned in Section IV.A and IV.B. An example set of
shown in (2) and (3), respectively. After preprocessing, the data keywords from this process corresponding to each KU of a KA
were ready for keyword extraction. Due to TF-IDF technique, was shown in Table. III. Left column in the figure shows
the appropriate corpus has to be applied in the technique to extracted keywords and top row demonstrates KUs whereas
extract legitimate keywords from target documents. Therefore, each value represents number of same keyword extracted from
the factor, which undoubtedly influences result accuracy, is external documents corresponding to their KU.
choosing decent corpus for keyword extraction. According to
TABLE III. AN EXAMPLE SET OF KEYWORDS IN DICT-MATRIX.
this issue, we have utilized Aylien API which could
outstandingly accomplish keyword extraction. A number of Keyword KU1 KU2 KU3
renowned corpuses are selected as reference corpuses including
ios 1 6 0
Wikipedia, DBpedia and Linked Open Data; this could
android 2 0 1
minimalize errors from keyword extraction and could cause
irrelevant keywords to be negligible. debian 4 3 7

Total 7 9 8

ܶ‫ܨܦܫܨ‬ሺ‫ݐ‬ǡ ݀ǡ ݊ǡ ܰሻ ൌ ܶ‫ܨ‬ሺ‫ݐ‬ǡ ݀ሻ ൈ ‫ܨܦܫ‬ሺ݊ǡ ܰሻ  


Nonetheless, keywords extracted from a document might be
exactly similar to keywords extracted from other documents;
this made redundancy because keywords extracted from nine
ͳǡ ݂݅‫ ݀ݎ݋ݓ‬ൌ ‫ݐ‬ external documents could lead to at most nine similar keywords.
ܶ‫ܨ‬ሺ‫ݐ‬ǡ ݀ሻ ൌ  ෍ ൜   
Ͳǡ ‫݁ݏ݅ݓݎ݄݁ݐ݋‬ Hence, each value in the table should vary from one to nine due
௪௢௥ௗ‫א‬ௗ
  to nine external documents. We have considered this point as a
ܰെ݊ reasonable issue because higher value rationally represents
‫ܨܦܫ‬ሺ݊ǡ ܰሻ ൌ  Ž‘‰ ൬ ൰   higher association. For example, considering ‘ios’ keyword in
݊

the table, it relates to KU1 with weight 1 and KU2 with weight
6 but there is no relation with KU3 with weight 0; it could be
C. Dict-Matrix Construction assumed that this keyword completely relates to KU2 more than
To determine association between course style defined in ‘android’ keyword that was not found and ‘debian’ keyword that
CE2016 and actual course style represented in course materials, was found merely in three external documents. By this process,
we have conducted Dict-Matrix that is a set of keywords and consequently, 13 sets of keywords and their values called Dict-
their weight in order to compare them to the keywords extracted Matrix were constructed according to 13 KAs.
from course materials. To construct Dict-Matrix, we excerpted
KUs of each KA in CE2016 and used them as queries for D. Comparison-Matrix Construction
searching on the Internet for external documents. Furthermore, This process aims to compare keywords extracted from
the additional word was added to be suffix of each queries since materials (KWMs) to keywords extracted from Dict-Matrix
webpages, which were found without any suffix, were casually (KWDs). However, there are a number of materials of each
not reasonable and did not relate to computer engineering. subject. To accomplish this step, the KWMs have been
Wikipedia API [16] and Google Custom Search API [17] were compared thoroughly to the KWDs throughout all documents.
our search engine APIs for this task. In addition, we collected Afterwards, we summed values which exactly match to the
only the webpages of the top four webpages found by Wiki API keywords called counted value. Subsequently, counted values
and top five webpages found by Google API. We used Google were filled into a matrix called Comparison-Matrix. As a result,
Custom Search API more than another because of its high the values in Comparison-Matrix might be quite greater than
accuracy; most of the time, Google API gave more reasonable values in Dict-Matrix because every course material have been
results. According to the four webpages searched by Wiki API, examined; one material made one time filled values in
two of them were searched using original queries without Comparison-Matrix. Finally, the Comparison-Matrix contained
additional word. On the other hand, the other two were inquired values from comparison between KWMs of every materials and
using the additional one. Moreover, one of each two webpages, KWDs. Table. IV demonstrates an example matching keywords
which were stated early, was searched using different API in Comparison-Matrix. Also, computed values of the matching
method, ‘suggest’ method and ‘search’ method. These methods keywords are shown in the table. For instance, considering
belong to Wiki API; hence further details could be discovered in ‘android’ keyword, its value in all KUs transformed to be triple
Wiki API document. However, each method was used twice per times of its value in Dict-Matrix because three course materials
searching query to assure that we have gathered adequate contained this keyword. Thus, values of ‘ios’ keyword did not
documents. Both the original queries and the additional queries change because only one course materials consisted of this
were used because additional queries sometimes did not lead to keyword. On the other hand, ‘debian’ keyword was not found in
computer engineering-related webpages when using the Wiki

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 164
any course materials (value 0 in all KUs). Consequently, it was V. EXPERIMENTS ON COURSE MATERIALS
discarded from the table for clearness. According to the methodology described in previous section,
we employed materials collected from Chulalongkorn
TABLE IV. AN EXAMPLE SET OF KEYWORDS IN COMPARISON-MATRIX.
University to demonstrate the characteristics of our course
Keyword KU1 KU2 KU3 materials. Moreover, we compared results from each institution
ios 1 6 0
to find the relationship of course material styles among those
universities based on latest Computer Engineering curriculum
android 6 0 3
(CE2016). Experiments were conducted and the results were
Total 7 6 3
compared in three perspectives: (i) The association scores of
E. Association Score Computation materials of three CU courses, (ii) A comparison of association
ranking from our method and actuality and (iii) A comparison of
To evaluate the association of two opposite side as stated in association scores among five institutions.
Section IV, the association score computation was performed.
The scores have been computed using the equation as shown in A. The Association Scores of Three CU Courses
(4). ܲሺ݅ሻ represents the matching value of keyword ݅ in In this experiment, we aim to find the characteristic of CU
Comparison-Matrix while ܳሺ݆ሻ represents the value of keyword course materials compared to CE2016 by applying our purpose
݆ in Dict-Matrix. The total number of keywords of KU ݇, which method and materials from three courses of CU. We computed
are in Comparison-Matrix and Dict-Matrix, are shown as ݊ and the association scores in both Knowledge Unit and Knowledge
݉, respectively. Area aspect. Fig. 2 illustrates the example association scores of
௡ ௠ four different KAs of CN subject, 05-ESY, 06-NWK, 07-PFP
and 08-SEC, which are displayed by four radar charts. These
‫݁ݎ݋ܿܵ݋ݏݏܣ‬௄௎ ሺ݇ሻ ൌ  ෍ ܲሺ݅ሻ  ൊ  ෍ ܳሺ݆ሻ  
example scores belong to Chulalongkorn University. The

௜ୀଵ ௝ୀଵ numbers at pinnacles of each chart represent the number of KUs
ordered in CE2016. For instance, number 11 in the top-left chart
In other words, in the equation, summation of values in each represents the KA named Mobile and networked embedded
column of Dict-Matrix is used as the heuristic number of each systems whereas number 11 in the top-right chart represents the
KU that the matching number of keyword of each KU could KA named Wireless sensor networks. Each axe of chart
possibly reach in the best case. Likewise, all values in each represents the quantitative association score of each KU in term
column of Comparison-Matrix were summed up to acquire of logarithmic scale; we used this scale for apparent view only
amount of values of each KU as shown as an example in the in Fig. 2. Also, every KUs of same KA were shown in the same
lowest row of Table IV. Next, we divided the amount in each chart. Consequently, there are 13 charts represented 13
KU of Comparison-Matrix by the heuristic value of the KU. Knowledge Areas as shown in Fig. 3. Each chart has unequal
Consequently, the association scores, which indicate the number of axes because of unbalanced number of Knowledge
association between the standard course styles and the actual Units.
course styles represented by course materials, were performed.
For instance, considering Table III and Table IV, the association 




score of KU1 would be 1 because the total number in KU1 of    


Dict-Matrix is 7 and the total number in the KU1 of 



Comparison-Matrix is also 7. The division of them is 1. In  
addition, the scores of KU2 and KU3 are 0.67 and 0.375  

computed by division of 6 by 9 and 3 by 8, respectively. The 





association scores, which is computed using data in Table III and 
Table IV, are shown in Table V. 

TABLE V. AN EXAMPLE OF ASSOCIATION SCORES COMPUTATION IN KU 


LEVELS

KU1 KU2 KU3  
 
   
Total (Dict-Matrix) 7 9 8

Total (Comparison-Matrix) 7 6 3

Computation 7/7 6/9 3/8    

Association score 1.0 66.67 0.375
 
 
However, the results quite rely on various factors including
performance of search engine, number of external documents
and keyword extraction technique; if these factors are adjusted,

the results will definitely change as well. Nonetheless, the



preliminary results show that our experiment is acceptable and
reasonable. Further analysis of these association scores are Fig. 2. Association scores of four different KAs of computer networks course
described in Section V. materials belonging to Chulalongkorn University (Logarithmic scale).

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 165
TABLE VI. THE ASSOCIATION SCORES OF KA LEVEL. THE STARS (*)
EMPHASIZE THE MAJORITY SCORES ON THE KU OF EACH SUBJECT)

KA CN CA OS
01-CAE 0.0402 0.0561 0.0430
02-CAL 0.0341 0.0522 0.1211
03-CAO 0.0481 *0.1908 0.2087
04-DIG 0.0379 0.0761 0.0458
05-ESY 0.0852 0.0527 0.1569
06-NWK *0.1631 0.0347 0.1171
07-PFP 0.0318 0.0291 0.0329
08-SEC 0.0624 0.0269 0.1222
09-SET 0.0195 0.0341 0.0461
10-SGP 0.0237 0.0376 0.0593
11-SPE 0.0407 0.0498 0.0931
12-SRM 0.0608 0.0606 *0.2863
13-SWD 0.0392 0.0434 0.1210
Fig. 3. An example of association scores of computer networks course
materials belonging to Chulalongkorn University (Normal scale).
B. A Comparison of Association Ranking from Our Method
After considering the shape of charts, we found that the and Actuality
scores of each KU in each subjects are quite distinctive. In CN To assure and evaluate our method performance, we have
subject, only 05-ESY, 06-NWK and 08-SEC are conspicuous reproduced a questionnaire to collect opinions from 11 fourth-
but they are 03-CAO, 04-DIG and 12-SRM in CA subject. year students of Chulalongkorn University. In the questionnaire,
Likewise, the areas of charts in OS subject, which are clearly students were asked to order the top-five related KA that related
apparent, are 03-CAO, 05-ESY and 08-SEC. These charts to each subject in descending sequence. Table. VII shows the
demonstrate that three courses of CU are strongly related to the results from our questionnaire in which the red highlight
KAs. emphasized the consensus of students. In CN subject, the table
Nonetheless, there are excessive charts and the scores are shows that 10 out of 11 students chose 06-NWK to be the most
quite ambiguous. Therefore, the representatives of each chart related KA that related to CN subject. Their opinions, however,
were created using the equation as shown in (5). ܲ௞ ሺ݅ሻ means the gradually discorded with five, three, three and two out of 11 in
matching value of keyword ݅ corresponding to KU ݇ in second-related, third-related, forth-related and fifth-related,
Comparison-Matrix whereas ܳ௞ ሺ݅ሻ indicates the value of respectively. Their opinions were in the same way in the other
keyword ݅ corresponding to KU ݇ in Dict-Matrix. The total two remaining subjects. The majority votes at first place always
number of keywords of KU ݇, which belong to Comparison- pass over half of number of students. Ten students chose 06-
Matrix and Dict-Matrix, are shown as ݊ and ݉, respectively. ‫ݏ‬ NWK for CN subject, while all students and seven students
represents total number of KUs of KA ‫ݐ‬. chose 03-CAO for CA and OS subjects, subsequently.

௦ ௦ ௠ TABLE VII. THE NUMBERS OF VOTES OF CU STUDENTS ON THREE



SUBJECTS
‫݁ݎ݋ܿܵ݋ݏݏܣ‬௄஺ ሺ‫ݐ‬ሻ ൌ ෍ ෍ ܲ௞ ሺ݅ሻ  ൊ  ෍ ෍ ܳ௞ ሺ݆ሻ 
௜ୀଵ
௞ୀଵ ௞ୀଵ ௝ୀଵ ID of CN CA OS
KA 1st 2nd 3th 4th 5th 1st 2nd 3th 4th 5th 1st 2nd 3th 4th 5th
In other words, unlike the association scores in Section IV.E, 01 0 1 0 1 1 0 5 1 3 1 0 0 1 0 0
these association scores represent the association scores in KA 02 0 2 1 1 1 0 0 1 0 1 0 1 3 1 2
level instead of KU level. We summarized the value of all 03 1 3 3 1 0 11 0 0 0 0 7 4 0 0 0
matching keywords of each KU in the same KA. Afterward, they 04 0 0 0 0 0 0 6 1 1 0 0 1 0 1 1
were divided by total number of keywords of each KU. For 05 0 0 0 1 1 0 0 6 2 1 0 1 2 1 1
instance, considering Table V, summation of Total (Dict- 06 10 0 0 0 0 0 0 0 0 3 0 1 1 2 0
Matrix) is 24 and summation of Total (Comparison-Matrix) is 07 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
16. We divided 16 by 24 and a result of this association score of 08 0 5 2 3 1 0 0 0 0 0 0 0 0 4 2
KA level is 0.6777.
09 0 0 1 1 1 0 0 0 2 2 0 0 2 0 1
Our data have been executed through this process. 10 0 0 2 2 2 0 0 0 3 1 0 0 0 0 0
Consequently, the delegate scores of each KA were performed 11 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0
as the association scores of KA level. Table VI shows the 12 0 0 1 1 2 0 0 2 0 1 4 3 1 1 0
computed scores that represent all KU scores of each KA. These 13 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2
scores demonstrate that the most related KA to CN subject is 06-
NWK. Moreover, 03-CAO is the most related KA to CA subject To evaluate performance, we chose the student consensus of
and 12-SRM is the most related KA to OS subject. These scores each rank to be KA representatives and compared to the ranking
strengthen the idea that our method is obvious rational. of results from using our method. Table VIII shows score results
and ranking from applying our method to course materials of

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 166
CU. We found that the score of first rank usually surpass the number of KA matching between student opinions and our
second rank overwhelmingly. Therefore, the difference between method when consider only top- ranks up to top-. In our
scores of first rank and second rank are obviously experiment, we considered up to top-5 ranks. For instance, in
distinguishable. Also, the scores of first rank are more than the CN subject, ‫ܯ‬ହ is 4 because four KA representatives from
second approximately twice times on CN and CA subjects. student opinions are exactly same as KAs from our method
Similarly, the score of first rank is more than the second about which are 06-NWK, 08-SEC, 03-CAO and 12-SRM. The
half time.These results show that our method could precisely similarity scores are 69.33%, 84.33% and 64.33% in CN, CA
identify the first rank and discriminate from the second rank. and OS subjects, respectively. Hence, the average similarity
score of these is 72.67%.
TABLE VIII. RESULTS AND RANKING OF COURSE MATERIALS OF ௡
CHULALONGKORN UNIVERSITY ‫ܯ‬௜
݈ܵ݅݉݅ܽ‫ ݁ݎ݋ܿܵݕݐ݅ݎ‬ൌ  ൭෍ ൱  ൊ ݊

CN CA OS ݅
KA ௜ୀଵ
Score Rank Score Rank Score Rank
C. A Comparison of Association Scores Among Five
01-CAE 0.0402 7 0.0561 4 0.0430 12
Institutions
02-CAL 0.0341 10 0.0522 6 0.1211 5
03-CAO 0.0481 5 0.1908 1 0.2087 2 In this experiment, we computed and compared association
04-DIG 0.0379 9 0.0761 2 0.0458 11 scores of each institution in term of ranking. Firstly, we
05-ESY 0.0852 2 0.0527 5 0.1569 3 calculated association scores using our purpose method. The
06-NWK 0.1631 1 0.0347 10 0.1171 7 radar charts represented the scores were shown in Fig. 4. The
07-PFP 0.0318 11 0.0291 12 0.0329 13 numbers at axes, which are 1 to 13, mean each KA from 01-CAE
08-SEC 0.0624 3 0.0269 13 0.1222 4
to 13-SWD. From empirical results, we discovered that shapes
09-SET 0.0195 13 0.0341 11 0.0461 10
10-SGP 0.0237 12 0.0376 9 0.0593 9
of charts among these five institutions are significantly
11-SPE 0.0407 6 0.0498 7 0.0931 8 analogous in the same subject. This finding affirms that course
12-SRM 0.0608 4 0.0606 3 0.2863 1 material styles among these institutions are quite focused in the
13-SWD 0.0392 8 0.0434 8 0.1210 6 same way.

After considering the top-five rank of each subjects, we


found that top-five rank from both student opinions and our
method are extremely close. Table IX shows top-five related
KAs from student opinions and our method. However, opinion
of each student is not exactly same as each other. Therefore, we
selected the highest vote KA to be the representatives of each
rank in all three subjects. Nevertheless, if the KA was selected
already, same KA cannot be selected to be the representative
again. We will select the second one instead of the most one.
Yellow highlight with underlining in Table VII shows the CN CA OS
representatives of the rank in the situation that same KA got
most voted in different rank. For example, first yellow highlight Fig. 4. The scores of KAs among five institutions
10-SGP shows that 10-SGP is representative of fourth rank for
CN subject instead of 08-SEC which got the majority vote Nevertheless, using only shapes is deficient to illustrate
because 08-SEC was representative of second rank already. explicit comparisons among them. In order to manifest their
Consequently, 10-SGP that got the number of votes inferior to relationship, Spearman's Rank Correlation Coefficient
08-SEC is selected as a representative. technique [18] was applied. We ordered ranks of representative
association scores of each KA of subjects. In other words, the
TABLE IX. A COMPARISON OF ASSOCIATIVE KA RANKING FROM OUR ranks of every institution were created in the same way of CU
METHOD AND STUDENT OPINIONS REPRESENTATIVES shown in Table VII. Subsequently, ranking of association scores
were created by comparing rank of each subject between
Subject CN CA OS
institutions using Spearman's rank correlation coefficient. In
Rank 1st 2nd 3th 4th 5th 1st 2nd 3th 4th 5th 1st 2nd 3th 4th 5th addition, all p-value scores are less than 0.025.
% 90.9 45.5 27.3 18.2 18.2 100 54.5 54.5 27.3 27.3 63.6 27.3 27.3 36.4 18.2 Table X shows the rank correlation coefficients of CN
*Stu- subject comparing among institutions. The coefficients of CA
06 08 03 10 12 03 04 05 01 06 03 12 02 08 13
dents
*Our and OS subjects are shown in Table XI and XII, respectively.
06 05 08 12 03 03 04 12 01 05 12 03 05 08 02 Each row represents coefficient scores from left column to each
method
*01 to 13 in students and our method row represent IDs of KAs remaining column. For instance, in CN subject, the score
compared from CU to Cambridge University is 0.93. In our
After that, we computed the similarity score by counting experiment, we did not concern the score between the institution
matching number between representatives of student opinions and itself because it is always 1. Table X demonstrates that
and our method ranking in all situations to appraise the likeness. course material style of CU resembles each other closely with
The score is computed using the equation as shown in (6).  is high coefficient scores. Especially in OS subject, the scores of

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 167
coefficient are absolutely high with 0.97, 0.92, 0.94 and 0.92 as shown in Section V. To verify the potential of our method,
comparing to University of Cambridge, Carnegie Mellon eleven students were asked to order top five subject related KAs.
University, Massachusetts Institute of Technology and Stanford Likewise, KAs were ranked by their corresponding association
University, respectively. scores. We compared these two ranking lists using Spearman’s
Rank Correlation Coefficient. The results show that ranking,
Additionally, the most scores of each pair in Table X are which was performed by our method, is significantly similar to
Cambridge-Stanford, CMU-MIT and Cambridge-CU in CN, CA the ranking ordered by students. These preliminary results
and OS subjects, respectively. These scores identify pairs of demonstrate the feasibility of using course materials as a factor
institutions that their material styles are the most similar to analyze and understand curricula. However, this finding is
comparing to the others. promising and should be validated in a larger number of data.
Additionally, further investigations by integrating other analysis
TABLE X. RANK RELATION COEFFICIENTS OF COMPUTER NETWORKS
SUBJECT AMONG FIVE INSTITUTIONS method, such as human-based and course syllabus-based
analysis method, are needed for further improvement of
Cambridge CMU CU MIT Stanford curriculum analysis.
Cambridge X 0.77 0.93 0.81 0.98
VII. REFERENCES
CMU 0.77 X 0.75 0.73 0.78
CU 0.93 0.75 X 0.75 0.92 [1] Sekiya, T., et al. (2014). Mapping analysis of CS2013 by supervised LDA
and isomap. Teaching, Assessment and Learning (TALE), 2014
MIT 0.81 0.73 0.75 X 0.80 International Conference on, IEEE.
Stanford 0.98 0.78 0.92 0.80 X [2] Gluga, R., et al. (2012). PROGOSS: Mastering the curriculum.
Proceedings of The Australian Conference on Science and Mathematics
TABLE XI. RANK RELATION COEFFICIENTS OF COMPUTER SYSTEM Education (formerly UniServe Science Conference).
ARCHITECTURE SUBJECT AMONG FIVE INSTITUTIONS [3] Impagliazzo, J. and E. Durant (2014). Toward a modern curriculum for
computer engineering. Teaching, Assessment and Learning (TALE),
Cambridge CMU CU MIT Stanford 2014 International Conference on, IEEE.
Cambridge X 0.65 0.90 0.84 0.70 [4] Welch, W. W. and H. J. Walberg (1972). "A national experiment in
curriculum evaluation." American Educational Research Journal: 373-
CMU 0.65 X 0.72 0.90 0.82
383.
CU 0.90 0.72 X 0.84 0.78 [5] Dai, Y., et al. "Course Content Analysis: An Initiative Step toward
MIT 0.84 0.90 0.84 X 0.87 Learning Object Recommendation Systems for MOOC Learners."
Stanford 0.70 0.82 0.78 0.87 X [6] Marshall, L. (2012). A comparison of the core aspects of the ACM/IEEE
Computer Science Curriculum 2013 Strawman report with the specified
core of CC2001 and CS2008 Review. Proceedings of Second Computer
TABLE XII. RANK RELATION COEFFICIENTS OF OPERATING SYSTEM
Science Education Research Conference, ACM.
SUBJECT AMONG FIVE INSTITUTIONS
[7] Ota, S. and H. Mima (2011). "Machine learning-based syllabus
Cambridge CMU CU MIT Stanford classification toward automatic organization of issue-oriented
interdisciplinary curricula." Procedia-Social and Behavioral Sciences 27:
Cambridge X 0.90 0.97 0.96 0.90 241-247.
CMU 0.90 X 0.92 0.95 0.95 [8] Sekiya, T., et al. (2015). Curriculum analysis of CS departments based on
CU 0.97 0.92 X 0.94 0.92 CS2013 by simplified, supervised LDA. Proceedings of the Fifth
MIT 0.96 0.95 0.94 X 0.93 International Conference on Learning Analytics And Knowledge, ACM.
Stanford 0.90 0.95 0.92 0.93 X [9] Durant, E., et al. (2015). CE2016: Updated computer engineering
curriculum guidelines. Proceedings of the 2015 IEEE Frontiers in
Education Conference (FIE), IEEE Computer Society: 1-2.
VI. CONCLUSION [10] Lott, B. (2012). "Survey of Keyword Extraction Techniques." UNM
In this paper, we proposed the alternative curriculum Education.
analysis method for understanding and evaluation. The [11] Hasan, K. S. and V. Ng (2014). Automatic Keyphrase Extraction: A
curricular guideline CE2016 was utilized as the standard Survey of the State of the Art. ACL (1).
knowledge [1]; moreover, KUs of the guideline were used as [12] Salton, G. and C. Buckley (1988). "Term-weighting approaches in
automatic text retrieval." Information processing & management 24(5):
queries to search on the Internet in order to obtain external 513-523.
documents [5]. In our experiment, course materials of three [13] "QS Top Universities", Topuniversities.com, 2016. [Online]. Available:
subjects, which are provided by five universities, were used. We http://www.topuniversities.com/. [Accessed: 18- Aug- 2016].
have applied keyword extraction technique named TF-IDF to [14] Han, J., et al. (2011). Data mining: concepts and techniques, Elsevier.
elicit keywords from course materials and the external [15] "Natural Language Toolkit — NLTK 3.0 documentation", Nltk.org, 2016.
documents. The keywords extracted from external documents [Online]. Available: http://www.nltk.org. [Accessed: 18- Aug- 2016].
and their amounts were used to construct Dict-Matrix. [16] "API:Main page - MediaWiki", Mediawiki.org, 2016. [Online].
Subsequently, we compared keywords elicited from course Available: https://www.mediawiki.org/wiki/API:Main_page. [Accessed:
materials to the keywords in Dict-Matrix; Comparison-Matrix 18- Aug- 2016].
was consequently constructed. Then, the association scores were [17] "Custom Search | Google Developers", Google Developers, 2016.
computed to demonstrate course material style comparing to [Online]. Available: https://developers.google.com/custom-search/.
[Accessed: 18- Aug- 2016].
CE2016. Furthermore, we found that shape of charts, which
[18] Pirie, W. (1988). "Spearman rank correlation coefficient." Encyclopedia
were plotted using the association scores, could be considered to of statistical sciences.
express similarity of course material style between universities

978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 168

You might also like