Understanding Knowledge Areas in Curriculum Through Text Mining From Course Materials
Understanding Knowledge Areas in Curriculum Through Text Mining From Course Materials
Abstract—Curriculum analysis is attracting widespread The well-known institution of educational society named
interest in educational field. There are two main approaches: (i) “ACM Education Board and the IEEE Computer Society's
human-based and (ii) text-based assessments. Although an Education”, which have been working to establish curricular
evaluation by teachers and learners are widely used, it is guideline for over 40 years, has released Computer Engineering
inconvenient and time-consuming. Also, the results absolutely rely Curricula CE2016 in October 2015 [9] to be a guideline for
on individual attitude. The text-based approach aims to directly Undergraduate Degree Programs in Computer Engineering. In
evaluate the course syllabus; however, there is only a course the guideline, 13 Knowledge Areas (KAs) were defined as
description in the syllabus, so this cannot really express the actual related areas of computer engineering as shown in Table I.
course contents. In this paper, we present an automatic text-based
Moreover, each KA was divided into many sub topics termed
curriculum analysis that straightforwardly assesses entire course
Knowledge Units (KUs); the number of KUs corresponding to
materials. Our approach employs a well-known text-mining
technique that extracts keywords using TF-IDF. The analysis is
each KA is varied with extent of the KA. A number of
based on keywords from the course materials matching to the researchers concentrated to examine the guideline document for
keywords from online documents, which is similar to the domain various objectives. Sekiya, et al., tried to map linkages between
expert. Moreover, a new measurement is proposed to quantify two different guidelines and also between guideline and course
associations between course materials and online documents using syllabi [1] whereas Marshall quantified differences of structure
amounts of matching keywords. The experiment was conducted on among the guideline series [6].
materials of three subjects collected from five top universities
mapping to the latest Computer Engineering Curricular Guideline TABLE I. KNOWLEDGE AREAS OF COMPUTER ENGINEERING IN CE2016
(CE2016). The results illustrate significant relations among
courses from different universities and CE2016. To further ID Abbreviation Knowledge Area
analyze the courses, each of them are visualized using radar 01 CAE Circuits and Electronics
charts. 02 CAL Computing Algorithms
03 CAO Computer Architecture and Organization
Keywords—curriculum analysis; curriculum evaluation; course 04 DIG Digital Design
content analysis; keyword extraction; TF-IDF;
05 ESY Embedded Systems
I. INTRODUCTION 06 NWK Computer Networks
07 PFP Professional Practice
A major current focus in educational engineering teaching
and learning development is curriculum analysis [1, 2, 3]. To 08 SEC Information Security
ensure that the developed curriculum was comprised of eligible 09 SET Strategies for Emerging Technologies
contents, many studies have proposed curriculum evaluation 10 SGP Signal Processing
methods to accomplish this task [4, 5]. Their contributions 11 SPE Systems and Project Engineering
completely focused on assessments of teachers and learners. 12 SRM Systems Resource Management
They accumulated opinions from instructors and students; the 13 SWD Software Design
opinions were analyzed to evaluate curriculum quality. While
some works concentrated on human-based assessment, there However, it is usually inconvenient and time-consuming to
have been a number of studies that analyzed curricula through conduct a human-based assessment in order to analyze the
course documents [1, 6, 7]. Among these studies, some of them quality of a curriculum. Furthermore, human-based results could
attempted to analyze curricula through course syllabi called be biased toward personal perspectives of individual assessors.
“course syllabus-based analysis” [1, 7, 8]; they mapped their Likewise, the majority of literature, which concerned about
curriculum topics to related topics in a standard curriculum in analysis of curricula through course syllabi, did not sufficiently
order to express connections between their curricula and the take into account raw course materials that could definitely
standard. express actual course contents. Although a syllabus states the
topics of educational course, but the contents taught in the class
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 161
might be slightly different. Some topics, which is defined in a graph; they compared this graph to the graph created from topics
course syllabus, might be neglected because of several reasons of CS2013.
such as improper subject scope and unsuitable time duration.
Considering the issue of using only course syllabi, Yiling, et
Therefore, course materials could express precise topics of
al., took their attempt to use further external text in order to find
knowledge in order to demonstrate knowledge contents that are
out the correlation between courses and knowledge structure of
actually mentioned in the course.
CS2013 [5]. Their motivation was further related information
In this paper, we present a novel curriculum analysis that is obtained from the Internet could improve accuracy of their
directly examined from course materials, which is similar to a system. They, therefore, searched the Internet by using KAs and
human-like method, comparing to Body of Knowledge (BOK) KUs as queries. Search Engine API named Google Custom
including KAs and KUs defined in CE2016. To be an expert of Search API was utilized for retrieval of documents. Afterwards,
a science, reading a number of the science-related documents is keywords were extracted from retrieved documents; these
a way to be a master. Therefore, we elicited KUs and KAs from keywords were used in conjunction with KAs and KUs for
CE2016; they were used as queries to search on the Internet mapping analysis. However, similarly to Sekiya, et al., they
using search engine APIs. As a result, we obtained abundant concerned only keywords of syllabus, and did not take into
information concerning with their topics. Afterwards, keywords account raw course materials. Our curriculum analysis method
were extracted from both found documents and course materials used actual course materials instead of using only syllabi. We
using widely used text mining technique known as Term extracted keywords from materials and topics of CE2016; these
Frequency–Inverse Document Frequency (TF-IDF) [10]. keywords were matched. Afterwards, the scores of association
Eventually, keywords from different sources were mapped; were computed using our association score computation. When
these matching results represent the association between course the scores were produced, we have plotted these scores in radar
materials and CE2016 in term of our evaluation that is described charts. The charts could illustrate the characteristics of each
in Section V. course. For instance, if the chart shapes of courses of two
different universities are quite similar, it could be assumed that
The rest of this paper is organized as follows: Section II is the course styles of these universities are also quite similar
about related works of analysis on curricula and our data because course materials of these universities focus on the same
collection is described in Section III. Our proposed analysis KUs or KAs. Moreover, radar chart visualization could show
method on curricula through text mining from course materials aims of the course, if a score of a topic represented in the chart
is in Section IV. We performed our experiments described in was extremely high whereas others were utterly low. This
Section V and the conclusion of this work follows in Section VI. indicated that the topic, which is represented by the highest twig
II. RELATED WORKS of chart, was mostly focused in the course.
In recent decades, “ACM Education Board and the IEEE To elicit keywords from text documents, there are some
Computer Society's Education” institution have continuously studies on this kind of text mining technique widely known as
worked on computer science-related curricular guideline. keyword extraction [10, 11, 12]. However, one of the most
Eventually, the latest computer engineering curricular guideline efficient techniques is Term Frequency–Inverse Document
was released called “CE2016”. However, before this guideline Frequency (TF-IDF); it used frequency of word appearance in
establishment, there are several versions of computer science- both target material and the corpus to quantify signification of
related curricular guideline published by the institution. Many the word. The word, which has high signification score, would
studies have tried to analyze series of this guideline. For be selected to be a keyword. This technique requires the
example, Marshall quantified differences of structure among appropriate corpus as a referent word database in order to
computing science curricular guideline series (CS series) [6]. investigate notable words. In our experiment, several corpuses
Graph visualization was used in the study to illustrate were used to perform the keyword extraction including
connections among KAs. The paper represented KAs as a Wikipedia, DBpedia and Linked Open Data.
network graph while showed KUs and topics as edges of graph.
They identified corresponding topics among computing III. OUR DATA COLLECTION
curricular guideline series to compare their discrepancies. On the Our data were collected from course materials of five top
other hand, our study aims to quantify matching topics between institutions; Chulalongkorn University (CU) and other four
a guideline and actual courses in order to indicate their universities ranked by QS World University Rankings in
association; this association could express quality of actual Computer Science & Information Systems [13] including
courses comparing to the guideline CE2016. Massachusetts Institute of Technology (MIT), Stanford
University, Carnegie Mellon University (CMU), and University
Moreover, Sekiya, et al., proposed mapping analysis method of Cambridge. Although CU is not in the 100 top universities
to investigate the changes between CS2008 and CS2013 using
ranked by QS, it is one of the top universities in Thailand.
supervised Latent Dirichlet Allocation (sLDA) and isomap [1].
Moreover, we could fully access course materials from the
Their method was improved from the method that Marshall university and also gather information from CU students in order
used. They showed discrepancies among topics elicited from to conduct the experimental evaluation, which will be described
CS2008 and CS2013 by network graph visualization. in Section V.B.
Furthermore, they also examined the changes by using the actual
course from three universities. Nonetheless, they used merely In addition, three computer engineering-related subjects
course syllabi to extract topics and mapped them into network were focused. We have chosen three subjects of which course
materials are candidly provided from the selected universities.
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 162
These subjects comprise computer networks (CN), computer Curricula released by The ACM Education Board and the IEEE
system architecture (CA), and operating system (OS). Computer Society's Education, comprised of specific KAs, has
been frequently used for investigation of curriculum style [1];
We gathered the up-to-date material sets which the the newest curricular guideline is CE2016. We extracted KUs of
institutions publicly provide from their websites and MIT each KA and analyzed the material style using these particular
OpenCourseWare (OCW). Table II shows the academic year of KUs.
each material set provided by the institutions. There is only one
outdated set out of 15 material sets (before 2010). It is a material The course materials were collected from five top
set of CA subject gathered from Stanford University and used in institutions ranked by QS World University Ranking; the course
academic year of 2001. However, the other sets are completely materials corresponded to three computer engineering-related
contemporary. Especially in OS subjects, all institutions subjects as described in Section III. These materials were used
provided the updated course materials of which the academic in our preliminary experiment to investigate quantitative
year varies in range of 2014 to 2016. Therefore, our data were association between the intended course style stated in the
closely to each other in term of modernity. guideline document CE2016 and actual course style represented
by the materials. There are five main steps in our proposed
TABLE II. THE ACADEMIC YEARS OF MATERIAL SETS PROVIDED BY approach following by (A) Data preprocessing, (B) Keyword
FIVE INSTITUTIONS Extraction, (C) Dict-Matrix Construction, (D) Comparison-
Institution CN CA OS Matrix Construction and (E) Association Score Computation.
These steps are simply illustrated in Fig. 1. Step (A) and (B) are
Cambridge 2014 2011 2015
used twice times because these steps is about preprocessing and
CMU 2013 2015 2016 keywords extraction which have to be applied in both
CU 2014 2014 2014 curriculum guideline and course materials. Eventually, the final
MIT 2015 2015 2015 outputs of this method are the association scores that represent
Stanford 2011 2001 2015 quantitative association as mentioned earlier.
Computer Networks (CN), Computer System Architecture (CA), Operating System (OS)
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 163
B. Keyword Extraction API. Nevertheless, Google Custom Search API has more
Many studies ordinarily utilize keyword extraction efficiency for webpage searching task. Therefore, it is
technique to elicit essential keywords from their documents. unnecessary to use original queries as words for searching. The
One of the most famous techniques named TF-IDF [10] was API could search appropriate webpages simply with only
used in this study. The equation of simple TF-IDF is shown in additional queries. Afterwards, the webpages were extracted
(1). Consider term ݐand document ݀ ܦ א, where ݐappears in ݊ using the data preprocessing and keyword extraction technique
of ܰ documents in ܦ. Regularly-used TF and IDF functions are as mentioned in Section IV.A and IV.B. An example set of
shown in (2) and (3), respectively. After preprocessing, the data keywords from this process corresponding to each KU of a KA
were ready for keyword extraction. Due to TF-IDF technique, was shown in Table. III. Left column in the figure shows
the appropriate corpus has to be applied in the technique to extracted keywords and top row demonstrates KUs whereas
extract legitimate keywords from target documents. Therefore, each value represents number of same keyword extracted from
the factor, which undoubtedly influences result accuracy, is external documents corresponding to their KU.
choosing decent corpus for keyword extraction. According to
TABLE III. AN EXAMPLE SET OF KEYWORDS IN DICT-MATRIX.
this issue, we have utilized Aylien API which could
outstandingly accomplish keyword extraction. A number of Keyword KU1 KU2 KU3
renowned corpuses are selected as reference corpuses including
ios 1 6 0
Wikipedia, DBpedia and Linked Open Data; this could
android 2 0 1
minimalize errors from keyword extraction and could cause
irrelevant keywords to be negligible. debian 4 3 7
Total 7 9 8
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 164
any course materials (value 0 in all KUs). Consequently, it was V. EXPERIMENTS ON COURSE MATERIALS
discarded from the table for clearness. According to the methodology described in previous section,
we employed materials collected from Chulalongkorn
TABLE IV. AN EXAMPLE SET OF KEYWORDS IN COMPARISON-MATRIX.
University to demonstrate the characteristics of our course
Keyword KU1 KU2 KU3 materials. Moreover, we compared results from each institution
ios 1 6 0
to find the relationship of course material styles among those
universities based on latest Computer Engineering curriculum
android 6 0 3
(CE2016). Experiments were conducted and the results were
Total 7 6 3
compared in three perspectives: (i) The association scores of
E. Association Score Computation materials of three CU courses, (ii) A comparison of association
ranking from our method and actuality and (iii) A comparison of
To evaluate the association of two opposite side as stated in association scores among five institutions.
Section IV, the association score computation was performed.
The scores have been computed using the equation as shown in A. The Association Scores of Three CU Courses
(4). ܲሺ݅ሻ represents the matching value of keyword ݅ in In this experiment, we aim to find the characteristic of CU
Comparison-Matrix while ܳሺ݆ሻ represents the value of keyword course materials compared to CE2016 by applying our purpose
݆ in Dict-Matrix. The total number of keywords of KU ݇, which method and materials from three courses of CU. We computed
are in Comparison-Matrix and Dict-Matrix, are shown as ݊ and the association scores in both Knowledge Unit and Knowledge
݉, respectively. Area aspect. Fig. 2 illustrates the example association scores of
four different KAs of CN subject, 05-ESY, 06-NWK, 07-PFP
and 08-SEC, which are displayed by four radar charts. These
݁ݎܿܵݏݏܣ ሺ݇ሻ ൌ ܲሺ݅ሻ ൊ ܳሺ݆ሻ
example scores belong to Chulalongkorn University. The
ୀଵ ୀଵ numbers at pinnacles of each chart represent the number of KUs
ordered in CE2016. For instance, number 11 in the top-left chart
In other words, in the equation, summation of values in each represents the KA named Mobile and networked embedded
column of Dict-Matrix is used as the heuristic number of each systems whereas number 11 in the top-right chart represents the
KU that the matching number of keyword of each KU could KA named Wireless sensor networks. Each axe of chart
possibly reach in the best case. Likewise, all values in each represents the quantitative association score of each KU in term
column of Comparison-Matrix were summed up to acquire of logarithmic scale; we used this scale for apparent view only
amount of values of each KU as shown as an example in the in Fig. 2. Also, every KUs of same KA were shown in the same
lowest row of Table IV. Next, we divided the amount in each chart. Consequently, there are 13 charts represented 13
KU of Comparison-Matrix by the heuristic value of the KU. Knowledge Areas as shown in Fig. 3. Each chart has unequal
Consequently, the association scores, which indicate the number of axes because of unbalanced number of Knowledge
association between the standard course styles and the actual Units.
course styles represented by course materials, were performed.
For instance, considering Table III and Table IV, the association
LEVELS
KU1 KU2 KU3
Total (Dict-Matrix) 7 9 8
Total (Comparison-Matrix) 7 6 3
Computation 7/7 6/9 3/8
Association score 1.0 66.67 0.375
However, the results quite rely on various factors including
performance of search engine, number of external documents
and keyword extraction technique; if these factors are adjusted,
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 165
TABLE VI. THE ASSOCIATION SCORES OF KA LEVEL. THE STARS (*)
EMPHASIZE THE MAJORITY SCORES ON THE KU OF EACH SUBJECT)
KA CN CA OS
01-CAE 0.0402 0.0561 0.0430
02-CAL 0.0341 0.0522 0.1211
03-CAO 0.0481 *0.1908 0.2087
04-DIG 0.0379 0.0761 0.0458
05-ESY 0.0852 0.0527 0.1569
06-NWK *0.1631 0.0347 0.1171
07-PFP 0.0318 0.0291 0.0329
08-SEC 0.0624 0.0269 0.1222
09-SET 0.0195 0.0341 0.0461
10-SGP 0.0237 0.0376 0.0593
11-SPE 0.0407 0.0498 0.0931
12-SRM 0.0608 0.0606 *0.2863
13-SWD 0.0392 0.0434 0.1210
Fig. 3. An example of association scores of computer networks course
materials belonging to Chulalongkorn University (Normal scale).
B. A Comparison of Association Ranking from Our Method
After considering the shape of charts, we found that the and Actuality
scores of each KU in each subjects are quite distinctive. In CN To assure and evaluate our method performance, we have
subject, only 05-ESY, 06-NWK and 08-SEC are conspicuous reproduced a questionnaire to collect opinions from 11 fourth-
but they are 03-CAO, 04-DIG and 12-SRM in CA subject. year students of Chulalongkorn University. In the questionnaire,
Likewise, the areas of charts in OS subject, which are clearly students were asked to order the top-five related KA that related
apparent, are 03-CAO, 05-ESY and 08-SEC. These charts to each subject in descending sequence. Table. VII shows the
demonstrate that three courses of CU are strongly related to the results from our questionnaire in which the red highlight
KAs. emphasized the consensus of students. In CN subject, the table
Nonetheless, there are excessive charts and the scores are shows that 10 out of 11 students chose 06-NWK to be the most
quite ambiguous. Therefore, the representatives of each chart related KA that related to CN subject. Their opinions, however,
were created using the equation as shown in (5). ܲ ሺ݅ሻ means the gradually discorded with five, three, three and two out of 11 in
matching value of keyword ݅ corresponding to KU ݇ in second-related, third-related, forth-related and fifth-related,
Comparison-Matrix whereas ܳ ሺ݅ሻ indicates the value of respectively. Their opinions were in the same way in the other
keyword ݅ corresponding to KU ݇ in Dict-Matrix. The total two remaining subjects. The majority votes at first place always
number of keywords of KU ݇, which belong to Comparison- pass over half of number of students. Ten students chose 06-
Matrix and Dict-Matrix, are shown as ݊ and ݉, respectively. ݏ NWK for CN subject, while all students and seven students
represents total number of KUs of KA ݐ. chose 03-CAO for CA and OS subjects, subsequently.
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 166
CU. We found that the score of first rank usually surpass the number of KA matching between student opinions and our
second rank overwhelmingly. Therefore, the difference between method when consider only top- ranks up to top-. In our
scores of first rank and second rank are obviously experiment, we considered up to top-5 ranks. For instance, in
distinguishable. Also, the scores of first rank are more than the CN subject, ܯହ is 4 because four KA representatives from
second approximately twice times on CN and CA subjects. student opinions are exactly same as KAs from our method
Similarly, the score of first rank is more than the second about which are 06-NWK, 08-SEC, 03-CAO and 12-SRM. The
half time.These results show that our method could precisely similarity scores are 69.33%, 84.33% and 64.33% in CN, CA
identify the first rank and discriminate from the second rank. and OS subjects, respectively. Hence, the average similarity
score of these is 72.67%.
TABLE VIII. RESULTS AND RANKING OF COURSE MATERIALS OF
CHULALONGKORN UNIVERSITY ܯ
݈ܵ݅݉݅ܽ ݁ݎܿܵݕݐ݅ݎൌ ൭ ൱ ൊ ݊
CN CA OS ݅
KA ୀଵ
Score Rank Score Rank Score Rank
C. A Comparison of Association Scores Among Five
01-CAE 0.0402 7 0.0561 4 0.0430 12
Institutions
02-CAL 0.0341 10 0.0522 6 0.1211 5
03-CAO 0.0481 5 0.1908 1 0.2087 2 In this experiment, we computed and compared association
04-DIG 0.0379 9 0.0761 2 0.0458 11 scores of each institution in term of ranking. Firstly, we
05-ESY 0.0852 2 0.0527 5 0.1569 3 calculated association scores using our purpose method. The
06-NWK 0.1631 1 0.0347 10 0.1171 7 radar charts represented the scores were shown in Fig. 4. The
07-PFP 0.0318 11 0.0291 12 0.0329 13 numbers at axes, which are 1 to 13, mean each KA from 01-CAE
08-SEC 0.0624 3 0.0269 13 0.1222 4
to 13-SWD. From empirical results, we discovered that shapes
09-SET 0.0195 13 0.0341 11 0.0461 10
10-SGP 0.0237 12 0.0376 9 0.0593 9
of charts among these five institutions are significantly
11-SPE 0.0407 6 0.0498 7 0.0931 8 analogous in the same subject. This finding affirms that course
12-SRM 0.0608 4 0.0606 3 0.2863 1 material styles among these institutions are quite focused in the
13-SWD 0.0392 8 0.0434 8 0.1210 6 same way.
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 167
coefficient are absolutely high with 0.97, 0.92, 0.94 and 0.92 as shown in Section V. To verify the potential of our method,
comparing to University of Cambridge, Carnegie Mellon eleven students were asked to order top five subject related KAs.
University, Massachusetts Institute of Technology and Stanford Likewise, KAs were ranked by their corresponding association
University, respectively. scores. We compared these two ranking lists using Spearman’s
Rank Correlation Coefficient. The results show that ranking,
Additionally, the most scores of each pair in Table X are which was performed by our method, is significantly similar to
Cambridge-Stanford, CMU-MIT and Cambridge-CU in CN, CA the ranking ordered by students. These preliminary results
and OS subjects, respectively. These scores identify pairs of demonstrate the feasibility of using course materials as a factor
institutions that their material styles are the most similar to analyze and understand curricula. However, this finding is
comparing to the others. promising and should be validated in a larger number of data.
Additionally, further investigations by integrating other analysis
TABLE X. RANK RELATION COEFFICIENTS OF COMPUTER NETWORKS
SUBJECT AMONG FIVE INSTITUTIONS method, such as human-based and course syllabus-based
analysis method, are needed for further improvement of
Cambridge CMU CU MIT Stanford curriculum analysis.
Cambridge X 0.77 0.93 0.81 0.98
VII. REFERENCES
CMU 0.77 X 0.75 0.73 0.78
CU 0.93 0.75 X 0.75 0.92 [1] Sekiya, T., et al. (2014). Mapping analysis of CS2013 by supervised LDA
and isomap. Teaching, Assessment and Learning (TALE), 2014
MIT 0.81 0.73 0.75 X 0.80 International Conference on, IEEE.
Stanford 0.98 0.78 0.92 0.80 X [2] Gluga, R., et al. (2012). PROGOSS: Mastering the curriculum.
Proceedings of The Australian Conference on Science and Mathematics
TABLE XI. RANK RELATION COEFFICIENTS OF COMPUTER SYSTEM Education (formerly UniServe Science Conference).
ARCHITECTURE SUBJECT AMONG FIVE INSTITUTIONS [3] Impagliazzo, J. and E. Durant (2014). Toward a modern curriculum for
computer engineering. Teaching, Assessment and Learning (TALE),
Cambridge CMU CU MIT Stanford 2014 International Conference on, IEEE.
Cambridge X 0.65 0.90 0.84 0.70 [4] Welch, W. W. and H. J. Walberg (1972). "A national experiment in
curriculum evaluation." American Educational Research Journal: 373-
CMU 0.65 X 0.72 0.90 0.82
383.
CU 0.90 0.72 X 0.84 0.78 [5] Dai, Y., et al. "Course Content Analysis: An Initiative Step toward
MIT 0.84 0.90 0.84 X 0.87 Learning Object Recommendation Systems for MOOC Learners."
Stanford 0.70 0.82 0.78 0.87 X [6] Marshall, L. (2012). A comparison of the core aspects of the ACM/IEEE
Computer Science Curriculum 2013 Strawman report with the specified
core of CC2001 and CS2008 Review. Proceedings of Second Computer
TABLE XII. RANK RELATION COEFFICIENTS OF OPERATING SYSTEM
Science Education Research Conference, ACM.
SUBJECT AMONG FIVE INSTITUTIONS
[7] Ota, S. and H. Mima (2011). "Machine learning-based syllabus
Cambridge CMU CU MIT Stanford classification toward automatic organization of issue-oriented
interdisciplinary curricula." Procedia-Social and Behavioral Sciences 27:
Cambridge X 0.90 0.97 0.96 0.90 241-247.
CMU 0.90 X 0.92 0.95 0.95 [8] Sekiya, T., et al. (2015). Curriculum analysis of CS departments based on
CU 0.97 0.92 X 0.94 0.92 CS2013 by simplified, supervised LDA. Proceedings of the Fifth
MIT 0.96 0.95 0.94 X 0.93 International Conference on Learning Analytics And Knowledge, ACM.
Stanford 0.90 0.95 0.92 0.93 X [9] Durant, E., et al. (2015). CE2016: Updated computer engineering
curriculum guidelines. Proceedings of the 2015 IEEE Frontiers in
Education Conference (FIE), IEEE Computer Society: 1-2.
VI. CONCLUSION [10] Lott, B. (2012). "Survey of Keyword Extraction Techniques." UNM
In this paper, we proposed the alternative curriculum Education.
analysis method for understanding and evaluation. The [11] Hasan, K. S. and V. Ng (2014). Automatic Keyphrase Extraction: A
curricular guideline CE2016 was utilized as the standard Survey of the State of the Art. ACL (1).
knowledge [1]; moreover, KUs of the guideline were used as [12] Salton, G. and C. Buckley (1988). "Term-weighting approaches in
automatic text retrieval." Information processing & management 24(5):
queries to search on the Internet in order to obtain external 513-523.
documents [5]. In our experiment, course materials of three [13] "QS Top Universities", Topuniversities.com, 2016. [Online]. Available:
subjects, which are provided by five universities, were used. We http://www.topuniversities.com/. [Accessed: 18- Aug- 2016].
have applied keyword extraction technique named TF-IDF to [14] Han, J., et al. (2011). Data mining: concepts and techniques, Elsevier.
elicit keywords from course materials and the external [15] "Natural Language Toolkit — NLTK 3.0 documentation", Nltk.org, 2016.
documents. The keywords extracted from external documents [Online]. Available: http://www.nltk.org. [Accessed: 18- Aug- 2016].
and their amounts were used to construct Dict-Matrix. [16] "API:Main page - MediaWiki", Mediawiki.org, 2016. [Online].
Subsequently, we compared keywords elicited from course Available: https://www.mediawiki.org/wiki/API:Main_page. [Accessed:
materials to the keywords in Dict-Matrix; Comparison-Matrix 18- Aug- 2016].
was consequently constructed. Then, the association scores were [17] "Custom Search | Google Developers", Google Developers, 2016.
computed to demonstrate course material style comparing to [Online]. Available: https://developers.google.com/custom-search/.
[Accessed: 18- Aug- 2016].
CE2016. Furthermore, we found that shape of charts, which
[18] Pirie, W. (1988). "Spearman rank correlation coefficient." Encyclopedia
were plotted using the association scores, could be considered to of statistical sciences.
express similarity of course material style between universities
978-1-5090-5598-2/16/$31.00 ©2016 IEEE 7-9 December 2016, Dusit Thani Bangkok Hotel, Bangkok, Thailand
2016 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE)
Page 168