Ant Colony Algorithm For Text Classification in Multicore-Multithread Environment
Ant Colony Algorithm For Text Classification in Multicore-Multithread Environment
January 2017
I certify that a Panel of Examiners has met on 10th August 2016 to conduct the final
examination of Ahmad Nazmi Bin Fadzal on his Master of Computer Science thesis
entitled "Ant Colony Algorithm on Text Classification in Multicore-Multithread
Environment" in accordance with Universiti Teknologi MARA Act 1976 (Akta 173).
The Panel of Examiners recommends that the student be awarded the relevant degree.
The panel of Examiners was as follows:
DR MOHAMMAD NAWAWI
DATO' HAJI SEROJI, PhD
Dean
Institute of Graduate Studies
Universiti Teknologi MARA
Date: 19th January 2017
I declare that the work in this thesis was carried out in accordance with the regulations
of Universiti Teknologi MARA. It is original and is the results of my own work,
unless otherwise indicated or acknowledged as referenced work. This thesis has not
been submitted to any other academic institution or non-academic institution for any
degree or qualification.
I, hereby, acknowledge that I have been supplied with the Academic Rules and
Regulations for Post Graduate, Universiti Teknologi MARA, regulating the conduct of
my study and research.
Multicore-Multithread Environment
Signature of Student
in
In the age of wide digital usage, text classification is one of the significant prominent
attribute required in order to automatically arrange emails, articles, and other textual
data in an organization. Unclassified data can lead to slower data retrieval thus a
reliable method is required to effectively retrieve data efficiently and in systematic
manner. Ant Colony Optimization (ACO) is a bio-inspired technique that was
introduced to solve Non-Polynomial hard problem of high text data dimension that is
similar to Traveling Salesman Problem (TSP) using probabilistic way. Pheromone
concept is the main criterion that distinguish ACO to other algorithms. Based on the
concept, pheromone saturation is used to combine stackable solution pattern that is
discovered while straying to different term node to build a path. ACO classification
accuracy is compared to Genetic Algorithm classifier which also a wrapper method.
On integration of the technique, ACO is proposed to work in a multicore-multithread
environment to gain additional execution time advantage. In multicore-multithread
environment, the adjustment aims to make artificial ants communicate across the
physical core of processor. As a trade to the investment for more computing power,
the execution time reduction is expected to show an improvement without
compromising the original classification accuracy. The unthreaded and multicore-
multithreaded version of ACO was experimented and compared in term of accuracy
and execution time. It was found that the result show a positive improvement.
iv
Firstly, I wish to thank God for giving me the opportunity to embark on my master
programme and for completing this long and challenging journey successfully. My
gratitude and thanks go to my supervisor Prof Madya Dr. Mazidah Puteh, with co-
supervisor, Assoc. Prof. Dr. Adnan Ahmad and Dr. Norlela Samsudin. Thank you for
the support, patience and ideas in assisting me with this project.
Finally, this thesis is dedicated to the loving memory of my very dear mother and late
father for the vision and determination to educate me. This piece of victory is
dedicated to both of you. Alhamdulillah.
v
COPYRIGHT UNIVERSITI TEKNOLOGI MARA
TABLE OF CONTENTS
Page
vi
vn
REFERENCES 106
AUTHOR'S PROFILE 114
Vlll
IX
XI
INTRODUCTION
In the digital age, text classification is one of the important features required to
organize articles, emails and other text data of an organization effectively. Larger
companies usually have larger databases with different data added constantly. These
companies require a systematic method that is not only efficient but also fast in
analysing and compiling their data according to categorical manner. However,
unclassified data can lead to slower data retrieval. Text data can accumulate and
become unmanageable over the years. To overcome the problem, the data is
categorized manually by using human resource. However, it is not a good solution
because it can be costly, slow and susceptible to human error. Therefore, fully
automatic classification by computer science research is required. Numerous
classification techniques have been developed to handle massive unstructured data.
The area of this research has also been explored from statistical approach to bio-
inspired classification algorithm.
Ant Colony Optimization (ACO) has been used in text classification (Dorigo,
1991). In contrast to statistical classification method, the whole ACO algorithm is
based on the behaviour of real-life ants that lived as an organized community. The
imitation of their communication method is based on liquid substances known as
pheromone rather than direct communication through speech. Based on this model,
large database is scanned continuously by several artificial ants to search for the text
classification criterion. Text classification criterion will conduct a search from specific
to repetitive appearance of words in text as well as making comparison between each
article and other form of text data. Repetition and built up in generating solution are
several features that differentiate ACO algorithm from traditional approach. The
technique is chosen as text classification approach to further explore the significance
of natural algorithm and inspiration of biological mechanism (Bonasio et al., 2010;
Dressier & Akan, 2010; Emily, 2014).
This research develops the use of ACO in multicore-multithread environment
to achieve text classification. Its implementation can be used in modern search
engines, online libraries and data centres. The approach is anticipated to improve
In a short period, articles and other form of text data like emails can
accumulate massively. This phenomenon creates various problems in text
classification when data grows larger over time. This situation happens especially
when using supervised and semi-supervised classification algorithm. Larger database
can have complex text pattern and classification criterion.
The following are the problems found in text classification:
1) The main disadvantage of wrapper methods is the computational cost
(Saeys, Inza, & Larranaga, 2007) that relate from invoking induction
algorithm of every relevant feature set (Langley, 1994). In choosing
ACO algorithm, using multiple artificial ants could cost equivalent
amount of pheromone copy. Therefore, it will cost a large amount of
memory to be used at one time. From similar situation, calculating
heuristic and cross validation process also has high computation cost.
One of the modern technology that is suitable to handle large data
processing is multicore-multithread technology. Particularly, an
effective design to address computational cost is needed for ACO
based text classification system on a multicore-multithread
environment that will overcome data inconsistency issues.
5
COPYRIGHT UNIVERSITI TEKNOLOGI MARA
technologies. Lacks of integration module problem is dealt by using the simulation of
multiple artificial ant instances to observe the time reduction result.
The following are several research questions that are solved in this research:
1) Could an algorithm shorten the execution time by eliminating sharing
variable usage?
2) Could we get the most time reduction if we use a large number of threads?
If not, what is the optimal number of thread to be specified in the
parameter setting?
3) Is pheromone sharing suitable to be adapted to reduce cost of multiple
copy of pheromone array value?
1.3 Objectives
1.4 Scope
The following are the scopes which reflect the objectives of this research:
i. Non-text such as numeric and symbol are omitted.
ii. Algorithm used on single colony of artificial ant.
iii. Covers on English text file,
iv. Use of multicore-multithreading environment for concurrent processing.
This chapter explained and summarized the processes involved in solving the
research problem and in fulfilling the research objectives. They are divided into seven
phases as stated in the next page:
Data Preprocessing
explains how data are filtered
1.6 Significance
One of the benefits from the suggested improvement in this research would be
the increments of running efficiency. This impact is less noticeable at the beginning
but in a long run or for big data problem, it will add up and become significantly
useful. Such improvement could give better response time in searching database
query. This can be achieved when multiple cores are working together where low
overhead is needed to distribute the workload. The nature of pheromone presented
also allows it to be updated at any time when other computer demands it. The benefits
could affect various mediums such as blogs, searching engines, newspaper reviews
and bibliography categorization.
Another running efficiency that could be improved from the result of the
research is time reduction. For an evolutionary algorithm, processing time becomes
important factor for its usability. The time reduction becomes noticeable when dealing
with a larger data set as data dimensionality also increased proportionally. Depending
on the method chosen and data amount, classification process could take from a few
minutes to even a whole year. Some techniques even involved data crunching problem
where there is no lead for an algorithm to get close to a better result. Besides that
looping condition could also be reached infinitely when the problem occurred.
Understanding the meaning for each main terminology could help towards
better understanding of this thesis. This chapter explains the definition for the
frequently used terms which are critical to the main concept of this research such as
pheromone, concurrency and parallelism.
10
11
LITERATURE REVIEW
2.1 Introduction
This chapter explains the previous researches that are related and relevant to
ACO text classification. The discussion starts by describing text classification in
general from past researches. The discussion continues with GA which algorithm is
comparable to the ACO. The content explains its implementation that had been
discovered from the previous researches in a simplistic fashion. Then, the developed
method of ACO is discussed in the subsequent chapter. The research suggests a
repetitive or cyclic algorithm that builds up its solution quality over increasing
iteration. Therefore, explanation on current ACO version that is used widely for text
classification is given in order to overview its main process when applied in text
classification. Finally, an overview of multicore-multithread environment is discussed.
Multicore-multithread environment is intended to be run together with ACO
implementation in this research. For that purpose, discussion on previous researches
of multicore-multithread computing is included.
12
Documents
I
Preprocess
Document
JZ
Information
Extraction Management
Information
^3X
Clustering Summarization
P=?i System
Figure 2.1: Flowchart for Text Mining Architecture (Nalini & Sheela, 2014)
13
In text classification problem, GA (Wang & Cao, 2002) can search the optimal
characteristic variable given in a set of document. Selection of candidate is done
naturally by survival of the fittest and genetic evolution mechanisms where it is
possible to isolate inaccurate discernment in the algorithm and produce better text
classification result (Baharudin, Lee, & Khan, 2010). GA as a type of evolutionary
algorithms that has been used widely for its performance in classifying text
documents. Last year, its usability had been extended to the induction of classification
rules (Pietramala, Policicchio, Rullo, & Sidhu, 2008). In Olex-GA implementation,
the developed method relies on an efficient several-rules-per-individual binary
representation. For solution domain evaluation, the fitness function uses F-measure to
keep the algorithm accurate. In this research, the ACO which makes selection
randomly with minimal heuristic will be compared to Olex-GA in term of
classification accuracy. Figure 2.2 the flowchart for basic GA:
14
Generate initial
population
*
Evaluate fitness function
Rank individual fitness
15
16
17
GA optimizes GA optimizes J
the weights of the weights of
5 generations —J 1st category th
n category
15 times - < j a
" ^ ^
Figure 2.4: Coevolutionary Genetic Algorithm for Cluster (Gasanova et al., 2014)
18
1 Micro-averaged PRavg results obtained by 5-fold cross-validationonOhO. Oh5, OhlO, Oh15. BlogsGender, Ohscale and RIO (80/20 split) and by holdout on
the remaining data sets (70/30 split).
Figure 2.5: Sample Result of 5-Fold Between GAMoN and other Techniques (Policicchio et
al., 2012)
One of the factors that cause GA to work better in GAMoN induction is the
use of M-of-N relationship creates better representation with GA. The ability
represents the accommodation of complex DNA relationship map. The M-of-N
allocation can structure the classical concept by having negation and disjunction in
term to term relationship. The relationship is designed as a binary categorization
19
20
Since its discovery, there are enhancements made to the algorithm. One of
them include integration with Euclidean distance (Euclidean-SVM) in order to test
classification of text (Lee, Wan, Rajkumar, & Isa, 2012). Also known as Euclidean
metric, Euclidean distance is a normal straight-line gap between two spots in
Euclidean space. The SVM approach is applied in training phase and implement a
classifier by drawing decision surface, which is the optimal divided hyper-plane as
shown in Figure 2.6. It is to separate distinctive classes of data points in the resulting
vector space.
21
The idea of the optimal divided hyper-plane can be normalized for the non-
linear separating event by kernel operation to match the data points from the input
space (Figure 2.7) into a high dimensional attribute space so that they are divided by a
linear hyper-plane. This attribute results in the application of distinctive kernel
operations to lead to a high implication on the categorization accuracy of the SVM.
c>
Beside the kernel operations, the value of soft margin region, C is the other
important element in ascertain the ability of SVM classifier. Therefore, one of main
22
Figure 2.8: Vector Space of the Conventional SVM Classifier with Optimal
Separating Hyper-Plane (Lee et al., 2012)
23
Researches have been done thoroughly to understand how small animal with
limited vision is able to find food far from its nest. The route used by the ant changes
to optimal as the time goes by. This happens because a real ant leaves pheromone in
its trail and adjusted the strength of its smells as a mean of effective communication
(Rekaya et al., 2013). This is one of the abilities gifted from God to His creation and it
is a miracle for each of the ant to be able to learn such phenomena in order to adapt
themselves with the challenging surrounding. The mechanism of ant finding the
shortest route to the food had been studied and Ant Colony Optimization algorithm
was formed as the result. Based on the analogy, the algorithm is suitable to be applied
in text classification.
ACO algorithms have been inspired by an experiment which was carried by
Goss et al., in 1989 that use a real colony of Argentine ants (Iridomyrmex Humilis)
(Dorigo & Thomas, 2004). By placing a food source to a separate area linked by a
bridge with two branches of different length, most of the ants use the shortest branch
to their nest after a few minute. The ants used pheromone as an indirect form of
communication known as stigmergy (Grasse, 1959).
24
Figure 2.10: Two Paths with Different Length between Nest and Food (Wu et al., 2008)
Figure 2.10 illustrated two paths with different length and the number of ant
which is divided into two equal groups. The ants that follow the shorter path will
make more trips to the destination, thus leading to a higher intensity of pheromone.
The other group eventually will change their route to the shorter path after pheromone
intensity value is calculated at each branch node.
According to Kugu and Sahingoz (2013), ACO is the imitation of the actual
ant colonies that imitate the population behaviours in an electronic simulation. It is
used to deal with some compounded optimization problems by manipulating artificial
ants. It has been used over the decades in applications domain that aim to operate in
real world challenge. However, there are subtle distinctions between artificial ants and
actual ants. An artificial ant:
• Explore in a discrete and electronic instance.
• Has its own memory and possess a knowledge base.
• Can sense the surrounding environment aside from depositing
pheromones.
• Able to lay pheromones not only while on-the-go but also after
completing a travel.
• Able to communicate synchronization for tactical movements with
other ants.
• Able to apply additional abilities.
ACO algorithm has been researched that result in many extensions. To name a
few, the five main ACO algorithms are Ant System, Elitist Ant System, Rank-Based
Ant System, Max-Min Ant System and Ant Colony System. The algorithms have
different mechanism which are used for different extension especially pheromone as it
is the main part of ACO algorithm.
25
26
Figure 2.11: MMAS Pseudo-code with Local Search (DeleVacq, Delisle, Gravel, &
Krajecki, 2013)
ACO algorithm has been used to create and tackle complex computational
problem such as TSP and text classification. Max-Min Ant System (MMAS) (Stutzle
& Hoos, 2000) is one of the versions of the algorithm that has been used widely as it
is one of the most effective ACO algorithms at present time. Its metaheuristic use
general structure of memory and main processes that integrated many of the
algorithms. Figure 2.11 demonstrates a short pseudo-code of the MMAS. In the
algorithm design concept, number of cities n is matched up with the number of ants m.
The ants visitation creation mechanism is done in each consecutive ni iterations. To
that event, every of ant antk is initially assigned on a sparse and undetermined chosen
city. After that, on every searching move, antk plans its tour Tk that rapidly
implementing a status transition rule to select the cities that will be assigned to its
tour. The selection is done on the unvisited city list. When all artificial ants have
finished building their tactical tour, pheromone is renewed based on certain rule that
agree with two objectives. The first one is to expand the influence of arcs assigned to
the foremost solution searched at current time Tgi or to the best quality solution of the
present iteration Tu. Also, the influence value is used to minimize number of the
27
28
29
2.5.2 Evaporation
Pheromone that is used by a real foraging ant evaporates in open air. Other
than to avoid the path from being all sticky and wet, pheromone evaporates as a
strategic mean to inform other ants that found the laid pheromone. Given that the
pheromone is laid in the same area and time, pheromone evaporation happens at a
constant rate (Ivan et al., 2009; Mavrovouniotis & Yang, 2013). How high the
evaporation rate happen is not important but the relative attribute matters to represent
meaningful information. Since pheromone evaporates continuously, longer path
suffers less frequent refresh thus giving an impression to other ants that the path is less
efficient to use. When they arrive at a diverge point on a path, a smell concentration
comparison is done to evaluate which path has less distance to a food source.
Evaporation also plays an important role to reset the less optimal solution so that the
path that had been visited in the previous cycle could gradually decrease. Note that
beside all of the term nodes that artificial ants visit, all of other nodes are affected by
evaporation calculation. Pheromone evaporation could continuously happen until it
depletes to a certain preset minimum cap which is manually set by user (Ivan et al.,
2009; Mavrovouniotis & Yang, 2013). The same situation could also be observed to
real pheromone. In the searching algorithm, all of nodes that reach minimum cap will
30
2.5.3 Stagnation
Compare to GA, ACO algorithm which includes random base heuristic and
pheromone concentration has a chance to overcome two peaks with different height
solution. The algorithms can have different result when overcoming two peaks with
different height. In the case of ACO, there is a chance for the algorithm to reset from
lower peak to higher peak when stopping criterions are not fulfilled yet. However,
when lower peak is found by majority of individuals, they will stay at that point
because mutation mechanism cannot influence the majority to take different
exploration. More of the details are explained below.
Both algorithms are able to choose the better iteration if there are more than
one solution which exist in the search space. ACO algorithm can have a random
chance to make different path searching. The heuristic bases of maker allow the ants
to always choose a path that diverge from the highest pheromone concentration. The
pheromone value is having an indifferent concentration between other pheromone on
different path. The divergence usually happens at the earlier stage of searching when
all pheromones are having the same initial value. The other situation happens when it
is in a stagnation state where pheromone values meet its maximum cap value in a
continuous iteration. In stagnation state, solution quality has hardly increased because
the artificial ants will look for other possible way that is different than the current
path. Stagnation could happen at a peak because pheromone concentration has
bounced back to maximum value. The artificial ant that continuously laid more
pheromone will lead to a saturation that stalls pheromone concentration. Pheromone
evaporation happens at all time and this makes the artificial ants laid pheromone on
different path (Ivan et al., 2009; Mavrovouniotis & Yang, 2013). If they found a path
shorter than the current optimal one, the new path will get a stronger pheromone
concentration than the previous one over time. Therefore, a new peak could be
discovered over the cost of excessive cycle after reaching stagnation.
31
32
^: Elapsed time
^••••••••••••••••••••^^
Processing time
- . J Elapsed time
Processing time
33
34
35
Task 3 -• Sub-Task 3
36
37
38
i 3
I1
5 \ !
39
40
41
42
43
44
45
• \ \ 1 /
• /
Cityi
•
• •
#
J* 3J-
1
i
• * *
S
••
Cityn / i
-i— J 1
Thread 0 Thread i Thread m
i j \ j h_ j
Tour
Figure 2.15: Pheromone Deposit with Atomic Instructions (Cecilia et al., 2013)
46
47
Figure 2.16: Scatter to Gather Transformation for Pheromone Deposit (Cecilia et al., 2013)
The length of an artificial ant's travel path could be higher than the maximal
number of total thread instances that the thread block can support (NVIDIA, 2010).
The algorithm proposed by Cecilia et al. (2013) is designed to avoid this problem by
having the empirically illustrated optimal thread block presentation. Then, the length
of the travel path is divided into tiles. The problem for this calculation happen because
the length of the travel path cannot be divided when tiling is used. The issue has been
solved by leaving padding into ant travelling array so that warp divergence is properly
avoided (see Figure 2.16).
According to Cecilia et al., (2013) redundant operations to device memory can
be prevented by exploiting the advantage of symmetric structure of TSP. Therefore,
the number of thread instances can be distributed into two parts, thus effectively cut
the number of memory operation of the device by half. The particular reduction
edition of the algorithm minimize the total time of access operations to both shared or
device memory and also implements tiling. The number of access operations per
thread stays the same.
48
49
RESEARCH METHODOLOGY
3.1 Introduction
50
Knowledge acquisition
Phase 2 Objective 2 j
•
Text data collection and identification
•
Data pre-processing
T
Development of Ant Colony Algorithm
•
ACO text classification
T
Set up multicore-multithread environment
i
Multicore-multithread ACO model
ir
(^ END ^)
This phase involved the acquisition of literature review and information which
are related to text classification, GA, ACO and multicore-multithread environment
issues. The information is important to make sure that this research is up-to-date and
is heading towards a valid direction. Table 3.1 shows the summary of the collected
information:
51
52
From the table we can summarize that GA has valuable method for
comparison with ACO. The resulting classification accuracy can be useful in its
relation to the chosen algorithm in aspect of the given advantage. GA has organized
and clear approach which adapt in expanding population generation and tournament
concept. For ACO, its primary attribute is the use of pheromone which rated as
stigmergy that help the artificial agents to operate in simplistic manner. From the
literature review also shows that concurrency concept has potential for further
improvement to speed up execution process. Parallelism has been explored by many
previous researches and use of different platform and hardware utilization make the
research unique from each other.
After acquiring knowledge for the whole process overview, the next step is to
collect and identify data for text classification. Some of the criterions of the desired
documents as data set are publicly available with easy online access and the category
could be general and simple to understand. The 20 Newsgroups data set (Jrennie,
53
There are two main phases in text classification. The first step is the pre-
processing phase where it produces the working input for the next phase. The ACO
will use training set document to analyse text classification rules and produce ACO
54
(a) Training
Category •w
r
ACO
feature algorithm
text -> selection —>C 1 1 1 1 1 1 r — •
terms
input
0)) Testin g
>f
feature ACO two
text *w
r selection —»cM i l l 1 h~> —>
model category
terms
input output
Figure 3.2: Text Classification with Classifier (ACO) Framework (adapted from Bird, Klein
and Loper (2009))
Figure 3.3 illustrates the main tasks in pre-processing phase. From a set of
document, the text files from The 20 Newsgroup documents are accessed one by one
in pre-processing phase. Based on fold iteration number of current execution, the
content of text files are split into training and testing parts. Training document cover
90% of the whole content and testing set cover 10%. This means that 10-fold set cover
complete rotation of testing set. Tokenization is done when converting text to the
input of frequency table. It is a process of dividing a stream of text into a list of
symbol, words or other meaningful units recognized as tokens. In this research, the
token is called as terms and artificial ants will visit them and decide their affinity
towards a category. The token is also act as a key in frequency table therefore its
existence is unique to the other tokens. Originally, they are separated by a white space
in the text documents. By using JAVA native split method, the white space are
removed automatically and the method returns a list in form of array with String type.
Comparisons to distinguish their uniqueness are done using JAVA provided local
method. In split method, numbers only and words with number combinations are also
dropped as the scope of this research is to exclude numeric tokens. In the end of the
phase, frequency table is created.
55
The file content can have a number of irrelevant terms that does not give
meaningful information to the intended category or influence for the searched
criterion. For example, Figure 3.4 is one of the text files from 20 news group
collection of email, with graphic category:
SPONSOR: NESS (Navy Engineering Software System) is sponsoring a
one-day Navy Scientific Visualization and Virtual Reality Seminar.
The purpose of the seminar is to present and exchange information for
Navy-related scientific visualization and virtual reality programs,
research, developments, and applications.
Consecutively, Figure 3.5 shows one of the text file from Microsoft
miscellaneous category:
56
If you create a permanent swapfile larger than the recommended size, you will receive a message
telling you that Windows will not use anything larger than the recommended size. THIS ERROR
MESSAGE IS INCORRECT, we will allow the use of the larger swapfile, up to four times the
amount of RAM on your machine. - (file 9482)
Figure 3.5: Sample Content of Microsoft Miscellaneous Category Document (Jrennie, 2008)
During the scanning, the words that appeared more than once will be counted
up and indexed in a frequency arrays. For simplicity, any word with length less than
or equal to three characters is removed such as c is', 'the', cor' and 'to'. Example of the
frequency is shown in Table 3.2:
Table 3.2:
Example of Term-Frequency Table
Terms Frequency
SPONSOR
NESS
Navy
Engineering
Sponsoring
The process of counting frequency is repeated for each file in the categories to
get the total view of frequency. Since tokenization method is chosen, letter in
capitalization is left unchanged. Comparing done between tokens as a unit and the
count of occurrences is recorded as frequency.
Text classification starts by arranging the list of terms into nodes in a graph.
Each node connects with each other by bidirectional relationship. This situation is
adapted from solving the Traveling Salesman Problem (TSP). In TSP, nodes represent
cities where a businessman needs to travel to each city for exactly one time. Similarly,
in this adaptation we create artificial ant to simulate the agent travelling to all cities
and measure the total distances. The main problem of this scenario is Non-Polynomial
57
positive class
negative class
Travelling by ants to all different paths, one by one will consume too much
time even for supercomputer. A technique named Brute Force algorithm can also be
used to travel to all possible paths. For example, if a businessman needs to travel to 60
cities, there are 6060 = 4.89e+106 different possible ways. The ant colony algorithm
can find the optimal path in less than 500 attempts.
In this research, instead of taking one word at a time, category scanning is
done by using all words at once. It can be achieved by using ACO that is adapted from
ACO for TSP to map all terms. The adjustment is made by leaving the artificial ants to
choose term category at each node while depositing pheromone on the chosen
category. The process is repeated until the ants reach close to or perfect, desired
training score.
58
59
1 procedure ACO_search();
2 while (termination_condition_not_occur)
! 3 scheduletasks
4 create_and_assign_ants();
5 update_path_selection();
6 update_measurement();
7 update_pheromone();
| 8 end scheduletasks
9 end while 1
10 end procedure
Figure 3.7 is the pseudo-code of ACO searching method. The method starts
with sub-method to create and assign new artificial ants in its task scheduler. In
sequential search, this method is regularly invoked. Next is the procedure that contains
update_path_selection where this function will be invoked after the artificial ants'
creation. The function requires term-frequency table that is prepared during pre-
processing phase to initialize searching space for the artificial ant.
update measurement method is responsible to measure a newly generated route after
all of the term nodes are visited. The method will provide the total length or score to
60
1 procedure create.and_assign_ants();
2 while(not_reach _max_number();
3 set_ant_and_pheromone();
4 assign_ant_to _starting_route(); |
5 end while
6 end procedure
Figure 3.8 is the code development procedure to create more than one artificial
ant to its search space. The number of ant usually is set more than one so that the best
path or route could be calculated by comparing multiple results from two or more
artificial ants. The methods provided are responsible to initialize an artificial ant to its
starting point at random location. Location placement is randomly generated to imitate
the real situation of real ant which is able to find foods regardless of the location. Each
of the newly created ants will have centralized pheromone and share the same array
table.
61
! 13 end while
14 while (notendofterms) j
15 if (is_visited)
16 calculate__pheromone_addition();
I 17 update_pheromone_on_route();
18 else
19 deduct_evaporation();
20 end if
21 end while
22 die();
22 end procedure
Pseudo-code in Figure 3.9 shows the main activities of an artificial ant, starting
from its creation until the end of its lifecycle. The procedure starts with the
initialization of the artificial ant's properties which has been mentioned in Figure 3.8.
With the availability of the artificial ant memory, initial random selection of category
for the first term in the cycle is recorded to the memory of the artificial ant. The next
step in the loop is the category selection for the remaining terms. The loop starts by
fetching the value of term frequency from the term-frequency table of the next term.
The value is used to calculate the probabilities for the next term and generate random
value 0.0 (inclusive) to 1.0 (exclusive) as an input to the probabilities calculations.
62
Figure 3.10, 3.11 and 3.12 in this subchapter are describing the simulation to
demonstrate the artificial ant which had discovered the shortest path after completing
a few tours. This simulation is generated using web application, TSPAntSim. The
simulation is setup by using 10 artificial ants and nine random generated nodes
pheromone with the trail updated intensity mode is ON. Figure 3.10 shows nine
different locations of node distributed randomly before the simulation run. All the
following figures (Figure 3.10, 3.11 and 3.12) contain nodes that are tagged in the
number where they represent words or term during the process of rule discovery.
63
64
1 J !—!—! ©
o i 1 In _J
m
*** i r >$9
Wt
1*
1
&? T
^ ~ ^
^i » — « , , • -
o
1
Figure 3.12: The Path after Third Tour
Figure 3.11 and 3.12 shows the different paths discovered by artificial ants. As
the intensity of pheromone trail at node 1 increase, the path changed from node 0-7 to
1-7. From the observation, we can observe that pheromone intensity increased on
vertices 1-0-2 in Figure 3.12 when compared to Figure 3.11. We can imply that their
distances are shorter than other places and the heuristic calculation respond to this
condition by analyzing the vertices routing. Finally the artificial ant reroute those
vertices and analyze the total distance again. Meanwhile, other vertices has
pheromone concentration increased reflect their length. By the same rule, the rest
vertices 1 -0-2 shorter length has condenser pheromone display than longer one.
65
In text feature selection, it is not guaranteed that a provided term frequency list
has completely free from the common terms. Terms that exist in both categories can
have random frequency and possibly have the same frequency. Since they do not give
any useful information that affects computation, it is best for the algorithm to include
built in common term filter. Scoring function will automatically skip the rest of its
calculation when it found a term that has common terms criterion. In order not to
disrupt the flow of the whole computation, common term that is detected will still be
giving a score for the sake of striving stopping criterion. If there is no score given to
the common term upon discovery, artificial ant will be forever staying in a loop to get
full score that consist of the number of the total term. Common term that is validated
when the scoring function is applied to the term will also ensure that the artificial ants
have less chance to guess on different category. This is because common terms
usually do not influence the category selection. The artificial ants can keep skipping
heuristic calculation and generate random value. Therefore, ACO algorithm can gain
additional execution speed according to the number of common term found in the
term-frequency list. The function can be useful when an effective common term filter
is done before the text classification process take place. In term of pheromone
66
67
term I
'sponsor'
\ choose B choose A /
term
'ness'
6 t 1
term
'navy'
l/ \
s 1 ^ ^ ^ ' F
\ 7
1 denotes term node
Figure 3.13 shows how an artificial ant visits from one term to another. An
artificial ant will complete a full visit of all term nodes for every cycle until it has
reached the stopping criterion. At the same time, the latest classification rule list is
also produced. If the model is set to run at x number of artificial ants, therefore we can
obtain x number of classification rule per cycle. The process in the cycle is repeated to
search for more suitable classification. Unless stopping criterions are met, the
classification rules obtained are considered inferior than the latest cycle version. This
is because ACO is an algorithm that produces solution to build up over time. For
multiple artificial ants setting, rule classification need to be selected as the best quality
for the current cycle. This is because different artificial ant will produce its own
classification rule. Unlike pheromone array that is centralized and sharable between
68
In this research, the state here refers to the possible category for each term.
Upon visiting each term, the ant will inspect the pheromone value from previous
iteration. The formula to calculate pheromone will be described in the next section.
Ant will use some heuristic rules that influence the pheromone usage. Figure 3.14
shows some of the array value:
uycie u
term 0, category 0 [0 to 0] = 99.5, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5, [1 to 1] 99.5
term 1, category 0 [0 to 0] = 99.5, [0 to 1] = 99.5, category 1 [1 to 0] 99.5, [1 to 1] = 99.5
term 2, category 0 [0 to 0] = 102.83333333333334, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5,
[1 to 1] = 99.5
term 3, category 0 [0 to 0] = 99.5, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5, [1 to 1] = 99.5
term 4, category 0 [0 to 0] = 99.5, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5, [1 to 1] = 99.5
term 5, category 0 [0 to 0] = 99.5, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5, [1 to 1] = 99.5
term 6, category 0 [0 to 0] = 102.83333333333334, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5,
[1 to 1] = 99.5
term 7, category 0 [0 to 0] = 99.5, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5, [1 to 1] = 99.5
term 8, category 0 [0 to 0] = 99.5, [0 to 1] = 99.5, category 1 [1 to 0] = 99.5, [1 to 1] = 99.5
Figure 3.14: Sample of Pheromone Value
69
Table 3.3:
Relationship between Frequencies and Visited Category
Frequency Class Current Visited
Frequency class 1 more than class 2 Category 1
Frequency class 2 more than class 1 Category 2
Frequency class 1 more than 0 while class 2 is 0 Category 0
Frequency class 2 more than 0 while class 1 is 0 Category 1
As shown in table 3.3, the ant will attempt more tries on the most likely
category that will be the solution. However, there is no guarantee that the foreseen
solution is the best one. In TSP, there are times when the longer path leads to the
shortest overall route. If that is the case, the pheromone will become stronger on the
newly found best routes during the iteration process. Note that category 2 indicates
common term and not relevant to heuristic influence or other part of calculation. Its
purpose is there only to be skipped so that the whole classification is more efficient
and not to execute unnecessary calculation.
0
limit
Figure 3.15 shows step by step of how heuristic value is calculated to produce
a category prediction. Its calculation requires a table that contain all of the terms
70
71
The table above shows that the random value generated is 0.4, which falls
between 0.0 (inclusive) and its upper cap is at 0.7 (exclusive). In this case, the
artificial ant will make a move or mark its travel to first category. Suppose that the
random value falls between 0.7 (inclusive) and 1.0 (exclusive), the second category is
selected.
There are many types of pheromone for different usage that are used in
situational event such as to attack other ants and to make them go frenzy in dangerous
situation. This research refers to the pheromone used by ants while they were
searching for food.
In this research, ant that visits all nodes will examine the amount of
pheromone value that was left on the trail in the previous iteration. The decision will
be influenced by how much concentration value the term node has during the current
time. Different from real world ants, the artificial ants do not automatically update the
new pheromone value as they leave the node. This is because the score for the whole
path has yet to be available. Artificial ants can update the value only after all of the
terms have been visited. Pheromone evaporates in constant amount as time passed by
72
TiJ^[(l-p)TU+&T?fSt]TmaX , (3.1)
J
imin
where ij refer to path between term i and term j , Arf^5t refer to best path, p refer to
current pheromone value, rmm is lower bound and Tmax is upper bound. For each
solution found, artificial ants need to determine the correct amount of pheromone
concentration for each node. The setup for the research has decided that each correct
term node chosen by ants will increase 1% of pheromone concentration from its
original values. To be fair with the incorrect decision chosen by artificial ants, 1%
pheromone concentration will be deducted from the previous iteration per incorrect
training score.
73
74
ON
r—|
* i Main thread
•1
o
rp K7 i *
<J7~
*
\^
*
\J/
*
>|/
*
j\^
*
\^
*
\|/
*
vj/
*
~~^
ft:
a
•
n
o
CD
i 1 w w w w V • v Y • o
a
5]
&3
\ &
-%
\
OOOOOOOOO
thread I thread 2 thread 3 thread 4 thread 5 thread 6 thread 7 thread 8 thread 9
R
i i
CD Legend: ""^ Artificial ant
s-
Q-
CD
4 y i Term node routing
o Til ne
76
5
<? C o o o
o ooo
9 concurrent
route searching
Figure 3.17 illustrates the main part of the ACO model. It shows the overall
procedure from the pre-processing step to the end of concurrent execution for this
research. The explanation reveals that the mechanism for 9 concurrent executions
greatly differ from the single threaded version of ACO.
77
Upon dispatching artificial ants from threads, they are released by iterating a
loop. Although the released is done in sequential manner, it is not guaranteed that they
will keep the sequence during its life cycle. These happen because sometimes the
speed can be uncertain as there are certain events that can interrupt a thread especially
the child thread. Figure 3.18 shows how tasks are divided and scheduled to a linear
sequence by operating system. Between multiple tasks that an operating system needs
to handle, a task can take a short time or a normal time length to finish. However
some of the tasks can take too much time that can cause other task to be waiting in
queue and would not be executed in timely manner. This problem happens even
though the task just needs a very short time to finish. The problem can become even
significant when the task that is waiting in the queue has high priority and need to be
completed in real time.
An example of this type of task is time recording function. If the function is
not being executed immediately, it will have problem with data inconsistency because
the recording is done at later time from the long queue. This can give a large time
difference as it required high accuracy recording, in other word to record in nano
second in order to record the operating time for each artificial ant. To overcome this
problem, task scheduling repeatedly apply context switching between the tasks given
to the threads. Every task has small portion of time depending on its priority and time
for it to complete. Therefore, shorter and simpler task get executed in small time
compared to the heavier and longer task execution. However, the decision taken to
obtain such solution makes context switching overhead accumulated as the number of
task increase in the queue. A thread can pause or get interrupted if the task takes up
even slightly longer than the readily allocated time slice. Context switching that is
done by the scheduler makes it as if the threads execute random part of code even
though they have the same released time. To overcome uneven execution speed while
preserving responsiveness from context switching, implementation of how signaling
and stopping multiple thread is adjusted. The implementation detail is presented in
chapter 3.3.6.2.
78
A
time
length
task(T)
Tl T2 T3 T4
Tl T2 T3 T4 Tl T2 T3 T2
1
>
execution sequence
Figure 3.18: Task with Different Length Scheduled and Execution Sequence
In order to overcome uneven speed of artificial ants from the effect of thread
race condition (chapter 3.3.6.1), this research implement and adjust different setting.
Instead of main thread remain active to wait for one of artificial ants found the
79
80
81
ant 1
ant 2
(ant 3 to 8)
ant 9
->
x time time
issue
stopping
signal
xtime time
Figure 3.19: Implementation on How a Thread Send Stopping Signal to other Threads
82
Where
tp + tn v(3.3)
7
Accuracy7 =
tp+fp + fn + tn
83
84
4.1 Introduction
4.2.1 Experiment
This experiment was created to measure ACO text classification accuracy. The
results from the test are recorded in Table 4.1. The accuracy differences between file
numbers added was analysed.
Table 4.1:
Average Accuracy from Every 10 Runs of Every 50 Files Addition
Text Files Average Accuracy (%)
50 68.66
100 75.16
150 82.33
200 82.25
85
Table 4.2 shows that the ACO text classification accuracy has increased. The
accuracy increment from file in which the number ranges between 50 to 100 is 6.5%.
The accuracy increased further in the experiment involving 150 files with 7.17%.
However, the result for the next experiment that included 200 files showed a small
decrement of 0.08%. The overall experiment result shows an acceptable beneficial
improvement that outweighed its decreased accuracy values.
4.2.2 Discussion
86
4.3.1 Experiment
Table 4.3 showed that the average accuracy of GA text classification varies as
the number of files increased. The experiment is done to analyse GA classification
algorithm in handling bigger data set.
Table 4.4:
Accuracy Difference between File Addition Increment on GA
Text Files Percentage difference
50-100 Decrease by 3.50%
100 - 150 Increase by 0.20%
150 - 200 Increase by 5.30%
Average Increase by 0.67%
Table 4.4 shows that GA text classification accuracy increased over the
increasing data that had been tested. The accuracy decrement from file in which the
number ranges between 50 to 100 is 3.5%. The accuracy had slightly increase in the
experiment which consist of 150 files by 0.2%. Result from the next experiment that
includes 200 files shows an acceptable increment of 5.3%. The overall experiment
87
4.3.2 Discussion
88
Table 4.5 shows the ACO and GA text classification accuracy result from
previous experiments. For 50 files used in classification, GA methods have better
accuracy. The GA accuracy is calculated to be 2.4 percent higher than the ACO
method. The accuracy for 100 files shows that ACO surpasses GA by 7.66 percent.
When conducting the experiment with 150 files, it is revealed that ACO has the
highest difference of 14.63 percent over GA average accuracy. The experiment with
200 files revealed that ACO accuracy is 9.25 percent higher than GA accuracy.
Take note that both ACO and GA are wrapper methods that requires a set of
solution over time in order to solve text classification problem. They are
fundamentally the same which iterate and include random generation in their heuristic
decision making. However, they differ in how they perceive text classification criteria
problem by using natural and biological methods respectively. ACO was inspired by
the analogy of colony of ants using pheromone and food foraging mechanism. On the
other hand, GA uses general tournament and living cell activities comparison.
First of all, the test should not be taken as absolute reference in which the
result has shown that ACO surpass GA algorithm in text classification most of the
time. This is because there are many factors that contribute to or affect classification
accuracy such as ambiguous meaning from ambiguous term and abbreviation. Other
factors also consist of misspelled words and words with number. It is difficult to
possibly test all of the factors into a single set of experiment and research. Therefore
89
A small test to determine the most suitable number of artificial ants to run
concurrently is done manually. The test used 100 documents as the input data for
classification and it is run multiple times with the same setting except for the number
of concurrent artificial ants. The test is run repeatedly by increasing the number of ant
from one to sixteen which double the factory thread's max number (eight). All of the
test runtime are recorded to determine the shortest time. For the current test machine
specification, using nine threads concurrently are found to give optimal performance.
Table 4.6 is the information for the test result by recording the average for 10 times.
From the table we found that the best time reduction happened when 9 threads was
used, with the result of 35.53% of improvement.
90
From the same test, CPU utilization is also recorded. Task manager by
Windows 7 is used. Figure 4.1 shows the print screen version of unthreaded version of
ACO. Red region indicate the processing part from the start to the end of execution.
Since there are 10 run (10-fold), the graph shows a sudden drop for a short time in a
while, which represent the file reading. File reading consume less processing power
thus the phenomenon occurred. As shown in the graph, only one core of the processor
has higher utilization, while other cores show passive usage.
91
For comparison, CPU utilization for 9 threaded version of ACO test was also
recorded. Figure 4.2 shows the print screen version of 9 thread of ACO run. Observe
that the average processing power for current version shows a higher value.
92
From the graph, we can assume that using multithreading in ACO is beneficial
in terms of processing time and efficiency. It also uses the multicore-multithread
processor to a higher potential and fulfil its purpose well.
4.5.1 Discussion
By using context switching, it is possible to create more than one artificial ant
in order to handle a single thread. But we cannot gain additional time cut as time is an
immutable resource. Continuous thread switch required scheduler to work even more
while the scheduler itself has execution cost such as method overloading and
algorithm calculation. In writing code, there are no specific rules to limit a number of
a thread that can be created by a single process. All of the threads that are requested
93
4.6.1 Experiment
Experiment shows that the average accuracy of ACO text classification with
multicore-multithreading. It has different average accuracy as the number of files
94
Table 4.8 shows that ACO text classification accuracy increased for the overall
file addition. The accuracy increment from file in which the number ranges between
50 to 100 is 7.17 percent. The accuracy also shows an increment in experiment
involving 150 files with 6.28 percent. However, the result for the next experiment that
includes 200 files shows a small increment of 0.11 percent. The overall experiment
result shows a beneficial improvement that increases every file addition.
Take note that the accuracy differences for the same data set in the single
threaded version have very similar accuracy to the multicore-multithreading version.
The accuracy differences between single threaded and multicore-multithreaded
version are recorded in Table 4.9:
Table 4.9:
Accuracy Difference between Single Threaded and Multicore-Multithreading Version ofACO
Text Files Single threaded Multicore-Multithread Percentage
ACO Accuracy (%) ACO Accuracy (%) Difference (%)
50 68.66 68.66 0.0
100 75.16 75.83 0.67
150 82.33 82.11 0.22
200 82.25 82.22 0.03
Table 4.9 shows that there are only slight differences of classification accuracy
between single threaded and multicore-multithreaded version of ACO. For the
classification involving 50 files, there is no accuracy difference at all. The
classification accuracy difference is 0.67% for 100 files. The difference is recorded as
the highest accuracy difference of all the test series. The next file addition which uses
95
COPYRIGHT UNIVERSITI TEKNOLOGI MARA
150 files shows 0.22 percent difference. The last file addition in the test which
includes 200 files has produced ACO accuracy difference by 0.03 percent.
96
4.7.1 Experiment
This experiment was conducted to measure the time taken for ACO to classify
text. The results from the test are recorded in Table 4.10. The time differences
between the file numbers added is analysed.
Table 4.10:
Time Taken of Text Classification Test for every 50 File Addition Without Multicore-
Multithreading
Text Files Time Taken (seconds)
~50 021
100 0.76
150 1.71
200 5.68
Experiments show that ACO text classification has recorded different average
running time as the number of files increased as listed in Table 4.11. The experiment
was done to analyse ACO classification running time as the data set becomes larger.
Table 4.11:
Time Taken to Classify Text using ACO Featuring Multicore-Multithreading
Text Files Time Taken in Multicore-
Multithreading (seconds)
~50 018
100 0.49
150 1.27
200 2.28
Table 4.11 shows the time taken to run classification test using ACO with
multicore-multithreading. For the test that used 50 files, the time taken to classify the
entire test file was 0.18 seconds. The time recorded for 100 files to be classified is
0.49 seconds. The next file addition involving 150 files shows that 1.27 seconds was
needed to run the test. ACO which uses multicore-multithreading feature requires 2.28
seconds to classify 200 files as compare to the other file addition.
97
Table 4.12 shows the differences in time required to run ACO text
classification between single and multicore-multithreaded version of ACO. For 50
files used in the test, there is 14.28% time difference in which the multicore-
multithreaded ACO is faster by 0.03 seconds from the single threaded ACO. The next
test involving 100 files shows that multicore-multithreaded ACO lead the time
reduction race by a gap of 35.52%. The result is ahead of the single threaded ACO by
0.27 seconds. The current amount of file test is the second highest time reduction
improvement compare to all other file addition tested. The next test involving the use
of 150 files have reduced the time by 25.73% for multicore-multithreaded ACO time
improvement which shows a total of 0.44 second time saving. The highest file
addition of the test which involved the use of 200 file additions produced the highest
amount of time reduction by 59.85% with 3.4 seconds of gap.
4.7.2 Discussion
98
99
This chapter presented the tests and discussion done of this research. One of
the tests compared between ACO and GA in the aspect of classification accuracy.
From the result, GA has higher accuracy when run on lower number of text files.
ACO on the other hand has better classification accuracy on medium to high
dimension of data than GA. According to the method used, the discussion relates the
effectiveness of tournament concept of GA which occur on increasing individual
elitism in scaled population. Multiple peak solution that has higher chance to happen
at high dimensional data could deteriorate GA classification accuracy as training
dataset increased in the tests. To counter the problem, ACO has stagnation mechanism
that react when it reaches a peak and will be bound or reset its pheromone so that a
new exploration is done. Further test is done to test time reduction for ACO in
multicore-multithread environment. Based on the result, the overall speed gain has
meet the objective requirement that is to reduce time consumption of ACO
classification time. The reason discussed for ACO to achieve positive result is the use
of pheromone sharing and use of global variable to eliminate unnecessary sharing
variables. This feat has directly make the classification efficient as memory resource
operate in shorter time and require no more than single copy.
100
5.1 Introduction
The first objective of this research is to figure out optimization algorithm for
text classification. As the scope of bio-inspired algorithm, ACO is chosen to classify
text from a general collection which consists of two distinctive classes. Regardless
what the topics are, the classes could at least be distinguished normally by humanoid
evaluation. ACO that is adjusted from TSP problem solving is needed to find the
systematic criterion in order to run a classification process. The objective is achieved
when the adjustment brings satisfying result and is used for further research expansion
by using it in multicore-multithread environment. ACO is found to be suitable in text
classification when classification rule can be done cooperatively. There are many
instances of artificial ant that can be set out to work together using pheromone
naturally as a mean of communication. The traveling artificial ant behaviour closely
resemble the real ant behaviour in savaging for food where more route are created
until the food source is found and choose shorter path comparatively. The situation of
visiting the path randomly is drawn back as a text classification algorithm. With
collective classification rule over repeating cycle, the rule is adjusted and evaluated
constantly in building up a general classification solution. Since ACO is a wrapper
type for classification method, the comparison is made with other techniques in the
same category. GA is chosen because it is also a bio-inspired type that produces
solution in an evolutionary way. The ACO model is run against the third party
101
102
The main contribution of this research is the ACO for mining text
classification rules in order to categorize text document. The developed framework
has integrated a few of the following activities:
1) Text classification using ACO algorithm in multicore-multithread
environment.
2) Theory replication that depicts natural problem solving in real life
traveling salesman. The problem criterion similarity that is addressed in
theoretical bio-inspired could simplify the learning and understanding
progression. The statistical method could introduce straightforward
method but bio-inspired could be more suitably natural and human
friendly.
3) Method conversion from ACO algorithm from TSP solution is adjusted
to term node relationship.
Besides that, this research is contributing a few values in multicore-
multithread environment implementation. The contributions are listed as follows:
1) Multicore-multithread environment design that needs to be adapted in
ACO model need to comply with its specific requirement. One of them
is the multiple core need to handle a number of threads processing. In
contrast to parallelism, multithreading does not wait for the slower or
interrupted child thread. Therefore the pheromone value must have
dynamic update to keep its integrity at all time.
2) Improvement over single threaded version of ACO in term of execution
efficiency and time reduction. The bottleneck has also been found in
which increasing the number of thread does not give stable rate of time
reduction. This is true when the nine threads used in the experiment
103
The research has achieved its initial objectives. However, there are a lot of
improvements that can be done for future research on text classification in multicore-
multithread environment scope. Some of the main suggestions that cover both ACO
for text classification and its multicore-multithread implementation are listed as
follows:
a) In the future, the ACO in this research is hoped to be adapted in cloud
environment. Cloud environment use of ACO in multi colony method
as oppose to single colony method is known as cellular method. The
testing case could be adjusted to suit multi colony structure. The
performance in term of classification speed can be compared with
cellular method and the result could suggest an important overall
performance as a good ACO algorithm wide application. Multi colony
can be broken into different partition such as course, medium and fine
grain of granularity.
b) The developed technique in this research also could be used to mine
different field of data such as opinion mining. Opinion mining has
more abstract definition thus require a deeper care to differentiate the
classification rule. Opinion mining is useful to evaluate one's
behaviour and favourite subjects. It has marketing value and requires
more exploration from classification research.
104
105
106
107
108
109
110
Ill
112
113
COPYRIGHT UNIVERSITI TEKNOLOGI MARA
AUTHOR'S PROFILE
Ahmad Nazmi bin Fadzal completed his Master in Computer Science at the
Faculty of Computer and Mathematical Science, Universiti Teknologi MARA. He
received his Bachelor in Computer Science at Kuliyyah of Information and
Communication Technology, International Islamic University Malaysia.
LIST OF PUBLICATIONS
114