Information and Software Technology: Gillian J. Greene, Marvin Esterhuizen, Bernd Fischer
Information and Software Technology: Gillian J. Greene, Marvin Esterhuizen, Bernd Fischer
a r t i c l e i n f o a b s t r a c t
Article history: Context: version control repositories contain a wealth of implicit information that can be used to answer
Received 29 January 2016 many questions about a project’s development process. However, this information is not directly accessi-
Revised 2 December 2016
ble in the repositories and must be extracted and visualized.
Accepted 5 December 2016
Available online xxx Objective: the main objective of this work is to develop a flexible and generic interactive visualization
engine called ConceptCloud that supports exploratory search in version control repositories.
Keywords:
Formal concept analysis Method: ConceptCloud is a flexible, interactive browser for SVN and Git repositories. Its main novelty is
Tag clouds the combination of an intuitive tag cloud visualization with an underlying concept lattice that provides
Browsing software repositories a formal structure for navigation. ConceptCloud supports concurrent navigation in multiple linked but
Interactive tag cloud visualization individually customizable tag clouds, which allows for multi-faceted repository browsing, and scriptable
construction of unique visualizations.
Results: we describe the mathematical foundations and implementation of our approach and use Con-
ceptCloud to quickly gain insight into the team structure and development process of three projects. We
perform a user study to determine the usability of ConceptCloud. We show that untrained participants
are able to answer historical questions about a software project better using ConceptCloud than using a
linear list of commits.
Conclusion: ConceptCloud can be used to answer many difficult questions such as “What has happened
in this project while I was away?” and “Which developers collaborate?”. Tag clouds generated from our
approach provide a visualization in which version control data can be aggregated and explored interac-
tively.
© 2016 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.infsof.2016.12.001
0950-5849/© 2016 Elsevier B.V. All rights reserved.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
2 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
aspects of the project history. ConceptCloud makes use of a novel and lattices (for example, one derived from the Linux repository)
combination of tag clouds and an underlying concept lattice [5] to become incomprehensible. Our refinement-based navigation algo-
support exploratory search [6,7] tasks on software repositories. rithm (see Section 3.4) then enables interactive repository brows-
When users have no previous knowledge of a project or have ing through our tag cloud interface (see Section 4.1). Our naviga-
not yet formulated a direct query their task becomes one of ex- tion algorithm maintains a focus concept in the underlying lattice
ploratory search instead of direct search or retrieval [7]. While there which represents the user’s current tag selection. We derive the
are already approaches supporting specific retrieval tasks and visu- tag cloud visualization from the current focus concept and update
alizing aspects of software repositories [8], support for exploratory it after each navigation step. Navigation is driven by the user’s se-
search in software repository data remains unavailable. The goal of lection (or de-selection) of tags in the tag cloud. Fig. 1(i) shows
our work is to build a flexible and interactive visualization engine the initial focus concept generating the first tag cloud, after the
that allows users to visualize different aspects of a project interac- selection of tag “Alice” the focus moves to (ii) and the tag cloud is
tively and therefore supports exploratory search tasks, instead of updated.
presenting the user with one static, pre-configured view. By using different objects in the formal contexts (see
An exploratory search approach can provide an overview of the Section 3.2) that are used to construct concept lattices, we are able
repository data and allow the user to further investigate any as- to generate tag clouds that provide different perspectives on the
pects of the project which they might find interesting. Therefore, same underlying data in the same familiar visualization. Our foun-
exploratory search approaches can support new developers on a dation in formal concept analysis allows us to change the objects
project in understanding the project history and team structure. easily to get different insights on the same repository.
An exploratory approach can also be used to answer more general We have implemented our approach in the ConceptCloud
questions (e.g., “Which developers collaborate?”) which cannot be browser (available at www.conceptcloud.org) which includes ad-
formulated as a single search query which would be possible if the vanced visualizations, such as multiple interlinked tag clouds.
question was more focused (e.g., “Who collaborates with Alice?”). Section 5 shows the application of ConceptCloud to three different
Tag clouds (or word clouds) are a simple visualization method repositories.
for textual data where the frequency of each tag is reflected in In this paper, we extend our previous work [17] by providing
its size. We use a tag cloud visualization to present aggregated a formalization for our formal context construction from reposi-
software repository data, as tag clouds support exploratory search tories (see Section 2), combining multiple archives (such as issue
tasks and have been found to be effective when the informa- databases and version control repositories) in the same context in
tion discovery task is wide [9]. While our tag cloud visualization order to support data fusion (see Section 3.2.5) and developing a
may not be the optimal visualization for all aspects of the data, browser scripting language for ConceptCloud to support advanced
it is flexible enough to visualize many aspects of the software customizations (see Section 4.2.4). We have also conducted addi-
project such as developer expertise (e.g., which developers have tional evaluation in the form of a user study (see Section 7).
worked on particular files or directories and would be good candi-
dates to ask questions about this functionality), co-changed meth- 2. Modeling software repositories
ods in a software project, project activity (e.g., in which years and
months has there been a lot of development, and on which parts We use a simple repository model derived from Hindle and Ger-
of the system), or developer collaboration (e.g., which developers mán’s SCQL [18] to formalize how we construct the contexts that
are working together on which parts of the project) in a uniform underpin our browser: a repository is simply a collection of ver-
way. Our interactive tag clouds allow developers to aggregate com- sions of a set of files that are grouped into revisions. Note that we
mits into groups and filter commits that apply to a certain topic, follow the SVN terminology [19] here. Hindle and Germán [18] re-
which has been noted by developers to be useful [2]. fer to versions as revisions, while revisions are called modification
We generate tags directly from the data that we extract from requests; elsewhere revisions are called transactions.
software repositories, instead of relying on user-generated labels as A version v ∈ V denotes the abstract state of a file f ∈ F created
tags for particular content, as often done in Web 2.0 applications by an authora ∈ A at a time t ∈ T . We ignore the actual file con-
(such as Flickr’s early tag cloud view). The data available in a ver- tents and only use meta-data and abstract modifications. Versions
sion control archive is often large (for example, more than 50 0,0 0 0 constitute a version history if they are ordered by a precedence re-
revisions for the Linux [10] repository) and so we allow the user lation ≺ that holds only between versions of the same file and is
to make incremental refinements (i.e., navigate) in the tag cloud in compatible with the file creation times. We say that vevolves into
order to generate smaller, more detailed visualizations. The naviga- v if v≺v holds; two versions v1 and v2 are merged into v if v1 ≺v
tion in our tag clouds is crucial for facilitating exploratory search and v2 ≺v.
tasks. Navigation using tag clouds has previously been explored us-
ing a Bayesian approach [11]; however, navigation in our browser Definition 1. Let V ⊆ F × T × A be a set of versions over files F
is supported by a novel combination of tag clouds and concept lat- and ≺ ⊆ V × V be an irreflexive partial order. (V, ≺ ) is called a ver-
tices [5,12,13]. sion history iff v = ( f, t, a ) ∈ V, v = ( f , t , a ) ∈ V, and v≺v imply
We conjecture that a concept lattice [5] provides a high level f = f and t < t .
of internal structure for the repository data and therefore allows
A revision r is a set V of file versions that are committed to the
users to explore the data through multiple navigation paths. Con-
repository R at time t by an author a; on commit, some meta-data
cept lattices have been shown to be useful for browsing data
(i.e., author, time, and an additional log message l ∈ L) is stored to-
[14–16] but large lattices do not provide a suitable data visualiza-
gether with the versions. We assume that each revision r ∈ R con-
tion because the relationships between the concepts are difficult
tains only one version of a file (which need not be the most recent
to identify in a large Hasse diagram. Therefore, we make use of
version), and that each revision is uniquely determined by an ab-
a concept lattice to facilitate navigation in the more intuitive and
stract identifier id(r).
scalable tag cloud visualization.
Fig. 1 shows an overview of our approach. We construct a for- Definition 2. Let (V, ≺ ) be a version history and R ⊆ P(V ) ×
mal context from data in a version control archive (see Section 4.1) T × A × L be a set of revisions. R is called a repository iff r =
and generate a concept lattice directly from the context. Note that (V, tr , ar , l ) ∈ R and v = ( f, tv , av ) ∈ V imply tv ≤ tr and v ≺ v for
we have used a small illustrative example as larger context tables all v ∈ V.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 3
Fig. 1. Navigating concept lattices with tag clouds: tag clouds correspond to the matching colored concepts in the lattice (tag clouds from left to right correspond to concepts
i, ii and iii respectively). Context table (top left) used to generate concept lattice (top right). Tag clouds are refined on each tag selection (selected tags shown in red). (For
interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
We can easily extend this basic model towards common revi- attributes. Such contexts can be imagined as cross tables where the
sion control systems. For example, in CVS [20], the notions of ver- rows are objects and the columns are attributes (cf. Fig. 1).
sions and revisions are conflated; in our model we thus have for
all revisions r = (V, t, a, l ) ∈ R that V = ( f, t, a ). Note that we do Definition 3. A formal context is a triple (O, A, I ) where O and A
not model revision tagging explicitly, but assume that the tags are are sets of objects and attributes, respectively, and I ⊆ O × A is an
part of the log messages. In SVN, each revision can only contain arbitrary incidence relation.
the most recent version of a file, and only the commit author and
Definition 4. Let (O, A, I ) be a context, O ⊆ O, and A ⊆ A. The
time are recorded but not the file author or modification time.
common attributes of O are defined by α (O ) = {a ∈ A | ∀o ∈ O :
Hence, in our model we thus have for all r = (V, tr , ar , l ) ∈ R and
(o, a ) ∈ I }, the common objects of A by ω (A ) = {o ∈ O | ∀a ∈ A :
v = ( f, t f , a f ) ∈ V that v ∈ V implies that t f = tr and a f = ar . Note
(o, a ) ∈ I }.
that we are only interested in the linear sequence of revisions and
therefore do not model explicit branching and merging, but again For example, the common attributes of the objects
assume that this information is encoded into the log messages, if revision-1 and revision-2 in Fig. 1 are Alice, 10/14
requested. For distributed revision control systems such as Git we and build.xml.
analyze a clone of the repository. Note that clones of the reposi- Concepts are pairs of objects and attributes which are synony-
tory in different states will generate different contexts, as the con- mous. They are maximal rectangles (modulo permutation of rows
texts are generated using the commit information extracted from and columns) in the context table. For example, ({revision1,
the repository. Therefore, if a repository is not up-to-date (i.e., has revision2}, {Alice, 10/14, build.xml}) in Fig. 1 is a con-
changes available to be pulled) then the generated context will dif- cept, since adding another revision object loses common attributes,
fer from that of the up-to-date repository, as the list of commits while adding another attribute loses common objects.
differs.
Definition 5. Let C be a context. c = (O, A ) is called a concept of
3. Navigation framework C iff α (O ) = A and ω (A ) = O. πO (c ) = O and πA (c ) = A are called
c’s extent and intent, respectively. The set of all concepts of C is
In our model, we have a set of revisions and a set of attributes denoted by B(C ).
for each revision; the attributes are divided into separate cate-
Concepts are partially ordered by inclusion of extents such that
gories such as author, date, or file name. Our goal in browsing is
a concept’s extent includes the extent of all of its subconcepts; the
to retrieve a set of revisions which share a common attribute such
intent-part follows by duality.
as the same author, and then to refine this set gradually by adding
more attributes. We use formal concept analysis (FCA) as framework Definition 6. Let C be a context, c1 = (O1 , A1 ), c2 = (O2 , A2 ) ∈ B(C ).
to achieve this goal. c1 and c2 are ordered by the subconcept relation, c1 ≤ c2 , iff O1 ⊆
O2 . The structure of B(C ) and ≤ is denoted by B (C ).
3.1. Formal concept analysis
The basic theorem of FCA states that the structure induced by
Formal concept analysis (FCA) [5,12,13] uses lattice-theoretic the concepts of a formal context and their ordering is always a
methods to investigate abstract relations between objects and their complete lattice. Such concept lattices have strong mathematical
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
4 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
properties and reveal hidden structural and hierarchical proper- ities that appear across the bins, e.g., similarities between identi-
ties of the original relation. They can be computed automatically cally named files such as README.txt in different directories. We
from any given relation between objects and attributes. The great- use d, n, and t, respectively, to denote mappings from each time
est lower bound or meet and least upper bound or join can also be to the corresponding weekday, and from each file to its base name
expressed by the common attributes and objects. and type, respectively.
Note that we do not perform more complicated pre-processing
Theorem 7 (Wille [5]). Let C be a context. Then B (C ) is a complete
steps such as word sense disambiguation [23] or identity merging
lattice, the concept lattice of C. Its meet and join operation for any
[24]. We instead prefer to leave the user in control of such deci-
set I ⊂ B(C ) of concepts are given by
sions.
( Oi , Ai ) = Oi , α (ω ( Ai )) 3.2.2. Revision-based contexts
i∈I i∈I i∈I In a revision-based context we interpret the revisions, repre-
sented by their revision number, as objects and the commit meta-
data (e.g., author or words from the log message) as attributes;
( Oi , Ai ) = ω ( α ( Oi )), Ai
each revision is associated with its own meta-data as attribute.
i∈I i∈I i∈I
This context type represents the canonical view of repositories. Its
Each attribute and object has a uniquely determined defining concepts are sets of revisions and their common attributes (e.g., all
concept in the lattice. For example, the defining concept for Alice revisions that include a common set of files). It is useful to get a
is indicated in blue in the concept lattice in Fig. 1(ii). The defining historical overview of a project, for example to identify when the
concepts can be calculated directly from the attribute or object, most changes have been made to a project, which developers have
respectively, and need not be searched in the lattice. worked on particular files and which directories have been devel-
opment hotspots.
Definition 8. Let B (O, A, I ) be a concept lattice. The defining con-
cept of an attribute a ∈ A (object o ∈ O) is the greatest (smallest) Definition 9. Let R be a repository, and AR = W ∪ A ∪ T ∪ F. CR =
concept c such that a ∈ π A (c) (o ∈ π O (c)) holds. It is denoted by (id (R ), AR , IR ) is called the revision-based context ofR if for all r =
μ(a) (σ (o)). We use the δ (x) to denote μ(x) if x is an attribute and (V, t, a, l ) ∈ R, v = ( f, t , a ) ∈ V, and x ∈ AR , we have (r, x ) ∈ IR iff
σ (x) otherwise. (i) x ∈ W(l ), or
Efficient algorithms exist for the computation of the concept (ii) x = a, or
lattices and the meet and join of concepts in the lattice, such as (iii) x = d (t ) or x ∈ T(t ), or
Lindig’s algorithm [21]. (iv) x = n( f ) or x ∈ D( f ), or
(v) x = t ( f ).
3.2. Contexts from repositories
3.2.3. File-based contexts
In a file-based context we interpret the files as objects but de-
In order to construct a concept lattice from repository data we
rive the attributes from the revisions’ pre-processed meta-data;
need a context table. The first step in the construction of such a
more precisely, each file receives all attributes from all revisions
context table is to determine which field in the data will be taken
that involve the file. Concepts from such contexts are sets of files
as the object and which fields are suitable as attributes for that
with common attributes (e.g., the set of all files on which a group
object. We use three different object types, namely revisions, files,
of developers have all worked); in particular, each commit induces
and revision-file pairs (i.e, changes) in order to construct different
a concept: since a developer can only commit one set of files at
types of contexts, which enables us to create different tag cloud
any given time, the set of committed files is maximal with respect
visualizations for the same repository, providing new insights into
to the set of all attributes derived from the commit meta-data.
the data. We are able to combine multiple data sources in the
same context to support data fusion as object types in the context Definition 10. Let R be a repository, and AF = W ∪ A ∪ T ∪ id (R ).
table need not be homogeneous. We use a combination of issue CF = (F, AF , IF ) is called the file-based context ofR if for all r =
and version control data, in the same context, to provide a more (V, t, a, l ) ∈ R, v = ( f, t , a ) ∈ V, and x ∈ AF , we have ( f, x ) ∈ IF iff
complete overview of a project.
(i) x ∈ W(l ), or
(ii) x = a, or
3.2.1. Basic preprocessing
(iii) x = d (t ) or x ∈ T(t ), or
When we construct context tables we pre-process the meta-
(iv) x = n( f ) or x ∈ D( f )\{ f }, or
data that we extract from the revision control system, in partic-
(v) x = t ( f ), or
ular the log messages, file names, and commit times from each
(vi) x = id (r ).
revision in the repository. We use a function W : L → P(W ) that
segments each log message into individual words w ∈ W, removes Note that revision- and file-based contexts give complemen-
words on a default stop list, and reduces each word to its stem, us- tary views on the repository. For example, the author tags from
ing the Apache Lucene implementation of Porter’s [22] stemming a revision-based context will be scaled according to the number
algorithm. Since the stem is not necessarily a proper word we take of revisions that the author has committed over the project life-
the most frequently used word that evaluates to a given stem as time; during browsing only one author tag can be selected at a
representative in the cloud. time since each revision has only one author. In a file-based con-
We group both file names and commit times into increasingly text, the author tags will be scaled according to how many files a
coarser bins. For file names, we use a function D : F → P(F ) that particular author has changed. Selecting an author tag will reveal
decomposes each file name into a set of all path prefixes, similar all collaborators, i.e., all other authors who have also changed any
to recursively applying the Unix dirname command. For commit of the same files. Selecting two author tags will then reveal the ex-
times, we use a function T : T → P(T ) that truncates the times at tent of their collaboration, i.e., all files they have both worked on.
different precision levels (days, months, and years). Therefore file-based contexts can be used to visualize the collabo-
In addition, we also use aggregators (such as aggregating files ration in the project, showing which developers work together and
with the same names, even across directories) to capture regular- on which files.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 5
Fig. 2. Multiple linked tag clouds of the JUnit Repository in ConceptCloud, showing changed files (top), authors (bottom) and years (left). The tag cloud is constructed from
a revision-based context.
3.2.4. Change-based contexts tion of these such as GitHub [27]. Moreover, archive entries can
In a change-based context we use pairs of files and revisions be linked across the different tools, by, for example, adding an is-
as objects, so that for example (hello.java, revision-1) sue identifier to the log message of a revision which references
and (hello.java, revision-3) become separate objects in the that issue. Ideally, visualization tools should be able to “fuse” the
context. This allows us to use the content of the files as addi- information from different archives for the same project into a
tional attributes, which we cannot do with revision- or file-based single combined data structure, such as Hipikat’s uniform artifact
contexts. In our implementation we focus on the changes (rather database [4] or Codebook’s central graph [28].
than the entire contents), and use a lightweight fact extractor Here, we combine data from multiple archives (or different fea-
[25] to get the signatures of the changed methods from each file. tures of GitHub) into a single context using multiple object types.
We could therefore have, for example the attributes public int In particular, we combine repository data and GitHub issue data
equals(), public static void main(), and Alice asso- into the same context. In the combined contexts we use the re-
ciated with the object (hello.java, revision-1) to repre- visions and bug reports as objects (since the object types in the
sent the fact that revision-1 by Alice changes the methods context table need not be homogeneous) and derive the attributes
equals and main. Selecting a method tag m then produces a tag from both the revisions’ pre-processed meta-data and the text
cloud which contains all other methods that have been co-changed from the bug reports. Therefore, where bug reports and revisions
with m, scaled according to how often they have been changed to- share a common attribute they will be grouped together in the
gether (cf. Fig. 3). Therefore change-based contexts can be used to same concepts, indicating the relation of the bug reports to the
construct visualizations that depict the co-changed methods in the revisions. The combined context gives a more complete overview
project as well as showing other method information, for exam- of the project activities.
ple, which methods are development hotspots and in which time Note that the objects in a combined context are a union of re-
periods. visions and issue IDs; this is different to the construction of the
In our model, we assume a set M of abstract modifications change-based contexts where the objects are pairs of revisions and
(in the spirit of the atomic changes of Ren et. al [26]), and use files. The combined context’s attributes are the union of the origi-
(v , v ) ⊆ M to denote the (non-symmetric) difference between nal attributes for both the revisions and the issues, and each object
two versions v ≺v of a file. keeps its own attributes. We merge corresponding attribute cate-
gories from the data sources, e.g., log messages and issue descrip-
Definition 11. Let R be a repository, and AC = W ∪ A ∪ T ∪ F ∪
tions. This assumes that words have the same meaning in the dif-
id (R ) ∪ M. CC = (F × id (R ), AC , IC ) is called the change-based con-
ferent archives, but in return it provides us with implicit links be-
text ofR if for all r = (V, t, a, l ) ∈ R, v = ( f, t , a ) ∈ V, v ∈ V with
tween bugs and revisions that both talk about a specific topic (e.g.,
v ≺v, and x ∈ AC , we have (( f, r ), x ) ∈ IC iff
“Linux”), because their log messages and descriptions share a com-
(i) x ∈ W(l ), or mon attribute. The issues and revisions are therefore connected
(ii) x = a, or automatically, without the need to create any links, as for exam-
(iii) x = d (t ) or x ∈ T(t ), or ple described by Silwerski et al. [29]. However, for a data source
(iv) x = n( f ) or x ∈ D( f ), or such as GitHub, which stores explicit references between commits
(v) x = id (r ), or and issues, we are able to link these in the context table by using
(vi) x ∈ (v , v). a “surrogate key” attribute which we assign to both the revision
object and the issue object in the context table. A surrogate key
3.2.5. Combined contexts: bug reports and revision control data is therefore, an additional attribute which serves exclusively to in-
Software development projects often make use of dedicated dicate an explicit link between the revision and the issue in the
tools for different tasks, such as issue databases, task trackers, and concept lattice. Section 5.2 provides examples of tag clouds gener-
source code repositories, or use a tool that provides a combina- ated from Git repositories and issues in the GitHub issue-tracking
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
6 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
Fig. 3. JUnit: vacation cloud for David Saff constructed from a change-based context. Main tag cloud view (top). Changes by Alex Yursha (bottom left) and Kevin Cooney
(bottom right). Alex Yursha and Kevin Cooney are selected with sticky tags. Only tags with occurrence greater than two are shown.
system. Combined contexts can be used to visualize which files Navigation is refinement-based: when the user selects another
have been changed when a bug has been fixed as well as showing tag, the browser updates the focus by computing the meet of that
the project activity both in terms of commits and issue reports. tag’s defining concept and the old focus.
Intuitively, deselection should be the inverse of selection: de-
selecting the last selected tag should move the focus back to its
3.3. Tag clouds from concepts previous position. Because of the duality in the concept lattice,
we would expect the de-selection operation to be implemented by
We visualize repository data with a tag cloud that we construct the join in the lattice. However, using the join operation to de-
from the focus concept in the lattice. Since a concept comprises a select an attribute a would move the focus up in the lattice and
set of objects and a set of attributes, it is tempting to use the at- effectively de-select all other currently selected attributes except
tributes (i.e., the intent) as the tag cloud. However, this produces a, which leads to counterintuitive results. We must therefore re-
degraded clouds because (i) the intent only contains the attributes compute the focus as the meet of the defining concepts of the re-
common to all objects, and (ii) each attribute only occurs once so maining selected tags, in order to provide a de-selection operation
that all tags would have the same size. Instead, we use the intents which is the inverse of the selection operation.
of the extents; more precisely, we collect all attributes of the defin-
ing concept of each object in the extent of the focus concept; we
3.5. Relation to information retrieval
also add the objects themselves, to allow their direct selection in
the tag cloud.
Our lattice-based browsing approach is related to classical in-
Definition 12. The tag cloud from a concept c = (O, A ) ∈ B(C ) is de- formation retrieval (IR) [30,31]. The context table can be seen as a
Boolean version of the document-term matrix, while the concept
fined as τ (c ) = O
o∈O πA σ (o).
lattice can be seen as representation of the usual indexes. A con-
Here
denotes multiset union. By construction, the objects in cept in the lattice contains for each document in its extent, the set
the tag cloud induce subconcepts of the concept from which the of terms that occur in the document in its intent. For each term
tag cloud was derived; moreover, all tags have a non-bottom meet the set of objects in its introducing concept is its inverted index
with that concept. entry. If we see the selected tags as a conjunctive query, then the
focus’ extent is the query’s result.
The tag cloud can also be seen as the aggregation of the
3.4. Navigating concept lattices with tag clouds Boolean term frequencies for each document in the query result,
scaled according to the size of the document collection. The con-
The browser maintains a focus concept, from which it renders cept lattice provides us with an efficient way to compute this
the tag cloud as described above; when the user selects (or dese- tag cloud; a computation from only the inverted index would be
lects) a tag, the browser updates the focus and re-renders the tag impractically inefficient: we would first need to retrieve all doc-
cloud. The focus, or more precisely, its extent contains the sub- uments indexed by the selected tags, then iterate over the en-
set of objects in the repository that share all currently selected tire vocabulary and compute the size of the intersection of each
tags. The initial focus (corresponding to an empty selection set) is term’s inverted index with the query’s result. Hence, any efficient
therefore the lattice’s top element, whose extent contains the en- IR-based implementation must use the same information in essen-
tire repository (see Fig. 1(i)). tially the same way as our lattice-based implementation. However,
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 7
we can exploit the lattice structure, e.g., to update the focus incre- and structure to the tag clouds. By selecting a tag in the tag cloud
mentally, or to show which other tags are implied by (i.e. always the resulting cloud will provide contextual information for the cur-
occur along with) the current selection set. rently selected tag.
The initial tag cloud shown in ConceptCloud includes tags from
4. ConceptCloud browser all attributes and objects in the context table (using the top con-
cept in the lattice as the focus). This allows the user to select any
We have implemented our approach in the ConceptCloud tag from the extracted repository information. Tags in the initial
browser. The VISSOFT 2015 evaluated artifact [17] is available at tag cloud will be at their largest size because we scale all tags ac-
vissoft15.conceptcloud.org/ and the continuously updated web ap- cording the maximum and minimum tags in this cloud. Making
plication is available at www.conceptcloud.org. Our browser can selections in the initial tag cloud will result in clouds with smaller
automatically index Git and SVN repositories and create tag cloud tags (cf. Fig. 1), indicating that the cloud is only showing attribute
visualizations from them. It also supports more advanced pre- tags from a subset of the total objects in the context table.
processing and interface customizations. By construction, the objects in the tag cloud induce subconcepts
ConceptCloud comprises three main components that extract of the concept from which the tag cloud was derived; moreover, all
meta-data from the revision control system, construct a context ta- tags have a non-bottom meet with that concept.
ble in the desired format, and display the tag cloud of the resulting
lattice. ConceptCloud automates the process of creating a tag cloud Proposition 13. Let c ∈ B (O, A, I ) be a concept, o ∈ O, and t ∈ O ∪
visualization from a version control archive and its user interface A. Then (i) o ∈ τ (c)⇒σ (o) ≤ c, and (ii) t ∈ τ (c)⇒δ (t)∧c = ⊥.
supports customization of the tag clouds. The browser is generic Since the tag clouds can be very large we provide functional-
and can show tag clouds of different context types. It is also com- ity in the interface to limit clouds to one particular category (e.g.,
pletely automatic: there are no manual pre-processing steps, and commit authors), or to remove unwanted categories from them.
the user only needs to enter the URL of the repository. A more de- The cloud can also be adjusted to show only a certain number of
tailed description of the tool architecture and usage is available in tags or to show only tags that occur more than a given number of
[32]. times. Since all the tags are textual, users are also able to search
ConceptCloud currently supports extraction of meta-data and in the tag cloud to find a tag if they already know which tag they
construction of context tables from SVN [19] and Git [33] reposito- want to select (such as their commit name).
ries, both locally and remotely. For Git repositories, the hashes are Customized visualizations can be created from the initial tag
converted into sequential revision numbers. Both extractors sup- cloud by selecting relevant tags and by moving categories of tags
port the revision-, file-, and change-based contexts, as described in into separate viewers. For example, Fig. 2 shows a view of the year,
Section 3.2. The construction of change-based contexts requires the filename and author clouds for the JUnit repository where the file-
identification of methods changed in consecutive versions, which name tag AllTests.java has been selected. The visualization shows
requires the extraction to be language-aware. Such contexts are in which years this file has been changed, who has changed this
currently limited to Java files. The generated context tables can be file and what other files are often changed in the same commit as
saved in XML format so that they can be loaded again without ex- this one, scaled according to how often they are changed together.
traction. Fig. 2 allows us to answer questions such as “Who has changed
For the lattice construction, we use a method based on the Col- this file?” (i.e., expertise) , “Is this file still under development?”
ibri/Java library [34] which constructs concepts on the fly. We thus and “What other files should I be looking at if I want to change
never need to compute the full lattice and are able to render an this file?” (i.e., co-changed files).
initial tag cloud relatively quickly. Viewers can also be opened with a “sticky” tag that always re-
mains selected and cannot be deselected. This enables us to open
4.1. Tag cloud interface multiple parallel viewers with different tag selections in the same
category (such as months, cf. Fig. 4) which update simultaneously
We make use of a tag cloud visualization that can be cus- when another tag is selected in any viewer. Sticky tags therefore
tomized to show different views on the repository. Multiple dif- enable us to show mutually exclusive views in two tag clouds next
ferent visualizations for different metrics were found to confuse to each other.
users [35]. We therefore propose one uniform visualization that A tag is implied if it has not been selected explicitly, but corre-
can be used to explore various different aspects of a version con- sponds to an attribute in the focus’ intent. Implied tags reveal the
trol archive. repository’s internal structure, similar to the way association rules
The simplest and most popular tag cloud layout [36] is as an reveal the implicit structure of shopping baskets [39] but without
alphabetically sorted list of tags in a roughly rectangular shape any additional cost.
which was found by Schrammel et al. to perform better than ran-
dom or semantic layouts [37]; we use this layout because it sim- 4.2. Advanced visualization in ConceptCloud
plifies textual search within the tag cloud. We scale each tag i be-
tween the given minimum and maximum font sizes fmin and fmax , In addition to the interface customizations that can be per-
according to its weight ti in relation to the minimum and maxi- formed on the tag cloud there are also two customizations that
mum weights in the context table, tmin and tmax ; hence, can be performed during construction, namely personalization and
filtering. A combination of these two customizations allows us to
( fmax − fmin ) · (ti − tmin )
size(i ) = + fmin − 1 produce a “vacation cloud” as described in Section 4.2.3 below.
tmax − tmin
ConceptCloud also supports a number of advanced visualiza-
for ti > tmin and size(i ) = fmin otherwise. tions such as customizing a specific tag cloud or using a scripting
A variety of alternative tag layout methods have been proposed, language to automatically layout the ConceptCloud interface.
such as tag flakes by Caro et al. [38]. Tag flakes are used in order
to provide context for tags as basic tag clouds fail to show how 4.2.1. Personalization in tag clouds
the tags are related [38]. However, instead of using a more com- We can personalize a tag cloud for a particular developer by
plex visualization that depicts the relationships between the tags, identifying all tags that apply to that developer (e.g., files they have
we use incremental refinement in the tag cloud to provide context changed) in our pre-processing step. We then assign these tags
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
8 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
Fig. 4. JUnit: author clouds (top); changes to TestRunner.java (bottom). Tag Clouds constructed from a file-based context and months/files are selected as sticky tags.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 9
made changes in June 2002 (cf. bottom of Fig. 4). We also see that
there is a group of files which have been changed at the same
time.
Differences in visualizations: while the visualization presented by
Weissgerber et al. [40] is an author file graph which shows for
each author lines connecting the author to a specific file (which
is represented as a dot in the graph) our visualization shows the
different tag sizes for the developers according to their amount of
contribution. In the author file graph visualization the amount of
nodes connected to a developer can be used to assess the amount
of their activity, whereas in the tag cloud their tag size directly cor-
responds to the amount of activity. Additionally, by selecting au-
thor names in the tag cloud the names of the corresponding files
that these authors have been changing will be shown in the tag
cloud. It is unclear how the author file graph presents the names of
the files which have been changed. The author file graph [40] also
Listing 2. ConSL script for generating author by month view of the JUnit repository allows the identification of developer collaboration: if two devel-
(Fig. 4).
opers are linked to the same node they have collaborated on a file.
In our tag cloud view from a file-based context the selection of a
particular author would update all other author tags to show only
5. Illustrative application examples
authors that have been collaborating with the selected author in
a size that represents the amount of collaboration. Therefore, in
We apply our ConceptCloud browser to two open source repos-
our tag cloud view the identification of collaboration is interactive
itories and one industrial application to demonstrate the insights
and it is also scalable, since the tags for all collaborating develop-
that can be obtained using the browser. We repeat and expand on
ers can be shown at the same time. Using the sticky tag function,
a previous case study on the JUnit repository in Section 5.1 to high-
comparisons between different groups of collaborators can also be
light the flexibility of our browser. We also show how the browser
easily drawn, by comparing the tag clouds. The file author matrix
can be used to explore both combined version control and issue
[40] shows a grid-like summary of which developers have been
data simultaneously using the RubyGems repository in Section 5.2.
working on which files across the project, where each pixel color
We have also applied our browser to generate insights from a small
indicates the amount of activity on a file. In our tag cloud visual-
industrial project (see Section 5.3) in order to evaluate the appro-
ization files can be selected to see which authors have been work-
priateness of the insights that can be gathered with ConceptCloud.
ing most actively on a file and authors can also be selected to in-
dicate on which files they have been working. A summary view
5.1. JUnit repository across a group of developers (or files) can be created by making
sticky tag viewers for the group of developers and comparing the
JUnit is a popular open-source testing framework for Java which tag clouds created.
has been used in previous studies [40,41]. Here we repeat Weiss- Conclusions: ConceptCloud allowed us to gather the same in-
gerber’s study [40], which investigates developer roles up until sights as the dedicated tool presented by Weissgerber et al. [40].
2006, and extend it to a more current date. We show that we can However, ConceptCloud does not produce a static picture but al-
easily extend the previous observations on the repository through lows the user to refine the analysis, and access the other informa-
our interface even though our interface was not specialized only tion (e.g., log messages) that remains available.
to identify collaboration patterns. We also show that we can make
the same observations using our ConceptCloud browser as the cus- 5.2. Rubygems repository
tomized visualizations for each aspect presented in [40]. We cre-
ated the revision-based context for the JUnit project from its first We constructed the combined context for commits and issues
revision in 03/12/20 0 0 up until 26/02/2014 (1772 revisions). from the RubyGems GitHub repository [42] to show how we can
Overview: in order to get an initial view of the project we open combine issue and repository data in the same tag cloud. The
a commit time view and restrict it to years. This shows that project GitHub issue tracking system provides links between issues and
activity increases dramatically from the first full year in 2001 until commits that either close an issue or reference it. We extract these
2007 and remains relatively steady thereafter. Selecting the year links, using the GitHub API, to create explicit links between issues
tag 20 0 0 in the full cloud shows us that developer egamma and commits in our tag cloud, but we also extract keywords from
started the project in December 20 0 0. In an author cloud for the the issues and commit messages and use these to create implicit
first full year of development (2001) we see that developers kbeck links between issues and commits that discuss the same topics.
and emeade join the project in 2001 but egamma remains the For other issue tracking systems that do not include explicit links
most prolific author in that year (cf. [40]). between issues and commits we would still be able to extract im-
Authors by month: Weissgerber et al. [40] look specifically at the plicit links from the commit messages.
file changes made in the months March to June 2002. To repeat Linked issues and commits appear in the same tag cloud, show-
this we open viewers with “sticky tags” for March, April, and June ing which files have been changed in order to close an issue. For
2002 (there was no commit in May 2002) and limit these to show example, Fig. 5 shows the tag cloud containing information for is-
only author (cf. Fig. 4, top). Selecting an author tag shows us which sue 227 which was closed by commit 3642. We can immediately
files the author has worked on in each month. Fig. 4 shows kbeck’s see that files rubygems.rb and specification.rb were fixed in re-
contributes less and less in the given period. The cloud for June lation to the bug reported about inactive gems. We see here tags
2002 shows the addition of developers vbossica and clarkware #227 as well as tag 227, where #227 represents the issue object
to the project. and 227 is part of the commit message for commit 3642. We see
Selecting the file TestRunner.java, shows that egamma and also tags r3642 and 3642, where 3642 represents the revision ob-
kbeck have changed this file in April 2002 and only egamma has ject and r3643 is used as a link between both the revision and
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
10 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 11
Fig. 6. RubyGems: main changed files, committers and bug reporters from commits and issues mentioning Gem Install. Tag clouds constructed from a combined context of
GitHub issues and commits.
Fig. 9. Industrial application study: collaboration with developer LS. Tag cloud build
from a file-based context.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
12 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
Fig. 10. Industrial application study: directories and collaboration of developers (a) LS, SM and P9 and (b) AV. Tag cloud build from a file-based context.
prising to them. However, we have seen that there are many valu-
able insights, such as team collaboration, areas of expertise and
activities, contained in the version control repository. Using Con-
ceptCloud we were able to gather these insights which would be
very valuable for new developers starting on the team and teams
in which the collaboration patterns or activities are not obvious to
the project manager. We could identify the different roles of de-
velopers in the team by examining the directory structure of the
files they committed, which indicated what parts of the system
members were working on. We were also able to identify when
developers joined and left the team and how the different team
members collaborate.
Fig. 11. Industrial application study: weekdays of developer commits. (left) tags
sized according to number of commits (right) tags sized according to number of 5.4. Conclusions
files changed.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 13
Table 1
Metrics for revision-based contexts.
Xeon 8-core 2.0Ghz CPUs to analyze several Git and SVN reposito- 7. User study
ries in order to evaluate its performance.
We created revision-based contexts (using local clones of Git We performed a user study in order to evaluate whether un-
repositories and remotely accessing the SVN repositories). Table 1 trained users are able to answer questions about the history of a
summarizes the characteristics of and runtimes for these repos- software project using ConceptCloud more or less effectively than
itories, showing the number of revisions |O|, the number of at- with current widely-used interfaces. In particular we compare Con-
tributes |A|, and the size of the incidence relation (i.e., the num- ceptCloud to the default list-view of commits as implemented in
ber of object/attribute pairs) |I|, as well as the time to create the GitK and the GitHub interface, which is graph-based. Both GitHub
context table (i.e., indexing) and to draw the repository’s full tag and linear list commit views, such as GitK, are widely used in prac-
cloud. tice and we therefore use these interfaces as the controls for com-
We see that the indexing times (including the extraction of all parison against ConceptCloud. Linear list commit views are imple-
of the log information for the repositories) are only a few seconds mented in many popular Git GUIs (such as SourceTree1 and Tor-
for smaller repositories, and a few minutes for medium-sized ones; toiseGit2 ), but we use GitK as it is packaged standard with Git. GitK
even the largest repository with 155,627 revisions requires only provides a searchable linear list of commits and shows the diffs
34 min. Note that these times are not directly related to either the between two revisions. GitHub’s interface is widely used in order
size or the density (i.e., |I|) of the context tables but are to a large to visualize the history of a software project and provides graph
extent determined by the (lexical) pre-processing. views of user’s activity in repositories. GitHub also provides a code
The initial cloud creation times are given for the full tag cloud search interface. GitK, GitHub and ConceptCloud present the same
for the repository, which contains |O| + |A| tags. The table thus underlying information through different interfaces. We therefore
gives an indication of the cloud computation in the worst case; in compare the effectiveness of our tag cloud interface to that of a
practice, we can limit the number of tags shown to substantially searchable list interface and an interactive graph-based interface.
improve this. However, the initial tag cloud is cached and so can Since the participants in our study had never used our Concept-
be generated off-line in a pre-processing step. Subsequent loads of Cloud browser before, we also investigate whether the browser can
the initial tag cloud from cache are instantaneous. be used successfully by untrained users.
Tag clouds become smaller with subsequent navigation steps In this study, we aim to answer the following research ques-
and are therefore created substantially faster. Overall, navigation tions:
is instantaneous for small and medium repositories, with some
degradation on the initial clouds for very large repositories. RQ1: is a rich exploratory interface, such as our interactive tag
Note that drawing the initial cloud requires us to compute the cloud interface, accessible to untrained users?
defining concepts of all objects; however, since we use an incre- RQ2: does our interactive tag cloud interface allow users to
mental lattice construction approach and therefore never compute achieve higher correctness than the familiar linear list view
the full lattice, we do not experience the high runtimes commonly of commits when answering questions about the history of
associated with FCA. a software project in a set time period?
To reduce drawing time for larger repositories we could limit RQ3: does our interactive tag cloud interface allow users to
the number of tags shown in the initial tag cloud to only those achieve higher correctness than a graph-based interface,
that apply to a larger portion of the revisions in the repository and such as the one provided by GitHub, when answering ques-
then show the full tag set when the user has made selections to tions about the history of a software project in a set time
refine the tag cloud. For large repositories that are indexed repeat- period?
edly, our approach allows us to incrementally update the context
table (and therefore the concept lattice) so that updates can be
performed quickly and the initial indexing needs to only be per- 1
https://www.sourcetreeapp.com/ .
formed once. 2
https://tortoisegit.org/ .
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
14 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
Table 2 Table 3
Question set for user study, (a) Ruby Gems (b) Backbone (c) Retrofit. Descriptive statistics for average percentages
obtained with each of the three tools across all
(a) RubyGems: questions.
1 Who is the contributor with the most commits on the Ruby Gems
project? GitK ConceptCloud GitHub
2 In which year were the most commits made to the project?
Mean 0.52 0.71 0.67
3 Which file types has Charlie Somerville changed in his commits?
Sd 0.21 0.10 0.10
4 Which contributors have worked on the file lib/rubygems/psych
Min. 0.27 0.55 0.53
additions.rb?
Max. 0.84 0.90 0.85
5 Who has been making the most changes on the project since
Range 0.57 0.36 0.32
Samuel E. Giddins last worked on it?
6 When was this repository created?
b) Backbone:
1 In which month was the most activity on the project? the largest with 6388 revisions, Backbone consisted of 3130 revi-
2 Who was the most active developer in this month? sions and Retrofit had 998 revisions. We used repositories of dif-
3 Who is the most prolific author of the backbone/test directory? ferent sizes so that the results of our study would not be biased
4 Who was the last person to change the file backbone.js?
5 Which file has been changed the most in this project?
towards one repository size.
6 Who has made the most changes to the images in the project (jpg, The question sets were developed by exploring the repositories
png)? equally using GitK [47], GitHub [27] and ConceptCloud. Question
7 Who has changed the most files that Brad Dunbar has also sets included questions about the location of files, collaboration of
changed?
users, expertise of the contributors as well as the history of the
c) Retrofit: projects. The question answers were then verified using all three
1 Where are the tests for the main project located?
2 Who has edited the .yml files?
tools to make sure that the results were consistent. All questions
3 Who contributed the most to this project in its first year? were weighted equally. We used all three tools to generate the
4 Who has worked on JacksonConverter.java? question sets because the different tools have different strengths
5 Who merged pull request #1017? and weaknesses and using only one tool would have made the
questions easier to answer for the participants assigned to a spe-
cific tool.
7.1. Experimental setup Participants were given 15 min to answer each question set, (6,
7 and 5 questions respectively) after which they were given the
We used a between-subjects design to conduct the experiment, next question set and corresponding repository. Participants were
where each participant uses only one of the three tools to an- made aware of this time limit at the beginning of the user study
swer questions about the software development process in spec- and before each new question set was started. Participants were
ified projects. We constructed three questions sets, based on three asked to answer as many questions as they could in the time pro-
different software repositories that were also available on GitHub. vided and to move on from a question when they were unable to
All participants were asked to answer three question sets using answer it.
a tool (GitK, GitHub or ConceptCloud) which was randomly as-
signed to them. We then evaluated the correctness of the an- 7.4. Analysis and results
swers supplied by the participants. Each participant was supplied
with a user manual, detailing how their tool showed the history We used the R package for analysis of the experimental re-
of software projects. We marked all of the question answers that sults. We performed the Shapiro and Wilk [48] test to determine
were submitted by the participants and calculated their results. whether participants’ scores were normally distributed, in order to
We investigate the hypothesis that there is no difference between determine what further analysis could be performed. We obtained
the correctness results obtained by the participants over all three a p-value of 0.06, and at a confidence level of 0.05 we cannot re-
tools. ject the null hypothesis and conclude that the data is normally dis-
Our user study took place in a computer lab at Stellenbosch tributed.
University. All participants took part at the same time to avoid
communication about the tasks. Participants were not permitted 7.4.1. Summary statistics
to communicate during the study. Fig. 13 shows a summary of average correctness percentages
Resources for the user study, including question sets, sample so- achieved by participants for each question set, in the order that the
lutions and the versions of the repositories used, are available at question sets were answered (Ruby Gems, Backbone and Retrofit).
www.conceptcloud.org/userstudy15. In the first question set users of GitHub performed the best, and
for the second and third question sets users of ConceptCloud per-
7.2. Population formed the best. Fig. 14 shows a box-and-whisker plot for the av-
erage scores obtained across all questions for each tool. We see
We performed our user study with students in our third year that the median as well as the minimum value for participants
Software Engineering class of 2015. Previous courses required the using ConceptCloud is the highest, followed by GitHub and then
students to submit assignments using Git repositories, so all were GitK. The range of results of participants using GitK is the highest,
all familiar with Git. The participating group consisted of 47 stu- with some participants achieving high averages and others achiev-
dents in total. Participation was voluntary for all students. ing much lower results than those using either GitHub or Concept-
Cloud. Fig. 15 shows the box-and-whisker diagrams for the per-
7.3. Tasks centages obtained across each of the question sets for each of the
tools. Participants using ConceptCloud achieved higher median per-
We developed three question sets using three different repos- centages for each new question set, which indicates there might
itories available on GitHub, namely RubyGems [42], Backbone have been some learning effect observed over the different ques-
[45] and Retrofit [46] (see Table 2). We selected these repositories tion sets. However, participants using GitK or GitHub performed
as they are popular projects available on GitHub, and they differ in worse in the second question set and then again better in the third
size. At the time of the user study the RubyGems repository was question set.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 15
Fig. 14. Box and whisker plots for average percentages obtained using Concept-
Cloud, GitK or GitHub.
Table 4
P-values for Tukey test.
GitHub–ConceptCloud 0.6343916
GitK–ConceptCloud 0.0 0 06546
GitK–GitHub 0.0059474
Fig. 15. Average percentage obtained by participants using (a) GitK, (b) ConceptCloud, (c) GitHub across all Three Question Sets.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
16 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 17
for wider applications (e.g., clinical trial data, see Section 8.3). In indicate that list-based interfaces do not support exploration tasks
Section 8.4, we discuss previous applications of formal concept effectively. Codebook has been built with the aim of supporting
analysis to tasks in software engineering that are most closely multiple information needs from software development archives.
related to the goals our approach, for example, detection of co- While the Codebook data storage is flexible enough to support
changed methods and methods related to a particular bug report. users in answering different types of questions, the applications
built on top Codebook are aimed at answering specific questions.
8.1. Visualizing software and bug repositories With ConceptCloud we aim to have a single application that is
flexible enough to support users in answering different types of
8.1.1. Team structure and developer expertise visualizations questions, rather a centralized data-structure which can be used
Girba et al. use an “ownership map” visualization [50] in order as the base for different applications. However, our context tables
identify developer interaction and development patterns using the can also be seen as a central data structure for storing multiple
CVS log of a project. Girba et al. also identify several behavioral types of project information.
patterns of developers, such as teamwork, takeover, and cleaning, Hipikat [4] also monitors multiple information sources
and show how these can be identified in their ownership map vi- (Bugzilla, CVS, email, newsgroups) and builds a uniform ar-
sualization. These collaboration patterns could also be observed in tifact database. It has a number of heuristics (based on text
our tag clouds constructed from a file-based context. While the similarity and activity times) to create links between the artifacts,
ownership map visualization serves to provide an overview of the and provides lists of related artifacts on request. Hipikat queries
project developer patterns in a single visualization, our tag cloud are made using the Eclipse IDE and results are displayed in a
views are aimed at allowing users to interactively explore the con- Hipikat list view Eclipse plugin. However, the goal of Hipikat is
tributions. Therefore, while the collaboration patterns might not be more to recommend relevant items to project newcomers and
visible in a single tag cloud view, our approach aims to support not to provide them with an interface through which to explore
users in exploring the information at varied levels of detail. The the artifacts. Cubranic et al. [4] also note that project artifacts
user can then also continue exploring other aspects of the project are not easily accessible to developers as searching the archives
when they have identified interesting collaboration patterns. requires them to know the correct search terms for finding rele-
Alonso et al. [51] also use a tag cloud visualization to display vant information. In our work we also argue that searching the
information from CVS version control repositories. Their “exper- software development archives does not support all use cases,
tise cloud visualization” creates a tag cloud of committers that are as to be able to conduct a search the developer already needs
identified using a rule-based classification of CVS log information. to have some information about the archive. In our approach we
Users are then able to select the names in this cloud to display a aim to make the information contained in software development
cloud of the developers’ expertise. The expertise cloud visualiza- archives accessible to users for interactive exploration so that they
tion [51] differs from that of ConceptCloud as the different types can access the information even before they have formulated a
of information can only be displayed in separate clouds, meaning direct query. This is a different approach to the recommendations
that the combinations of tags a user can select are limited. In con- provided in [4] and supports users in exploring the full archives in
trast our underlying concept lattice only limits the available tag an unbiased way.
selections to tags that will not cause an empty tag cloud to be dis- Cubranic et al. [4] also note that while a list-based presentation
played. of results (as used by Hipikat) is very common “when the user’s
Weissgerber et al. [40] develop a transaction overview visual- purpose is exploratory browsing of a collection, such a flat-list pre-
ization, file-author matrix, and author-file graph to allow identifi- sentation does not indicate relationships within the results, only to
cation of team structure, developer collaboration, and project activ- the query itself.”. We propose interactive tag clouds as an alterna-
ity over a certain time period from data contained in the version tive view, as they allow users to explore query results in an aggre-
control system. Section 5.1 compares these visualization techniques gated form and support users in further filtering the results and
to the tag cloud view provided by ConceptCloud in the context of identifying relationships between them.
the JUnit case study. Information fragments [53] provide answers to developer’s
questions by combining subsets of relevant project information.
8.1.2. Co-evolution of production and test code Information fragments are comprised of nodes of different types,
Zaidman et al. [52] develop a change-history view and a such as a team member or work item. Node types are similar to
growth-history view to study the co-evolution of production and tag categories in ConceptCloud. The presentation of results uses an
test code. The change history view is a plot of the changed files Eclipse plugin and supports a counting feature to get an overview
over the revisions of a project’s repository distinguishing between of the number of occurrences of nodes, to get for example, the
production and test code. In our tag clouds we can distinguish be- number of items a developer has been working on. Our tag
tween production and test code by observing the project’s direc- cloud automatically gives the user an overview of the number
tory structure. of occurrences of each item as the tags are sized according to
occurrence frequency. The information fragments prototype is
8.1.3. Centralized data structures and visualizations for multiple aimed at answering specific questions that developers ask on a
software development artifacts day-to-day basis and not on allowing exploration of the underlying
Codebook [28] is a social network inspired toolset to ana- archives. While our approach can be used to answer the questions
lyze information implicitly contained in software repositories. Its identified by Fritz and Murphy [53] it is specifically aimed at
central data structure is a graph, where the nodes represent the supporting exploration of the underlying archives even for users
artifacts and actors (e.g., change set, developer), and the edges who have not yet formulated a direct query. While the list-based
represent the different relations between these (e.g., contains, interface presented in the information fragments prototype groups
committer). This graph is built from different sources including items together, to show for example which developers have been
revision archives, bulletin boards, mails, and directory information. working on a section of the code, our tag clouds present this type
Direct queries in a specific format can be given to Codebook to of information through navigation, where the user can select the
answer different types of questions. Results are displayed in a relevant file or directory and observe the developers that have
web interface that provides a ranked result list including images made changes to it. The “queries” that can be composed through
of people associated with artifacts. Results from our user study our tag cloud interface are also more flexible in that different
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
18 G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19
kinds of information (e.g., years, files and developers) can be the same underlying data and observe collaboration patterns of the
selected at the same time. developers. By using changes (i.e., revision-file pairs) as objects we
are able to easily identify the co-changed methods in a project.
8.2. Tag cloud visualizations of software Additionally, our context tables can be used as a centralized data
structure for multiple sources of information, such as version con-
There have been applications of tag cloud visualizations directly trol data and bug reports.
to software for different purposes. Our tag clouds provide a visualization in which version con-
Guido [54] includes a tag cloud to visualize names of types, trol data can be aggregated and explored interactively to support
variables, parameters and methods in source code. Selecting nodes developers in tasks such as keeping up with project changes. Our
in the graph visualization that Guido also provides will highlight interface is customizable through the use of a scripting language,
the corresponding tags in the tag cloud and selecting a name in which can be used to repeatedly access a constructed view on the
the tag cloud will highlight corresponding source code elements dataset. Our interactive visualization supports users in exploratory
in the graph view. The visualizations are linked in Guido similarly search tasks when they have no previous knowledge of a project.
to the multiple tag clouds that update simultaneously in Concept- We have used the ConceptCloud browser to repeat a previous
Cloud. Anslow et al. [55] use a tag cloud to visualize the structure case study [40] and to make observations about the internal struc-
of Java class names. Emerson et al. use tag clouds to visualize Java ture of a small commercial development project. We have also per-
methods and explore several different tag cloud layouts using the formed a user study to determine the usability of ConceptCloud
TAGGLE tool [56]. TAGGLE extends basic tag cloud views and al- and to compare its effectiveness in allowing users to answer his-
lows highlighters to be associated with tags so that if a tag is se- torical questions about a project to that of other existing informa-
lected, related tags in the cloud will be highlighted. Tag clouds in tion representations. Through our user study we conclude that un-
TAGGLE are customizable, as they are in ConceptCloud, with TAG- trained users are able to make use of our ConceptCloud browser to
GLE additionally allowing tag layouts to be changed. answer questions about the history of a software project.
In future, we plan to conduct an additional user study which
8.3. Tag clouds and navigation compares our ConceptCloud browser to other tools mentioned in
related work (which index repositories as well as additional infor-
Mesnage and Carmen [11] use a Bayesian approach for navi- mation sources such as email archives) to determine how the dif-
gation in tag clouds that allows tags related to one or more se- ferent tools perform in both search and exploratory search tasks.
lected tags to be shown in the cloud, where previously clouds We are currently working on building a generic framework from
could only be created for one selected tag. Gwizdka and Bake- our ConceptCloud browser so that this can be used to visualize a
laar [57] look at displaying a tag cloud history, which allows users variety of semi-structured data archives (such as academic paper
to keep track of their previous navigation steps, when clouds are data) [63]. We are also applying ConceptCloud in a different do-
used for pivot navigation. This approach is not directly applicable main and conducting another user study in which we specifically
to our tag clouds since we use refinement navigation where multi- evaluate the learning effects present when using the tool.
ple tags can be selected. Hernandez et al. [58] use multiple linked
tag clouds to browse semi-structured clinical trial data. These tag Acknowledgments
clouds are generated from the results of an initial search query and
each represent one facet (e.g., medical condition) of the data. A This research is funded in part by a STIAS Doctoral Scholarship,
multi-faceted view can also be created in ConceptCloud by moving NRF Grant 93582, CAIR, and the MIH Media Lab.
tag categories into separate tag clouds.
Supplementary material
8.4. Software and formal concept analysis
Supplementary material associated with this article can be
Poshyvanyk and Marcus [59] use a combination of latent se- found, in the online version, at 10.1016/j.infsof.2016.12.001.
mantic indexing and concept lattices to find methods that are rel-
References
evant to a bug report. Girba et al. [60] use concept analysis to
detect co-change patterns in revision control systems. Objects are [1] J. Sillito, G.C. Murphy, K. De Volder, Questions programmers ask during soft-
packages, classes, or methods, while properties are the validity of ware evolution tasks, in: Proceedings of the SIGSOFT ’06/FSE-14 International
expressions over certain metrics of the objects (e.g., number of Symposium on Foundations of Software Engineering, 2006, pp. 23–34.
[2] M. Codoban, S. Srinivasa Ragavan, D. Dig, B. Bailey, Software history under
classes, methods, or statements); the specific expression is deter- the lens: a study on why and how developers examine it, in: Proceedings of
mined by which co-change pattern is to be detected. Similar ideas the International Conference on Software Maintainance and Evolution (ICSME),
could be integrated into our approach. 2015.
[3] S.E. Sim, R.C. Holt, The ramp-up problem in software projects: a case study
There have also been direct applications of formal concept anal- of how software immigrants naturalize, in: Proceedings of the International
ysis to source code analysis and re-engineering [61,62] but these Conference on Software Engineering (ICSE), IEEE, 1998, pp. 361–370.
only consider an individual program, not a repository. [4] D. Cubranic, G.C. Murphy, J. Singer, K.S. Booth, Hipikat: a project memory for
software development, IEEE Trans. Softw. Eng. 31 (6) (2005) 446–465.
[5] R. Wille, Restructuring lattice theory: an approach based on hierarchies of con-
9. Conclusions and future work cepts, in: Ordered Sets, Reidel, 1982, pp. 445–470.
[6] R.W. White, R.A. Roth, Exploratory search: beyond the query-response
paradigm, Synth. Lect. Inf. Concept. Retr. Serv. 1 (1) (2009) 1–98.
In this paper, we have developed an interactive browser for re-
[7] G. Marchionini, Exploratory search: from finding to understanding, Commun.
vision control archives. We use a novel combination of concept lat- ACM 49 (4) (2006) 41–46.
tices and tag clouds, to make the information implicitly contained [8] H. Kagdi, M.L. Collard, J.I. Maletic, A survey and taxonomy of approaches for
mining software repositories in the context of software evolution, J. Softw.
in repositories accessible to users. Our browser can thus be used
Maint. Evol. Res. Pract. 19 (2) (2007) 77–131.
to answer many difficult questions such as “What has happened in [9] J. Sinclair, M. Cardew-Hall, The folksonomy tag cloud: when is it useful? J. Inf.
this project while I was away?”, “Which developers collaborate?”, Sci. 34 (1) (2008) 15–29.
or “What are the co-changed methods?”. [10] Linux github repository, (https://github.com/torvalds/linux).
[11] C.S. Mesnage, M.J. Carman, Tag navigation, in: Proceedings of the SoSEA 2nd
By changing the type of objects in the context table (e.g., re- International Workshop on Social Software Engineering and Applications, ACM,
visions, files etc.) we are able to provide complementary views on 2009, pp. 29–32.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001
JID: INFSOF
ARTICLE IN PRESS [m5G;December 12, 2016;20:48]
G.J. Greene et al. / Information and Software Technology 000 (2016) 1–19 19
[12] B. Ganter, R. Wille, Formal Concept Analysis - Mathematical Foundations, [40] P. Weissgerber, M. Pohl, M. Burch, Visual data mining in software archives to
Springer, Berlin, 1999. detect how developers work together, in: Proceedings of the Fourth Interna-
[13] B.A. Davey, H.A. Priestley, Introduction to Lattices and Order, 2nd. ed., Cam- tional Workshop on Mining Software Repositories (MSR), 2007, pp. 9–17.
bridge University Press, Cambridge, 2002. [41] S. Thummalapenta, T. Xie, Spotweb: detecting framework hotspots and
[14] B. Fischer, Specification-based browsing of software component libraries, Au- coldspots via mining open source code on the web, in: Proceedings of the
tom. Softw. Eng. (ASE) 7 (2) (20 0 0) 179–20 0. International Conference on Automated Software Engineering (ASE), 2008,
[15] C. Lindig, Concept-based component retrieval, in: Proceedings of IJCAI, 1995, pp. 327–336.
pp. 21–25. [42] Rubygems, (https://github.com/rubygems/rubygems).
[16] C. Carpineto, G. Romano, A lattice conceptual clustering system and its appli- [43] M. Goeminne, T. Mens, A comparison of identity merge algorithms for software
cation to browsing retrieval, Mach. Learn. 24 (2) (1996) 95–122. repositories, Sci. Comput. Program. 78 (8) (2013) 971–986.
[17] G.J. Greene, B. Fischer, Interactive tag cloud visualization of software version [44] Bus factor, (http://deviq.com/bus-factor/).
control repositories, in: Proceedings of the IEEE 3rd Working Conference on [45] Backbone, (https://github.com/jashkenas/backbone).
Software Visualization (VISSOFT), IEEE, 2015, pp. 56–65. [46] Retrofit, (https://github.com/square/retrofit).
[18] A. Hindle, D.M. Germán, SCQL: a formal model and a query language for source [47] Gitk, (https://git-scm.com/docs/gitk).
control repositories, ACM SIGSOFT Softw. Eng. Notes 30 (4) (2005) 1–5. [48] S.S. Shapiro, M.B. Wilk, An analysis of variance test for normality (complete
[19] C.M. Pilato, B. Collins-Sussman, B.W. Fitzpatrick, Version Control with Subver- samples), Biometrika 52 (3/4) (1965) 591–611.
sion - the Standard in Open Source Version Control, O’Reilly Media, Inc, Se- [49] J.W. Tukey, Comparingindividual means in the analysis of variance, Biometrics
bastopol, California, 2008. 5 (2) (1949) 99–114.
[20] J. Vesperman, Essential CVS, O’Reilly Media, Inc., Sebastopol, California, 2006. [50] T. Girba, A. Kuhn, M. Seeberger, S. Ducasse, How developers drive software
[21] C. Lindig, Fast concept analysis, Work. Concept. Struct.Contrib. ICCS 20 0 0 evolution, in: Proceedings of the International Workshop on Principles of Soft-
(20 0 0) 152–161. ware Evolution, 2005, pp. 113–122.
[22] M.F. Porter, An algorithm for suffix stripping, Prog. Electron. Lib. Inf. Syst. 14 [51] O. Alonso, P.T. Devanbu, M. Gertz, Expertise identification and visualization
(3) (1980) 130–137. from cvs, in: Proceedings of the International Working Conference on Mining
[23] R. Navigli, Word sensedisambiguation: a survey, ACM Comput. Surv. 41 (2) Software Repositories (MSR), 2008, pp. 125–128.
(2009) 10:1–10:69. [52] A. Zaidman, B. Van Rompaey, S. Demeyer, A. Van Deursen, Mining software
[24] G. Robles, J.M. Gonzalez-Barahona, Developer identification methods for inte- repositories to study co-evolution of production and test code, in: Proceedings
grated data from various sources, SIGSOFT Softw. Eng. Notes 30 (4) (2005) 1–5. of the International Conference on Software Testing, Verification, and Valida-
[25] G.C. Murphy, D. Notkin, Lightweight lexical source model extraction, ACM tion, IEEE, 2008, pp. 220–229.
Trans. Softw. Eng. Methodolol. 5 (3) (1996) 262–292. [53] T. Fritz, G.C. Murphy, Using information fragments to answer the questions
[26] X. Ren, F. Shah, F. Tip, B.G. Ryder, O. Chesley, Chianti: a tool for change im- developers ask, in: Proceedings of the International Conference on Software
pact analysis of java programs, in: Proceedings of the Object-Oriented Pro- Engineering (ICSE), ACM, 2010, pp. 175–184.
gramming, Systems, Languages and Applications, OOPSLA, 2004, pp. 432–448. [54] R. Cottrell, B. Goyette, R. Holmes, R. Walker, J. Denzinger, Compare and con-
[27] Github, (http://github.com). trast: visual exploration of source code examples, in: Proceedings of the In-
[28] A. Begel, Y.P. Khoo, T. Zimmermann, Codebook: discovering and exploiting re- ternational Workshop on Visualizing Software for Understanding and Analysis
lationships in software repositories, in: Proceedings of the International Con- (VISSOFT), 2009, pp. 29–32.
ference on Software Engineering (ICSE), 2010, pp. 125–134. [55] C. Anslow, J. Noble, S. Marshall, E. Tempero, Visualizing the word structure of
[29] J. Śliwerski, T. Zimmermann, A. Zeller, When do changes induce fixes? in: java class names, in: Proceedings of the Object-Oriented Programming Systems
Proceedings of the International Workshop on Mining Software Repositories Languages and Applications (OOPSLA), 2008, pp. 777–778.
(MSR), ACM, 2005, pp. 1–5. [56] J. Emerson, N. Churcher, C. Deaker, From toy to tool: extending tag clouds for
[30] C. Van Rijsbergen, Information Retrieval. software and information visualisation, in: Proceedings of the Australian Soft-
[31] C. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval ware Engineering Conference, 2013, pp. 155–164.
[32] G.J. Greene, B. Fischer, Conceptcloud: a tagcloud browser for software archives, [57] J. Gwizdka, P. Bakelaar, Tag trails: navigation with context and history, in: Pro-
in: Proceedings of the ACM SIGSOFT International Symposium on Foundations ceedings of the CHI’09 Extended Abstracts on Human Factors in Computing
of Software Engineering (FSE), 2014, pp. 759–762. Systems, ACM, 2009, pp. 4579–4584.
[33] J. Loeliger, M. McCullough, Version Control with Git: Powerful Tools and [58] M.-E. Hernandez, S.M. Falconer, M.-A. Storey, S. Carini, I. Sim, Synchronized
Techniques for Collaborative Software Development, O’Reilly Media, Inc., Se- tag clouds for exploring semi-structured clinical trial data, in: Proceedings of
bastopol, California, 2012. the Conference of the Center for Advanced Studies on Collaborative Research:
[34] D.N. Götzmann, Colibri/java, 2007, (http://code.google.com/p/colibri-java/). Meeting of Minds (CASCON), ACM, 2008, pp. 4:42–4:56.
[35] C. Anslow, S. Marshall, J. Noble, R. Biddle, Sourcevis: collaborative software vi- [59] D. Poshyvanyk, A. Marcus, Combining formal concept analysis with informa-
sualization for co-located environments, in: Proceedings of the IEEE Working tion retrieval for concept location in source code, in: Proceedings of the Inter-
Conference on Software Visualization (VISSOFT), 2013, pp. 1–10. national Conference on Program Comprehension (ICPC), 2007, pp. 37–48.
[36] S. Lohmann, J. Ziegler, L. Tetzlaff, Comparison of tag cloud layouts: task-re- [60] T. Gîrba, S. Ducasse, A. Kuhn, R. Marinescu, R. Daniel, Using concept analysis
lated performance and visual exploration, in: Proceedings of the Interna- to detect co-change patterns, in: Proceedings of the IWPSE Ninth International
tional Conference on Human-Computer Interaction (INTERACT), 2009, pp. 392– Workshop on Principles of Software Evolution: In Conjunction with the 6th
404. ESEC/FSE Joint Meeting, 2007, pp. 83–89.
[37] J. Schrammel, M. Leitner, M. Tscheligi, Semantically structured tag clouds: [61] G. Snelting, Reengineering of configurations based on mathematical concept
an empirical evaluation of clustered presentation approaches, in: Proceedings analysis, ACM Trans. Softw. Eng. Methodol. 5 (2) (1996) 146–189.
of the SIGCHI Conference on Human Factors in Computing Systems, 2009, [62] G. Snelting, F. Tip, Reengineering class hierarchies using concept analysis, SIG-
pp. 2037–2040. SOFT Softw. Eng. Notes 23 (6) (1998) 99–110.
[38] L.D. Caro, K.S. Candan, M.L. Sapino, Navigating within news collections using [63] G.J. Greene, A generic framework for concept-based exploration of semi-struc-
tag-flakes, J. Vis. Lang. Comput. 22 (2) (2011) 120–139. tured software engineering data, in: Proceedings of the Automated Software
[39] M.J. Zaki, M. Ogihara, Theoretical foundations of association rules, in: Proceed- Engineering (ASE), IEEE, 2015, pp. 894–897.
ings of the 3rd ACM SIGMOD Workshop on Research Issues in Data Mining
and Knowledge Discovery, 1998.
Please cite this article as: G.J. Greene et al., Visualizing and exploring software version control repositories using interactive tag clouds
over formal concept lattices, Information and Software Technology (2016), http://dx.doi.org/10.1016/j.infsof.2016.12.001