Creating Ides For The Eclipse Platform
Creating Ides For The Eclipse Platform
Bruno Dinis Ormonde Medeiros Student no. 50966 Introduo ` Investigao survey, ca a ca
Insituto Superior Tcnico e Lisbon, Portugal, http://www.ist.utl.pt
Abstract. Modern IDEs support a set of impressive features, such as code navigation, code assistance, and code refactoring, which greatly enhance the productivity of IDE users. Of these, Eclipse JDT stands out as one of the most advanced open-source IDEs available, as well as being based on the Eclipse Platform, an extensible framework for the creation of custom IDEs. This survey explores the issues and techniques concerning the creation of IDEs with rich code manipulation features, for the Eclipse Platform. The architecture, and the various components that form an IDE are examined, with particular detail given to the data structures that are the basis for IDE functionality. This survey is based on literature pertaining to the topics of Eclipse, IDE creation, and code refactoring tools, as well as on analysis of JDTs source and overall design.
Table of Contents
Creating IDEs for the Eclipse Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Eclipse Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Eclipse Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 IDE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Core Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Project Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Project Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Project Nature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 UI Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 The editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 IDocument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Source Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Advanced Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Basic AST Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 AST - Homogenous vs. Heterogenous tree . . . . . . . . . . . . . . . . . . . . 5.3 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Entity References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 DOM AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Model scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Model updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Refactoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I 1 2 2 2 4 4 5 5 6 6 7 7 8 9 11 11 11 12 12 13 14 15 16 17
Introduction
Nowadays, the success of modern programming languages is inuenced not only by the characteristics of the language itself, but also by the availability and quality of the whole language toolchain: compilers, build tools, debuggers, Integrated Development Environment(IDEs), prolers, etc. The IDE integrates with the other development tools, and provides a rich editing environment, responsible for manipulating a project and its source les. As such, the IDE is a most crucial part of the toolchain: it is where the developer workow is centered, and where he spends the majority of his time[1]. IDE tools have seen a large amount of development since their dawn, having greatly increased in power. As an example, consider modern IDEs such as Eclipse1 , Microsoft Visual Studio2 , or IntelliJ IDEA3 . For the Java and C# languages, these IDEs have a rich set of features, including semantic code navigation (go to declaration, code outline, etc.), full tool integration (compiler, builder, debugger), code assistance and completion, and, more recently, various automated refactorings. This level of IDE functionality greatly enhances developer productivity[2], and for this reason, aspiring new languages that have IDEs with little more than basic text editing features may quickly nd themselves at a serious disadvantage versus the languages where such full-featured IDEs are available. Indeed, the existence of high-quality IDEs for a given language may play a role as important as the quality of the language itself. But as much as it is important, implementing an IDE from scratch is also a complex and laborious task. For this purpose, Eclipse presents a very attractive option: the Eclipse Platform. The Eclipse Platform is an extensible IDE development platform that oers[3]: A comprehensive framework for the development of custom IDEs, providing support for generic IDE functionalities. Integration with complementary development tools, such as the build tool Apache Ant, source control systems such as CVS or Subversion, or any other tools available as extensions. A standard visual and behavioral environment across dierent languages, and perhaps even the possibility of inter-language integration. The Eclipse Platform is host to two IDEs that are ocially part of Eclipse: JDT4 and CDT5 . Both of these have become very popular, particularly JDT, which is recognized as one of the most powerful and feature rich IDEs in existence[4], rivaled only by IntelliJ IDEA, and, to a lesser extent, NetBeans.
1 2 3 4 5
Eclipse: http://www.eclipse.org Visual Studio: http://msdn.microsoft.com/vstudio IntelliJ IDEA: http://www.jetbrains.com/idea/ Java Development Tools, http://www.eclipse.org/jdt C/C++ Development Tooling, http://www.eclipse.org/cdt
The success and continuous growth of JDT and CDT, as well as of the underlying Eclipse Platform, is notorious, and has attracted many to the development of Eclipse-based products. Recent projects have started with the goal of creating Eclipse IDEs for newer languages, such as PHP, Ruby, AspectJ, and others. Under this motivation, this survey examines the state of the art in creating IDEs with rich code manipulation features, for the Eclipse Platform (altough many of the considerations taken here also apply outside of Eclipse). Two particular terms are introduced: Target Language - The language for which the custom IDE is developed for. Host Language - The language in which the custom IDE is developed. In the case of Eclipse that is Java. This paper starts with a small overview of Eclipse and of the two basic components of an IDE, and then explores fundamental considerations for the implementation of advanced IDE features, particularly in the design of the IDEs data structures. Basic knowledge of Eclipse (as an user) is assumed.
Eclipse Platform
The Eclipse Platform is a comprehensive open-source framework for the development of IDEs. It oers a rich foundation of building blocks where custom IDEs (as well as other kinds of applications) can be built upon and integrated together[3]. 2.1 Eclipse Architecture
Eclipse is designed and built as a plug-in architecture. A project developed for the Eclipse Platform consists of a set of one or more plug-ins, where each plugin represents a logical module of that project, and can depend on other plugins. Each plug-in contributes to Eclipse by providing implementations (called extensions) of well dened extension points, and many of Eclipses components are plug-ins themselves (this is represented in Fig. 1). A group of inter-related plugins can be grouped in what is called a feature, which allows them to be installed and updated together. 2.2 IDE Architecture
Most Eclipse IDEs, regardless of the target language, follow a similar general architecture. They are divided into separate components where each component is packaged into a plug-in of its own, with a well dened API of interaction. There are usually at least two IDE components: the Core, and User Interface (UI). Additionally, there may be components for debug, the build system, or documentation, if their size merit a separate plug-in. Fig. 2 shows one such basic IDE architecture.
Creating IDEs for the Eclipse Platform
The core: The core is the brain of the IDE and it is responsible for managing the domain logic of the projects of the target language. The domain logic is composed primarily of two things: the project description, which species what are the projects options, les, etc., and the language model, a structured, semantic representation of the source code of the project. This structure is usually an Abstract Syntax Tree (AST), or a derivate of it. It is generated by a language parser (See 3.1), and is a very important data structure, used extensively throughout the IDE. The core is also responsible for building and launching the project, which usually consists in calling an external compiler with the various project conguration options. The build system may also collect compiler messages from this process (especially if the compiler is external) and report them back to the user. If this system is complex enough it may sometimes be placed into its own plugin. UI: The UI is responsible for user interaction and controling the lifecycle and operations of domain data, such as the language elements or lesystem resources. The UI consists of several components such as the editor, views, outline, actions, wizards, preference pages, etc. The UI is responsible for these components, as well as the interactions between them and between the Eclipse workbench. Debug: The debug component is responsible for launching and running the target languages programs in a debug environment, providing the functionality required for interactive debugging. Debug may implement a debugger of its own, or it may interface with an external one6 . The Debug component may have a UI component of its own, to host debug-specic UI features. The implementation of the Debug component is not examined in this survey.
Core Component
The core is, like the name says, the center of the IDE. In this chapter the basic subcomponents of an IDE core are examined. In chapter 5 more advanced IDE features are analysed. 3.1 Parser
One of the primary subcomponents of the core is the language parser. The parsers job is to recognize the language from a source le, while creating a in-memory structured representation of the source code. Aho, Sethi, and Ullman [5,17] distinguish between two kinds of representations: an abstract syntax tree (AST) and a parse tree7 . The AST diers from the parse tree in that the nodes that are redundant or irrelevant for the program structure are removed or
6 7
Such as, for example, GNU Debugger (GDB), which is what CDT uses. Also called Concrete Syntax Tree (CST).
simplied, whereas the parse tree is a structure directly or very closely obtained from the parsers grammar productions. Creating a parser is a task fairly outside the scope of the Eclipse Platform, and as such, Eclipse does not oer direct support for it: the parser is coded independently of Eclipse. There are several options to do so: pre-existing external compiler tools can be used to parse the AST (if such tools exists for the target language); the parser can be developed from scratch; or the parser can be created using a parser generator. Here is a list (by no means extensive) of some parser generators that generate code in the Java language: ANTLR8 - Parser generator developed by Terrence Parr with over 15 years of development. Very well supported and documented. Generares LL(*) parsers as of version 3. JikesPG9 - Parser generator that is part of the Jikes compiler. Fast but not well documented and no longer actively developed as of 2006. Still, it is the one used by JDT. JavaCC10 - Java-based LL(1) parser. SableCC11 - Java-based LALR(1) parser generator. 3.2 AST
One of the central data structures that the core must manage is the AST. The AST, as mentioned before, is created by the parser from a source le, and is a structured, semantic representation of the source code, in tree form. At the very minimum, the ASTs nodes contain information such as the node type, children nodes and their role, and the position in the source code where the node originates. But since it is the AST structure (and related data types) that allow for many of the advanced IDE features (like code assist, code refactoring, etc.), then the more of these features one wants the IDE to support, the greater the degree of complexity and amount of information that the AST must hold. In section 5 the design issues and functional requisites of the AST required for such advanced features are examined in greater detail. 3.3 Project Description
Another important data that the core manages is the project description. This data consists of all the information necessary to describe a project, such as the projects source folders, les and required libraries (generically called the build path12 ), compiler and build options, compile-time conditionals and variables, etc. This information is used not only to build a project, but some of it is also
8 9 10 11 12
ANTLR URL: http://www.antlr.org/ JikesPG URL: http://jikes.sourceforge.net/ JavaCC URL: http://javacc.dev.java.net SableCC URL: http://www.sablecc.org/ Or classpath in Java-specic terms.
necessary for other core IDE features. For example, in target languages with structured module systems13 , the build path information is necessary to create a complete language model, because the core needs to know which source les are part of the project.
3.4
Project Builder
Another core subcomponent is the project builder (or simply builder). Although not strictly required, this component is usually present in any non-trivial IDE, as it is better for the user experience if the builder is partially or fully integrated with the IDE, not requiring external action to invoke it. The builder is responsible for executing whatever steps are necessary for building and compiling the target artefact, such as an executable or a library. Additionally, it may also be in charge of processing the output of the building process and report any notable messages or events back to the UI. One such example are compilation errors and warnings, that are reported back to the Eclipse workbench in the form of problem markers14 . Eclipse builders can work in two ways: full or incremental. Incremental builders are builders that take into consideration only the elements that have changed since the last build, and produce the build output without doing a full build. This is capacitated in Eclipse by a mechanism that tracks and collects resource deltas and inputs them to the builder when requested.
3.5
Project Nature
Another task of the IDE core, is to dene a project nature for the target language. An Eclipse project nature is a mechanism that identies a workbench project with a particular characteristic (such as the projects language). Recall that the Eclipse platform can host several IDEs, each targeting their own language, and as such a project can belong to any of these languages. A project nature identies to which language the project actually belongs. This allows Eclipse to know for which projects it should enable a particular IDE and its respective features (such as UI elements). A project can have multiple natures, as is the case, for example, of a PDE project, which has a PDE nature as well as a Java nature. A project nature is also an adequate and common way to attach and congure (and possibly de-attach as well) a builder for that project[6]. When a nature is added to a project (which usually happens during project creation), then that nature congures an appropriate builder for the project (and does the inverse in the case of nature removal).
13 14
like Java, but not like C/C++. Eclipse markers are a mechanism to annotate resources with various informations, such as compiler messages, to-do list, bookmarks, breakpoints, etc.
UI Integration
The next integration element is the UI component of the IDE. This component consists of elements such as views, outline, actions and menus, preference pages, wizards, and of course the editor itself. Figure 3 illustrates the various UI elements of the Eclipse workbench. Most of these UI subcomponents are simple and straightforward to implement, and one can easily learn how to create them by looking at implementation examples or the respective documentation. But some subcomponents are a bit more intricate, namely the source code editor and its underlying text framework, so an overview of this framework and its customization mechanism is given.
4.1
The editor
An Eclipse editor is a contribution to the org.eclipse.ui.editors extension point, together with a class implementation of the org.eclipse.ui
Creating IDEs for the Eclipse Platform
.IEditorPart interface, which species the required interface for any kind of editor, be it a text editor, a visual editor, a multi-page editor, etc. To create a custom editor, one could implement this interface from scratch, but that is not the recommend approach for source code editors, since Eclipse provides us with pre-existing abstract implementations for this kind of editor. Namely, there is the class org.eclipse.ui.texteditor.AbstractTextEditor for generic text editors, and its subclass org.eclipse.ui.texteditor .AbstractDecoratedTextEditor for source code editors. An AbstractTextEditor has two main components of interest, the source viewer (org.eclipse.jface.text.source.SourceViewer) and the document (org.eclipse.jface.text.IDocument). The SourceViewer is a JFace adapter that wraps the raw SWT15 widget StyledText into a more high-level abstraction. The IDocument instance represents the document which is the input to the editor and source viewer. These components are structured in a Model-ViewController (MVC) design. The IDocument instance is the domain Model (which wraps both language model information as well as textual information). The SourceViewer is the Controller (despite its name) and the View is the underlying StyledText widget. The SourceViewer is then responsible for updating the views presentation according to changes in the input document and viceversa. The editor is integrated tightly with these MVC components, and provides a layer of interaction between them and the Eclipse workbench context, meaning it controls the contributions and interactions with the Eclipse workbench. Eclipse editors follow an open-save-close lifecycle. 4.2 IDocument
An IDocument is a JFace component which holds the textual data of its underlying input source (usually a le). It provides support for: Text Manipulation - Modifying the underlying source text. Line Information - Providing the line number for a given oset. Positions - Maintaining information about several positions of interest in the document, and updating them during document changes (these positions are actually ranges). Partitions - Maintaining information about document partitions. Document change listeners and document partition change listeners - Keeping a list of document and partition change listeners, and notifying them when their respective events occur. An IDocument allows itself to be divided into several non-overlapping regions called partitions, where each partition has an associated partition type (also called content type). These partitions represent logical divisions in the documents text (such as comments or strings in a source code le), and are used in the conguration of several features of the editor and SourceViewer. For example, features such as syntax highlighting and code completion are congured
15
dierently for each partition type. An IDocument is set up with an instance of a partitioner, a class that knows how to calculate the documents partitions, as well as update them when there are document changes. An IDocument instance is not directly provided to an editor. Instead the editor uses an org.eclipse.jface.text.IDocumentProvider implementation to create an IDocument instance from the given editor input. For example, when the input is a le (as is the usual case), the IDocumentProvider acts as a persistance manager, and is responsible for loading and saving the IDocument from the the lesystem (the storage medium), keeping track of already created documents, and notifying interested listeners of changes in the lesystem. Eclipse provides a default implementation for this kind of document provider, which is org.eclipse.ui.editors.text.FileDocumentProvider. An editors IDocumentProvider is also responsible for creating the annotation model from the editor input, as well as setting up the partitioner for the created editor document. For this reason the IDocumentProvider is one of the rst entry points for editor customization, usually done by subclassing a class such as FileDocumentProvider and extending the appropriate methods. 4.3 Source Viewer
The SourceViewer is the editors main component, as it is responsible for the editors document presentation and editing features. The SourceViewer abstracts away many of the base functionality common to source code editors, thus oering an extensive component for customization of common editor features (such as syntax highlighting, code completion, hovering, etc.). To implement custom editor features, one subclasses not the SourceViewer itself, but the org.eclipse.jface.text.source.SourceViewerConfiguration class, a component of a SourceViewer, which is the central point for editor customizations. The editors SourceViewer is congured with the custom SourceViewerConfiguration, when the editor is initialized. SourceViewerConfiguration contains a series of getter methods, each responsible for one particular editor feature. To implement such a feature, one overrides the respective getter method, making it return a custom class that will further control how the feature will operate. The available features to implement are: Syntax Highlighting - Highlights regions of the text according to a damagerepair mechanism and a token scanner. Method: IPresentationReconciler getPresentationReconciler() Text Hovers - Displays small tooltips in the editor area presenting information about the current selection or the element the mouse is pointing to. Usually this information is the documentation comments of methods, classes, etc. Method: ITextHover getTextHover() Auto Edits - Congures various auto edits, which are automatic textual changes that the editor performs, like adding a closing brace or quotes when the openCreating IDEs for the Eclipse Platform
ing one is typed. Method: IAutoEditStrategy[] getAutoEditStrategies() Code Completion - Displays a list of possible completions for a method, variable or class that the user is typing, according to the partial name already entered. This same extension option is responsible for code templates, which are pre-dened pieces of code that are inserted into the source, when a code template is requested to be inserted during code completion (code templates appear in the same popup list as code completion proposals). Method: IContentAssistant getContentAssistant() Double-Click Selection - Enables language-aware selections when double-clicking in the editor. This feature can select identiers (according to what the language considers identiers), the text range between braces or parenthesis, or any other selection deemed useful. Method: ITextDoubleClickStrategy getDoubleClickStrategy() Hyperlink detection - Detects if what the mouse is pointing to in the text editor is a hyperlink or not, and if so, what to do if the user requests to follow the link. The link can be language neutral links like URL addresses inside strings or comments, or language specic links such as a link from an entity reference to its denition (an alternative way to invoke the Go To Denition action). Method: IHyperlinkDetector[] getHyperlinkDetectors() Code Formatting - Denes a formatter for the editor. The formatter reorganizes certain textual elements of the source le, such as indentation, spaces, newlines, etc., in order to make the code prettier and more presentable to the user. Method: IContentFormatter getContentFormatter() Quick Fixes - Quick xes are a feature, similar to code completion, where the user selects a problem in the project (such as a compilation error, name mismatch, type conict, etc.), and is presented with a list of x proposals for that given problem. There can be multiple x proposals for each problem, being up to the user to choose one to be applied. Method: IQuickAssistAssistant getQuickAssistAssistant() Model Reconciler - This conguration option allows one to specify a class that will be in charge of reconciling the language model (such as the AST or related structures) whenever textual changes occurs (such as the user typing new characters). The reconciler runs in an asynchronous way, and can be congured to operate in either incremental or non-incremental mode. In incremental mode the reconcilier collects the textual changes since the last update and creates a textual delta that is used to update the model. In non-incremental mode, the update is simply performed with the whole source le. See also Sect. 5.7. Method: IReconciler getReconciler()
Creating IDEs for the Eclipse Platform
10
Advanced Concepts
The cores data structures are the major point of support for most IDE functionalities. This section now describes various design issues and functional requirements necessary for advanced IDE features. Most of these considerations are Eclipse-independent. 5.1 Basic AST Design
The rst point to note about the AST (especially if one is trying to reuse existing compiler tools to parse the AST), is that the AST design must preserve all source code information relevant to the user, whereas a compiler only needs to work with the information necessary to generate compiled code. This means that language constructs such as comments or preprocessor directives, which can safely be ignored in a compiler AST, should be recorded in some way in the data structures generated for IDE usage. That is, the AST and overall language model of an IDE should be at the same abstraction level of that which is the user view of the source code. Code formatting is an example of one such feature which is dicult or impractical to implement without an adequate level of information in the IDEs data structures. Here is some of the basic information an AST node should have: Parent node: A link to the parent node. This allows traversing the tree in an upward as well as downward direction. Source range: The source range is the start position and end position in the source code where the node appears, usually coded as a character oset and length. This information allows the user to navigate back into the original source text, and is used for selections and other features. This range should start with the rst signicant character of the node, and end with the last signicant character. For maximum usefulness, the range should also be present in an all nodes, and be recursively consistent, such that the ranges of a nodes children should be non-overlapping, and contained in their parent nodes range. Compilation unit: To which compilation unit (if any) this AST node belongs to. 5.2 AST - Homogenous vs. Heterogenous tree
One design aspect of the AST is whether the tree should be a homogenous or heterogenous tree. A homogenous tree is one where there is only one class for all the nodes. In a heterogenous tree there are dierent classes for each node type, altough they still share a common parent class (usually named ASTNode, as JDT does). Homogenous trees are simpler and faster to implement, and are easier to traverse, but heterogenous trees make it easier to work with the particularities of each type of tree node, and so are ultimately considered as the most adequate
Creating IDEs for the Eclipse Platform
11
choice (both JDT, CDT, and nearly all non-trivial custom IDEs use heterogenous trees). To make traversing heterogenous tree nodes simpler, a Visitor pattern is usually employed, allowing concrete AST visitors to dynamically dispatch to dierent methods according to the type of a node. JDT introduced some useful additions to the basic visitor pattern([7], chapter 33): First, the visitor actually has two visit methods, visit() and endVisit(). visit() is called when descending into a node, and endVisit() is called after the nodes children are visited. visit() returns a boolean controlling whether the visitor should actually descend into the nodes children or not. Second, there are also two generic visit methods, preVisit() and postVisit(), which are used to traverse the node in a non-type specic way, as opposed to the normal visitor pattern. The overall invocation order is then[8]: 1. 2. 3. 4. 5. 5.3 preVisit(ASTNode node) visit(<some subclass of ASTNode> node) Now visit the nodes children if visit() return true. endVisit(<some subclass of ASTNode> node) postVisit(ASTNode node) Parser
A basic language parser is not too dicult to implement, however, there are certain IDE features that may require special parser capabilites not present in a simple parser. Two such capabilites are error-recovery and partial parsing. Error-recovery is the ability for the parser to generate an incomplete, but still meaningful AST tree in the presence of syntax errors. This becomes important in keeping the IDE functional in the moments when the user is editing a source le and has an incomplete, syntatically invalid le. When the parser is capable of doing error recovery, the AST is usually annotated with information of whether the AST node come from sintatically valid source or not. Partial parsing is the ability to parse only a segment of the source le, where this segment corresponds to a complete syntax element, such as a statement or declaration. This ability becomes useful when one wants to obtain information about a given language element (like a function declaration for example), but does not want to re-parse the whole compilation unit. Note that doing partial parsing requires knowing beforehand the source range of the element one wants to parse. 5.4 Entity References
In an AST, many of the AST nodes represent references to other named language elements. As such, a key aspect of the language model is the ability to, given a reference in the code, nd the referenced entity. These references are called bindings in JDT, and they are used by many other IDE features. A basic example is the Go To Denition operation, where given a selected type or variable
Creating IDEs for the Eclipse Platform
12
reference, the cursor and editor focus is set to the location where the referenced entity is dened. A more advanced example is the inverse operation: given an entity denition, nd all references to that entity. In most languages, an entity reference is usually restricted in the kind of elements it can refer to, such as a type kind or a variable kind. For example, a name reference appearing in a mathematical expression should refer to a variable entity, and not a type entity. It is useful to specify these restrictions in the reference structures of the language model. The references may also be categorized in more ne-grained types, according to the roles they play in the nodes they appear[8]. Here are some possible roles, taken from JDTs bindings: 1. Name reference - An explicit name reference in an AST to a type, function, variable, etc. 2. Type reference - the type of an AST expression. 3. SuperType reference - the super type of a given type. 4. Thrown exceptions - the expressions thrown by an expression or statement. 5. Members reference - the members of a given type. An AST node can have more than one such reference. For instance, an expression node can have a reference to the type of the expression, as well as a name reference to a variable present in the expression. Usually the validity of references ends when there are model changes, as is the case in JDT[9]. 5.5 DOM AST
An advanced AST technique is to design the AST in the form of a Document Object Model (DOM). The term DOM comes the XML/HTML DOM, and is dened as a platform and language-neutral interface, that denes a standard model of the logical structure of a document, as well as a standard interface for accessing and manipulating that structure[5,18]. The term DOM has been generalized from XML to any structure that ts that denition, such as a DOM AST. The benets of a rich AST manipulation mechanism such as a DOM AST are manifest in AST manipulation operations, particularly refactoring operations which are usually complex in nature. Designing an AST with DOM capabilities means the following in terms of code: Instead of just coding the children of an AST node as class elds of that node, such as this: class WhileStatement extends ASTNode { Expression condition; Statement body; ... one also denes in the node class certain structural information (called structural properties in JDT[8], just as in general XML DOM parlance), that for each of the nodes elements describe certain attributes such as the name of the element, whether the element is mandatory or not, the kind of element (single children,
Creating IDEs for the Eclipse Platform
13
list of children, or simple property), etc. Figure 4 shows an example of the structural properties of a Java method declaration, according to JDT. This structural information allows the IDE to conveniently and dynamically inspect, as well as manipulate, the structure of an AST. This functionality is similar to language reection, and in fact the reection capabilites of Java, the host language, would allow to do this in a certain degree, but the DOM mechanism is much more adequate.
5.6
Model scalability
An important issue with the language model, is scalability. Creating a complete model for a project, such as creating AST nodes for the source les, implies creating structures with expensive memory footprints. This can become problematic if a project is large and has many source les.
Creating IDEs for the Eclipse Platform
14
Most IDE operations, like the outline view, navigator view, code completion, etc., revolve around the manipulation of named language elements only (such as classes, methods, variables, etc.) and do not descend into more precise AST elements such as statements, expressions, and their various children. In fact, an AST is a too ne-grained structure for most IDE operations, and its use in such situations is a signicant memory waste. Realizing this, JDT has separate data structures for named language elements, the set of which is called the Java Model, and which is used instead of AST nodes for many IDE features. In fact, full ASTs are usually only used in structured code manipulation and editing (such as in refactoring). These data structures are lightweight: they have minimal info, such as the name of the element, the elements parent, the children, and the source range (if applicable). In JDT, the Java Model also includes named language elements that exist outside of source code text, such as packages, and the compilation units themselves. Even being much lighter than the AST structure, computing this name model for the whole project might still be too expensive. As such, an additional strategy that can be employed is the use of lazy-loading[10] (which again is the case in JDT). Instead of computing a model for the whole project, initially only the les in which the user is working on have their element information computed. If during the course of IDE interaction some operations (such as opening a new compilation unit, or requesting completion assistance) require information of another language element, then only then will the full element information be retrieved. This is called opening the element. This lazy loading mechanism is then complemented with caching of the open elements, so that when a certain limit of open elements is reached, a cache manager tries to close some of the open ones, so as to keep the overall memory footprint from growing during the course of time. In JDT this cache is an unbounded Least Recently Used (LRU) cache. 5.7 Model updating
Another important aspect of the IDE architecture is the model update behavior. Since an IDE is interactive, the user will often be modifying the source code, which will cause the underlying language model to become outdated. The model will then need to be re-calculated to reect the latest changes. This process can be done in several ways: The model is updated on each project build or le save. The model is updated on the y, as the user alters the text le. The model is updated on a xed interval, like 500ms, or 200ms. The rst alternative is the most simple behavior, and the easiest to implement, but it is also the least satisfactory one, as the user must manually invoke a build or save for the model to be updated. In the second alternative the model is updated constantly, with each new keystroke. This may seem the ideal behavior, but it has a notable problem:
Creating IDEs for the Eclipse Platform
15
parsing is a costly operation, and performing it with each new keystroke will tax the Eclipse UI to the point where it will become sluggish or unresponsive. For this behavior to work properly, a very fast and scalabe parser is necessary, which usually means the parser needs to be incremental[10] (i.e., able to create the new model without re-parsing the whole le). However, building such a parser is a very complicated task. Not even JDT updates the model in this way. Instead, the third alternative, which is to update the model periodically on a small interval, is the most common and adequate method for most languages. The model is updated on a background thread, so that even in the moments it is active the UI wont be slowed down. Eclipse provides some support for any of these two behaviors, in the form of what is called the source Reconcilier (see the SourceViewer Model Reconciler option, Sect. 4.3). 5.8 Refactoring
Refactoring is the process of altering existing code in order to improve its readability or internal structure, but without the explicit intent of altering its external behavior[11]. Examples of refactoring operations are: renaming or moving a method or variable; extracting a method; extracting a superclass; etc. Refactoring is regarded as a highly valuable software development technique, but its benets are hard to realize without the support of automated refactoring tools[12]. Performing a refactoring manually requires various tedious code alterations, as well as the creation of a suite of tests to ensure that the refactoring change was performed correctly and did not introduce a bug[11]. One of the rst languages to support automated refactoring was Smalltalk, with its Refactoring Browser tool. According to Opdyke (in [11], Chapter 14), this tool, which initially was a stand-alone tool, was actually mostly ignored until it was integrated into the Smalltalk development tool. This illustrates the importance of having this feature in an IDE. Language Toolkit The vast support in JDT for various automated refactorings of Java code, has made this technique popular amongst Java developers, and has signicantly raised the bar for the code manipulation support of other IDEs, Eclipse-based or not. Recognizing the importance of refactoring, the JDT team has abstracted the generic functionalities for refactoring into a language neutral layer of the Eclipse Platform, which is called the Language Toolkit (LTK), and can be found in the org.eclipse.ltk.core.refactoring and org.eclipse.ltk.ui.refactoring plug-ins. These plugins oers generic refactoring support in three main aspects: refactoring lifecycle, refactoring participants, and UI support[13]: Refactoring lifecycle - LTK provides a framework which controls the lifecycle of a refactoring, where each phase of the lifecycle must be explicitly dened by each new refactoring operation. The phases are:
Creating IDEs for the Eclipse Platform
16
1. Checking the validity of initial conditions, such as if the selected element supports the requested refactoring operation, or if the source le is writable. 2. Requesting additional user input, if necessary. 3. Checking for validity of nal conditions, so as to ensure the refactoring can be applied correctly (i.e., without altering program behavior). 4. Create the set of textual changes of the refactoring. 5. Display a preview of the changes to the user so that he may review them and conrm the refactoring application. Refactoring participants - Sometimes a code entity can be referred in disparate language domains, such as a Java variable being used in a JSP le, or a Java JNI16 method which is linked to a C function. Usually an IDE is only aware of one of these contexts, and consequently only knows how to perform refactoring on that context. Other contexts must be updated seperately. LTK provides support for what it calls the refactoring participants, a mechanism that allows other plug-ins to selectively take part in refactoring operations via plug-in extensions, thus eectively enabling automated refactoring across language domains. UI component for refactoring - Finally, LTK presents UI support for a refactoring operation, which consists of an UI wizard dialog, that controls the advance of each step of the refactoring, provides user input, and presents a preview dialogue that displays the textual changes that the refactoring will perform (see Fig. 5). Of course, the actual logic of the refactoring operations must be implemented by the IDE developer, for each desired operation. Fowler details in his book[11] the necessary steps to perform the most common refactoring operations in Object Oriented code.
Conclusions
The development tools that a language has available, particularly the IDE, are of vital importance to the productivity and success of that language. The Eclipse Platform is an extensible IDE development framework that by abstracting away common IDE functionality, oers great potential to those interested in creating an IDE for a new language. Learning each aspect of the Eclipse Platform may take a while, but in the end, it will surely compensate in terms of the development work saved, and in the infrastructure and functionality provided by Eclipse. A modern IDE is expected to have several features, such as a code editor, syntax highlighting, a le/project explorer, outline, program builder, code completion, debugger, code formatting, and in advanced IDEs, refactoring. All of these features have some degree of support in the Eclipse Platform framework.
16
Java Native Interface, a protocol which allows for Java methods to be implemented by C functions.
17
18
To enable that support, it is necessary to extend the framework with implementations of language-specic parts of the IDE. An Eclipse-based IDE has two main components, the Core and the UI. The Core is the most extensive component, since the majority of it is composed of language-specic functionality. The UI on the otherhand, has most of its base functionality enabled by the Eclipse Platform, as most UI elements are common to any IDE. The implementation of the Core is then a central task in IDE creation, and a fundamental aspect of this task is the creation of a model that represents the target languages projects and source code. This model must be increasingly more advanced and detailed as more advanced IDE features are to be supported. For example, navigation features such as Go To Denition or its reverse, require a proper code reference management mechanism. Refactoring on the other hand requires a rich model manipulation mechanism. The IDE core must also take into consideration some important non-functional requirements, such as performance and scalabity, which in turn also impose that certain specic functionality be added to the model. Finally, as the result of all these requirements, the IDEs model may ultimately be divided in diferent structures, each specialized for a particular use, such as an AST structure for code manipulation, and lightweight element references for navigation/assistance of named language elements.
19
References
1. G. Booch, A. Brown: Collaborative Development Environments. Advances in Computers, Vol. 59, Academic Press, (2003). (http://www.booch.com/architecture/ blog/artifacts/CDE.pdf) 2. J. des Rivires, J. Wiegand: Eclipse: A platform for integrating development tools. e IBM Systems Journal, Vol. 43, No. 2, pp. 371383, (2004). (http://researchweb. watson.ibm.com/journal/sj/432/desrivieres.html) 3. Eclipse Platform Technical Overview. Eclipse Corner Whitepaper (2006). (http://www.eclipse.org/articles/Whitepaper-Platform-3.1/ eclipse-platform-whitepaper.html) 4. G. Goth: Beware the March of This IDE - Eclipse Is Overshadowing Other Tool Technologies: IEEE Software, Vol. 22, No. 4, pp. 108111, (2005). (http: //ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=1463218) 5. J. Arthorne, C. Lara: Ocial Eclipse 3.0 FAQs. Addison-Wesley, (2004). 6. E. Clayberg, D. Rubel: Eclipse: Building Commercial-Quality Plug-Ins. AddisonWesley, (2004) (http://www.qualityeclipse.com) 7. E. Gamma, K. Beck: Contributing to Eclipse: Principles, Patterns, and Plug-Ins. Addison-Wesley, (2003) 8. Kuhn T., Thomann O. : Abstract Syntax Tree. Eclipse Corner Articles, (2006). (http://www.eclipse.org/articles/Article-JavaCodeManipulation AST/index. html) 9. M. Aeschlimann, D. Bumer,J. Lanneluc: Java Tool Smithing - Extending the a Eclipse Java Development Tools. In EclipseCon 2005 presentation, (2005). (http: //eclipsecon.org/2005/presentations/EclipseCON2005 Tutorial29.pdf) 10. P. Deva: Create a commercial-quality Eclipse IDE, Part 1, 2 and 3. IBM developerWorks, (2006). (http://www.ibm.com/developerworks/edu/ os-dw-os-ecl-commplgin1.html) 11. M. Fowler, K. Beck, J. Brant, W. Opdyke, D. Roberts: Refactoring - Improving the design of existing code. Addison-Wesley, (2004). 12. W. Opdyke: Refactoring Object-Oriented Frameworks. Ph.D. thesis, University of Illinois, (1992). (ftp://st.cs.uiuc.edu/pub/papers/refactoring/ opdyke-thesis.ps.Z) 13. Frenzel L. : The Language Toolkit: An API for Automated Refactorings in Eclipsebased IDEs. Eclipse Corner Articles (originally in Eclipse Magazin, Vol. 5, January 2006), (2006). (http://www.eclipse.org/articles/Article-LTK/ltk.html) 14. The Eclipse website at www.eclipse.org (2006). 15. The Eclipse JDT source code at CVS://:pserver:[email protected]: /cvsroot/eclipse (2006). 16. Eclipse Help - Platform Plug-in Developers Guide. http://help.eclipse.org/ help31/index.jsp (2006). 17. A. Aho,R. Sathi, J. Ullman : Compilers, Principles, Techniques, and Tools. Addison-Wesley, (1986). 18. W3C (World Wide Web Consortium): Document Object Model (DOM) Level 1 Specication. W3C Technical report, (1998) .(http://www.w3.org/TR/1998/ REC-DOM-Level-1-19981001/)
20