Open Source Web Content Management in Java
Open Source Web Content Management in Java
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Acknowledgements
Thanks to the following people for reviewing sections of this report for accuracy Elie Auvray (Jahia) Kevin Cochrane (Alfresco) Arj Cahn Alexander Kandzior (OpenCms) Boris Kraft (Magnolia) Steven Noels (Daisy) Jennifer Gottlieb provided copyedit services and general encouragement to help me complete this report. Glenn Barnett customized the XSL style sheets used to format the report. Cover Art The photograph used on the cover was taken by Tan Quang Tuan [http://www.flickr.com/ photos/e8club/] and published under the Creative Commons Attribution 2.0 License on Flickr.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Table of Contents
1. Introduction ................................................................................................................. 1 The Demand for Open Source Java Web Content Management ................................. 1 The Need for This Report ........................................................................................ 2 Organization and Methodology ................................................................................. 2 2. Open Source WCM Marketplace .................................................................................. 5 State of the Industry: Web Content Management ....................................................... 5 Market Characteristics and Trends .................................................................... 5 Core Enterprise Requirements .......................................................................... 8 Market Summary ........................................................................................... 12 Open Source Market Segmentation ........................................................................ 13 Community Open Source ............................................................................... 13 Institutional Open Source ............................................................................... 16 Commercial Open Source .............................................................................. 17 3. Product Evaluations ................................................................................................... 21 Informational Brochure ........................................................................................... 21 What Makes a Good Informational Brochure Platform? ..................................... 21 Informational Brochure Platform Market Overview ............................................ 26 Apache Lenya 2.0 .......................................................................................... 27 Daisy 2.1 ....................................................................................................... 43 Magnolia 3.5 Enterprise ................................................................................. 62 OpenCms 7.0.3 ............................................................................................. 78 Informational Brochure Platform Summary ....................................................... 95 Web Content Management Framework ................................................................... 97 What Makes a Good WCM Framework? ......................................................... 98 WCM Framework Market Overview ............................................................... 102 Alfresco 2.2 WCM ........................................................................................ 104 Hippo CMS 6.05.02 ...................................................................................... 123 Jahia Enterprise 5.0 ..................................................................................... 139 WCM Framework Market Summary ............................................................... 154 Round Up ............................................................................................................ 156 Comparing with Commercial Products ........................................................... 157 Selecting a CMS and Beyond ....................................................................... 159 Glossary ...................................................................................................................... 161
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
List of Figures
3.1. Lenya Architecture Diagram: Use Case Framework ................................................... 30 3.2. Lenya Screenshot: Edit Menu .................................................................................. 32 3.3. Lenya Screenshot: BXE Editor ................................................................................. 33 3.4. Lenya Screenshot: Kupu Editor ................................................................................ 34 3.5. Lenya Screenshot: Editing Structured Content in BXE ............................................... 35 3.6. Lenya Screenshot: Site Tab ..................................................................................... 35 3.7. Lenya Screenshot: Lenya Localization ...................................................................... 36 3.8. Lenya Screenshot: Image Dialog ............................................................................. 37 3.9. Lenya Screenshot: Workflow Syntax ........................................................................ 38 3.10. Lenya Screenshot: Edit Permissions ....................................................................... 39 3.11. Daisy Architecture Diagram: Daisy Architecture ....................................................... 46 3.12. Daisy Repository Server Architecture ...................................................................... 48 3.13. Daisy Screenshot: Defining Field Types .................................................................. 50 3.14. Daisy Screenshot: Content Actions Menu ............................................................... 51 3.15. Daisy Screenshot: Link Builder ............................................................................... 53 3.16. Daisy Screenshot: Editing a Navigation Document ................................................... 54 3.17. Daisy Screenshot: Editing Image Properties ............................................................ 55 3.18. Daisy Screenshot: Daisy Diff .................................................................................. 56 3.19. Daisy Screenshot: Defining ACLs ........................................................................... 57 3.20. Daisy Screenshot: Faceted Browsing ...................................................................... 58 3.21. Magnolia Screenshot: Configure Subscribers .......................................................... 66 3.22. Magnolia Screenshot: Browsing in AdminCentral ..................................................... 67 3.23. Magnolia Screenshot: Page Layout ........................................................................ 68 3.24. Magnolia Screenshot: Edit Dialog ........................................................................... 69 3.25. Magnolia Screenshot: Localized Edit Dialog ............................................................ 70 3.26. Magnolia Screenshot: Site Designer ....................................................................... 73 3.27. Magnolia Screenshot: Configure Cache .................................................................. 74 3.28. OpenCms Screenshot: Editing Structured Content ................................................... 81 3.29. OpenCms Screenshot: Configure Search Index ....................................................... 82 3.30. OpenCms Screenshot: Database Replication Module .............................................. 83 3.31. OpenCms Screenshot: OpenCms Workplace Interface ............................................ 84 3.32. OpenCms Screenshot: Editing XML Pages ............................................................. 85 3.33. OpenCms Screenshot: Link Checking ..................................................................... 86 3.34. OpenCms Screenshot: Localizing Content .............................................................. 86 3.35. OpenCms Screenshot: Direct Edit Interface ............................................................ 87 3.36. OpenCms Screenshot: Insert Image ....................................................................... 88 3.37. OpenCms Screenshot: OCEE LDAP Connector ...................................................... 89 3.38. OpenCms Screenshot: Content Tools ..................................................................... 90 3.39. Architecture Diagram: Structured Publishing .......................................................... 100 3.40. Alfresco Architecture Diagram .............................................................................. 106 3.41. Alfresco Architecture Diagram: Repository Services ............................................... 108 3.42. Alfresco Screenshot: Web Content Properties ....................................................... 110 3.43. Alfresco Screenshot: Browse Site View ................................................................. 112 3.44. Afresco Screenshot: Sand Boxes ......................................................................... 113 3.45. Alfresco Screenshot: TinyMCE Formatting Buttons ................................................ 114 3.46. Alfesco Screenshot: Image Position Dialog ........................................................... 114 3.47. Alfresco Screenshot: Workflow Dialog ................................................................... 116 3.48. Alfresco Screenshot: Managing Permissons .......................................................... 117 3.49. Alfresco Static Deploy Model Diagram .................................................................. 118 3.50. High Level Hippo Architecture Diagram ................................................................. 125 3.51. Hippo Architecture Diagram: Forms Generation Architecture .................................. 127 Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page iii
Open Source Web Content Management Options in Java 3.52. 3.53. 3.54. 3.55. 3.56. 3.57. 3.58. 3.59. 3.60. 3.61. 3.62. 3.63. 3.64. 3.65. 3.66. Hippo Screenshot: Taxonomy Browser ................................................................. Hippo Screenshot: Document Browse Interface ..................................................... Hippo Screenshot: Xopus Editor Integration .......................................................... Hippo Screenshot: Managing Permissions ............................................................ Hippo Screenshot: To Do List .............................................................................. Hippo Screenshot: JSF Repository Browser Demo ................................................ Jahia Architecture Diagram: Distributed Architecture .............................................. Jahia Code Sample: Content Type Definition ........................................................ Jahia Screenshot: In-Context Content Management ............................................... Jahia Screenshot: Forms Based Editing ................................................................ Jahia Screenshot: Advanced Search Form ............................................................ Jahia Screenshot: Version Differences .................................................................. Jahia Screenshot: Workflow Approval Page .......................................................... Jahia Screenshot: Field Level Access Control ....................................................... Jahia Screenshot: Personal Portal Page ............................................................... 128 130 131 133 134 135 143 143 144 145 146 147 148 149 150
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
List of Tables
1.1. High Level Summary of Products Reviewed ............................................................... 3 1.2. Scoring Key .............................................................................................................. 4 2.1. How Community Projects are Governed ................................................................... 13 2.2. Commercial Open Source Revenue Sources ............................................................ 18 3.1. Informational Brochure Strengths and Weaknesses ................................................... 26 3.2. Lenya Project Overview ........................................................................................... 28 3.3. Lenya 2.0 Summary ................................................................................................ 41 3.4. Daisy Project Overview ........................................................................................... 44 3.5. Daisy 2.1 Summary ................................................................................................. 60 3.6. Magnolia Project Overview ...................................................................................... 63 3.7. Magnolia 3.5 Enterprise Summary ............................................................................ 76 3.8. OpenCms Project Overview ..................................................................................... 79 3.9. OpenCms 7.0.3 Summary ....................................................................................... 93 3.10. Informational Brochure Score Summary .................................................................. 96 3.11. Informational Brochure Strengths and Weaknesses ............................................... 102 3.12. Alfresco Enterprise Project Overview .................................................................... 105 3.13. Alfresco 2.2 Summary ......................................................................................... 121 3.14. Hippo CMS Project Overview ............................................................................... 124 3.15. Hippo 6.05.02 Summary ...................................................................................... 137 3.16. Jahia Enterprise Project Overview ........................................................................ 140 3.17. Jahia 5.0.3 Summary ........................................................................................... 152 3.18. Informational Brochure Score Summary ................................................................ 155
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Chapter 1. Introduction
The Demand for Open Source Java Web Content Management
Not long ago, companies looking for an open source Java web content management system (WCM) had limited options. While the open source content management system (CMS) community as a whole was thriving, most of the activity was on the PHP and Python stacks. The main Java options were Apache Lenya and OpenCMS. If you wanted a simple, widely used technology that your users would like, neither of these options looked very attractive. This state of the market was frustrating for many companies that had standardized on the Java platform and wanted to take advantages of the opportunities afforded by open source content technologies. The building blocks have been available for a long time. The Java world is rich with frameworks that provide core services like persistence, access control, data validation, and presentation. Many companies have used these components to build custom systems that fit their needs. However, these homegrown systems tend to languish without a continuous commitment to maintenance and enhancement. Adding certain core content management features can be prohibitively complex. For example, adding versioning and/or localization to a data model that was not originally designed for it can disrupt the whole application. Furthermore many in house development teams building these systems do not have the wealth of subject matter experience that a dedicated content technology development team would have. Lessons learned can only be applied in the next release of the application - if there is one. The state of the market is rapidly changing. More products are emerging and some of the older projects are seeing a resurgence. The momentum behind Java web content management (WCM) technologies started to surge in early 2006 when open source business applications began to get the attention of enterprise buyers who were having success with infrastructure products like Linux, Apache, and MySQL. Java was a natural requirement for large enterprises who had standardized on the language. At the same time, commercial open source vendors were starting to notch up their offerings and connect with these interested buyers. Many companies are reporting successful implementations using a new breed of Java WCM technologies. If you were disappointed the last time you looked for a Java web content management platform, it may be time to look again. Companies that have successfully implemented solutions based on these platforms talk of lower project start-up costs and similar (not greater) integration and maintenance costs. Typically, they have strong development teams or rely on systems integrators to manage the systems for them. These same companies tend to have a history of frustration with commercial software because they do not feel that the value is commensurate with the licensing costs (because they spend so much time or money doing integration work) or they feel under-served by technical support and would like to be less dependent. Companies have found the greatest leverage using open software to power basic informational web sites and also to provide content management services to highly dynamic, transactional or interactive web applications. As you will see in the pages of this report, the Java open source content management marketplace is rich with options in these categories. While the Java products still lag PHP and Python based systems in terms of social media oriented features and community size, they have good support for the more fundamental content Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 1
Introduction
management functionality and several of these products offer the assurance of commercial support packages.
Introduction
positioning the each project in categories of use where it typically excels. In the above mentioned report, I used the categories: Informational Brochure Site, Online Periodical, Collaborative Workspace, and Online Community. The projects described in this report fall into two categories: Informational Brochure and Web Content Management (WCM) Platform. There is also a discussion of the overall web content management marketplace and how open source software fits in. Many of the products reviewed in this report are commercial open source meaning that a software company develops the product as part of their business strategy. For these products, I discuss how the company makes money off the software: whether they sell a commercial version of the software that is better than the free version ("tiered product") or whether the revenue comes entirely from selling support services for the free version.
6 5 3.5 7
For each of the projects reviewed in this report, I have subscribed to the mailing list and monitored the volume and nature of the activity. I have talked to users of the software. I have built prototypes that involve defining content types, setting permissions, and developing layouts. To ensure factual accuracy, each evaluation has been reviewed by a project committer or company officer. Within each evaluation, I discuss the architecture and integration potential, usability factors, the community, and how the project seems to be trending. For the business oriented reader, the content contribution and presentation sections describe how the application is used to manage content and what type of visitor facing functionality is possible. For the technical reader, the architecture and development sections describe how the product works behind the scenes and can be configured and integrated. Although, I do not give overall ratings of the product, I do rate each product along certain common criteria.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Introduction
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
are accompanied by other expensive and difficult to predict activities such as a site re-design or a corporate re-organization that complicate and delay the implementation work. While companies are used to spending money on the technology, they need to be careful not to divert resources away from the things that matter most: good content and good processes. When faced with these numbers, technology buyers often feel like they are not getting value out of the platform and may be tempted to try building something on their own. Companies that have gone this route, have found mixed results. To someone new to content management, a web content management system looks a lot like any other data management application, and most developers have built plenty of those. But content management is different, and developers usually discover these differences after it is too late to efficiently incorporate them into the design. [For more information on why content management is different from typical data management applications, see sidebar Homebrew CMS] Homebrew CMS Before you build the one billion and first CMS, here are some things that typically burn generalist architects in the process: Versioning. Frequently, the single requirement that kills a custom CMS is versioning especially if it is added in after the initial design. Versioning is hard. It is hard because it makes your data model more complicated. It is hard because it is a concept that most generalist architects haven't implemented before. There are all of these interesting nuances like how often to create a new version (with every save, or every time it is published?) or the need to link to a specific version of an asset or just the latest version. Localization. Localization isn't just about Unicode; it is a whole other dimension of your content repository. While adding versioning doubles the complexity of a data model, versioning combined with localization makes chaos if you are not careful. Does each translation have multiple versions? Or does each version have multiple translations? What language do you fall back to if you don't have a translation of an asset in the requested language? What is the relationship between the URLs of the translated sites? How do your presentation templates handle it when text runs right to left or up and down? Do all of the attributes of an asset need to be translated or can some things (like images) be shared? Deployment and dependency management. Content, especially web content, is interrelated. Pages reference images and have links to other pages. If you are going to deploy a piece of content to the presentation tier, what will you do if the related assets are not ready for publishing and/or not deployed? Would you even know? Usability. While the content management market cannot claim to have mastered usability, it probably spent more time refining user interfaces than you can afford to. Usability is probably the most common reason why companies abandon their home grown CMS. Access control. Most software systems are designed to manage access control by function, not by data. Most (although definitely not all) content management systems have figured out a manageable system for controlling permissions around data. Source: Homebrew CMS [http://contenthere.blogspot.com] Many potential content technology buyers are just looking to augment a custom web application with content management services so that the text and imagery of the application Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 6
does not have to be deployed as code. These buyers are usually quickly frustrated by the cost and limitations of commercial products. They don't intend to use many of the features that the WCM system offers and, as a development platform for building custom functionality, the system is sub-optimal. In addition, custom code built on a proprietary platform is not portable so there may be a considerable risk and expense of lock-in. The web content management framework section of this report describes these uses of a WCM platform.
Market Fragmentation
A few years ago there was a count of 1,800 software applications that called themselves content management systems; anecdotal information indicates that the number is growing and not shrinking. New products continue to emerge (I was at a conference in November 2007 and met two people who were starting to build new WCM products.) and old products don't seem to die. It is surprising how long a small CMS vendor can survive off the maintenance revenues from a tiny install base. As one long time content management veteran said, "the WCM market is in dire need of a Darwinian event." Market fragmentation is rife in the open source world, too (especially in the content management sector), and comes at a great cost: developer resources are spread too thinly across too many projects. But the absence of a "winner" in the commercial market takes away a safe, automatic choice and forces technology decision makers to look at alternatives. Every option appears equally risky from a market share perspective. The market has appeared to be on the verge of a massive consolidation for years. That it hasn't happened yet means that it will never happen - or that it is due. Interestingly, one of the few products that did disappear was at one time considered one of the safest bets: Microsoft CMS 2002.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
LDAP Integration
In a large company with thousands of employees, administering user accounts across many disparate systems is a real challenge. While a marketing brochure web platform may only have a few users, an IT organization would prefer being able to terminate access to it, along with every other system in one place: in the centralized LDAP directory. Most large enterprises will want to integrate their content management systems into their corporate LDAP user profile repository for authentication and authorization. All of the content management systems reviewed here support that feature. Authentication is the easy part, the complexity is in authorization (that is, determining what privileges a user has). Privileges are usually determined by the user's roles, which are either applied directly to the user or to a group that the user is a member of. The question becomes where to manage the groups and roles. Because the corporate LDAP is central to the whole company, it may not be easy for a content management initiative to insert the necessary information into this shared resource. For example, say you need an "author" group. The rest of the company might not care enough about this group to add it to the LDAP structure. There may also be collisions with other existing groups. A web author may be very different than a technical documentation author. Performance may also be at risk if the CMS needs to consult an external system every time it calculates whether a user can see an asset. All these issues can be worked out, but it takes collaboration and cooperation from across different departments and business units and that is neither quick nor easy to achieve. Another issue arises when there are external users that need access to the CMS but do not meet the criteria for an entry in the central LDAP repository. A common design is to have a local profile repository with a fall-back to the LDAP directory. In fact, nearly all systems keep a local store of user profiles. This is necessary because when a profile is removed from the LDAP directory, the CMS still needs to remember that user for ownership information and its audit and version histories. The design of the LDAP integration may determine where the roles and groups are managed. Some LDAP integration is done by regularly importing records from the central LDAP directory. If this is the case, any role or group assignment configured in the CMS will be overwritten the next time the user repository is refreshed. If the system uses a pluggable authentication architecture, it can consult external directories if the user is not in the local user repository. Once a user is authenticated against the external directory, a local account is created. However, the system should still verify authentication credentials with every login to ensure that the user has not been centrally de-activated.
Resiliency
The people who run data centers do not like midnight calls telling them the CMS is down; they have too many other things to worry about. They look for technologies that can be configured on multiple servers so that when one crashes the whole system does not become unavailable. Mission critical systems usually get deployed across geographically dispersed data centers so that a catastrophic event will not bring down the system. Typically, the single point of failure will be the database and that can be solved through database replication technology. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 8
Good systems administrators will also care about resistance to data corruption and the ability to back-up and restore the system. Again, the underlying components that support persistence usually are responsible for this. In a content management system, complications arise when content is stored across multiple systems that need to be in sync. Most commonly this will be the file system for binary files and the database for structured text content and metadata. Search indexes are a third place where data are stored but they are not so much of a problem because you can usually re-index the entire repository. Look for technologies that have documented back-up scripts and procedures. Ideally these will not require turning off the system to execute the back-ups. If live back-ups cannot be done, back-ups will be done less frequently increasing the potential for data loss. Another feature is de-coupling the authoring and delivery environments. This prevents intensive authoring activity from degrading the performance of the external web site and high traffic loads on the web site from making the authoring environment unresponsive.
Scalability
Even if the initial use of the system is not intensive, large companies tend to avoid applications with limited scalability. If it the application turns out to be successful and has the promise of an enterprise-wide deployment, the technology should not stand in the way of realizing a business opportunity. Of course, there is the case for deploying un-scalable prototypes to test the idea and then rebuilding the application if it has business potential. Google has experienced success with this model. Open source certainly fits into this strategy by taking licensing out of the experimentation and start-up costs. This report, however, makes the assumption that the buyer has a lower R&D budget than Google and is looking for a sustainable solution.
to data integration by replicating data across repositories. There are several reasons for this. First, data integration is brittle. Content technologies do not promise to keep their database schema the same like they commit to APIs. Secondly, a CMS may do a lot of processing when they add or update an asset, for example: firing an event to update links, clearing cache, and updating search indexes. If you follow the rules and integrate at the API level, the storage mechanism makes less of a difference. The only reason to care would be if you had database management competencies specialized on a particular product (like Oracle). In fact, compatibility with existing technical skills is often more important than system interoperability. In particular, the projects covered in this report rely heavily on specific open source web application frameworks: Cocoon, Struts, MyFaces, etc. Knowledge of these frameworks is very helpful. Having skill in the Java technology stack is critical if you expect to have any responsibility for managing the platform. Some Java WCM platforms are certified or known to run on specific servlet containers or application servers. If you happen to run Tomcat or JBoss, you are in good shape. The products in this report either ship with Tomcat or (in the case of the Cocoon based projects) have good documentation on how to deploy the application to a Tomcat container. Theoretically, if the application works on Tomcat, it should work any certified container. However, from the mailing lists, users of more elaborate J2EE server platforms (such as Websphere, WebLogic, or Sun) tend to have configuration questions that the general community is less prepared to answer.
Usability
Regardless of how extensive the platform is, unless the users perceive the user interface as being usable and intuitive, the solution will not be regarded as a success. Users will look for ways to avoid using (or misuse) the system and by doing so undermine the business value of the initiative. James Robertson from Step Two Designs astutely observed that enterprise applications need to be simpler and easier to use because users rarely get adequately trained during company-wide deployments (see article More Users = Simpler CMS [http:// www.steptwo.com.au/papers/cmb_moreequalssimpler/index.html]). This goes against the long standing trend in enterprise software that values number of features over usability. A revolution of sorts is going on where business users are starting to reject the assertion that enterprise software needs to be more complicated than consumer tools that they like to use. Project managers are forced to behave more like a commercial product managers than serving a captive audience that has no choices. Despite its importance, usability is hard to measure because it is so subjective. This report attempts to address some obvious strengths and weaknesses and common observations about each product's usability. Only your users, however, will be able to tell you if they consider the solution usable. The basic fact is that content management systems strive to solve a set of hard problems. There is an inherent conflict between the competing needs of the authors of content and the audience. An author just wants to spend time making the asset useful and pleasing to him and tends to focus on the content and layout of the asset. He tends to resent spending the time to add metadata and structuring the content that will make the asset easier to find by others and more re-usable. Systems that strictly enforce tasks that are not regarded as important or break the flow of producing content tend to be criticized for usability. Systems that are lax in this area become unmanageable and content becomes unfindable. Striking the right compromise between convenience and compliance is an art. Using open source technology may provide some advantages in the practice of that art but it will not solve the problem directly. Some open source adopters have found that the user interface is easier Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 10
to scale down and simplify than the commercial products that they have tried. Others point out that they used to money saved on license costs to refine the UI to the needs and tastes of their users. However, if the choice of an open source platform is made by developers because of an affinity toward open source or, for other technical reasons, there tends to be less interest in user satisfaction than the technical aspects of the system. The general assessment on the usability of open source software is mixed. On one hand, many of these applications are written by technical people for technical people and there is a tendency to neglect the sensibilities of a technophobe. However, these characteristics are not appreciably worse than commercial content management software that is also commonly criticized for usability issues. Open source WCM developers tend to be very interested in and excited by Web 2.0 and rich internet clients. Many have taken up the challenge to use technologies such as AJAX to address usability issues.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Market Summary
While companies are losing enthusiasm for managing their web content in "ECM" products that try to centralize all forms of content onto a single platform, the search for "enterprise grade" tools to manage web content has gotten much more intense. In the absence of inexpensive, safe commercial web content management technologies, many of the perceived advantages of commercial software seem to lose their influence over buying decisions. Companies are starting to add open source software to their selection matrices but are having a hard time evaluating open source alongside commercial products. Selection processes that depend on RFP responses and vendor demos choke on options with small or no sales and marketing bandwidth. Traditional analyst firms are equally confused trying to incorporate open sources in their analysis. Unless a selection process is adapted to fully explore open source, the commercial products typically win because of the allure of a polished and well executed demo. Investing in an open source proof of concept typically levels the playing field, but few companies make the investment unless there is a particular motivation such as a senior-level directive to carefully consider open source. This has essentially happened in many of the governments across Europe that have been mandated to use open source software wherever possible. Anecdotal evidence suggests that there is no correlation between customer success and the licensing model of the application. Open source implementations often fail for the same reasons as commercial software implementations: poor requirements gathering, ineffective scope management and change control, dependencies on other systems, and not enough user training. As discussed later in this report, open source describes a very wide range of products and business models. In many cases two open source products are no more like each other than like a commercial competitor. Both commercial software and open source software can be oversold. Just like commercial software buyers are misled by claims of features being out-of-the-box, companies that adopt open source because of unreasonable cost expectations tend to fail because they under invest in other aspects of the project. Companies with reasonable cost expectations and a good understanding of the strengths and limitations of the product they are using tend to fare better. It is not unlike the trend to use off-shore development resources. Companies that blindly pursu a pot of gold in low cost labor tend to abandon the idea after disastrous first experiences. Companies that take the time to understand the model, learn about best practices, and invest in the solution tend to have better outcomes.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
you do the implementation yourself, the community will be an important resource for you. If you are working through a systems integrator, the community will help them work efficiently and keep their interest in the platform. There has been much study on the dynamics of communities at both the micro and macro level. Despite all the theory, in practice it is hard to consciously try to build a community and even harder to predict whether a community will grow or decline. Successful communities typically have strong leadership, a mechanism for building social bonds, and a motivating factor (which is nearly always the ability to make money). Turnover is actually a good thing because it infuses the community with new ideas and energy.
Leadership
Leadership may be the most critical and rare ingredient in an open source community. Usually there is one member that is able to set a vision and make decisions. Community members need to trust the leader's motivations and judgement. They need to feel like they are heard and understand the rationale when their ideas are rejected. There may be multiple leaders but each needs to have his own area of control. Good leaders provide leadership opportunities for star contributors. For example, there may be a rotating role of "release manager" who presides over all the activities related to the release of a version of the software. Having one so critical member of the community is always a risk. What happens when the leader moves on? The best communities have a strong culture of meritocracy and a pipeline of new leaders. It has been said that the single best test of strength for a community is its ability to endure a change in leadership. Most of the big projects have not been tested in this way. The larger projects usually form a legal entity (called a "foundation" or an "association") to institutionalize leadership and make the will of an individual subservient to the organization. This introduces formal governance practices like a board of directors and transparent decision making processes. In addition to ensuring continuity, the establishment of a legal entity creates something that other organizations can interface and partner with. If this happens, the community project may transform into an institutional project (described next) where large corporations contribute to the project similar to a joint venture. While forming a legal entity distributes the privilege and responsibility of making decisions across a group of people, the need for strong individual leadership does not go away. The leader still needs to encourage and facilitate the activity of the board of directors and step in when dynamics get dysfunctional. There are some decisions that are hard to make by committee. For example, good user interface design is not a democratic process. There needs to be a visionary that keeps things clean and consistent. Maintaining this vision sometimes requires dictatorial behavior that only a trusted leader can pull off. Good examples of selfstyled "benevolent dictators" are Linus Torvalds whose efforts kept the Linux Kernel clean and stable, Dries Buytaert who kept the Drupal core thin and extensible with modules, and Alexander Limi who keeps the Plone user interface simple and compliant with accessibility standards. Projects that try to vote on every UI decision usually trend toward bloat and entropy.
Social Interaction
Programmers are not generally known for their social skills but social interaction is an important aspect of community based open source. Social interaction forms a foundation of trust that facilitates communication. Digital communication (like email, IRC, and message boards) is the mainstay of geographically distributed development teams but the most successful projects have face to face events that put a human face behind the email address or IRC handle. Some projects have local user groups. Others have global conferences, Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 14
activities, and programming sprints. It is not unusual to see people that no longer use the software listen in on the IRC channel or attend events for purely social reasons. Social interaction is also an important part of forming a culture and social bonds that keep participants engaged and compel each other to help each other out. These social dynamics also keep members in line. A member of one of the major PHP based WCM platforms noted that, since they encouraged users to put their real portraits on their forum profiles, members have become more friendly and helpful. When people participate on a personal level, they tend to be more accountable. Social activity also creates the opportunity for non-technical users of the application to get involved. Building and serving a non-technical community is a plateau that only a few of the open source content management projects have achieved. It is an important milestone because it allows for user input to be contributed directly in the users own words rather than as interpreted through a technical developer who filters the information through is own biases.
Economic Opportunity
The social aspects of a community are important but people can't afford to invest time and energy in a project if there is no potential to make money. Developers, by nature, are attracted to hot technologies to build marketable skills. Projects using dated technologies have a hard time attracting new people. Freelance consultants seek out projects that are widely used and need systems integration work. Fortunately, in the WCM space, all platforms (commercial, open source, and SaaS) need extensive systems integration work. Some open source platforms create opportunities for companies and individuals to sell products and services that enhance the core. For example, the Joomla! community had a very lively marketplace for themes and modules. Most of the money on the Joomla! platform was made by people selling themes (site branding modules). When the project leadership revoked the GPL exception that made add-on modules exempt from GPL licensing, there was an uproar. Like with most things, commitments must be kept.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
accountability. Just like with traditional commercial software, commercial open source has support and maintenance packages and account management staff to sell them. There are essentially three ways to make money off open source software: selling a commercially licensed "enterprise" version in addition to the free "community" version; selling support and training; and selling integration services. Commercial open source companies try to stay out of the professional services business because it creates a competitive relationship with other systems integrators that could be out implementing the software and creating new support customers and market-share. While it may carry an open source license, a product that is developed and implemented by a single systems integration company has the same prospects as a commercial "consultingware" product: the install base will be small and there will be no energy in the community. Show Me the Money
Tiered Product
In the tiered product approach, the free "community" version is a functionally trimmed down or otherwise inferior version of the commercial "enterprise" version of the software. The Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 18
community version typically lacks enterprise oriented features like LDAP integration, replication and clustering, and conduits to commonly used applications or is not certified or supportable. The logic is that that companies that use these features are getting more value out of the product and should pay for that value. Companies that use the Tiered Product approach tend to have an awkward relationship with the community version of the software. They want a community to form around the community version and succeed but not at the risk of losing potential enterprise version customers. The language used to described the two products is interesting as they try to promote the enterprise version without denigrating the community version (at least not too much). Tiered Product approaches work best when a community is able to provide and receive value in the community version and the vendor is able to leverage these non-monetary contributions. If the software vendor treats these non-paying customers like "free riders" the community version becomes little more than lip service and a community is unlikely to form and contribute. The extended features may also be offered in the form of a set of extensions where the community and the enterprise versions share a common core. Depending on how the licensing works, this may create the opportunity for third party software vendors to sell competing or complementary extensions. Another variation of the tiered product model is the free version carries a requirement to display a "powered by" badge on every page. Open source experts (including the Open Source Initiative's Open Source Definition) maintain that these badgeware products are not legitimate open source because they restrict the user from modifying the code to remove the badge.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Informational Brochure
Despite Web 2.0 trends that urge companies to have their web site be more than a static brochure, there is still a need for simple tools to allow non-technical users to manage a basic corporate web site. Still, it is short sighted to implement a WCM system without thinking of features that will enable a bi-directional conversation with the visitor. At the simplest level, features like web forms and RSS are desirable. Most importantly, however, the system needs to be easy to use so that employees from across the organization can efficiently publish fresh content. Now it may be enough to have a few people in the marketing department manage the web site. In the future, however, the corporate site will need to connect customers more directly with the employees that understand the products and the vision. Customers will be less satisfied with a formal press release and instead look for CEO or employee blogs. They will want to explore content through faceted navigation that makes sense to them. They will want to throw away paper product manuals and be able to read them online. The products described in this section have the basic features to efficiently run simple informational web sites such as a basic marketing web site, a corporate intranet, and a customer extranet. Many of these systems also have the capability to do more interactive functionality, but their strength is in managing informational resources. This category also includes other informational resources such as corporate intranets and customer extranets.
Content Contribution
The Content Contribution section describes the work environment used to manage content. The key areas to look at how content is organized and may be found, the content editing interfaces, localization features, and workflow.
Product Evaluations
the site. For example, it might not be clear that editing a content component in one area of the site will affect other pages. In a high content reuse scenario, the author composes content in a presentation neutral way and lets the presentation tier worry about content placement. The problem is that most content authors think of everything in terms of Microsoft Word that has no concept of re-use - only copy and paste. The web CMS user interface has the challenge of resolving the conflict between word processor expectations and the value of managing reusable content components. Some interfaces more successfully maintain this balance than others. There is also a tension between organizing content within a site explicitly by placing content assets within a hierarchical site map or implicitly by tagging assets and using a query based navigation. The former appeals to content managers that want full control over the visitor's experience. The latter is preferable when large volumes of content are in play or when a personalized experience is desired. It is possible to achieve a mixture of the two concepts where certain structural components of the site (such as landing pages) are explicitly organized and dynamic lists of content based on taxonomy or other query based rules are also presented to the user. In larger web sites, not all the content that a visitor sees originates from within the CMS. Product information and documentation may be authored in other systems and then imported or synchronized into the WCM repository. Although this is often done manually, the stronger platforms will have interfaces to import and exchange content.
Localization
While localization used to be a low priority requirement, many companies now see localization as central to their business strategy. Even for companies in the United States that do not have international aspirations, the growth of non-English speaking populations represents an important business opportunity. Fortunately, all of the products in this report originated in Europe where multilingualism is the norm. Each of these products has support for extended character sets and strategies for maintaining a site in multiple languages. The most primitive of these strategies is to use the multiple site capability of the system to treat the different translations as independent sites. This approach may be desirable for a marketing brochure site if the marketing organization of a company is broken down into independent regional business units. In this case, content reuse across the sites is manual and there is no centralized control over the content, how it is organized, or the branding of the sites. If localized sites are to be centrally coordinated and content is to be shared, more sophisticated localization functionality is needed. In this case, the system should be aware of Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 22
Product Evaluations
the fact that two assets are really different translations of the same content. Maintaining these relationships will enable features like automatically triggering a translation workflow when the primary language version of the asset is updated. Support for this advanced localization functionality varies between the products and each product has its own strategies and best practices. The key is to find a product that aligns with the way your business is organized.
Workflow
Realistically speaking, most informational web sites need only the most simplistic workflow models such as a single approval or even no approval at all. While complex workflows may be desirable on paper, in practice they tend to get in the way and are frequently circumvented. What is perhaps more important in a Web 2.0 world is strong monitoring capabilities so managers can see what was published and respond (if necessary), rather than be a bottleneck to publishing any content asset.
Presentation
All of the products in this category provide a framework to build what the visitor sees as the web site. In general, this means offering a presentation templating system to render content as formatted pages and a system (commonly called a controller) for mapping URLs to pages. But the presentation tier is usually used for more than simply rendering content. Most sites today have at least some form of interactivity such as a simple search field or a mail form. More advanced sites support personalization or interactive applications that allow the user to interact with the content or other third party data. Products are evaluated along the following criteria.
Product Evaluations
some do a better job of others. In general, CMS that try to auto-generate HTML code are a risk unless a developer can have full control over HTML element classes and IDs. The ease with which CSS files can be deployed to the delivery environment is also a differentiator. Some platforms treat CSS files as code; others treat them as deployable content assets.
Interactivity
In the world of Web 2.0, even something referred to a "brochure" cannot get away with being entirely static. Simple interactive features like search, registration forms, and registrationprotected content have been ante stakes for years. Now, with Web 2.0, your visitors are probably expecting a lot more. Dynamic media such as video and audio content have been shown to have a positive impact on site effectiveness. The ability to effectively manage, reuse, and promote this content is important. Forward looking companies are starting to see their marketing web sites as platforms for delivering and deploying applications. To build interactive tools like configurators or ROI calculators, the presentation tier needs characteristics of a development environment. It needs to have decent tools for writing and managing code, the ability to incorporate code that you didn't write (e.g. Javascript libraries), and allow developers to use familiar or easy to learn skills. There is a tension between providing flexibility to the developer and enforcing clean presentation code with minimal business logic. Technologies that use JSP leave the developers to police themselves to keep complex business logic out of the display templates. XSLT and scripting engines like Velocity rigidly enforce this separation. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 24
Product Evaluations
Multi-channel publishing
Whether or not you look to the WCM platform to provide blogging functionality, syndication support (RSS, ATOM, RDF) is important now and will only become more important in the future. Producing XML views of content is fairly trivial for most WCM technologies. The ability to read in RSS feeds from external sources and incorporate this syndicated content in the web site is less common. Open source technologies have typically been ahead of their commercial peers in recognizing and participating in the syndication. More than ever, companies need to think about the mobile platform. With the popularity of the iPhone and other smart phones, more and more users are getting their information from alternate devices. Robust presentation tiers allow content to be presented in multiple presentation templates that are optimized for the devices that access the content.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Product Overview
Table 3.2. Lenya Project Overview
Web site Project Inception: Current Version: Project Type: Licensing Options: Geography: Common Uses: Sample Customers: http://lenya.apache.org 2002 2.0 since January 2008. Community Apache 2.0 Global with concentration in Switzerland. Brochure site. Used to be used on news sites. The University of Zurich [http://www.unicms.unizh.ch/docu/livepubs.html] has several publications running on Lenya. They are working off of a special branch off of the 1.2 code line. Harvard Medical School Countway Library of Medicine [https:// www.countway.harvard.edu/lenya/countway/live/index.html] runs on Lenya version 1.2. Committer Andreas Hartmann's company BeCompany GmbH [http:// www.becompany.ch] is running on the 2.0 trunk. Frameworks and Components: WYSIWYG Editor: Integration Standards: Java Support: Databases: Cocoon, ehcache, Jena, Lucene, Websphinx Kupu, BXE XML 1.4, 1.5 No relational database used. Filesystem/Lucene based repository.
Project History
Apache Lenya was originally developed in 1999 as the brain child of then-Ph.D student Michael Wechner to manage content for an academic journal. Later when working at Swissbased Neue Zrcher Zeitung (NZZ, one of the world's largest German language newspapers), Wechner proved the viability of the technology for NZZ's publishing and called the application XPS (Extensible Publishing System). In 2002, Wechner founded a systems integration firm called Wyona to implement Wyona CMS, as it came to be called. The vision was to build a "nearly out of the box CMS" on the Cocoon platform. Despite successful implementations, under Wyona's ownership, XPS failed to draw attention and participation from the outside world and develop a community. In 2003, Wyona donated the application as "Lenya" (after Wechner's sons Levi and Vanya) to the Apache Software Foundation, where it was incubated under the Cocoon project. In September 2004, Lenya was promoted to a top level project. Around the same time, Wechner and Wyona were collaborating with open source WCM developers from other projects to form a new organization called OSCOM (Open Source Content Management). OSCOM was envisioned as a forum for sharing ideas about content management and hosting shared projects like the popular WYSIWYG editor Kupu that is also used by Plone and Infrae Silva. OSCOM held some promising events in the U.S and in Europe Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 28
Product Evaluations
but began to deflate when founding members ran out of time to devote and the organization failed to recruit new active members. Today, OSCOM is little more than a web site. Although the project continues to develop and just published a major release, Lenya's momentum seems to have slowed. Despite being a top level Apache project, Lenya gets little attention. Other projects have been more successful than Lenya when it comes to addressing usability and technical complexity problems. More than one Java open source WCM project was founded out of disappointment with the Lenya platform. Apache Cocoon, on which Lenya is based, has fallen out of fashion as a general purpose web application development framework. The Java technology stack as a whole has been challenged by lighter weight, efficient technologies such as PHP and Ruby on Rails. The Java community has countered with their own frameworks that answered the call for simplicity and efficiency (such as Hibernate, Spring, and Wicket). Cocoon is now relegated to applications that are totally XML focused. In January 2008, the project put out the first major release of the platform (2.0) since late 2004. 2.0 was originally named 1.4 and had correspondingly modest improvements over the 1.2 release. However, delays in completing 1.4 caused it to grow in both complexity and size. By the time it was near completion, the new pending release was a major rewrite of the platform and the team voted to rename it 2.0. Part of the struggle may have been a rough transition of leadership from a Wyona dominated development team to a community effort. As mentioned earlier in this report, change in leadership is one of the biggest challenges in community projects and the Lenya project has had its share. Many of the Lenya developers who were very active in the project have moved on. Whechner himself is developing a new content management framework called Yanel and Yulup and Wyona is no longer doing new Lenya work. U.S. systems integrators that specialize in Cocoon have shifted over to other Cocoon-based platforms like Daisy and Hippo. Mailing list traffic is still active but as original leaders transitioned out and new members joined, there were long periods where the tone tended toward frustration and confrontation. Most of this turmoil seems to be behind the project. The team has just rewarded several dedicated members with committer status and release 2.0 is finally live. 2.0 introduced many architectural changes and a few user interface improvements but perhaps the biggest impact is just getting out of the state of limbo that comes with an almost complete new release. It will be interesting to see how fast the team is able to move without that anchor to drag around.
Architecture
The Lenya architecture is pretty much all Cocoon with a simple file system based repository. There were plans to develop a more service based repository on Jakarta Slide (see Glossary for Slide) and then, later, Apache JackRabbit but those integrations ran into complications and were never implemented. There was also some hesitation about committing the effort of following the emerging JCR standard (See Glossary for JCR). There are occasional discussions on the mailing list about whether to reconsider JCR support. While Version 2.0 introduced some API improvements to the file system based repository, the Lenya repository is not at the level of projects like Hippo, Daisy, and Alfresco as a stand-alone service. There is no remote API to read and write content in the repository. Since Lenya is just basically a user interface on top of Cocoon, to effectively implement and scale Lenya one needs to be very familiar with Cocoon. The core concept is the notion of "Pipelines" that describe a sequence of logic that gets executed for a request. Pipelines originate from Cocoon's core purpose: to choreograph the execution of a bunch of XSLTs to allow for the layering of display logic. Pipelines are defined in a file called the "Sitemap." Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 29
Product Evaluations
Struts developers reading this may be thinking of the struts-config.xml file that performs a similar function. Sitemap pre-dates struts-config.xml and is considerably more elaborate and powerful. The idea behind Sitemap is "to allow non-programmers to create web sites and web applications built from logic components and XML documents" (from the Cocoon Users Guide [http://cocoon.apache.org/2.1/userdocs/concepts/sitemap.html]). However, it is clear that project founder Stefano Mazzocchi over-estimated the technical abilities of non-technical users (and even many developers). Prior to 2.0, Lenya had a sitemap for each "use case" called a "use case sitemap." With 2.0, the process has been simplified by automating the wiring of use cases to business logic. In the new system, the developer just writes Java classes to support the business logic and a JX template (See Glossary for JX Template) to display the results.
The new system for managing system behavior has been streamlined from the old system of use case sitemaps. Cocoon supports interactive behavior with a framework called "Flow" for "Control Flow." Like Sitemap and Pipelines, Flow is elegant and sophisticated but has a steep learning curve. Flow supports an advanced concept called "Continuations," where logic execution can pause and wait for more user input. Think of a command line shell script that asks the user a question and waits for the answer. This is distinct from the way most web applications work (where each request/response is stateless and atomic) and has the potential to support more complex interaction between the client and server. Newer Java web application frameworks such as WebWork (now Struts2 [http://struts.apache.org/2.x/]) and RIFE [http://rifers.org/] support Continuations, but Cocoon was doing it "before it was cool." In Cocoon, Flow Scripts are written in Javascript and interpreted through the Mozilla Rhino [http://www.mozilla.org/rhino] engine. The advantage of this is that scripts can be changed without compiling or restarting the application. The fact that most technologies get along without Continuations supports the opinion that they are not as important as they were originally envisioned. It will be interesting Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 30
Product Evaluations
to see if new AJAX oriented programming models will leverage Continuations, or make them obsolete. While Flow is powerful and elegant, Cocoon falls short of being an ideal platform for building highly transactive web applications because of its complexity. Cocoon has many moving parts that make it difficult to understand and debug and slow to perform. While XSLT processors are continually getting faster, the inherent overhead that comes from text parsing puts a limit on how fast logic can execute. The Cocoon project has largely addressed performance with clustering and caching strategies and Lenya takes advantage of those. In extreme cases, Lenya has been used to generate static HTML pages that are deployed to basic web servers (the "baking" model) and also clustered on multiple nodes reading from the same repository based file system over NFS. Lenya represents a single web site as a "publication" and can host multiple publications on an instance. Deployments like the University of Zurich have used this feature extensively to support the school's many departments. Publications store content, code, and user information (unless the LDAP integration is used). Each publication can have multiple languages. Publications are more or less atomic and cannot share content between them. From the basic install, Lenya comes with a starter site, called the "default publication," and the common practice is to use the default publication as a starting place to build a new site. A frequent newbie mistake is to not change the metadata thus causing your new web site to have the HTML title tag "Default Publication." By default, content is stored in a subdirectory of the publication directory called "content." With 2.0, that directory can be anywhere on the file system. Every page on the site gets its own subdirectory that contains an XML file for each translation of the document (distinguished by a naming convention on the file) and, potentially, sub-directories for sub-pages in the navigation structure. A best practice is to put the content directory somewhere other than within the publication directory. This is a good idea to make sure code and content are stored in separate places and deployed separately. This is also a good idea for back-ups (assuming that you use a source code control system to manage your template code).
Content Contribution
Although not generally known for its usability, Lenya employs some user interface concepts that were rather innovative for time. By default, Lenya uses a "browse to edit" model. The presentation tier operates in two main modes: an "Authoring" mode where editing controls are placed on the top of the page and the user is able to see pre-published content, and a "Live" mode that does not have the editing controls and where only published content is visible.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The Lenya management interface uses a browse to edit model where the edit controls are placed on the top of the page on the staging view of the site. Notice the yellow text indicating a broken link. The user browses through the Authoring view of the web site and, upon finding a document to edit, selects the edit menu to access the appropriate editing interface. Lenya ships with a number of options: probably too many for a project of its size to support. There are two WYSIWYG HTML editor interfaces: Kupu (also the default WYSIWYG editor of Plone) and BitFlux (also known as BXE). Kupu is by far the more stable of the two but BXE has better integration with the Lenya platform and has more features. From the BXE editor, a user can insert images and links using pop-up dialogs that browse the repository. The BXE editor has reasonable set of formatting buttons but is missing a spell check feature.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The Bitflux editor is the most feature rich editor that ships with Lenya, but it is also the least stable. The Kupu integration puts Lenya specific functions (like metadata and links) on the right side of the page outside of the main button bar. This avoids the need for pop-ups but it is a little clunky to use. Like the BXE editor, the Kupu editor is missing a spell checker. In Lenya's implementation, you need to know the URL path of the target page or image. The second missing feature is a spell checker. However, this should not be difficult to add by following instructions provided by other CMS that use Kupu (notably Plone).
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The Kupu WYSIWYG editor in Lenya has integration on the right side. There are also two non-WYSIWYG editors. The awkward "HTML Forms" editor allows a user to edit a document as a list of HTML elements (like paragraph, table, headline). The One Form editor is just a simple HTML Text Area where a user can manually code HTML or XML. When the document is saved, Lenya checks that the user's submission is a well formed XHTML document. Alternative editors can also be installed. Some customers have configured Lenya to work with the popular XML based editor Xopus and there is some remnant Xopus integration code in the core codebase. Xopus would probably be the best editor for structured content types although there is a tutorial for setting up BXE to edit structured XML content types. Lenya has an extensible content model that is defined in "resource types." The primary resource type that comes with Lenya is a simple XHTML page that is stored as an XML file in the file system. While Lenya uses "object" rather than the standard "img" tag for image references, the file is a fairly recognizable HTML file - just without the layout markup that is defined within XSL stylesheets that are processed by the presentation tier. There is also an ODT type that allows a user to upload and download the asset in Open Document format to be edited in Open Office. The asset is then rendered as HTML at request time. New resource types are implemented as modules and the process for setting them up is somewhat complicated and lengthy when compared with other products described in this report. In addition to defining the content type in XML Schema format (XSD), a developer needs to declare it as a resource type by adding an XML file that extends the core Cocoon configuration. Then XML files need to be edited for menu items and instructions for the WYSIWYG editor. At the end of the process, the content is edited as an XML document which Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 34
Product Evaluations
is somewhat of a stretch for a WYSIWYG editor. There are instructions on how to configure the BXE editor to control structured content types but nothing for Kupu.
The BXE WYSIWYG editor can be configured to edit structured content types. Lenya also stores Dublin Core metadata inside the file using XML tags within the "dc" namespace. However, the editing interface for metadata is on another tab called "Site." Despite a convenient link to edit metadata from the Edit menu that jumps you over to the "Site" tab, this creates a disjointed user experience. Separating the editing interfaces for these two aspects of a document would make more sense if documents could be placed within multiple locations of the site and have different metadata values depending on their location. But this is not the case.
The Site tab provides an interface to organize the site, edit metadata, and edit permissions.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
In addition to editing metadata for content, the Site tab performs a number of other functions to organize and manage the web site. For those who are familiar with MediaSurface's commercial product Morello, the distinction between the Site and Authoring tabs is akin to the "Content Contributor" and "Site Planner" interfaces. Within the Site tab a user can cut/copy and paste a page to another location in the navigational tree. Pages in the same folder can be ordered using "move up" and "move down" menu options. Pages can be removed from the live site by selecting the "hide" or delete menu options. Permissions are also controlled here. Theoretically, a manager would work in the Site tab to manage the site while authors and editors spend most of their time in the Authoring view to edit content. Content is managed in a hierarchical tree structure that parallels the path structure of the site. Documents can only be placed in one location of the node hierarchy and there is no mechanism for creating links or pointers in other locations. Localization support is better than average and follows a model of translated copies. Each asset has one or more language versions. The Site tab shows the user which translations exist and which translations are missing. This is a good system for sites that try to keep the different localized versions of the site in sync. Sites that allow the local versions to be more independent typically use different publications but, as mentioned earlier, there is no sharing of code or content between publications.
The Site tab shows the user wbich translations of an asset exist and are missing. Here you can also see the unique identifier implemented in version 2.0. Lenya has full versioning support. From the Site tab, a user can see the history of versions, view a particular version, and roll back to a previous version. By default, Lenya stores nine previous versions (plus the current version making 10). Some of the other systems reviewed in this report have a visual differencing feature; Lenya does not. Deleting assets can also only be done through the Site tab. Prior to 2.0, binary, file-based content, such as images and PDFs, were called "Assets" and were managed solely in the Site tab. Now, assets are treated like documents and can be managed from the Author tab. Wrapping binary assets in documents also provides improved metadata support. Images and other binaries can be added from the File menu as a "Media Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 36
Product Evaluations
Document" or from within the image browsing dialog launched from the BXE or Kupu editor. Images can also be automatically resized when they are added to a page.
Now with Lenya 2.0, users can add images as they edit pages from the WYSIWYG editor. The Lenya workflow system uses a state-transition model and can be configured to most needs. The basic install comes with a simple one step approval workflow. New workflows are defined in XML documents called "workflow schemas" that describe the behavior of a simple state machine. Transitions are triggered by events and can have conditions. Workflows can also have variables that hold values through their execution. Workflows are associated to document types so it is difficult to configure Lenya to apply a workflow to all content that lives in a particular branch of the site tree or meets other criteria. It may also be hard to do advanced workflow concepts like parallel processes. Lenya's workflow system has a nice audit trail feature that records and displays all events that happened on an piece of content. Lenya 2.0 introduced personal inboxes so that workflow tasks may be assigned to groups or individuals rather than all users who have reviewer permissions.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Roles can be inherited down or directly assigned. Lenya has a basic link management system that displays broken intra-site links in yellow in Authoring view. While it would be more helpful to put in warnings if deleting an asset would lead to broken links, the visual cue is useful when previewing content and doing pre-publish spot checking.
Presentation
Presentation templates are written in XSL and stored on the file system within a publication. To some degree, XSL coding can be avoided with a technique called XHTML templating where the overall page layout is defined in a well formed XHTML document that is merged with the content by the presentation engine. Also, because Lenya document content is well formed XML, CSS is a powerful and convenient method for styling a site. While it may be more a reflection of the developer community than the capabilities of the platform, but the sites listed on the Lenya live sites gallery would not be considered "high design" sites. The layouts are basic and simple and there are few examples that innovate or stand out from a design perspective. This could be because skinning a Lenya site requires knowledge of XSLT, which most web designers lack. Web designers tend to do better with JSP and other templating languages that are more similar to straight HTML. More elaborate, dynamic functionality is achieved by writing business logic in Flowscript and display templates in JX Templates. JX Templates is an XML based templating language that has replaced XSP as the official templating language for the Cocoon community. JX Templates supports JSP-like tag libraries to call Java code. As mentioned earlier, Lenya's architecture make it a less than ideal platform for building highly dynamic, transactional web applications. Developers would be far more productive programming interactive functionality in a platform other than Cocoon whose strength is in managing layouts and multi-channel publishing. Lenya would be a poor choice for building community, Web 2.0-style applications. However, the basic install does have an example of a blogging publication. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 39
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Conclusion
Table 3.3. Lenya 2.0 Summary
Category Contributor Navigation Content Entry Score Explanation Intuitive browse to edit model but division between the Author tab and the Site tab is awkward. The more feature rich and better integrated BXE editor is not very stable. A spell checker would be helpful. Input validation is weak. Lenya's link management functionality visually highlights broken links but there are no warnings if deleting a page will create a broken link or broken links report. Full versioning support with new version created with every save. By default Lenya saves up to 10 versions of each document. Users can roll back to earlier versions but there is no visual differencing functionality. Content is stored in hierarchical tree structure. Building dynamic query based pages is more complicated than other systems in this category. Localized versions of an asset are managed in parallel. The site tab shows which translations exist and which translations are missing. While connecting to external data sources is possible within Cocoon, it is not as easy as other platforms. The Lenya repository has no remote API to access content. A basic approval workflow comes out-of-the-box. More complex workflows are supported by Lenya's workflow engine. Theming a Lenya site requires knowledge of XSL. Cocoon is a complex technology platform for building interactive applications. Performance may be an issue. XML orientation tends to output clean XHTML. User friendly, human readable URLs are supported. None Sparse. Particularly with the recent release of 2.0. Active mailing list. Below Average; Average; Above Average; Exceptional.
Link Management
Versioning
Content Integration
Workflow Layout and Branding Interactivity SEO Books Online Documentation User Forums Key: Nonexistent;
While not a widely used or easy to learn platform, Lenya has some decent support for some of the classical content management features such as versioning, localization, workflow, and link management. The areas that Lenya lags behind the market are in structured content management and reuse. Both are possible on the platform but implementing these features requires complex configuration compared to other products. Now that the long awaited 2.0 release is official, the Lenya team may be able to focus on some of the finer usability issues that have turned prospective adopters away from the platform. Organizational and leadership Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 41
Product Evaluations
issues that have historically plagued the project have also largely been addressed and the mailing list is friendlier and more upbeat.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Daisy 2.1
Abstract
Daisy's simple, wiki-inspired editorial interface - combined with powerful through-theweb administration and configuration functionality - makes it a good choice for rapidly building and maintaining simple informational web sites, intranets, and knowledge bases. Workflow, access control, and structured content types take Daisy into applications that are beyond the capabilities of a traditional wiki, and the de-coupled repository creates new opportunities for integration. Daisy provides a powerful faceted navigation system to make content easier to find and organize. Daisy's lack of support for clustering and separate management and production environments may be concerning to architects building high availability, high security web sites. Abnormally large web sites will strain Daisy's search subsystem. However, the de-coupled repository can be used for building enterprise grade publishing systems with a separate delivery tier. Architects considering using the stand-alone Daisy Repository to deliver persistence services for a custom application should also consider Alfresco, JackRabbit, and pure XML databases. Using the Daisy Repository in this way will constrain your design more than a generic repository would, but the higher level API and additional features (workflow, user management, and plugin framework) may save design and implementation time.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Project Overview
Table 3.4. Daisy Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: http://cocoondev.org/daisy 2003 2.1 since September 2007. 2.2 due out in February 2008. Commercial: support based. Apache 2.0 Outerthought is headquartered in Belgium. The install base is global with a concentration in Northern Europe. At least one North American based systems integrator has started to build a Daisy practice. Brochure, intranet, knowledge base, documentation site Redback [http://www.redback.com] uses Daisy for a call center knowledge base. QAD [http://www.qad.com] uses Daisy to deliver a live customized product documentation service. Vlerick Leuven Gent Management School [http:// www.vlerick.be/en/] uses Daisy as a back end publishing system. Frameworks and Components: WYSIWYG Editor: Integration Standards: Java Support: Application Servers: Databases: ActiveMQ, Apache Cocoon, Apache Lucene, JBPM, Java Advanced Imaging (JAI), MX4J, Spring htmlArea REST, JMS, XML 1.4, 1.5 Jetty (default), Tomcat (also commonly used) MySQL
History
Daisy CMS was first released as an open source project (Apache 2.0 license) in October of 2004 after 18 months of development and customer implementations by Belgian-based Outerthought bvba [http://outerthought.org]. Outerthought still manages the development of the platform and has built a business around support and services. Given Outerthought's small size (only five full time employees), the maturity and install base of Daisy is greater than you would expect. This is largely due to Outerthought's relationship with Schaubroeck [www.schaubroeck.be], a large Belgian e-government services company and some key U.S. based customers. The last pre-open source version of Daisy (V0.9) was built primarily with investment and cooperation from Schaubroeck. Today the copyright on the Daisy code is shared between Outerthought and Schaubroeck. Daisy has been used primarily for intranets and internal knowledge bases but the platform is increasingly being considered for and used in externally facing informational web sites. In many ways, Daisy has become the user friendly, easy to implement, Cocoon based WCM platform that Apache Lenya [http://lenya.apache.org] has always wanted to be. Some attribute the different trajectories to the fact that Lenya is managed as an Apache project with an Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 44
Product Evaluations
Apache governance model that may be more appropriate for infrastructure components and frameworks than for user facing business applications. A scan of the Apache project portfolio would certainly support this theory. Only a few out of the many Apache projects are targeted as business applications: Lenya, Jetspeed 1 and 2, and (recently) the Roller weblog platform. Daisy is internally developed by a dedicated, user oriented team. The Daisy core team may have its leadership spats and personality conflicts, but they don't happen out in the open like they do in the Apache Lenya community. Daisy also has been able to keep on a regular release schedule which Lenya has been unable to do. There are many Daisy-powered web sites online today in Belgium. Most of the sites have been built by Outerthought and Schoubroeck (Schoubroeck gallery [http://www.schaubroeck.be/ internet/default.htm]. Daisy gallery [httpp://cocoondev.org/wiki/286-cd.html]). Daisy is less widely used in North America but there are a few small non-profits using Daisy for their public facing web sites. Examples include: The Samueli Institute [http://www.siib.org], Provider's Council [http://www.providers.org], and The Minnesota Newspaper Foundation [http:// www.minnesotanewspaperfoundation.org/mnf/index.html]. There are a few North American software companies using Daisy for call center knowledge bases and to produce and maintain their product documentation. One of the leading U.S. based Cocoon specialists has switched from Lenya to Daisy as their go-to platform for basic web sites.
Architecture
Daisy consists of two main components: the stand-alone Daisy Repository server that has a HTTP/XML interface, and a wiki-style front end (based on Apache Cocoon) called the "Daisy Wiki". Starting with release 2.1, the repository server runs in a custom Java container called the Daisy Runtime that is based on the Spring Framework [http://www.springframework.org/]. By default, Daisy Wiki runs on Jetty [http://jetty.mortbay.com/] although customers frequently run it on Apache Tomcat.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Daisy consists of a de-coupled business application (The Daisy Wiki) and a stand alone repository server that is accessible through an HTTP based API and extended with plugins. The technology neutral API creates the opportunity to integrate with other technologies. Diagram courtesy of Daisy's documentation site. Daisy Wiki talks to the Daidy Repository server over HTTP to a publisher component that handles content persistence, search and retrieval operations. The publisher is implemented as an extension within the Daisy Repository plugin architecture. At the simplest level, the publisher returns an XML document containing all the information necessary for the wiki (or any other client) to render a page. The publisher can also perform other operations like building collections of assets or preparing a difference view between two versions of an asset. A publisher request takes the form of an XML document that contains information about the request sent to the HTTP interface. Some requests, such as a simple query, do not require sending an XML document. All the arguments can be passed through query string parameters. Every request needs to send authentication credentials via basic authentication, but there is no support for HTTPS so care should be taken in network setup. The API is powerful and there is documentation with simple examples but expect to spend some time mastering it. A good way to experiment is to use a tool like GNU WGet [http://www.gnu.org/software/wget/] to post documents to HTTP server: this architecture would allow other applications, such as a custom CMS written in any technology (not just Java) to use the Daisy Repository. However, it appears that most implementations pair Daisy Repository with the Daisy Wiki front end. This is a primary reason for Daisy's categorization in this report as an informational brochure oriented system. Architects looking for a pure content repository may consider Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 46
Product Evaluations
a standard like the Java Content Repository (See Glossary for JCR). Unlike the JCR specification, which is intentionally abstract (consisting of a hierarchical set of "nodes"), the Daisy Repository has a higher level, more specific API based on documents, users, and collections. Daisy's Repository, like most wikis, is non-hierarchical. On the one hand, the flat structure rules out options like inheriting access control structures or other metadata down a branch of the tree. On the other hand, the non-hierarchical model enables faceted navigation concepts where assets can appear under more than one category (See Enter Content Here blog post There Is No Folder [http://contenthere.blogspot.com/2006/05/there-isno-folder.html] for a deeper discussion of this trade-off). The repository stores metadata in a relational database (MySQL or PostgreSQL) and the actual content assets as files in the file system. There is also a Java API for local integrations with the Daisy Repository. The Daisy Repository supports plugin framework for extending the Repository with custom functionality. Commonly developed plugins include generic Extensions (which is how many of the noncore functionality, such as email notification, is implemented), authentication schemes, link extractors (that keep track of relationships between pages for a "what links where" view of the site), and HTTP handlers (to extend the API). To build a plugin, you implement the plugin interface, package it in a JAR, and deploy it to the Java runtime container of the Daisy Repository server.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The Daisy Repository server is based on a plugin framework. Diagram courtesy of the Daisy documentation site. Daisy comes with a JMS server (ActiveMQ) built in to handle communications between the various components. For example, every time there is a change made to the repository, a message is posted to a message queue to tell client applications (such as Daisy Wiki) to invalidate their local caches. Architects generally like this messaging design because it is asynchronous and scalable, but ensures delivery of the message. Whenever a published asset is updated, the Lucene based full text indexer is notified. Content that is not in a live state is not indexed. This could be a problem for managing pre-published assets. The search system does some post processing of the Lucene results such as sorting and filtering by permissions. The result limit is also applied outside of Lucene. If your Daisy Repository has 1 million documents and you search for everything but limit the results to 10, Daisy still parses through 1 million asset references coming out of Lucene. While only the metadata ("fields") are loaded into memory, the query system is the part of Daisy that tends to struggle when content repositories get overly large. Up to a certain point, this can be handled with configuration. However, when you get into the 1,000,000s of documents, it gets difficult to allocate enough memory for the system to operate properly. Essentially, you have to give the
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Daisy Repository server enough memory to hold all of the metadata of the documents in the repository. For simple bulk operations on documents within the Daisy Repository (like automatically updating a group of documents like a SQL update statement), Daisy comes with a Document Task Manager. Simple, built-in actions like categorizing assets can be called from the Daisy Wiki. More elaborate operations can be scripted in Javascript. The Document Task Manager performs its updates on one document at a time and can be interrupted and resumed. A success/failure log is viewable from within the Daisy Wiki. Daisy's ability to support high traffic web sites has not been tested. Cocoon is a resource hungry framework because of all the XML transformation it does. For the most part, Cocoon has addressed performance through caching techniques and by minimizing the number of XSL transforms in the rendering pipeline. There is no documentation for configuring clustered environments with multiple Daisy Wiki servers talking to the same Daisy Repository. However, the architecture looks like it would support this model given the use of JMS to notify clients of content changes and also the stateless communication between the Repository server and its clients. Since Daisy is predominantly used in smaller intranets and corporate web sites, these configurations have not been actively tested. While the Daisy Wiki is based on Cocoon, you can still get a lot done without much Cocoon experience. However, you should be comfortable using Cocoon if you want to do anything substantial. This is great for people who love working with Cocoon (and the framework certainly has its appeal for those willing to learn it) and want to rapidly develop web sites without writing a lot of custom code. For most Java developers who are more comfortable with less complex frameworks, getting under the hood can be intimidating. The Daisy team is aware of this hurdle and is open to moving the UI to another technology stack. That is one of the advantages of having such a clean separation between the repository layer and the user interface layer. You could write a front end application on any technology stack.
Content Contribution
Daisy has an extensible content model and you can define new content types through the administrative user interface. The model is based on "Parts" and "Fields." Parts are the actual content (such as an XML or other text file, or a binary file such as PDF, image, or MS Word document). Fields are metadata attributes and are based on common Java data types: String, date, datetime, long, double, BigDecimal, boolean. The base content class is called a "Daisy Document" that can be sub-classed into custom content types. The most popular content type that comes out-of-the-box is an "XHTML" document that is essentially a generic web page. A document type is defined with part types, field types, and links. Both part types and field types are re-usable across different document types. For example, you can set up a title field type and use it in an Article or Page document type. A Daisy Document can have more than one part but cannot have multiple instances of the same part. For example, you couldn't have a collection of image parts in a document. Instead, you would define an image1 part, an image2 part, and so on. Unfortunately, while Daisy does ship with a WYSIWYG rich text editor (htmlArea), there is no XML editor that would allow you to edit a structured XML document part. To add this capability would involve creating a Cocoon form and registering it as an extension. While the process is documented [http:// cocoondev.org/235-cd.html], this is more Cocoon programming than most would want to tackle if they were just trying to build a simple web site. A more practical approach maybe to edit the XML document in a client side XML editor and then upload the file. Fields can be multi-value and hierarchical. The input field can either be a free text field or a select list where the possible values are either statically defined or populated by a query. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 49
Product Evaluations
Unless you give the user a controlled set of values to pick from, you will have very little control over what they enter. There is no input validation built into Daisy Wiki. You can say that a field is required and you can set a sizing hint (which controls the size of the input box), but you can't enforce a rule like a character limit or only numeric characters.
Defining fields through the administrative user interface. In addition to fields and parts, Daisy Documents can also have links. Links are more structured than simple anchor links edited through the WYSIWYG editor and the Daisy Repository manages these dependencies. All Daisy Documents have one or more variants that can be the same or different. Variants are managed similar to branches in a source tree. A common usage for variants is for localization, but variants can also be used for content reuse. For example, on the Daisy web site, the product documentation for different releases of the application are managed as variants. Variants go along two axes: branches and languages. Branches are similar to what you would see in a source code control system although there is no merging feature. By default, one branch (main), and one language (default) per document. Searching for non-existing variants is a useful way to find content that has not been translated. The URLs have the branch in the path. A little bit of Apache mod_rewrite could turn a www.cocoondev.org/fr/about to www.cocoondev.fr. When viewing a document, you can view other variants of the asset using the Variants menu. Doing so adds some query string arguments so you can see the alternative version of the file without losing your current site context (branch/language). Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 50
Product Evaluations
While the groundwork for localization is there with variants, the Daisy team is hard at work to add features that help editors keep multiple translations in sync. For example, there will be a new feature to add non-translatable content elements that are shared between language variants. There will also be a mechanism to map versions between variants in the repository so that, for example, the system knows that version 3 in French maps to version 5 in English. Documents are managed within "Collections." The most common use of a collection is to define a group of assets that are used in a particular site. Collections can also be used to organize content for other uses such as Daisy's book feature. Documents can be part of more than one collection. Similar to collections are "Baskets." Each user has a basket that can be populated either individually by selecting documents like a shopping cart, or by query. Once a basket has been filled with the desired assets, the user can execute group operations such as aggregating the assets and displaying them in PDF or an HTML sub-site. This is especially useful in Daisy's common use for documentation sites and knowledge bases. The Daisy Wiki serves as both the management interface and the presentation tier. The label "wiki" is a holdover from earlier days and perhaps does the application a disservice because it does not conform to most people's vision of a wiki. Daisy uses a XHTML and a WYSIWYG editor for rich text areas, not wiki syntax. Daisy's extensible content model enables structured content types to be created through the administrative section of the user interface. Forms are automatically generated based on the document type definition.
The actions menu shows which actions are available on a piece of content.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Still, many companies use Daisy as a traditional wiki due to its many wiki-like features. Like a wiki, Daisy has an in-context editing model. You browse around the site and then use the actions menu (if you are logged in) to do things like edit the document and review versions. There are other wiki-esque features such as you can create a new document by linking to it from another document. Unlike a wiki, you don't have to worry about CamelCase. The WYSIWYG editor is based htmlArea which is stable but is no longer being developed (current development is being done on a derived project called Xinha [http://xinha.webfactional.com/]). The WYSIYWYG editor will appear for all parts of type "Daisy HTML." Being an XML-focused WCM system, Daisy ensures that the user authored HTML is clean and valid XHTML. There is an allowed subset of tags (html, body, br, pre, h1-h5, a, strong, em, sup, sub, tt, del, ul, ol, li, blockquote, img, and table and its sub tags). Text styling and image positioning is done with CSS classes. The out-of-the-box configuration includes lots of buttons including tables links, images, bullets, etc. The editor can be configured by adding and removing buttons. The semihtml-literate contributor will at first be encouraged to see that you can turn the WYSIWYG editor off to hack in their own HTML but then be disappointed to learn that Daisy will strip out their dubious HTML code on the server side when the document is saved. In all, the formatting capabilities strike an appropriate balance of user empowerment, content-layout separation, security, and WWW standards compliance. The rich text editor is nicely integrated with a link browser that allows users to search for their link target and preview what the page looks like. There is also the option to link to a specific version of the target document or a fragment within the target page. Links built in this way are stored in the metadata of the document and visible in the referrer view of the document. While the editor doesn't have a spell checker, there are a number of other features that one normally does not see. For example, there is a query button that allows a user to define a SQL-like query and have the results embedded in the page. There is also the ability to include variables and other assets by reference. While these functions are powerful, they would be of most use to more technical users as the GUI is fairly low level.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The WYSIWYG editor is integrated with a nice link builder dialog that allows users to search for and preview target Daisy pages. The community thinks of Daisy as both a wiki and a traditional web site building platform. They sometimes bristle at not being categorized with other WCM platforms. Other times they do things like submit the platform to WikiMatrix [http://www.wikimatrix.org/]. Out of the box, the Daisy Wiki operates in different modes. Live and Staging modes determine whether to show pre-published content. A user who has multiple roles (such guest, author, and administrator) can select what role he wants to use when viewing the site. For example, selecting the administrative role exposes an administrative menu that surfaces functionality like defining content types and permissions and making an unpublished asset live. Daisy does not store content in a hierarchical repository as is the case with most CMS. Like Drupal [http://drupal.org], Daisy stores all the content in one large collection and then uses metadata to create faceted navigation. There is also a navigational element that can be used to create a basic hierarchical tree of content: the "navigation" content asset (called a Navigation doc) contains static references to assets and queries.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Editing a Navigation document. When deleting a document, the user has a choice of "archiving," deleting the variant, or deleting all variants. On the bottom of the page there is a listing of all pages that link to the document. However, there is no pop-up warning when a link is about to be severed and the broken link does return the Daisy equivalent of a 404 error. The fact that the delete operation is reserved for the "Administrator" role reduces the risk of breaking links. Images and other binaries are managed like other documents in Daisy, the image file is the "part" and "fields" represent the metadata. Out of the box, Daisy supports some basic image manipulation features, such as resizing and allows you to add captions and specify positioning within a WYSIWYG text area.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The WYSIWYG editor has an image properties dialog that controls the size and position of the image. True to its wiki roots, Daisy has always had a good versioning system complete with in-line differencing. New with version 2.1 is a nice graphical diff'ing feature that shows a color coded view of changes between two versions of a document.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
New with 2.1 is a WYSIWYG version differencing feature. Daisy 2.0 introduced a new workflow system based on JBPM built into the repository. Three workflows ship with the product: Generic Task (a delegated to-do), Review (simple one-step approval), and Timed Publication. Adding new workflows is done by uploading XML workflow process definition files that can be authored using the Eclipse Process Designer plugin. [http:/ /easyeclipse.org/site/plugins/jboss-jbpm.html]. The process for initiating a workflow is a little disjointed from the document editing process. A user creates or edits a document and then creates a new workflow and adds the document to it. However, a basic approval mechanism of "Put this live" is available when a user with publishing access views an asset that is not yet published (or has an updated version that is not yet published). There is a search interface to find assets that are workflow and in a particular state or assigned to a user.
Product Evaluations
and groups. Daisy's user profiles are lean: username, full name, password, and email address. Members can be assigned multiple roles. The default role is the role that they assume when they log into the site. Daisy's authentication system allows the use of multiple "Authentication Schemes," which can be set up to authenticate against an external system such as LDAP. However, each user can only be associated with a single authentication scheme. Daisy has a rules-based access control system that is managed similar to a firewall. The ACL interface allows an administrator to create a rule with a text-based condition like "documentType = 'SimpleDocument'" and to set read, write, delete, and publish permissions to individuals or groups for documents meeting that criteria. Conditions can be based on membership in a collection, a document type, or the value of a content attribute (or "field" in Daisy terminology). Rules are defined in the staging site and then published to the live site.
Authorization rules are managed by defining content filters and then assigning read, write, delete, and publish permissions. There is some work being done to implement a new "fine grained" access control model, though it is not planned for an immediate release. One of the new use cases enabled by fine grained access control is the ability to see that documents exist without having read access to them. The fine-grained access control will be taken to the field level. "Parts" can be set to allow or deny full-text indexing and different access levels on the summaries or full text.
Presentation
The Daisy Wiki can support multiple sites on the same infrastructure and repository. A site is implemented as a view on the Daisy Repository. Each site has a default collection that is used for the on-board full text search box and its own navigation. Although Daisy has good support for CSS, there are few best practices or tools for styling a Daisy site. As of now, skins are written in a combination of XSLT and CSS. Documentation is Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 57
Product Evaluations
a little thin in this area but the systems integrators that specialize in the platform have become very proficient at applying any brand to a Daisy site. But there is a risk in relying on a systems integrator for every re-branding request. There was an interesting discussion on the mailing list about building a "skinning framework" for Daisy. One of the ideas bandied about was a gallery of downloadable themes that can be used as a starting place for building a custom theme. Joomla and Drupal have been very successful with these types of programs. It remains to be see what becomes of this initiative. Although it is not enabled in the out-of-the-box install, a very powerful faceted navigation system can be configured. While not as powerful as Endeca, Daisy's faceted navigation delivers a similar experience. A user can search for a term and then see a categorized list of results complete with asset counts.
Daisy's faceted navigation provides a powerful interface for browsing the nonhierarchical content repository. The basic Daisy install comes with simple commenting functionality for all document types. The default configuration allows any authenticated user with read access to a document to comment on it. The user can set the visibility of his comment to everyone, editors only, or private. If a user has write access to a document, he can delete comments. Daisy's on-board search engine supports full text search on both text based content types and popular binary formats such as Microsoft Word. However, the search results are not filtered by access control. The user is denied access when he tries to click through. One increasingly popular use of Daisy is for producing documentation. A built-in book application can publish a collection of content as static HTML, PDF, and other formats. Some members of the DITA community are starting to show interest in Daisy for this reason. Being based on Cocoon and primarily designed for editing and displaying information-rich content, Daisy is not well suited for building interactive applications. A Cocoon expert could Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 58
Product Evaluations
probably build any sort of functionality (after all, Daisy itself is built on Cocoon) but most Java programmers would prefer a simpler framework. Furthermore, unlike Lenya, Daisy does not have an optimized use case framework.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Conclusion
Table 3.5. Daisy 2.1 Summary
Category Contributor Navigation Score Explanation The wiki oriented user interface is easily understandable for most users. Some users will take a little while to grasp the concept of "variants," but that aspect of the system can be suitably de-emphasized for casual users that do not need that functionality. The model of "parts" and "fields" is somewhat specialized; but still works well for simple XHTML content types. The WYSIWYG editor is well integrated but is missing a spell check feature. User input validation on both fields and parts is noticeably missing. Similar to many wikis, Daisy has a nice link management feature that shows which pages link to an asset. This referrer view is shown on the delete asset page but is not warned. Daisy stores a version with every save. The new WYSIWYG differencing feature nicely displays differences between versions. Daisy is one of the few CMS that can link to a version of a document. Daisy improves on the standard wiki model with features like "includes" and "queries" built into the WYSIWYG editor. The faceted navigation is a powerful way to organize the repository. "Variants" are well suited or parallel localization strategies. Better relationship management between language variants is coming with the next release. The Cocoon based wiki delivery tier is not easy to extend to read from other sources. Content from within the Daisy repository, however, would be easy to integrate into other platforms using the REST based API. A basic approval workflow comes out-of-the-box. Complex workflows are supported by the jBPM workflow engine. Theming a Lenya site requires knowledge of XSL. Cocoon is a complex technology platform for building interactive applications. Performance may be an issue. The faceted navigation makes all content accessible to search engines. URLs are not user friendly, but the flat path structure may be favored by some search engines. None The online documentation is actually pretty good. This may be because the Daisy platform that powers the site is often used for documentation. The mailing list is fairly active, but most customers go directly to Daisy for support. Below Average; Average; Above Average; Exceptional.
Content Entry
Link Management
Versioning
Content Integration
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Daisy's wiki approach to content management is unconventional but brings with it powerful benefits. Common limitations of the wiki model such as access control and organization have been solved by Daisy's user interface. While Daisy is a powerful platform for building informational web sites and knowledge resources, it is not an ideal platform for building highly interactive applications primarily due to its Cocoon architecture with a steeper learning curve than competing Java frameworks. Daisy's reliance on a small regional software vendor probably does introduce some risk which may be mitigated by working with a systems integrator that has a long track record on the product. For North American customers, there are a couple systems integrators to choose from.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Project Overview
Table 3.6. Magnolia Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: http://www.magnolia.info 2003 3.5 since December 2007. Commercial: tiered product model. Community Edition: GPL Enterprise Edition: Magnolia Network Agreement [ http:/ /www.magnolia.info/mna.html]. The Magnolia Network Agreement is essentially a commercial license that provides access to the source code. Geography: Magnolia has a primarily European install base although U.S. adoption is growing and Magnolia has opened a New York office. Brochure sites, intranets, news sites France24 [http://www.france24.com/] runs on Magnolia Community. Ministry for Public Adminstration (Spain) [http://www.map.es] runs on Magnolia Enterprise. Frameworks and Components: WYSIWYG Editor: Integration Standards: Java Support: Application Servers: Databases: Apache Lucene, Apache JackRabbit (substitutable with Day Software's CRX), ehcache, FreeMarker, OpenWFE, Velocity FCKeditor JSR 170, JAAS, WSRP*, LDAP*, RSS 1.4, 1.5 Tomcat (default), Glassfish, JBoss, Web Logic**, Websphere** MySQL, Oracle, Microsoft SQL Server, Derby, DB2 * Enterprise Edition only ** Certification requires Enterprise Edition
History
Obinary, now Magnolia International, was founded as a services firm in 2000 - the peak of the Internet bubble. Founders Boris Kraft and Pascal Mangold were experts in WebObjects and built a sizeable practice implementing the Icelandic WebObjects-based CMS Soloweb. Given the amount of integration work required to implement a solution, Kraft and Mangold looked for alternative foundations that would bring down the cost of the offering while maintaining services revenues. After a disappointing survey of the existing open source Java WCM options available, Obinary started to build their own platform. At that time, JackRabbit - Apache's reference implementation of the Java Content Repository - was still in its unnamed infancy and lived its life as a subproject of Slide. However, Obinary was convinced that this project could eventually bring significant business benefits and choose it to provide the core functionality for the new Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 63
Product Evaluations
product. Although incumbent Java WCM projects criticized Magnolia for being just a thin shim on top of JackRabbit, Magnolia was able to provide the simplicity and usability which had eluded the more engineering focused products. Magnolia was released under the open source LGPL in November 2003, and was able to shield its users from many of the changes that JSR170 would go through before it was finally released in June 2005. While the product gained attention, particularly in Europe, Obinary looked for different business models. A donation based revenue model failed. Obinary decided on a tiered product approach with an unsupported, open source Community Edition and a supported, commercially licensed Enterprise Edition which includes some value added features, as well. The Enterprise Edition requires an annual subscription of 10k USD per server. Obinary had aspirations in the DMS market and built document management capabilities into the Enterprise product. But the emergence of the well funded and highly visible Alfresco discouraged their hopes of being the open source Java ECM product. While the document management capabilities remain, they are not the focus of the offering. In 2006, Obinary changed its name to Magnolia International. Currently at 12 employees, Magnolia is experiencing growth and is expanding its team. It has signed more than 30 customers since releasing the first Enterprise Edition in November 2006.
Architecture
Like other products in this category, Magnolia is an end-to-end web content management system. To use Magnolia requires adopting it for both your management tier and you web delivery platform. There are many benefits to this architecture including ease of use and development efficiency, but some architects will feel boxed in having to rely so much on the Magnolia architecture - especially if they have invested heavily on an alternative web application framework such as Struts, MyFaces, Tapestry, or Spring. It is conceivable to use an alternative presentation tier (and customers have experimented with many) because of the standards based repository. However, doing so would probably make the product more difficult to use since the editing environment is so closely tied to the delivery tier. Magnolia's original claim to fame was that it was the first CMS built from the ground up to work on the Java Content Repository (See Glossary for JCR). The JCR is a relatively new Java standard (JSR 170) that defines a repository for managing content. The JCR is well suited for semi structured content that is hierarchical in nature. Unlike relational databases, JCR's native support content management-specific functions like versioning, workspaces, and content deployment. The JCR specification has not yet enjoyed widespread adoption. The biggest proponent is Day Software, whose CTO David Nscheler is the specification lead. Day also has a number of its own developers working on the reference implementation: Apache JackRabbit. Day also sells a commercial JCR implementation called CRX and JCR adaptors for other repositories such as Documentum, FileNet, Lotus Notes, TeamSite, Sharepoint, OpenText Livelink, and Vignette. Outside of Day, however, use of the JCR has been limited. Few CMS companies, such as Alfresco, support the JCR standard with their own repositories but now Oracle 11 supports the JCR natively. Other products, like Percussion, support the spec at level one (Read Only access) and support the query language for finding assets. There is more JCR interest within internal corporate software engineering departments that are building custom systems and want to reduce their risk by sticking to standards. If the JCR is to become a truly successful standard, it will require corporate architects putting pressure on more software vendors to support the specification. The JCR specification is in the process of being upgraded with a new Java Community Process (JCP) Java Specification Request (JSR) called JSR 283. JackRabbit will serve as a Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 64
Product Evaluations
reference implementation for JSR 283, as well. Magnolia did a good job of keeping up with the evolving JSR 170 spec and improvements proposed in 283 are not expected to pose too much disruption. That is, certainly not like the the original 170 specification. Architects that are new to using the JCR sometimes have the tendency to want to use it for everything. On the developer mailing list, you occasionally see architects getting "talked down" from using the JCR for inherently tabular data like e-commerce catalogs and order history. By default, Magnolia's content repository is Apache JackRabbit. JackRabbit has a pluggable persistence manager with the default being a file system, which is also the slowest. Other options include relational databases such as Oracle and MySQL. The binary install of Magnolia comes bundled and configured to use JackRabbit with the Java relational database Derby for persistence. Derby is better than the other commonly embedded database Hypersonic but high end implementations will probably want to use MySQL. JackRabbit has a WebDAV interface; however, WebDAV support in Magnolia is slated for a future release. The Enterprise Edition comes with the option to use Day Software's commercial CRX JCR product. The Magnolia administrative interface contains a JCR repository browser that lets you view and edit the content tree as one would with a database administration tool. There is a free JCR browser plugin for Eclipse, as well. For relational data, Magnolia implementations can use the embedded Derby database or connect to any other JDBC compliant relational database. Magnolia does not use the relational database for much but Derby can be replaced with MySQL if a more powerful relational database management system is needed. This configuration is done at the application server level by specifying a data provider. Magnolia is certified to work on Tomcat as well as full JEE Java application Servers JBoss, WebLogic, and Websphere. Magnolia was one of the first open source WCM systems with a multi-tier design. In a basic configuration, two instances of the web application are deployed: one is designated as the authoring server and another as the presentation server. The distinction is made with an attribute setting made in the management interface. Content is authored and edited on the authoring server and then deployed to one or more presentation server instances. Each instance of the application is called a "stage." Communication between stages is done over http/https.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
In the authoring server, an administrator configures subscribers that the authoring server will publish content into. Developers wishing to extend or customize Magnolia can do so by writing add-on modules. The best practice is to write all customizations in modules so that a new instance can easily be brought online by installing the base system, deploying the modules, and then deploying content and template code. There is a Subversion repository for sharing modules, and a couple have been added, but there is no module forge like other projects have. The newly released version 3.5 introduced a more pluggable architecture that may facilitate the sharing of modules. Right now module sharing is limited to the mailing list. Future releases will likely introduce the Spring Aspect Oriented framework. This will enable code to be more easily shared across custom components.
Content Contribution
Magnolia provides two main interfaces that content contributors can use to navigate the web site. The primary user interface is "AdminCentral" that is used for all site administration, configuration, and development tasks. The "web site" section shows a clean view of the hierarchical site tree. Color coding is used to express workflow in three states: red for unpublished, green for published, and yellow means that the asset has changed since it was last published. Context sensitive right-click menus show what actions are available to the user based on his security permissions.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The main "AdminCentral" user interface is the primary way to navigate to edit content. The second method of navigating the web site is through preview mode. Content contributors can browse around the authoring instance in preview mode, which shows editable regions as well as buttons for editing. This interface is also used to edit page metadata and move paragraphs around on the page.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Content contributors can browse around the authoring instance in preview mode and edit paragraphs by clicking the edit buttons. This interface is also used to edit page metadata and move paragraphs around on the page. Source: Magnolia documentation site. Magnolia is closely tied to the hierarchical model of the JCR. The primary content type is a generic "page" that is composed of "paragraphs" and metadata. Although pages are not "typed" into content types or classes, a page is defined by a template that determines the structure of the page by creating regions that can hold paragraphs. For example, a template might have a three column layout with the center column containing the main page content and a right column containing sidebar content components. In this example, the user would be able to add paragraphs to the center and right column. The template would also control the display of the paragraphs in the page. Paragraphs are structured content types in their own right and can have their own metadata. Paragraphs are defined by and edited with pop-up "dialogs" that can have multiple tabs. In addition to paragraphs that accept content, there are also paragraphs that hold dynamic page components. The default Magnolia Enterprise installation comes with paragraph types for: Text and image, File download, Link, Anchor, Documents: List, Search by topics, RSS link, RSS icon, Movie Player, Code example, Navigation, Breadcrumb, Mail form, iFrame, 2 Columns, Javascript, Full text search input field, Full text search result, MP3 Player, and Text scroller. New paragraph types are defined by creating custom dialogs through the administrative user interface.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Each paragraph type has its own edit dialog. Clicking the edit button launches an edit dialog that contains form fields for the structured elements of the paragraph. Paragraphs can also have their own metadata. Image source: Magnolia documentation site. This model is different than most web content management system designs that create semantically meaningful content types at the page level. In Magnolia, rather than creating a new "event" asset, one would create a page and put in a custom "event info" paragraph that contained elements for information like date, time, location, and description. To create an event calendar page that listed events chronologically, a dynamic template would need to look for all "event info" paragraphs and display them. The design of Magnolia is not conducive to sharing content across pages but there are work arounds. A common strategy is to create a non-navigable collection of pages that are just containers for global assets. Then paragraph templates can be programmed to pull in paragraphs from these pages. While the strategy works fine, it is a break from the overall browse-to-edit metaphor. Localization, new in version 3.5, is done at the paragraph level. In the Magnolia localization strategy, a language neutral page has a collection of paragraphs. Each paragraph has its own language specific elements that, in the edit dialog, are organized into tabs. For example, if a paragraph is translatable into English and German, there may be English Title and German Title fields that are on different tabs. When the page is rendered, a locale identifier in the URL path tells Magnolia what language specific paragraph fields to show. It is easy to see how this Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 69
Product Evaluations
model would break down in sites that are published in many different languages but it seems adequate for sites that are translated into up to three or four languages. Perhaps companies that work in many languages group the localizations into multiple independent Magnolia instances. However, this design would not effectively enable content sharing between the sites. Another limitation is that the URLs would only be in the primary language that the page was in. For a first attempt, the localization system is not too bad. However, the plan is for a major improvement in release 4.0.
In Magnolia's new localization scheme, content is localized at the paragraph level. Each paragraph is given a set of fields for each language supported. Rich text editing services are provided by FCKeditor. By default, users are not able to embed images directly into the text area using an image button. Instead, they need to go to an image tab of the paragraph dialog to upload an image that will be placed by the display template. The image tab of the paragraph dialog may present the user with options such as alignment, text wrapping, and fields for attributes such as caption. This is generally a good strategy because it allows the CMS to be aware of images and give it more control to manage their display. The FCKEditor can be configured to allow other ways of dealing with images, if so desired. Magnolia has no special link management functionality. Internal links are constructed through a path but internally the relationship is maintained through the unique identifier of the asset (UUID). This means that if the target asset is moved or renamed, the link will still work. If the target page is deleted and then replaced with another page with the same name, the linking mechanism will fall back and use the URI to link to the new page. However, Magnolia does not manage links and is not able to warn a user when deleting an asset will create a broken link. There is also no view of what pages link to a page like in Daisy, Lenya, and OpenCms. Using a third party link checking software is advised. Not all authors like to think of their pages collections of paragraph components. It makes it hard to re-factor text heavy pages when you have to open different pop up dialogs to move text from one paragraph to another. Authors who write text heavy pages will have a tendency to put in hard returns to create multiple paragraph-looking text blocks within a single paragraph component. This is probably not a bad thing. Still, this component page model is a nice compromise between structure and user control over layout. Structuring content in this way Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 70
Product Evaluations
allows a template developer to re-organize the layout of the pages by editing the template rather than editing content. However, because page components are not easily re-used across pages, the benefit of the structure is somewhat reduced. Since release 3.0, Magnolia has used the open source Java workflow engine OpenWFE to manage workflow. Magnolia's CTO Boris Kraft is a committer on the OpenWFE project. Like most workflow engines, OpenWFE uses an XML syntax to create "process definitions," although OpenWFE does not use a standards based process definition language, such as BPEL. In the configuration area of the administration interface, there is a place to paste in a process definition. The workflow definition references "commands" which are mapped to classes in the administration interface. The default definition that comes with the basic installation, "activation," has commented-out sections for functions like email notification.
Product Evaluations
Otherwise, the best approach is to take the system down and do a dump of the database being used as the persistence layer.
Presentation
Magnolia has a "frying" style presentation system where pages are generated at request time rather than pre-compiled into HTML to be served by a simple web server. Caching is done at the page level using a custom cache implementation or optionally the popular open source caching framework ehcache. Cache is invalidated when new content is published and (now with 3.5) when template code is changed. The main issue here is that cache clearing is global, so high traffic sites should publish at regular intervals to limit the number of cache clearing events. The centralized caching mechanism is one of the key issues that limit Magnolia's suitability for multi-site hosting. Magnolia does power some fairly high traffic sites, such as France24 [http:// www.france24.com/]. High throughput is achieved through clustering presentation servers and there is support of session federation. In this configuration, the presentation tier is read-only. Interestingly, France24 is using the Community Edition so they must have developed their own version of the clustering mechanism that comes with the Enterprise Edition. One common strategy is to publish different sections of the site to different delivery servers. For example, a highly interactive section may be put on another delivery server so that the computational complexity of those pages does not degrade the performance of the entire site. A future version of Magnolia will introduce a "baking" style model where static HTML files are deployed to a web server farm. As previously mentioned, pages are rendered at request time. Magnolia has its own modelview-controller implementation. URLs reflect the organizational hierarchy of the site. Magnolia does support virtual URLs or URL aliases. On the mailing list, some developers have reported success in integrating Magnolia with a Struts based delivery tier although this would probably make more sense when using the unsupported Community Edition rather than wasting money on the Enterprise Edition by making it unsupportable. Content presentation templates have traditionally been written in JSP with the help of the Java Standard Tag Library (JSTL) and custom Magnolia tags that are provided under the namespace "cms". The third party add-on MagTags distributed by Noodle Open Source under the LGPL provides convenient helper tags. Velocity and Freemarker JARS ship with the product but are not, by default, available for use as an alternative templating language. However, it is possible to build alternative "renderers" that leverage these technologies. There is discussion within the Magnolia community for adopting Java Server Faces, or potentially an AJAX based framework (such as Google's GWT-Ext) as a delivery framework. But for now, the officially sanctioned front end of Magnolia is JSP with Magnolia's tag library. The JSP code files are stored in the file system under the "webapp" directory and pointed to by nodes in the repository. The Enterprise version comes with a module called "Sitedesigner" that allows templates to be developed and modified through a design environment within the admin interface. This is a convenient feature for companies that do not have web designers at the ready to respond to template change requests. Sitedesigner consists of a generalized parameterized template that allows a business user to edit properties to control the layout. There are no WYSIWYG, drag-and-drop graphical features that a DreamWeaver user may be accustomed to. Instead, Magnolia uses the dialog model with bunch of property fields. The edit buttons appear right next to the buttons used to edit content. Sitedesigner template updates are stored at the page level and can be inherited down the tree structure or overridden by child pages. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 72
Product Evaluations
Sitedesigner consists of a parameterized template that allows a user to control the look and behavior of pages by setting properties. Although Magnolia has a dynamic delivery tier, the separation of the authoring environment and the delivery environment makes it less suitable for visitor contribution functionality. However, there is a strategy to store visitor submitted content in a different workspace within the JCR and replicate that workspace back to the authoring environment. Magnolia does maintain some add-on modules to deliver community oriented functionality (such as forums and polls) but they are extremely simplistic and not well documented. They should, at best, be considered as a starting point on which to build custom functionality. Modules can be downloaded from the Magnolia Subversion repository which organizes them into Community and Enterprise modules. The Enterprise section of the source code repository is password protected. Module support is relatively new for Magnolia, so users can expect more development in this area. Also in the future, Magnolia plans to introduce more visitor facing interactive features. They are currently working on a marketing module that will have functionality like A/B testing and SEO optimization tools like a Google sitemap. Another option for presentation is through a third party JSR 168 (See Glossary JSR 168) compliant portal product. A Magnolia instance can be wrapped in a JSR 168 portlet and subscribe to updates from the authoring instance. Sold separately, there is a Web Services for Remote Portlets (See Glossary WSRP) module that publishes content out to this standard. The WSRP module is new and has not yet been aggressively sold as a product. Caching configuration is done at the page level based on URL rules. By default, all pages are cached in the delivery tier. Caching is then turned off on a page by page basis for pages that have dynamic or personalized behavior. The Magnolia architecture now tolerates co-existence with other security and cache filters. However, setup has been reported to be tricky. Large, high traffic Magnolia implementations tend to deploy across multiple delivery servers which, although expensive from a hardware prospective, is simple enough to do using Magnolia's publish/subscribe model.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Cache is configured at the page level through AdminCentral by editing URI nodes.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Magnolia is a small company committed to being a dedicated software company rather than a software and services hybrid. The challenge for them is to build the Enterprise Edition license base to increase revenue on licensing and support fees. But it is not easy for the company to convert Community Edition implementations into Enterprise customers because the Community Edition gives most users what they want. Magnolia is actively growing its partner program. Most of the official integration parters are in Europe but there is a growing number of North American systems integrators that do Magnolia implementations. However, there is only one official U.S. based partner. The monetary hurdle to be listed as an official partner is minimal ($1,250) but there is a requirement to buy a high tier support package. Consultancies that make this commitment get a 25% commission on sales. The mailing list for Magnolia is active for a community of its size and it is your best resource for information other than paid Magnolia support. The documentation is very thin, especially for the 3.x releases. The Community wiki (http://www.magnolia.info/wiki/ recently ported over to Confluence) is frequently better than the official documentation (http:// documentation.magnolia.info). There are some Javadoc comments in the code base and the code is relatively well named and easy to follow. From a social perspective. Most of the action happens in Magnolia's home territory in Switzerland where there are social gatherings. Magnolia has opened an office in New York where project co-founder Boris Kraft spends some of his time building Magnolia's U.S. presence.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Conclusion
Table 3.7. Magnolia 3.5 Enterprise Summary
Category Contributor Navigation Content Entry Score Explanation The tree-based AdminCentral interface is the content contributor's starting place, but from there he can launch into the preview view for an in-context editing experience. The content entry model is very page oriented, which has the advantage of being intuitive for most uses, but is weak for semantically meaningful content types such as an "event." WebDAV would help with uploading binary files. Spell check would be helpful too. Magnolia is the only product in this category without a strong link management system that discourages users from actions that break links. Full versioning support with new version created with every save. The page oriented editing model does not naturally lend itself to high levels of content reuse. While many customers use Magnolia to run multiple language sites, parallel localization is new to Magnolia and this first attempt is rough. The standards based JCR repository is well designed for integrating Magnolia content in other applications. Java developers will be familiar with writing JSP/Java code to interact with any relational data source. A basic approval workflow comes out-of-the-box. More complex workflows are supported by pre-integrated OpenWFE workflow engine. The JSP based delivery templates are easy for a Java developer to work with and the paragraph model can be easily translated into page components for building flexible pages. The Sitedesigner tool enables non-programmers to control the look of the site. The main question is where to put user submitted content since the publishing model is based on a uni-directional flow between the back-end authoring server and the front end delivery servers. XML orientation tends to output clean XHTML. User friendly, human readable URLs are supported. None The new documentation site (http:// documentation.magnolia.info) is extremely thin and does not cover any of the new features introduced in 3.5. The developer and user mailing are fairly active as the community version of Magnolia is widely used. Below Average; Average; Above Average; Exceptional.
Link Management
Content Integration
Workflow
Interactivity
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Magnolia hit the market at precisely the right time: when buyers were looking for a simple, easy to use Java based WCM product and when the JCR was beginning to emerge as a stable platform to build on. As a result, Magnolia International has been able to rapidly build a compelling product without the need of venture funding. While Magnolia CMS has enjoyed widespread adoption, the company has been trying to figure out a business model that will turn the success of the product into corporate growth. From a business perspective, Magnolia International is somewhat between Alkacon and Alfresco. Like Alkacon, the free version is, in itself, a useful product that large companies can deploy. Like Alfresco, Magnolia asks their customers to buy the Enterprise Edition to be eligible for support packages. However, Magnolia is not as forceful in pushing users to the Enterprise version... probably because they don't have the same venture capital revenue pressures that Alfresco does. Although Magnolia (the company) is small, it seems stable and solid and gets high remarks from their customers. As an open source project, Magnolia appears to be a safe technology to adopt because of the install base and the development energy behind the underlying technology: the Apache JackRabbit JCR.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
OpenCms 7.0.3
For quite some time, OpenCms has enjoyed the distinction of being considered the most mature and best organized of the community based open source Java web content management systems. With the release of the 6.x series and the recent 7.x release, OpenCms has been able to stay relevant despite an influx of new competition, and remains a viable option for companies looking for comprehensive basic web site management. OpenCms has also avoided Web 2.0 functionality such as user generated content and social media. For a company looking for a solid platform with extensive traditional WCM functionality, OpenCms is an attractive option. The budget conscious will appreciate that, unlike new market entrants, OpenCms is a fully open source product and requires neither commercial licensing fees nor an obligation to buy support contracts. The size of the OpenCms install base makes working with the community a realistic option for getting support. For companies seeking additional support, Germany based Alkacon Software sells commercial-style support packages as well as a bundle of commercial extensions targeted at enterprise installs. Beyond Alkacon, over 100 systems integrators are registered as official OpenCms solution providers. From a usability standpoint, OpenCms is decent, but not exceptional. Customers that are looking for newer, flashier, Web 2.0-style user interfaces tend to select other products such as Magnolia. OpenCms straddles the line between a community and commercial open source product. Whereas Alkacon once subtly positioned itself as a premier provider of OpenCms services and software, it now calls the product Alkacon OpenCms and has taken more overt ownership of it. Since they own the copyright on the source code and the OpenCms name, they are well within their rights. And given the influx of commercial open source products in this category, the move has served OpenCms well.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Project Overview
Table 3.8. OpenCms Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: Common Uses: Sample Customers: http://www.OpenCms.org 1999 7.0.x since July 2007 Community. Commercially supported by Alkacon. LGPL with commercial extensions available. Global, with a concentration in Europe. Brochure sites, informational intranets, news sites. The public facing web site of The North Face [http:// www.thenorthface.com]. The public facing web site of Virgin Money Australia [http:// virginmoney.com.au]. Frameworks and Components Integration Standards: Java Support: Application Servers: Databases: Apache Lucene, Digester, EHCache, JTidy, PDFBox WebDAV, EJB, XML 1.4, 1.5, 1.6 Tomcat (default), JBoss, Websphere, WebLogic MySQL, Oracle, Microsoft SQL Server, Sybase
History
OpenCms was originally developed in 1999 by the interactive agency BKM Online Medien GmbH as a proprietary product called MhtCms. In 2000, the product, then at version 4, was released as OpenCms under the LGPL. The OpenCms core team chose the LGPL because it allows developers to use OpenCms within other applications without the viral effect of the GPL, but prevents the commercial sale of enhancements made to the OpenCms core without contribution back to the community. Third party vendors, such as QBizm, have legally created proprietary extensions to OpenCms that they sell under a commercial license. In 2002, several of the core developers from the original MhtCms team formed the company Alkacon Software, which has done most of the development on the product and hosts the OpenCms.org web site. Still, OpenCms is an open project and there are external contributors and committers. In particular, the database connectors for PostgreSQL and Microsoft SQL Server are managed by external developers. Alkacon practices a software based business model with most of its revenue coming from consulting, training, and support contracts on the OpenCms platform. Nearly 100 customers currently buy support contracts from Alkacon. The relationship between Alkacon and the community is interesting. Unlike many community open source projects that form neutral non-profit holding organizations to own the code and govern direction, Alkacon owns the code and runs the project. Increasingly, Alkacon and the OpenCms web site (which Alkacon runs) refers to the product as Alkacon OpenCms. This appears to have a positive effect of reducing the ambiguity around the project, both externally and internally, and clearly positioning Alkacon as the OpenCms supplier. The general feeling within the community is that of loyalty and gratitude, and there has been backlash when other companies have tried to wrestle control of the community from Alkacon. Because they own the code, Alkacon could create a dual licensing scheme or change the license entirely. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 79
Product Evaluations
Architecture
OpenCms is probably the most mature of the open source Java WCM products. On the downside, however, the architecture tends to show its age. While newer Java WCM products are taking full advantage of the latest frameworks and components such as Spring, Hibernate, and the various Ajax libraries, OpenCms is essentially built from the ground up with the exception of a number of third party XML libraries and the Lucene search engine. This is understandable given that most of these frameworks did not exist or, at least, were not mature when OpenCms was originally developed. That said, OpenCms is current with the latest versions of Java and releases of the major application servers and servlet containers. Some customers have reported success integrating OpenCms into applications using these modern frameworks, but this is far from the mainstream. The technical leadership of the project has prioritized adding features and fixing bugs over refactoring the architecture to leverage these frameworks. In hindsight, this may have been the right choice given the shifting popularity of the various frameworks and the fact that some of these features are being incorporated into the core Java platform. Unless you are already familiar with these third party frameworks, their absence makes the application easier to understand by eliminating layers of abstraction and indirection. OpenCms is divided into three major components: the OpenCms core, where all the business logic is executed; the delivery tier, which executes JSP templates to render the site and also runs the "Workplace" (a web based client for contributing content and administering the system); and a database adaptor layer that manages persistence in a SQL compliant database. Alkacon's support packages are available for OpenCms installations running on Tomcat or JBoss application servers, although customers have also reported success on WebSphere and WebLogic. Content, template code, and configuration files are stored in a "virtual vile system" (VFS) that is backed by a relational database. The introduction of WebDAV in version 7 has finally made the VFS useful as a file system. Prior to WebDAV support, there was an awkward synchronization strategy where a physical file system directory served as a proxy for the VFS that synchronized with the directory on a periodic basis. This was most problematic for developers who wanted to develop their templates on a file system so they could use their favorite IDE, then push them to the VFS where they could be executed and tested. There is also a third party Eclipse plugin that facilitates editing code in Eclipse. Behind the VFS is a relational database. The database schema is fully UTF8 compliant and makes use of blob fields to store both binary and XML based assets. Metadata properties are stored in a property table. Alkacon will support OpenCms on a number of databases including MySQL, Microsoft SQLServer, Postgres, and Oracle. Alkacon's commercial OpenCms Enterprise Extensions (OCEE) module for repairing a corrupted VFS (VFS Doctor) is compatible with MySQL 4.1 or 5.0; Oracle 8.1.7, 9.x, 10.x or 11.x.; PostgreSQL 8.x (only with OpenCms 7.0.2 or newer) and MS SQL Server 2000 and 2005. The "Workplace," which is used to administer the site as well as edit content, is an ambitious Javascript application that works well on the recent versions of Internet Explorer and the Mozilla Gecko engine (Firefox and Mozilla browsers); using Opera or Safari browsers is not advised. Even though one could legitimately argue that the Workplace was AJAX from the beginning (before the term "AJAX"), some AJAX oriented enhancements introduced in version 7 has made the client more responsive and stable. The property dialogs that pop up to accept user input are extensible to a degree, but you do not want to go tinkering with the user interface code. OpenCms also ships with a command line interface that is suitable for scripting tasks against the repository and executing business logic. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 80
Product Evaluations
The OpenCms content model is broken down into two high level classes: XML Page for mainly unstructured content, and XML Content for structured content classes. Like several other products evaluated in this report, an XML Content type is defined by an XML Schema (described in an XSD file) and stored in the repository as XML documents. The schema defines the content structure (as in the fields and their data types) and also the form elements used to edit them. OpenCms comes with a pre-defined set of data types (OpenCmsBoolean, OpenCmsColor, OpenCmsDateTime, OpenCmsHtml, OpenCmsLocale, OpenCmsString, and OpenCmsVfsFile) that come with their own basic validation logic. Additional custom validation logic can be defined within the XSD using regex syntax. There are 14 form widgets to choose from. These include default widgets for the basic data types and some extended widgets for compound elements like an image gallery.
Structured XML Content is edited through a form. Content is stored in hierarchical folder structure within the VFS. version 6 introduced the concept of "siblings" that are like symbolic links except, rather than a target and a reference, siblings behave like true peers. Siblings can also have different values for their metadata attributes. Siblings are distinguished by a small arrow on the asset type icon and there is no way to tell which one is the original and which is the reference. An Apache Lucene based search service does full text and metadata of XML Content and XML Documents. There are also extractors to full text index binary formats such as Microsoft Office and PDF. Through the Workplace, an administrator can configure multiple search indexes by specifying the section of the directory structure to include, filters, and sort order. Only the default metadata fields are available for inclusion in the search index. Custom attributes within structured content are only indexed as part of the full text index. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 81
Product Evaluations
OpenCms supports the definition of different search indexes through the Workplace graphical user interface. A search index is defined by the folders to include, filters, what fields to search, and instructions for ordering. There are extensive and well documented APIs for the OpenCms core, the Workplace, and front-end modules. Surprisingly, however, there is no Web Services or REST style API that ships with the product and there do not appear to be any modules that provide this interface. Most developers write their own XML over HTTP interfaces using the JSP delivery tier or in modules. Extending OpenCms is done through the addition of modules that are implemented as Java packages and registered through the administrative user interface. Modules can be built to extend both the Workplace and the front end web site. Modules can be exported through the UI. Doing so creates a zip file with the necessary code and configuration information that is read into the system configuration when the module is imported. One of the areas that OpenCms excels in is hosting multiple web sites on the same instance of the platform. At the root of the VFS is a node called "sites." Out of the box, OpenCms comes with a default site, but more can be added by editing the OpenCms system configuration file. After that, the site can be configured in the Workplace. URL management is done in through Apache Virtual Hosts. Within this configuration, editors see the entire content tree in the Workplace. Through project roles, users can be prevented from editing sites that they should not have access to. OpenCms supports clustering for fail-over and load balancing, but not a multi-tiered architecture or separate instances for staging code or content. Instead, OpenCms creates sandboxes called "projects" that serve as virtual environments to edit and preview content and code. For a multiple tier configuration, Alkacon's commercial Cluster Package provides functionality to manage a cluster and replicate content and code between multiple OpenCms instances. The cluster package also provides database transaction and LDAP support. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 82
Product Evaluations
The Database Replication Module allows an administer to replicate the repository to a remote server. This is useful for multi-tiered architectures with a content production instance and a delivery instance.
Content Contribution
The primary power user interface for managing both content and the OpenCms application is the "Workplace" which works in two modes: Explorer and Administration. Content editors work in the Explorer mode that is modeled after Windows Explorer. On the left side is a tree base structure that contains a folder structure; on the right side is the detail pane. Clicking on the icons launches context sensitive menus that list actions available to the user. While Workplace is an impressive display of Javascript coding, there are some usability issues that have the potential to frustrate some users. First off, the application is entirely model. That is, the user can either be editing an asset or exploring the repository, but not both at the same time. Unfamiliar users clicking to edit an asset assume the window they are taken to is a popup dialog, and close the entire application when trying to exit the edit form. It takes them a while to remember to click on the "X" button within the button bar of the application. Second, navigation of the content tree is based on file names rather than titles so it is not always easy to know what the content asset is about.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Power users and administrators use the Workplace to edit content and administer the site. Creating new content follows a wizard-like process of first determining the type of content, then editing the metadata values, then saving the asset. After that point, the user can edit the asset and the Workplace shows the appropriate interface: either a Microsoft Word-like window for XML Pages or a forms based editor for XML Content.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
XML Pages, or unstructured content, are edited in a simple MS Word style dialog. OpenCms supports advanced WCM concepts such as contributor sandboxes, strong versioning, access control, dependency management, and localization. For contributor sandboxes, OpenCms uses a "projects" metaphor. In other WCM systems, projects would be called "workspaces," "sandboxes," or "stages." Content in the "online" project is what external visitors to the site see. To safely edit an asset, a user checks the content into an offline project. This locks the asset so that it cannot be edited in another project. Depending on the user's privileges, he can either check the asset back to the "online" or live project, or submit it for review so that someone else can check it back in. The configuration of a project controls what content can be checked into the project, who can view and edit content that has been checked into the project, and who can approve content to be checked back in. Behind the scenes, OpenCms creates a collection of tables for every project. Users with sufficient privileges can also lock assets in the central staging project. New with version 7 is the ability to break a lock. Unpublished modifications are marked in the Workplace with a flag icon. Assets can be published individually, recursively through a branch of the directory structure, or the whole project can be published. As of version 6.0, a link checker automatically runs whenever items are published and the user is shown a report of issues if any occurred. version 7 improved link management with the introduction of a "content relationship engine." This allows OpenCms to warn a user if the asset that he is about to delete will create a broken link, and automatically updates links if content is moved or renamed. The relationship engine also publishes dependent assets such as images, when an asset is published. From within the Workplace there is also a feature to check for broken external links. This process runs through the content repository to collect external links and then verifies that those external pages are accessible. The relationship engine is exposed through the API.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The OpenCms content relationship engine checks for dependencies and warns a user if deleting an asset will break a link. The OpenCms localization strategy is based on the use of "siblings" mentioned earlier in the architecture section. Siblings behave like symbolic links but they can have different metadata values. With this technique a single asset has multiple values for each of its elements (such as "body" and "title") - one for each language. The display template looks at metadata attributes on the content asset or on the enclosing folder to determine which localized value of the attribute to display. This strategy keeps the different localized versions of content in sync and allows fall-back logic to display the asset in another language if the requested translation does not exist.
The OpenCms localization strategy involves overloading elements with different language versions and then using siblings to place the same content asset in multiple folders. Up until version 6.0, content authors and editors needed to work in the Workplace. version 6 introduced an in-context editing feature called "Direct Edit." Direct Edit allows a user to navigate through the rendered site (as a regular user would) and click on icons to edit regions of the page in a fashion similar to Hummingbird's (now OpenText's) RedDot product Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 86
Product Evaluations
(OpenCms uses bull's eyes, not red dots). The icons are visible both in detail and list views of content and pull up the full asset editing form. Direct Edit can only be activated on "offline" projects. Direct edit has had a huge impact on the perceived usability of OpenCms and made it competitive again with other products like Magnolia. Like Magnolia, however, in order to use the browse to edit interface, the user must log in through the Workplace and then launch a page as a starting point.
The Direct Edit interface provides browse to edit functionality and allows casual users to spend less time in the Workplace. Depending on the browser, OpenCms has a number of WYSIWYG editors it can present to the user. All of the editors have the option to browse the server to create an intra-site link. Linking pages in this way, as opposed to by URL, allows OpenCms to register these links with the relationship engine so that they can be managed. There is no embedded spell check but there is a convenient button to strip the extraneous formatting from text copied from Microsoft Word. Presumably many OpenCms authors do most of their writing in Word and then copy in their text when it is ready for the web. This is not unlike other products; both commercial and open source. Image handling is much better with version 7. In addition to WebDAV support that allows a user to drag images into the VFS using Windows Explorer or another WebDAV client, there is also automated image scaling and manipulation functionality built into the product.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
From the WYSIWYG editor image button, a user is able to select an image from the repository and set sizing parameters that will control the automated image scaling functionality. OpenCms has a feature to import a zip file containing a static HTML site. The OpenCms import tool is somewhat better than its peers because it allows you to use regular expressions to extract the body of the page (stripping out all the layout and branding embedded in the static HTML page) and apply a presentation template. The importer also parses through the HTML and corrects links. Content imported in this way can be managed as actual content, not unstructured HTML files. In the version 6 series, the workflow feature of OpenCms was just a generic task list that is totally separate from the explorer view. Workflow items were not associated to content or publishing events and were visible only on a different view of the Workplace (the Workflow view). Content approval uses the "project" system - not workflow. Version 7 was supposed to have a major upgrade to the workflow capabilities by integrating a legitimate workflow engine but the sponsor for that effort backed out. As a result, version 7 was released without a workflow capability. However, the consultancy BearingPoint has published an early version of an add-on module (Workflow2) that provides workflow functionality.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Alkacon's OCEE LDAP Connector provides LDAP support. Being a mature product, OpenCms is a stable platform with a rich set of administrative features and utilities such as a Content Tools section that allows an administrator to make global changes to content in the repository. There is a diagnostic check the validity of content in pages and there are tools to rename elements of existing content for when the content model changes. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 89
Product Evaluations
OpenCms provides a number of tools to make global changes on content in the repository.
Presentation
Presentation templates for OpenCms are written in standard JSP using the JSTL plus a custom tag library provided by OpenCms under the "cms" name space. There are a couple of limitations such as lack of support for the newer XML style JSP syntax and certain styles of includes. However, template developers with a familiarity with Java and JSP will feel right at home. As with all tag libraries, there is a constant tension between simplicity of the API and simplicity of the code. Providing too many functions makes the library hard to manage; too few means that developers have less helper functions and need to write more verbose template code. OpenCms has kept its tag library small and developers frequently resort to using inline Java print statements rather than sticking to the tags. Version 7 has improved matters somewhat by upgrading to the 2.4 servlet engine and JSP 2.0 and exposing more objects to the JSP expression language. JSPs are stored in the VFS and can reside in the same folder structure as the content or packaged as modules in the system directory. Like content, templates are versioned and deployed from offline projects to the online project. Managing display template in content folders is a little messy. The only advantage is that if you name a template "index.html" it will automatically be used as the index page of the directory. Since version 6, OpenCms ships with a module that provides a presentation framework called TemplateOne, which is used in the demo site. TemplateOne is flexible and may be a useful starting point for building new sites, but developers can get away with writing simple but less Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 90
Product Evaluations
elegant JSPs. If nothing more, TemplateOne is a good introduction to advanced concepts and clever ways to use the OpenCms presentation tier. For performance reasons, JSP templates are written out to the file system on first request and then served directly by the servlet container. During this process, OpenCms updates paths, renames the files, and stores them in a directory mapped to the project that the content is being rendered out of (for example, online, or offline). Content is associated with display templates through metadata properties of the content assets or the enclosing folder. This is an awkward system because the property value field is free text so the user needs to correctly name the path of the template. The modal user interface does not help matters. The TemplateOne framework makes it a little easier by providing a pick-list for the template attribute. Visitor-facing functionality can be developed by using the Module API. Alkacon has a LGPL licensed set of modules called OAMP (OpenCms Add-on Module Package) available for download. OAMP includes modules for syndicating content over RSS, managing and delivering email newsletters, and a basic web form module. There are also a couple of community contributed modules on the site. Probably the most interesting is the KonaKart module that integrates the popular free Java shopping cart KonaKart with the OpenCms platform. This integration uses OpenCms JSP templates to call the KonaKart SOAP API and display products. For high traffic sites, there are two main options to increase performance. First, OpenCms ships with a feature called FlexCache that is based on the open source caching framework ehchache. FlexCache lets you configure cache parameters on each asset, such as whether to cache a different copy for each user (good for personalized sites), whether to cache different copies if the query string parameters are different, and a cache timeout. Cache settings are done in metadata properties of each asset which can get tedious to manage but gives a lot of control. The Administration section of the Workplace provides an interface to invalidate and manage the cache. For higher traffic sites, there is a Static Export feature that stores and serves generated pages on the file system rather then dynamically generating them every time. This setup requires some configuration with Apache mod_rewrite and mod_proxy and is not quite as high performance as a true "baking" style presentation architecture because it cannot publish to a farm of simple web servers.
Product Evaluations
deployed as modules. This is convenient if you are working with OpenCms without internet connectivity, although needing to install a module to read the documentation is a bit of a hassle. Documentation coverage took a while to catch up after the release of 6.0 and is still somewhat spotty. Version 7 put documentation another step back. The OpenCms team has established more user focus as a primary goal for the next few milestones of the project. They have already made significant progress in versions 6 and 7. The book Managing And Customizing OpenCms 6 Web sites: Java/jsp Xml Content Management, by Matthew Butcher, is probably the best resource for a business user; more technical users, trying to do technical tasks, will find it somewhat light. A book on version 7 is upcoming. Alkacon has roughly 10 full time employees (all software engineers) and uses freelancers for graphical and user interface design work. Development on OpenCms has largely been driven by the needs of the Alkacon customer base. Usually features can be tied back to a sponsoring customer's requirements. Since OpenCms was first publicly released in 2000, there has been a major release of the platform roughly every two years. The stable release of version 7 came out in July 2007. While external systems integrators could potentially develop a feature and contribute it to the core, few actually do. This is probably because Alkacon owns the source code and would obtain ownership of the contribution. The one exception is a third party systems integrator that donates database adaptors for different databases. There are several add-on modules available for OpenCms, but they are not organized in one location like other open source projects. The OpenCms.org web site has a module sandbox, but there is not much there. You would have more success doing an internet search for "OpenCms" and the capability that you need, which would probably turn up results from SourceForge and third party developer sites that sell commercial modules.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Conclusion
Table 3.9. OpenCms 7.0.3 Summary
Category Contributor Navigation Score Explanation The main "Workplace" packs a lot of functionality and looks complicated to non-technical users. Pages are listed by filename which can be less than descriptive. However, the Direct Edit interface has introduced significant improvement. OpenCms supports structured and unstructured content assets well. A spell check feature would be useful but Microsoft Word compatibility functionality is a good substitute that authors may prefer. Introduction of WebDAV support is a big win. The new relationship engine tracks dependencies and helps prevent broken links. Full versioning support with new version created with every save. Siblings, while difficult for non-technical users to understand, enable content to be used in multiple locations of the site. The sibling framework supports related translations of assets but it makes the user interface complex. Most companies use a naming convention to distinguish between different translations of an asset. Straightforward JSP templates make it easy for any Java developer to interact with any data source. WebDAV support has made the repository accessible to other technologies. However, a SOAP or REST based API would be helpful. While workflow has been stripped out of version 7, the "project" editing model supports basic approvals. The JSP based delivery templates are easy for a Java developer to work with and the paragraph model can be easily translated into page components for building flexible pages. The standard Java orientation makes building interactive applications straightforward for Java developers and the module framework facilitates the deployment of the applications. However, the delivery tier (with all of its caching) is optimized for information display. XML orientation tends to output clean XHTML. User friendly, human readable URLs are supported. Managing And Customizing OpenCms 6 Web sites: Java/jsp Xml Content Management (Paperback), by Matthew Butcher. A new edition that covers version 7 is in the works. Much of the documentation is deployed as modules that can be installed on an instance of OpenCms. The mailing list is very active and helpful. Below Average; Average; Above Average; Exceptional.
Content Entry
Content Integration
Interactivity
SEO Books
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
OpenCms is a mature and stable platform with many of the features that one would see in commercial products. OpenCms is particularly strong in the basics (check-in, check-out, versioning, and organizing content), but is not designed for building interactive, Web 2.0 style applications. Usability, once a real weakness for OpenCms, is being steadily refined after a big improvement with the Direct Edit user interface. Reasonable support prices make OpenCms one of the least expensive platforms to operate. OpenCms has a strong user and developer community anchored by Alkacon Software. Most of the active community is in Europe, where Alkacon is headquartered and where several agency style consultancies have built practices on delivering OpenCms powered web sites; there are fewer OpenCms integrators in the United States (28 North/South American solution providers listed on the OpenCms.org web site). Since the bar is low when it comes to being listed, be sure to qualify a prospective OpenCms solution provider.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Product Evaluations
using open source to freely share code across corporate boundaries and change developer's mindsets, companies are finally experiencing the levels of re-use that object orientation promised. Companies like these are typically very demanding of the architecture and want full control over every aspect of the system. They have strong technical capabilities and are constantly on the brink of abandoning the third party framework in favor of a custom solution. Companies that hold back from this temptation and strike the right balance between customization and compromise are usually the most successful. These companies are able to assemble, rather than build, custom applications to get the technology they want with less risk and cost than traditional custom software.
Content Contribution
Typically the platform is only used for its repository and editorial interfaces and not for its delivery functionality. In order to support the high degrees of dynamism and content reuse that is typical for this category of web application, the content managed needs to be highly structured and to have high quality metadata. Of course, there is a trade-off. More structure generally means more user interface complexity and less similarity to the grand-daddy of unstructured authoring tools Microsoft Word. A good rule of thumb is to impose as much structure as your users can stand. If you exceed the amount of complexity that users are willing to tolerate, they will undermine your best attempts at enforcing high quality data standards. For example, if you insist on turning what a user perceives as a large text area (such as an article body) into a collection of repeating paragraph elements, the average content contributor will probably paste the whole article out of Word into the first paragraph element. All your ideas around pagination and inserting promotional items between paragraphs will be undermined. Balancing usability and structure is a negotiation between the competing interests of the content contributors who want something like Microsoft Word, and the content consumers who want a dynamic presentation tier that needs structured content to deliver the right information in the right format at the right time. The CMS Framework can help by being generally easy to use (if the contributor is already swearing by the time he opens the content asset for editing, you have lost the battle), flexible (so you can quickly make adjustments by adding or combining fields), and have a good input validation framework (to help you be firm when you have to). XML based technologies have the advantage of having inherently flexible content models, although similar behavior can be achieved through highly abstracted and normalized relational database schemas. All of the technologies reviewed in this section allow a system administrator to change the structure of a content type through the user interface or by editing configuration files without having to worry about restructuring the database or updating the content entry forms. There are a number of techniques that a CMS can use to enforce input validation, and the better platforms work on multiple levels. The first level is data typing support. If the delivery tier needs a date field for its display logic (to do things like list articles in reverse chronological Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 98
Product Evaluations
order or search for event assets within a date range), the CMS should be able to designate a date attribute to be of datatype "date." The same goes for numeric fields like "price." If the CMS fully understands the content model, it can automatically perform some input validation. Custom validation can be done at the client and server side - preferably both. Client side validation is useful because it saves a trip to the server and gives the user more immediate feedback. Server side validation is more reliable and can be more sophisticated (for example, checking to see if a zip code is valid). Of course, AJAX is blurring the line between server and client side logic because it allows the client to call server methods without submitting the whole page. Input form controls such as radio buttons, check boxes, pick lists, and drop-down lists are helpful because they prevent the user from even trying to enter invalid values. The CMS should have a rich library of form controls that can be configured with the appropriate validation logic. Advanced functionality like dependent select lists (where, for example, the values in "state" field change based on the country that has been selected) are also useful. These systems frequently need to handle large volumes of content and a business user needs to be able to find the appropriate assets to edit. The abilities to organize content within the repository and search are critical.
Presentation
Many products in this category publish content into a separate presentation tier that is potentially not even on the Java stack. In order to be useful in this way, a platform needs to have functionality to deploy structured content into another application's repository. The most common way of doing this is through XML. An adaptor is built onto the presentation tier application to read in XML and store it in its local repository. This is also a good time to execute logic to clear caches and update search indexes. Getting structured content into an XML format should be easy work for any WCM: deployment is usually the more challenging requirement. Getting the files (XML and associated binaries: images, flash, pdf, audio and video) onto the server is not so much the problem as knowing which files to push. Ideally, you do not want to re-publish content that hasn't changed. Breaking dependencies by failing to publish linked pages and images is equally problematic. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 99
Product Evaluations
Publishing directly to an external database is another alternative and several commercial commercial alternatives have database adaptors (Interwoven has DataDeploy, Percussion Rhythmyx has Database Publisher). In the absence of these adaptors, typical approaches include using a workflow event or template code to call functions that write to an external database. However, there needs to be some way of notifying the de-coupled presentation tier that data has changed and to clear its display caches.
In the structured publishing pattern, the CMS publishes structured data into a de-coupled delivery tier. The two use cases to consider when designing these de-coupled architectures are preview and linking. To achieve an accurate preview, the CMS will have to push the new version of the asset to a content staging instance of the delivery tier. How much content and when depends on the requirements. Single page preview is relatively easy to achieve, whereas full site preview (where a user can browse around the site to see where the asset appears) is more challenging. A common short cut is to build "low fidelity" preview templates in the presentation tier that come with the CMS. The risk of this approach is that the CMS preview templates may fall out of sync with the production templates as the site is updated and re-branded. As for linking, the issue is that, since the presentation tier owns the URLs, the WYSIWYG editor will have a difficult time constructing link tags (<a>) because the URL of the target is unknown at the time of editing. There needs to be some process that transforms a link target to a URL that will be recognized by the presentation tier for re-pointing. Technologies that have dependency management functionality that scans rich text areas for intra-site links have an advantage because they have the hooks to invoke link re-writing logic. If the presentation tier of the CMS Framework is used to render the web site, it should be flexible, easy to manage, and leverage widely known and/or easy learn and manage technologies. Look for standards based (like JSP and XSLT) and widely used templating languages (like Velocity and Freemarker). Even more important, however, is the controller. To support a transactional web application, a CMS Framework should leverage a capable MVC based framework (like Struts, Struts2, Spring MVC, Tapestry, or Wicket) or be able to work in conjunction with one.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Product Evaluations
LifeRay [http://www.liferay.com/web/guest/home] portal. Dotmarketing has deployed dotCMS for a number of customers. Interestingly, dotCMS also includes CRM and eCommerce. InfoGlue (www.infoglue.org). InfoGlue has a nice looking web site but very little is happening with the project. However, there are some systems integrators in Asia doing InfoGlue deployments for non-profits and NGOs like the United Nations Viet Nam site. mmBase (www.mmbase.org). mmBase was originally developed in 1995 by the Dutch Public Broadcasting organization VPRO (www.vpro.nl); the project was open sourced in 2000. VPRO did all the right things to start an open source project: they created a nonprofit foundation to own the code and worked to build a community. There was a time when some large multi-national companies like IBM were building sites on mmBase, but the project has been on the decline. Most of the activity is from a few freelance developers and small systems integrators building various web sites. The project is still actively being developed and release 1.9 is targeted for May 2008. Content Here will continue to watch this one.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Project Summary
Table 3.12. Alfresco Enterprise Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: http://www.alfresco.com 2005. WCM launched in 2007. 2.2 since February 2008. Commercial: tiered product model. Alfresco Community is licensed under the GPL with a FLOSS exception. The Enterprise Bundle has a commercial license. Alfresco Software Inc. is headquartered in the UK with some staff distributed across North America. The user community is global with concentrations across Europe and North America. Repository services for custom web applications. Electronic Arts runs their EASports site [http:// www.easportsbig.com/] on Alfresco WCM. The Harvard Business School Publishing [http:// www.hbsp.harvard.edu] site runs on Alfresco WCM. Frameworks and Components: Integration Standards: Java Support: Application Servers: Databases: Apache MyFaces, ehcache, FreeMarker, Hibernate, jBPM, Lucene, OpenOffice, Rhino, Spring, Velocity JSR 168, JSR 170, WebDAV, Common Internet File System (CIFS) 1.4 and 1.5 Tomcat, JBoss, Websphere MySQL, Oracle, MS SQL Server
History
Alfresco is a generously funded software company with a commercial enterprise software pedigree. The company was founded by John Newton (co-founder of Documentum) and John Powell (former CEO of Business Objects) and they have rounded out their team with senior people from Novell and Interwoven. The fact that the early team came from Documentum is clearly visible in the product with its early focus on document management, repository services, and access control. Since those early days, the Alfresco team has worked hard to layer in web content management functionality and support for structured content. Development of Alfresco started in January 2005 and the team has made tremendous progress in both building the software and visibility for the company. Alfresco describes itself as the first and leading open source ECM product - a claim that frustrates companies like Nuxeo whose ECM products pre-date Alfresco. While Nuxeo was there first, few can argue with the fact that Alfresco has put open source on the map as a viable alternative to commercial ECM products. Commercial vendors and open source projects alike have adjusted to Alfresco's market disruption. Nuxeo ported its ECM product from Zope to a more familiar Java platform. Commercial vendors are reconsidering their pricing and value propositions. Alfresco's drive into the WCM space got serious when they recruited Kevin Cochrane and other thought leaders from Interwoven and, in the process, accepted that the Documentum view of WCM was just not good enough. WCM was officially launched with release 2.0 of the Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 105
Product Evaluations
Alfesco Community Edition in July 2007; first customers launched in August. As of January 2008, there are roughly 43 paying Enterprise Edition customers using Alfresco WCM (as opposed to 300 customers using the basic ECM platform) and roughly half of these sites are live.
Architecture
Alfresco gets the attention of software architects and Java developers for its standards support and its use of popular open source components and frameworks. The first thing you notice when you download Alfresco is that it is a lot of software. The lib folder is packed with 108 JARs totalling nearly 40 megabytes; that is a lot even by Java standards. What that provides is some of the most modern and elegant open source components and frameworks around. In some ways, you can think of Alfresco as one big supported bundle of best-of-breed open source software projects. Reusing these components is what has enabled Alfresco to develop their product so quickly and stay current with the latest technology and standards.
Alfresco has a very open service-based architecture that supports a number of standards. Source: Alfresco documentation site. Alfresco's standards support and openness makes it very effective for integration with other systems and use in service oriented architectures. When Alfresco first hit the market, it was positioned as a framework for building any kind of content centric application and the web client was merely an example of what you can do with the platform. Today, many architects still look at Alfresco as an ideal building block for larger architectures. Java, PHP, and Web Services APIs expose most of Alfresco's functionality. The repository is accessible over WebDAV, Common Internet File System (CIFS), and FTP. CIFS support, which allows a Windows user to map a letter drive to the repository as if it was a Windows file server, is one of the Alfresco team's biggest achievements. Long time Unix users will remember what an impact that Samba [http://www.samba.org] had by allowing Windows and Unix to share files Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 106
Product Evaluations
over Microsoft's proprietary standard. Alfresco has the only Java implementation of a CIFS client. One could say that CIFS is the user interface that engenders the most pride from the Alfresco team and the most adoration from business users (see commentary on the web client later). JSR 168 (the Java Portlet Standard. See Glossary for JSR168) and JSR 170 (the Java Content Repository Standard - level 2. See Glossary for JCR) are supported. Business Process Execution Language (BPEL. See Glossary for BPEL) support is provided by Alfresco's inclusion of JBoss's jBPM workflow engine. The key to the Alfresco architecture is the repository whose node based hierarchy is similar to the Java Content Repository. Indeed, the Alfresco JCR interface complies with level two of the JCR specification. There are a couple of places where it is difficult to use the JCR calls and you need to resort to the native Alfresco repository API such as observation feature (where you can monitor a set of assets and then be notified if there is a change). This is more a function of the JCR's newness than Alfresco's recalcitrance, but internally the Alfresco team is critical of the JCR. Hopefully, they will use their position on the JSR 283 team and ideas to improve the JCR specification. Early in 2007, Alfresco created publicity around their JCR benchmark tests and claimed to be the fastest open source JCR implementation (faster than the other: Apache JackRabbit). They had a platinum partner certify the results. However, the JackRabbit configuration was using the default file system persistence rather than the much faster relational database persistence that most non-demo implementations of JackRabbit are configured with. Still, the definition of a benchmark was a great contribution to the community. The Alfresco Repository is composed of three core services: the Node Service, the Content Service, and the Search Service. Together, these three are called the "Foundation Services." The Node Service manages the metadata of content objects or "Nodes." Alfresco's definition of Node maps directly to the JCR definition. Every content asset is a node placed in a hierarchical tree. Node metadata information is stored in a relational database (MySQL by default, although most database platforms are supported thanks to a Hibernate object relational database layer). The Node service is used for organizing and browsing content. Every content object in Alfresco is stored in a file: XML for structured content, HTML or native binary formats for everything else. These files are managed by the Content Service which takes care of things like retrieving the proper version of the asset and encapsulates the mechanics of persistence. Currently, metadata is not versioned within Alfresco, only the actual files. The Search Service uses Lucene search indexes that are also stored on the file system and is also used in the on-board search functionality and for listing operations in display templates.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Alfresco's repository architecture is based on three core services: Node, Content, and Search. Source: Alfresco documentation site. Additional services may be added to the Alfresco repository by registering them with the Registry Service. All the other repository functionality is built on top of these three services. This includes: Content transformation and image manipulation, metadata extraction, templating, classification, versioning, locking, workflow, and permissions. Alfresco has a modular architecture to allow for plugins called AMPs (Alfresco Module Package). Modules are encapsulated and kept separate from the core execution logic to ensure system stability and clean upgrades. A command line management tool called MMT (Module Management Tool) installs, removes, enables, and disables modules on the system. The introduction of WCM to the Alfresco architecture forced the company to make some major enhancements to the repository. A key change: the introduction of the Alfresco Versioning Model (AVM). The AVM supports functionality like file-level branching, snapshots, and directory level versioning. There is also the construct of "transparencies" that allow one collection of assets to be "overlaid" over another collection to create a view that is the union of the two collections. Where both collections have the same file, the overlaid version is shown; when the overlaid collection has deleted a file, the file is removed from the view. It is this architecture that enables the sandboxes and snapshots that are explained in the content contribution segment of this evaluation. The new AVM supports distributed repository model where multiple repositories can run virtually on a single instance of Alfresco or on multiple Alfresco instances. Content can be replicated between repositories and the process is identical for repositories running on the same instance or for repositories distributed across the network. Replication is based on snapshots that are automatically taken every time content is pushed to the content staging workspace. When replication is initialized, the source repository asks the target repository for a hash of its latest snapshot. The source server then sends over the files that have changed along with the hash of the replicated snapshot (to save the target server the work of computing Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 108
Product Evaluations
the hash of its snapshot). This model reduces the amount of traffic over the network and the amount of workload on the target server that is expected to be busy servicing web traffic. A similar architecture is available for simple file system deployments. In this case, a lightweight daemon is installed on the target server rather than a full blown Alfresco instance. While the repository-to-repository replication functionality was available in version 2.1, version 2.2 provides file system replication and a GUI to manage target servers. Web projects can be accessed through CIFS (as opposed to the ECM standard repository that is accessible over CIFS, WebDAV, and FTP) under a separate mount point than the general ECM repository. Under the WCM mount point, the user will see two directories: data and versions. Under versions there will be directories for v0 through vn - one for each snapshot taken of the repository. This structure allows you to "time travel" to different read-only views of the web project's repository through Windows Explorer. Other mount points can also be defined based on filtered views of the repository. There are some pre-defined ones that restrict what a user can see based on role. Other mount points can be defined through XML configuration files. Despite the fact that web projects are accessible through CIFS, Alfresco does not generally recommend business users accessing the web projects in this way. It is considered safer to have them work in standard ECM project spaces and use rules to push content into web projects. The Foundation Services are exposed through a Java API to create yet more functionality. Developers can also leverage these services through two external interfaces: the Web Services API and the JCR Interface. One of the more exciting integration features: Alfresco's Web Scripts, which allow the creation of a custom REST based API by coding server side Javascript code. Web Scripts are a significant part of Alfresco's strategic move away from their SOAP based API to a simpler REST API. Using the REST that comes out-of-the-box and extending it with custom methods using Web Scripts is a very powerful way of extending Alfresco. It is particularly useful for supporting AJAX calls to provide some additional data driven, client side interactivity. Some systems integrators are using the native REST API extended with Web Scripts to build custom management and delivery applications on top of the repository and eliminating the use of Alfresco's clunky administration interface. As one systems integrator put it: "Alfresco is the ideal development platform for a customer that has mock-ups showing a clear idea of what they want the user interface to look like. Rather than fighting with the UI that comes with the CMS to transform it into their design, we can give them an API and let their web developers build what they want." Alfresco is a great alternative to building a custom CMS from the ground up. They take care of all the content management specific functionality that most developers are not familiar with building, and leave the rest to a custom software development team. Many architects see the REST API and the introduction of Web Scripts as positioning Alfresco as a content service in a service based architecture, or the back end of any number of Web 2.0 style applications. Best practices are still emerging as to how much application code should be written in this layer. No doubt there will be a healthy debate similar to that between programming in the database with stored procedures or in the application tier; the more programming done in this tier, the more the lock-in. Architects looking for standards support may consider integrating through the JCR level two compliant API. There is also a rich set of functionality that is not covered in the JCR spec. For example, Alfresco's supports a construct of the "aspects" to add attributes and functions across different asset types. Adding the "versionable" aspect to an asset makes that asset support versioning; a "searchable" aspect causes the asset to be indexed. This is different from object oriented classing because it is done at the object instance level - the class or type stays the same. Aspects are defined through XML files and manually applied through Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 109
Product Evaluations
the user interface or by business rules triggered by Alfresco's event model. For example, a business user can set a rule to add "categorizable" aspect to content when it is added to a specific folder. Although no compiling needed for defining most aspects, you need to restart the application server for them to be recognized. Alfresco ships with several native aspects that can be added to assets in the repository through the user interface or set by default in the XML repository configuration file. Defining new aspects is a convenient way to add functionality to the system. A developer could add a "synchronization" aspect to push updates to an asset to another system, for example. The Alfresco repository also has an event model that can trigger the execution of code on events such as update, move, or a change in workflow state. While the release of the AVM in version 2.1 introduced many new capabilities to the platform, the user interface of web projects took a step back and is just starting to catch up. For instance, although web content was indexed, and the API supports search, there was no search functionality in the web client until the 2.2 release (late January 2008). Also, content within a Web Project cannot be "made multi-lingual" like the rest of Alfresco content.
Web Projects expose a small fraction of the repository functionality supported in the rest of the application. To manage structured content, Alfresco uses the open source Chiba XForms engine to automatically generate web forms. Defining a new content type is done through a wizard interface that involves uploading a schema definition (.xsd) file and associating workflows and display templates. Content types can be shared between web projects. One of the more ambitious concepts that came over from the Interwoven engineers is "virtualization." Unlike TeamSite, which proxies over to a web server running the presentation tier, Alfresco is provides a container for any "well behaved" (Alfresco's words) Java web application to run in. This allows both code and content to be tested in safe virtual instances running on one instance of Alfresco. Alfresco's virtualization architecture is strictly designed for preview and staging content - not running a production web site. Still, Alfresco does some clever optimization to reduce the memory footprint of multiple virtual environments. So, for Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 110
Product Evaluations
example, JARs used on the delivery application are only loaded once and shared across virtual instances of that application. For complex de-coupled web applications (such as a site running ATG Commerce), version 2.2 supports remote test servers that are not virtualized within Alfresco. Alfresco can deploy content to these servers and then proxy requests for preview and content staging in essentially the same way that Interwoven TeamSite works. To make this happen, test server instances need to be set up beforehand and registered with Alfresco. Since there is a finite number of test servers, only a finite number of people can preview at the same time. Although Alfresco does not have many high traffic WCM sites live today, they certainly have thought about how to do it. The recommended high availability, high load configuration is a three tiered architecture with a cluster of application servers running the delivery application. This delivery tier just needs Alfresco's deployment module to receive content from a cluster of Alfresco servers. For dynamic requests against the repository (for Web 2.0-style applications where end users submit content), the presentation tier can call back to the Alfresco cluster over the REST API. Behind the Alfresco repository would be a cluster of MySQL servers.
Content Contribution
As impressed as technologists are about the Alfresco architecture, users are often less enthusiastic about the user interface that tries to split attention between web content and document management focus. It seems that the Alfresco team is a bit confused about the role of the "Web Client." Originally, it was positioned as a reference application to show what one could build on top of the Alfresco platform and it seemed to get less attention from the engineering team than the programming interfaces and modularity of the system. When Alfresco positions itself as a business application rather than a development framework, it is more likely to defend the web client. Still, when speaking to Alfresco staff, it is easy to tell from the relative enthusiasm between the UI and the architecture that they see the UI as a necessary evil. It is not surprising that many integrators do as little with the web client as they can. At least one of the early WCM implementations had contributors edit content in DreamWeaver and XML editors against a CIFS drive rather than use the Web Client, and then use Alfresco to deploy these files to the delivery environment. When going this far to work around a CMS, one should consider just using a source code control system to manage HTML files. As marginal as the web client is for document management and collaboration, it is even less suited for web content management. Web sites are created in special folders called "web projects." With no tree based navigation or in context editing, the web project user interface is way behind pure WCM products in terms of usability. Content assets are listed by their file name so a user must guess from the file name what the content is about and then figure out what enigmatic icon will execute the desired action on the content. The UI has a "paging style" design where the contents of a folder is shown in pages of 10 assets at a time. The sort columns are limited to very basic attributes: file name, size, modified date, created date, and modifier, so it can be hard to find an asset. The newly enabled search functionality should help here.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The browse view of a sandbox allows a user to navigate through folder structure of web assets. While the problem of browsing and finding content was not a primary design concern for Alfresco's initial WCM release, handling concurrency between multiple content contributors clearly was. Alfresco followed Interwoven Teamsite's approach of creating user "sandboxes" where users can edit and preview content without interfering with other user sandboxes or the production site. Changes made in a sandbox are only visible in that sandbox until checked back into the staging sandbox. Unedited content in a user's sandbox is automatically updated to reflect changes other users submit to staging. Depending on the user's permissions, he can directly check an update in to the staging sandbox or initiate a workflow that will collect the necessary approvals.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Alfresco's sandbox model provides contributors with their own work areas to edit and preview their changes prior to checking back in. The other Java open source WCM project to employ the sandbox model is OpenCms with its notion of "projects." In the PHP world, TYPO3 has introduced a similar work area concept. However, Alfresco's implementation is more sophisticated thanks to its "virtualization" technology that allows a user to browse through the site as it would appear after the modifications are checked in. Like most Alfresco interfaces, the content editing experience is wizard based with control buttons (back, next, finish) on the upper right corner of the page. Although awkward, users get into a rhythm of working their way down a form and then scrolling to the top to continue on or finish the wizard. Input validation (based on Apache Commons Validator) is done at submit time and presents the user with a list of validation layers at the top of the form. As mentioned earlier, the editing forms for structured content types are automatically generated by the Chiba XForms implementation. Although standards based, Alfresco's implementation is less powerful and flexible than Hippo's forms engine. For example, Hippo gives you more control over the layout of the form. Still, there is adequate support to model complex content types including items with repeater and nested elements, and there is a basic set of form controls including a calendar date selector, and a browser widget and other controls can be added. TinyMCE is shipped as the default WYSIWYG editors, but developers report success with other editors, as well. The TinyMCE configuration comes with custom browse dialogs for adding links and image references, but a surprisingly limited of formatting buttons are enabled. More can be added by editing a section of the web client configuration file that sends parameters over to the TinyMCE control. Some buttons, like spell check, require the addition of plugins that can be easily installed (See the TinyMCE web site [http:/ /wiki.moxiecode.com/index.php/TinyMCE:Control_reference] for a full list of configuration options). Formatting buttons can be turned on per content type and field. For example, the summary element of a content asset can get less buttons than a body element.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Alfresco ships with a stingy number of formatting buttons enabled. The image and linking browse controls allow a user to browse and add new targets. The image control provides fields to set the dimensions, position, and alt text of the image.
The image dialog has fields for sizing and positioning the image and for adding alt text.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
In order to use Alfresco's rendering engine, content rules are set to process the presentation templates when the content is saved, creating rendered versions of the source XML content. For example, if you have an article123.xml source file, and rendering templates for a detailed view and a summary view, you may get the files article123.html and articlesummary123.html. The best practice is to store rendered content in a different directory structure than the XML sources. This makes sense because rendered content should be stored as it will be navigated on the external site - not as it is managed. This also enables content re-use because content can be rendered to multiple places. The paths and file names that the content is rendered to are configurable via rules that can use variables and information about the content to determine where to put it. One could use a taxonomy to render content into various folders; alternatively, Alfresco can be configured not to render the content and instead save it as XML and have a dynamic delivery tier do the rendering when the assets are requested. Alfresco also touts its "site import" technology that can import an entire static site in a zip archive. This is useful to quickly deploy a site on Alfresco and does enable library services (check-in, check-out, versioning, access control), but it doesn't provide much in the way of the high value content management benefits such as separation of layout and content, content reuse, and business user empowerment. At best, this approach may be considered a way to incrementally replace a static web site with a managed one. With release 2.1, Alfresco introduced some basic link checking functionality. Users can click a "check links" button from within their workspace and link checking can also be added as an automated step in the default workflow that comes with the product. This is helpful because the complexity of the UI makes visually checking links and images a bit flakey. A user that is familiar with the "make multi-lingual" feature supported in the rest of Alfresco will be disappointed in the lack of localization support within web projects. Companies tend to use primitive work-arounds, like appending a locale code (en, es, fr) at the end of the file name to create localized web sites. The only other alternative is to manage different locale web sites in different web projects. While earlier releases of Alfresco had a simplistic folder based workflow model, Alfresco now uses JBoss's jBPM for workflow services. jBPM is a popular workflow component but it is primarily used for choreographing services across applications in service oriented architectures. Still, jBPM has the name recognition and tool support to justify the choice even if it is a little over the top. Workflow processes are defined in a powerful but proprietary XML language called jPDL (JBoss Process Definition Language). The JBoss jBPM Process Designer Eclipse plugin provides a graphical interface for designing workflows. While designing workflows is very point-and-click, it takes a little more effort to wire these workflows into Alfresco application logic. JBoss jBPM also supports the standards-based BPEL for cross system choreography. When a new content type is defined, through the "web form wizard," it is associated with a workflow. Depending on the workflow, there will be different configuration options that can be selected by the user that submits the content. Workflows can initiate business logic like checking links, and create manual tasks that are emailed to the user and show up on the user's dashboard.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The workflow task form combines some task management plus a workflow state machine. Alfresco allows a content type to be defined with more than one workflow option. When more than one workflow is enabled for a content type, the user can select which workflow to use.
Product Evaluations
pushing code from development instances through QA and to production. It is also for pushing snap shots of production content back to staging and development environments for testing purposes. The deployment user interface can configure directories to include or exclude or file name patterns to exclude. One thing that is missing in the deployment GUI is the ability to save deployment definitions for later reuse. Access control within web projects is limited. Alfresco comes with some pre-packaged roles that should look familiar to a user of its document management functionality: Content Manager, Content Contributor, Content Reviewer, and Content Publisher. A Content Manager has full permissions on the workspace; a Content Contributor can edit and add but not publish; a Publisher can approve content but not edit; and a Reviewer can only read content. More roles can be created by editing configuration files. The real shortcoming of the access control model is that roles are applied at the web project level - not at the sandbox or folder level. This makes it difficult to do things like restrict access to edit a portion of a web project. One work around would be to do this by adding custom roles but a more practical approach is to use workflow to prevent users from publishing content that they shouldn't be editing. Unless they are approved, their edits will linger harmlessly within their own personal sandboxes. Another strategy would be to separate the web site into multiple web projects, however this would hinder sharing content across site sections.
Managing permissions is done by inviting users and groups and assigning them roles. Alfresco's LDAP support is based on a replication model. Alfresco periodically gets updates from an external LDAP repository. This is implementation is problematic if you want to edit group memberships through the Alfresco UI because they will just be overwritten by the next update. If you want to integrate with LDAP, it is best to do all the group assignment directly in the LDAP directory or customize Alfresco to only consult the LDAP directory for authentication. There is currently no special back-up functionality other than to shut down the system and run standard MySQL and file system back-up. This does not pose a problem for most Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 117
Product Evaluations
companies since the de-coupled delivery tier would remain operational. However, for global companies working on the same Alfresco instance, having a daily maintenance outage would not be acceptable. There is talk within the Alfresco team about implementing a live backup mechanism, perhaps, using the replication functionality. No doubt some customers are probably experimenting with this approach right now.
Presentation
Alfresco gives you several options when it comes to presentation. Presentation templates written in FreeMarker or XSL are registered with a content type are executed whenever an asset of that type is saved. Each content type can have multiple presentation templates that each make a "rendition." For example, one template could make a detailed view while another template could make a view to be used in a list of assets; this is good for static delivery. For dynamic delivery, Alfresco allows you to build a Java web application in the web project (WEB-INF directory and all) and Alfresco will serve as a container for that web application to run in. For preview, Alfresco advertises that it can "virtualize" any "well behaved" Java web application using any web application framework. Production environments are not "virtualized." They are real application servers running on production hardware. Using Alfresco's new deployment user interface, code and content deployment can be separated. The pioneers that built the first Alfresco powered web sites went with a static HTML deploy model where rendered HTML files were deployed to a simple web server. This model is particularly appropriate for sites that have been statically imported into Alfresco. Other models include structured publishing of XML files or publishing a whole web application content, code and all. The verdict is still out as to whether to use Alfresco as a source control system for a delivery web application.
The simplest delivery model is static deployment where static HTML files are pushed over to a simple web server. In terms of presentation tier functionality, Alfresco doesn't offer much out of the box - especially in the world of Web 2.0. Despite the company's obvious interest in Web 2.0 (John Newton is an excellent blogger and frequently speaks at conferences about Web 2.0), and marketing Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 118
Product Evaluations
rhetoric, a quick browse through the Alfresco corporate web site reveals a strategy of relying on "best of breed" technologies rather than the ECM vision of one tool to consolidate different content management functionality. Alfresco uses phpBB for forums, GForge for community collaboration, MediaWiki for documentation, Baynote for search, and executives use Wordpress for blogs. Interestingly, there is a recipe on the Wiki for creating a Facebook application that reads from an Alfresco repository. While the marketing brochure web site may be managed in Alfresco WCM, the only appearance of the Alfresco product on the alfresco.com domain is for downloading PDF documentation and for the partner extranet that contains PowerPoint presentations and other documents in Microsoft Office formats. Most Alfresco WCM customers tend to use Alfresco to publish into custom presentation tiers. Integrations with technologies like LifeRay Portal [http://wwwl.liferay.com] and even OpenCms have been successful. Alfresco, with its robust repository and open architecture, fits nicely behind presentation tiers. Now with the deployment options available in version 2.2 and Web Scripts, these architectures are even more promising.
Product Evaluations
and half that for each CPU on the delivery tier. This is roughly comparable to MS SharePoint. Customers not wishing to renew their support and maintenance contracts must downgrade to the Community Edition. Support is not included in this fee and runs an extra $12,000. The documentation and support forums are hit or miss and Enterprise customers report that paid support is not much better. Munwar Shariff's book Alfresco, Enterprise Content Management Implementation is a useful introduction to the user interface, the architecture, and its customization points. However, it does not cover advanced topics and was written before the WCM and the AVM were available. The best resource is the wiki, but the articles are not as thorough as more formal product documentation would be. There are a few articles on best practices, but not nearly enough. Alfresco delivers training at its offices near London and through partners elsewhere. Customers report that the training is useful. At a recent user group, there was a general sentiment of frustration that Alfresco was being too aggressive adding new sophisticated features rather than refining and documenting the pre-existing basic features. The project moves fast and although the wiki changes daily, it does not keep up with the new ideas and initiatives that the Alfresco team is working on. For example, one customer built the equivalent of web scripts only to find that it was being added to the product. Perhaps a more significant example is the Web Client that could really use some sustained re-design and re-factoring to be a useful business application. These rapid and unpredictable movements may be for competitive reasons, but to the outsider it looks like attention deficit disorder. With some digging, you will find a lot of information on the wiki. However, there is a poor signal to noise ratio. The best bet for getting the most out of Alfresco is to go through a systems integrator who may have an inside line on the product. Alfresco operates a network of SI partners. The network is tiered (Platinum, Gold, and so on) and based on company size and financial (and other) commitments made by the systems integrators to Alfresco - not by the amount of Alfresco work that the systems integrators do. The best way to evaluate SI partners is to look on the forums. The good SIs are the ones that are answering the questions and publishing modules on the Forge.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Conclusion
Table 3.13. Alfresco 2.2 Summary
Category Contributor Navigation Score Explanation While the hierarchically organized content repository can handle large volumes of content, the Web Client is not optimized for managing web content. It is so weak that customers prefer using the CIFS interface to navigate the repository as a simple file system. Structured content types can be through forms edited generated through the Chiba XForms engine. There is less control over the form generation than in Hippo. Developers have the option of using an external source code control system or using the repository for version management. Virtualization is useful for spot testing code. New deployment functionality makes it easier to deploy code and content to the delivery tier. Alfresco is very clear that its customization layer and its licensing prevents users from re-compiling any of the core code. Customization of Alfresco is done through writing presentation templates, developing modules (AMPs), and adding jars that override default behavior. The Spring IoC control framework allows you to wire in code. Alfresco does not come with its own delivery tier. Developers can use its Freemarker or XSLT engine to transform XML content when it is saved for static content delivery. Most customers build their own dynamic delivery tiers that either read XML deployed to a file system or from the Alfresco repository. The PHP and REST APIs are also useful for building dynamic delivery tiers. Alfresco also allows you to virtualize and deploy your presentation tier code. Alfresco is a Java programmer's dream. It uses all the technologies that a developer either knows or wants to learn. Sometimes communication is open and candid, other times it is more marketing hype. To get the straight dope, work with a good systems integrator, read the wiki, and get to know other customers. Alfresco Enterprise Content Management by Munwar Shariff. Useful as an introduction to the platform but does not cover WCM or many advanced configuration and extension topics. The formal documentation has adequate coverage of some of the primary topics. The wiki has a lot of information but it is not particularly well organized. Better search would tie everything together. Online forum is monitored by Alfresco staff. Still, responsiveness is just average. Below Average; Average; Above Average; Exceptional.
Structured Content
Configuration Management
Books
Online Doc
User Forums
Key:
Nonexistent;
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Alfresco is an ideal platform if you really want to build your own CMS but don't trust yourself to get versioning, deployment, and workflow right the first time. The strength of this product is in the architecture, the APIs, and the repository; certainly not in the user interface. It provides a higher starting point than a naked JCR implementation like JackRabbit, but comes at the cost of some lock-in. One could think of Alfresco as the Zope [http://zope.org] of the Java world an elegant technology waiting for a nice UI (although Alfresco is more aesthetically pleasing than the Zope Management Interface). In the case of Zope, many companies built elaborate custom systems on the platform; later, Plone came along and became the Zope application for mainstream content management. Because of Alfresco's licensing and release model, it is doubtful that a third party Plone-like application will appear, and the Alfresco team has not shown the interest or commitment to build one of its own. However, since Java is a much more mainstream technology than Python, it is not clear that Alfresco needs to be more than a great development framework. Alfresco is seeing an impressive amount of traction: just six months after releasing what many claim as an Alpha or Beta quality WCM product, they have several big-name customers going live. In content management circles, Alfresco has built name recognition that preexisting open source products - and many commercial products - will never achieve. As an open source product company, Alfresco is continuing to evolve. It is becoming more actively engaged in the community and has some real evangelists among customers and systems integrators. Alfresco could probably stand to hire more people with open source backgrounds to check commercial software company instincts which have a tendency toward closed internal communication and processed marketing messaging.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Project Overview
Table 3.14. Hippo CMS Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: Geography: Common Uses: Sample Customers: http://www.hippocms.org 2000 6.05.02 since December 2007. Commercial: support based. Apache 2.0 Hippo B.V is headquartered in The Netherlands. The install base is currently limited to Europe. Managing large web sites such as media and publishing sites. ABN Amro VNU/Incisive [http://www.vnunet.com/] uses Hippo for all of its publications. The Dutch Ministry of Finance [http://www.minfin.nl/nl/home] web site is running on Hippo. Frameworks and Components: Apache Avalon, Apache Batik, Apache Cocoon, Apache FOP, Apache Geronimo, Apache Lucene, Apache Slide, Excalibur Fortress, Hypersonic DB, Jetty, Jgroups, OpenJMS, OSWorkflow, Spring WebDAV, DASL, LDAP Jetty (default), Tomcat (also commonly used), JBoss, Weblogic, Websphere Hypersonic (default). MySQL, Postgres, Oracle, and Microsoft SQL Server
History
The Hippo CMS project and its Apache 2.0 licensed code repository received visibility with the initiation of the HippoCMS.org community web site in 2005. This, after five years of being used for custom implementations by Dutch content management software company Hippo B.V. During that time, the technology was known within the Apache Cocoon community but had very little visibility with the general public. This changed with the release of version 6.02.00 of the platform and the 1.0 release of Hippo Repository that is based on Jakarta Slide; prior to that release, Hippo sat on top of the XML database X-Hive (recently acquired by EMC). Since moving to a public community model, Hippo B.V. has shifted from a consulting/software company to a pure software company offering Hippo CMS along with closely aligned Portal and Document Management products. Hippo has an impressive client list in Europe. One of the premier customers is Incisive Media, which publishes several web sites on the platform including VNUNet.com and CRN UK. Hippo has not been sited in North America but the company is exploring relationships with North American integration partners and leveraging its relationships within the Apache community.
Architecture
Like Daisy, Hippo has a componentized architecture where the repository server is separate from the management application. Hippo takes the de-coupling one step further by pulling Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 124
Product Evaluations
delivery services into a separate tier that can be easily replaced with another delivery application. This separation of concerns makes Hippo attractive to large "architecturally pure" applications, such as large scale digital publishing. The management tier is built on top of Apache Cocoon and most Hippo implementations use Cocoon in the delivery tier, although that is not a requirement. Hippo offers a generic Java repository client that would allow any Java web application framework to access Hippo Repository. The standards-based repository would also be open to other technology platforms such as PHP or Ruby, although no known sites are configured this way. The use of Cocoon on the front end does make sense, though, because the Hippo architecture is so XML centric and an affinity toward Cocoon is what attracts many developers to the Hippo platform in the first place. However, those who are intimidated by Cocoon will be interested to know that Hippo is exploring alternative web application frameworks for the next major release (called Hippo ECM 1.0), which bundles a new repository based on Apache JackRabbit and a rewrite of the Apache Cocoon-based management application in Apache Wicket.
Hippo has a three-tiered architecture consisting of a management tier, an open repository, and a de-coupled delivery tier. Today, Hippo Repository is based on Jakarta Slide [http://jakarta.apache.org/slide/] and therefore supports the WebDAV [http://www.webdav.org/] standard: an extension of HTTP standard that allows users to collaboratively edit and manage files on remote web servers (Slide is the reference implementation of WebDAV. See Glossary for WebDAV). The Hippo team contributed a considerable number of improvements to the Slide project and also built on top of it to make a scalable and full featured repository. In addition to the the WebDAV standard, the Hippo repository supports some custom methods such as "Replace" (used for a search and replace feature) and a "Facets" method (used for a faceted browsing feature that is supported by the query syntax, but not exposed through the user interface). While Hippo Repository provides versioning, search, and access control services, it is less functionally rich than Daisy's and Alfresco's repositories. At the end of the day, every piece of content in the Hippo Repository is just a file. Slide's WebDAV implementation does support the notion of "properties" that can be used for metadata. To use this feature, you need to write custom "extractors" that parse through the file, grab the appropriate data, and store them as properties. Doing so allows you to query the repository using the DAV Searching and Locating (DASL) standard [http://www.webdav.org/ dasl/]. Out of the box, Slide comes with extractors to grab the text out of MS Word, Powerpoint, Excel, and PDF. Hippo comes with some useful base extractors that can easily be configured to meet most basic needs. For example, the XMLDatePropertyExtractor can be configured to grab a date element out of an XML document.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
DASL is an XML syntax that follows the basic form of a SQL query with clauses that define what fields to return ("select"), what types to look in ("from"), the filter conditions ("where"), ordering rules ("order by") and number of records to return (limit). As query languages go, DASL is somewhat esoteric and unknown when compared to SQL, XQuery, and the new JCR Query syntax. Still, it is a standard and that counts for something. Slide implements DASL natively, but Hippo uses a more powerful Lucene based implementation that is faster and can search a broader collection of content because it searches an index rather than opening each XML asset. Hippo Repository uses Slide's pluggable persistence model. The default configuration uses a simple file system. Hippo's developer documentation describes MySQL, Oracle, Microsoft SQL Server, and PostgreSQL configurations. Hippo Repository supports replication that mirrors content to other repository instances. Based on mailing list traffic, this configuration is fairly common in the field. In particular, replication is often used to push content from the published area of the repository to a collection of read-only Hippo Repository instances on the delivery tier. There is also an option to create a cluster of repositories reading from the same database. The next major release of the Hippo Repository (2.0, which will be part of the ECM 1.0 product) is going to be a total rewrite based on Apache JackRabbit [http:// jackrabbit.apache.org]. JackRabbit is a more capable content repository and able to represent content as hierarchical structured nodes. The Hippo team is openly discussing interesting ways to leverage and extend the platform effectively. One area of deep inquiry is how the repository will be organized. The JCR spec is inherently hierarchical, but the Hippo team wants to enable faceted organization of content where assets can appear under multiple different collections. The DASL query syntax will be replaced by the JCR's own query syntax, which is already starting to enjoy adoption by companies that don't even support the rest of the JCR standard. For example, the commercial WCM product Percussion Rhythmyx uses JCR query syntax to retrieve content from its non-JCR repository. Hippo has very good support for structured content types. Edit forms are auto-generated using Cocoon's CForm technology (See Glossary for CForms) based on a content definition described in an .XSD file, a layout.xml file that describes the selection and organization of form controls (Hippo provides a comprehensive list of form field widgets) and rules for showing them, and a business_logic.xml file that contains validation rules that can be defined as assert statements or regexp expressions. The business logic syntax also has some built-in rules that can be applied such as that the field is required, that the entered value needs to be a valid email address, or upper and lower character limits. Each content type can have its own style sheet (CSS) to control layout and styling. There is also a properties file that defines properties stored outside of the asset as properties that can used in various queries in the repository and shown in the user interface. By editing these files, a developer can build complex and attractive content entry forms without needing to understand Cocoon's CForms, generally considered difficult to master. The one thing missing is AJAX-based form controls. Another benefit of this abstraction layer is that it will help customers migrate to ECM 1.0 without needing to port logic written in XSL or other Cocoon code. AJAX controls are slowly working their way into the platform, but will not be fully exploited until the ECM 1.0 release that will also benefit from the JCR's fine grain editing model. The ability to update individual nodes within an XML document will enable features like micro-edits (where you can edit document one field at a time) for a richer, more dynamic user interface.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Diagram:
Hippo
forms
generation.
Source:
Hippo
Typical Hippo implementations publish to multiple front end instances (usually based on Cocoon). Many Hippo customers use a single instance of Hippo to publish content to several different web sites that may share content. Hippo provides a skeleton and a sample web site based on Cocoon. There is also a code sample of a simple Java class reading from the Hippo repository over WebDAV. A Java client library encapsulates communication with the repository for integration with other Java platforms. Like most Cocoon applications, Hippo web sites use elaborate caching techniques to achieve performance. The caching system, called eventcache uses JMS to receive notifications of what cached objects to invalidate. Hippo Repository bundles the open source JMS server OpenJMS [http://openjms.sourceforge.net/]. Cocoon sites subscribe to this service to listen for invalidation events (delete, add, change, or move). This mechanism has been implemented in the Java adaptor as well so that non-Cocoon Java presentation tiers can also benefit from Hippo's caching system. The binary distributions of Hippo CMS and Hippo Repository come bundled with the Jetty servlet container that is executed from within an Excalibur Fortress container. This a common pattern among Cocoon technology projects. Fortress is part of the Apache Avalon Framework project. Avalon as a parent project is officially closed after a couple years of drift. The sub projects, like Excalibur, live on and are used by other Apache technologies like Cocoon although not happily. Projects that have the resources and initiative are migrating to trendier technologies like Spring. Nevertheless, Fortress does what it needs to do for most Hippo Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 127
Product Evaluations
implementations. Others have successfully deployed the framework on top of Tomcat, but this configuration requires the addition of a JMS provider like ActiveMQ.
Content Contribution
The Hippo content model recognizes two high level content types: Assets and Documents. Documents are structured content types (XML documents, really) and Assets are binary files. Assets are uploaded into the system using the web based interface. Documents are edited with web forms. It is possible (although not advised) to upload content directly into the repository through WebDAV. Doing so will prevent Hippo CMS from executing business rules, and other operations. Primitive WebDAV clients, like Windows Explorer, tend to trample properties and other advanced WebDAV data structures. The Hippo team recommends using the WebDAVPilot plugin for Eclipse when administering the repository, even though it is no longer being developed or supported. The Hippo Repository is organized in a hierarchical directory structure that can be used to drive the navigation of the site or for internal purposes only. Many Hippo sites use keywords and taxonomy to drive navigation. The "list item" form widget can read an XML document that describes a node tree representing a hierarchical taxonomy managed by either Hippo or externally. However, customizing the search and browsing features in the management interface to traverse the taxonomy rather than the folder structure is less trivial.
The "list item preview" widget is useful for selecting hierarchical taxonomy terms for a document. Hippo's decoupled architecture creates a clear distinction between the back end and the front end. All of the content contribution and management is done in the back end management interface. However, some customers have implemented a "surf to edit" feature by placing an edit button on every page of the delivery tier that points to the appropriate page of the management interface. The management interface makes heavy use of frames and IFrames rather than AJAX technology for an interactive feel. Browser support is limited to Firefox and IE, but most business users find the interface to be clean and simple. It is organized into tabs called "perspectives," which most techies will recognize from the Eclipse IDE. Business Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 128
Product Evaluations
users tend to understand the concept of tabs well enough so they don't have to worry about understanding what a "perspective" is. What makes the Hippo perspectives interesting is that they are state-full - that is, when you click on another tab, the UI remembers what you were doing on the tab that you left. This is especially useful in the case of the "Editing" perspective that is used when editing a piece of content (Document or Asset). It doesn't lose the user's changes when he navigates to other perspectives in the UI. The other perspectives are: Dashboard, Search, Documents, and Assets. More aspects can be added to manage other functions or view data from other systems. The Dashboard perspective is used for various administrative tasks as well as for displaying custom reports. Developing reports requires considerable amount of skill and experience. Out of the box comes a "to do" list that lists all content submitted for publishing. The temptation to add notifications and summaries to this perspective should be balanced for the efficiency of a lightweight start-up page. The Dashboard is also where users edit their profile and, if authorized, manage users, groups, and permissions. The on-board search engine can be accessed through a persistent search box in the upper right corner of the page and the Search perspective that provides an advanced search interface and the results. Out of the box, the Search interface presents options to search by boolean expressions, workflow state, date, location, and user. Out of the box, there is no ability to do fielded searches (as in the keyword field contains "food") or restrict by content type (you can only choose among folders, documents, or assets). There is, however, a handy search and replace feature that allows a user to replace text inside documents that were returned by the search. Adequate warnings - and the fact that it only changes returned documents - make it adequately safe for non-technical users. Search and replace queries cannot be restricted to specific elements within the XML documents and does not work on metadata properties outside of the XML documents.
The Hippo search perspective exposes advanced search functionality. The search functionality is very extensible by adding extractors to the repository that pull content attributes into indexed properties, and editing the appropriate XML files that control the search form and the layout and behavior of the results. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 129
Product Evaluations
Content contributors spend most of their time working in the documents tab where they can navigate through folder structures of content and select items to edit. The Documents perspective has a context sensitive right column that shows functions and actions available to the user based on the selected asset (folder or document), its publishing state, and the user's permissions. A properties box on the lower right can be configured to edit metadata properties on assets. By default, this only controls the "caption" attribute which is usually identical to the file name (without the extension).
The "Documents" perspective allows users to navigate through folders of structured content and surfaces functions and actions that the user can execute on the selected documents. Documents are listed with basic information including the name of the document (also called the "caption"), the size, document type, modified date, and a workflow status that is communicated through icons. Although not obviously apparent, the grid can be sorted by clicking on the column headings. Clicking on the "order" column enables sorting arrows to move items up and down on the list and control the order in which the assets are listed on the presentation tier. Developers can customize the browsing interface by editing XML files, but these configurations are not well documented and a working knowledge of Cocoon is required. One of Hippo's more advanced features is its link management functionality. When a user links to another document or asset, even within a WYSIWYG text area, the reference is stored and managed. When a Document or Asset is highlighted in the browsing interfaces, a panel on the right indicates what documents are linked to it. When a user attempts to delete or move the document or asset, he is warned. By default, there is no functionality to update references when assets are moved. However, the fact that the relationships are managed is a good start for building in re-referencing options into the warning dialog. Logic could also be added and triggered during workflow transitions to help prevent broken links. Links are discovered by a process that is run periodically on the repository - but not when the asset is saved. On the Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 130
Product Evaluations
negative side, the link management is not instantaneous so references can be broken before Hippo is aware of them. However, this approach has the advantage of evaluating content imported into the system and edited by automated processes such as workflow actions. When a user selects to edit a document, he is automatically placed into the "Edit" perspective where he is presented with the appropriate edit form. As mentioned earlier, Hippo's form building functionality is particularly powerful with wide selection of form controls and widgets, including pop-ups for calendars and various browsing and pickers. The default WYSIWYG editor is the basic but stable Xinha [http://xinha.webfactional.com/] Javascript text area control. The spell check feature is disabled by default; turning on spell check requires a simple change to one of the Javascript files and installing GNU Aspell on the server (Aspell can be easily installed by most Linux package managers.) Some Hippo installations use the Xopus XML WYSIWYG editor to edit the whole XML asset at once. Xopus is known to be quirky and some users complain of its performance - especially on underpowered workstations. However, some Hippo implementations use Xopus for its ability to semantically tag text within text areas. Doing so enables the presentation tier to execute advanced display rules on the content. This means that, for example, a business periodical may tag companies mentioned in the article and have the presentation tier list all mentioned companies on the bottom of the article, or link the company name to a search of all the articles that mention that company. Some Hippo implementations use locally installed XML editors like XMetaL and Arbortext, and then upload the files.
The Xopus Editor enables users to leverage advanced XML operations. Hippo's versioning support comes from Slide (the "V" in WebDAV stands for "versioning"). By default, versions are only created when a user clicks the "save draft" link. However, many Hippo implementations are configured to save a draft with every workflow transition. Prior versions are accessible from the "History" link on the actions column that launches a pop-up that shows all the saved versions and allows a user to revert to a prior version. Reversion is Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 131
Product Evaluations
done by creating a new current version identical to the one reverted to. This reversion strategy keeps the full history of all versions and is easily reversed. There is no auditing functionality that records every access or update of a document. Only the last modified date and modified user is shown. Assets, or binary files, can be added in the Assets perspective, where they are available for re-use, or within the context of a document, where they can be uploaded directly from the WYSIWYG editor or through the image picker control. Like Documents, links to Assets are managed by Hippo and users are warned when they try to move or delete them. Linked documents are also shown on the right side "action" column when an image is selected. This view answers a common need to find out where on the site an image is used in order to manage licensing and copyrights on images. Assets are not versioned and have no workflow capabilities. While not enabled by default, Hippo has a "Trash Bin" functionality that moves "deleted" content to a different area of the repository, rather than truly deleting them. If this feature is enabled, there should be some archiving system to periodically clean out the trash bin.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Hippo's access control system builds off of the basic permission set provided by Slide. Permissions are closely tied to workflow. New permissions can be created to restrict specific workflow actions to a certain sub tree of the content folder hierarchy. Out of the box, Hippo supports a basic single approval workflow model. Workflow states are communicated through a simple set of icons (X is unpublished, checkmark is published, checkmark with a star means the asset has been updated since publication). Users can request publication or, if granted sufficient permissions, can publish the document directly. The publication interface gives the user the option of publishing immediately or at some later date. The same interface also gives an option of setting an archival date when the document will be removed from the public web site. When a user requests publication, a user with review permissions gets a review task on his "to do" list.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Users with approver permissions get "to do" tasks when a user requests publication. Permissions are managed within the Dashboard perspective. Other than that, there is very little else that can be done in terms of through the web configuration; Just about everything else requires a developer. On the plus side, this has a positive effect of "locking down" the system and closely managing change. However, it also means that most configuration tasks require developer intervention. More complex workflows can be configured through Hippo's embedded workflow engine: OSWorkflow [http://www.opensymphony.com/osworkflow/]. OSWorkflow is driven by XML based "workflow descriptors." Open Symphony does offer a Graphical workflow designer like JBoss's process designer but it is not considered production ready and most developers write the XML descriptors by hand. The workflow component uses a database to manage state information and the Apache Quartz to schedule tasks and jobs. New workflows are developed as "workflow projects" that contain XML descriptors and Java code. Workflows are assigned by content type in the content type definition. The publishing event is handled internally by copying content from the working area of the repository to a region that the delivery tier reads from. In a distributed, high availability model, this region of the repository is replicated to other repositories that the presentation tier reads from. Staging presentation environments used for preview read directly from the working area of the repository. Unlike Documents, Assets are not subject to Hippo's workflow and are perpetually in a "published" state. This is convenient as it eliminates the common issue of deciding what to do when publishing a Document with links to unpublished images. For back-up, there are two options. For a hot back-up, the repository replication service can mirror the live repository to a read only copy. There is also the option to do a dump of the underlying MySQL database. This may be more reliable, but it requires taking the system offline. If the delivery tier reads directly off of this repository, taking it down may not be an option. However, high availability sites will run the delivery tier off of a replica of the master repository, not the master repository itself.
Presentation
Hippo provides some best practices and tools for building delivery tiers, but this is generally considered outside of the core product. One of the more useful tools is the Cocoon Project Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 134
Product Evaluations
Wizard, which builds a generic Cocoon site that can be used as a starting place. In addition to building out the general directory structure and configuration files for the Cocoon site, the Cocoon Project Wizard also builds out the navigational menus based on the folder structure in your repository. After using the Cocoon Project Wizard and reviewing some basic samples in the Hippo documentation, a developer is best served by moving over to the Apache Cocoon web site and third party technology books to get up to speed on Cocoon. The Apache Cocoon site was just re-launched and is better organized. The professionally published books are good if somewhat behind the latest version (Cocoon 2.2) and the version that Hippo uses (2.1); the latest English language books are on Cocoon 2.0. Hippo's other product, Hippo Portal, is also a viable option for the delivery tier. Based on Apache Jetspeed-2 [http://portals.apache.org/jetspeed-2/], Hippo Portal talks to the repository via the Java client adaptor. Jetspeed-2 is generally regarded as having higher performance than the other big open source portal product, Liferay [http://www.liferay.com/], but it doesn't have all the AJAX bells and whistles and comes with fewer portlets out-of-the-box. The Jahia project (evaluated next) also uses Jetspeed-2. The Hippo team is also actively working with other Java frameworks. For example, there is a JSF (See Glossary for JSF) based repository browser prototype available for download. Hippo also encourages customers to use the Java client library to leverage Hippo's content services in any Java application. One of Hippo's larger customers is exploring the idea of breaking out of the Java stack and building a delivery tier on Ruby on Rails.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Conclusion
Table 3.15. Hippo 6.05.02 Summary
Category Contributor Navigation Score Explanation The tree based navigation is used by several customers to manage very large news oriented web sites. The search engine and interface are also effective. Bulk search and replace is another useful feature; so is dependency management. The forms engine framework gives powerful control over form layout and user input validation. All code is managed on the file system. Hippo offers plugins to copy content from one repository instance to another. Hippo's model of wrapping the Hippo core in a custom application effectively keeps Hippo and customer code separate. The Java client library allows other Java applications to easily interact with the repository. Extending management application requires getting into Cocoon pipelines. ECM 1.0 is supposed to have a pluggable architecture. Hippo offers complete freedom on the delivery tier. Delivery tiers written in Java have the advantage of a Java client library that encapsulates connecting to the repository and handles cache invalidation notifications. Cocoon is no longer popular as a general purpose web application framework. The Hippo team is moving off of it. Everything else is XML and HTTP which are ubiquitous. Hippo B.V. is very open about its vision and progress. None The wiki is has a lot of useful information. The mailing list is very responsive. Most replies come from Hippo B.V. staff. Hippo recently started to archive its mailing list on Nabble. It will take a while for the archive to be a useful search tool. In the meantime, use Google against the Mailman archive page (site:http://lists.hippo.nl/pipermail/hippocms-dev/ ). Below Average; Average; Above Average; Exceptional.
Widely Used Technologies Project Transparency Books Online Documentation User Forums
Key:
Nonexistent;
As a platform for managing structured content, Hippo has a lot to offer. The product has a highly configurable, feature-rich user interface that has demonstrated its effectiveness in large content volume scenarios. Versioning, dependency management, and workflow are all nicely handled. The flexibility on the delivery tier allows architects to select the most appropriate technology to support desired visitor facing functionality. As a technology stack, Hippo's move from Slide and Cocoon to JackRabbit and Wicket introduces a certain degree of risk. It is too early to start using ECM 1.0 and there is no way Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 137
Product Evaluations
to tell how hard it will be to migrate to ECM 1.0 when it is ready. While JackRabbit is now a top level project and has proven itself to be stable, reliable, and fast, Wicket is still pretty new. Not many Java developers have used it although there are books on the framework and, like many new things, Wicket has generated a lot of buzz. Mitigating the risk of the migration is that Hippo B.V. has several high profile clients on the platform and Hippo's success as a company depends on keeping them happy. However, one never knows how these major ports will turn out and there are plenty of examples of success and failure. Moving off of the Cocoon framework was a good decision and will bring the many benefits of a modern web application framework and allow Hippo to offer a more modular and more accessible architecture.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Project Overview
Table 3.16. Jahia Enterprise Project Overview
Web site: Project Inception: Current Version: Project Type: Licensing Options: http://www.jahia.org 1998 5.0 since September 2006 Commercial: tiered product model. Community Edition: Jahia has a limited functionality community edition licensed under a derivative of the Mozilla Public License with a "badgeware" addition. Commercial Editions: The more full-featured Standard, Professional, and Enterprise Editions have what Jahia calls a "Sustainable Source" license that is essentially a visible source commercial software license that provides rebates for code contributions that meet specific requirements. Geography: Jahia Ltd. is headquartered in Switzerland with a U.S. regional office in Washington D.C. The install base is concentrated in Europe. Large media sites, corporate intranets, corporate web sites Vodaphone Live (Germany) [http://www.vodafone.de/ vodafonelive.html] runs on Jahia. The Polytechnic School of Lausanne [http://www.unil.ch/] runs 500 web sites on Jahia. Generali Proximit [http://www.generali-proximite.fr/] runs on Jahia. Frameworks and Components: Integration Standards Java Support: Application Servers: Databases: Apache Pluto, Apache Jetspeed2, Apache Lucene, Apache POI, EHCache, FCKeditor, Hibernate, OpenJMS, Spring Framework, Struts, Zimbra AJAX libraries JSR 168, JSR 170 (partial: only for importing and exporting content via XML), LDAP, SOAP style API 1.4, 1.5 Tomcat, JBoss, WebSphere (6.1), Weblogic (8 SP5) HyperSonic, MySQL, MS SQL Server, Oracle, PostgreSQL
History
The Jahia product was originally built 1998 by a venture funded, Swiss-based company (called Xo3) and sold as a closed-source proprietary product. Jahia was designed to address the overlap and integration between portals and web content management. While the two product categories (Portal and WCM) have been converging from their relative starting points, Jahia approached the problem from the middle by building off of components from both sides of the spectrum. After a management buyout in 2002, the product was re-released under an open source strategy. Since then, Jahia has benefited from increased visibility and also the rapid distribution effect of open source. Free downloads lower the hurdle for companies to try their product and reduce the cost of sales. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 140
Product Evaluations
Jahia Ltd., which owns and maintains the product, has kept focus on being a software company and leaves the services business to integration partners. All development of the core platform is done by the Jahia team of 30 full time staff members, 25 of whom are engineers or architects. The Jahia project has no outside committers, although Jahia Ltd. regularly hires programmers (either on a contract or permanent basis) that are committers on some of the projects that Jahia builds off of. Jahia International staff are also actively involved in other open source projects. For example, Jahia's co-CTOs (Serge Huber and Thomas Draier) are committers on the Apache JackRabbit and Apache Slide projects, respectively. Jahia Solutions Group has been successful in working with integration consultancies in Europe and have a number of partners in their alliance program including Cap Gemini and Fujitsu. Jahia has been implemented for some large, high profile web sites. One of their leading clients is Vodaphone Live in Germany. Jahia has recently started to build momentum in the North America after establishing a regional sales office and a couple of R&D centers. Recent U.S. wins include United Nations, Abercrombie and Fitch, Virgin America, and Garmin. There has been considerable discussion as to just how "open source" Jahia actually is given that its flagship products (Jahia Standard, Professional, and Enterprise Editions) carry an essentially commercial software license and the Community Edition is released under a nonOSI certified Jahia Common Development and Distribution License (JCDDL). The JCDDL is based on Sun's Common Development and Distribution License (CDDL) derived from the Mozilla Public License and approved by the OSI in January 2005. The Mozilla license is fairly permissive about re-distributing bundled works and Sun's version has some modifications to make it more patent friendly. However, the JCDDL adds a requirement to display a "Powered by Jahia" badge on every page of the sites. Software distributed with this description is often derisively called "badgeware" and may not be acceptable for external web sites that do not want to advertise for Jahia. The "powered by" logo may not be a problem for less visible sites like a corporate intranet. However, unlike Magnolia and Alfresco, the commercial support packages are available for the Jahia Community Edition. With its tarnished open source pedigree, why is Jahia covered in this report? There are a few reasons for inclusion. First, the licensing may change. Second, two of the other commercial open source Java WCM platforms covered in this report (Alfresco and Magnolia) require (or at least strongly encourage) the use of the commercially licensed versions of their platform. Third, customers use open source for different reasons. While this licensing model may turn away an open source purist, using Jahia still provides a couple of open source benefits such as its use of open source libraries and components and shipping the source code with the commercial versions of the product. The "Jahia Sustainable Source License" (JSSL) that the commercial versions are sold under is a commercial license with some interesting nuances. First of all, the source is viewable by anyone, not just Jahia customers. This is more transparent than the typical small commercial independent software vendor and consultingware practice of making source code available to customers to reduce the risk of adopting a proprietary technology. Larger software companies maintain code escrow programs where a customer can access source code (for a price) in the event that the software company folds or ceases to support the product. What is most interesting about Jahia's commercial license is that they give credit off the license fees to customers that underwrite extensions or enhancement of the product. The contributions program is closely controlled. In order to qualify for a credit, the enhancement has to be on the Jahia roadmap and it has to be implemented by the Jahia team itself or by another certified Jahia partner working closely with the Jahia engineering team. These policies help Jahia set the direction of the application, control the quality of the contributions, and avoid non-compatible forking of the software. The benefit to the underwriting company (beyond the Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 141
Product Evaluations
licensing discount) is getting more of a say in the details of the design and implementation. Having the desired feature integrated in the software also reduces maintenance risk for the customer.
Architecture
Jahia is built on an open source software stack that includes Hibernate and a large collection of Apache projects: Struts, Slide, Jetspeed-2, Pluto, and Lucene. Struts provides a strong MVC framework for the presentation tier that combines content management, presentation, and system administration in one user interface for an in-context editing and management experience. Presentation templates are written in standard JSP with support for JSTL, and JEXL and Struts EL expression languages. Portal functionality and support for the Java portlet specification (See Glossary for JSR168) comes from the Apache Portals projects Jetspeed2 and Pluto. Pluto gives Jahia the ability to embed any third party portlet that meets the JSR 168 specification. The portal layouts and profile management functionality come from Jetspeed2. The overall platform is organized and managed in a collection of sub-projects: Enterprise Content Management Server, Document Management Server, Search and Indexing Server, Corporate Portal Server, Collaborative Suite, Business Process Management Server, and Cache Proxy Server. With the exception of the Proxy Server that is not available on the Standard version, all of these components come with all the various versions of the platform. The distinctions between the different versions are mainly at the discrete feature level. Jahia's use of Hibernate for database abstraction makes it compatible with most relational database management systems. Although the product ships with an embedded Hypersonic database, it is highly recommended to swap this database out with a more robust RDMS for any production instance of the application because Jahia is a database intensive application. The content repository leverages Apache Slide for WebDAV support (especially useful for Jahia's document management functionality) and an event model that can fire 30 different types of events when content is moved or edited. In an upcoming release, Jahia will replace Apache Slide with Apache JackRabbit which, in addition to maintaining WebDAV support, will make Jahia fully JCR compliant (JSR 170 and eventually JSR 283). For now, the JCR API is only supported for importing and exporting content in XML format. Architects familiar with upper tier WCM products will appreciate Jahia's multi-stage architecture: it can run on a single stand-alone server or on a clustered and tiered architecture. In the tiered architecture, different instances are designated for code development, content production and preview, and live publishing. A publishing framework pushes content from the staging environment to the production environment. For hightraffic sites and maximum availability, Jahia recommends a clustered configuration with one node dedicated to activities like publishing and indexing. In this configuration, Jahia servers broadcast the cache update requests to all the other clustered servers. Nodes can be added to the cluster without a system restart for rapid scaling to high traffic volumes. Communication between nodes in a cluster and other notifications uses multicast/UDP (user datagram protocol) managed by the embedded JGroups framework. The clustered deployments model is only available on the Enterprise Edition.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Jahia has distributed architecture consisting of multiple environments for developing code, creating and previewing content, and production publishing. Source: Jahia documentation. Interestingly, content types are defined in the same JSP templates used to display the content. While this breaks the conventional wisdom of having lean view code that is easily managed by HTML developers, there are some practical aspects of this design. First, it means that all aspects of defining and displaying content is done in one place. When you add an attribute to a content type, you usually want to edit it and display it; in a more traditional CMS, you go into the content definition system (either a config file, a database table, an administration interface, or all of the above), build a form to edit the content, then go into the view templates and add code to display the attribute. Here you do it one place (or two places very near each other). Another reason why this is appealing is that because the work is done in a JSP, nothing needs to be manually recompiled or restarted even though this is a Java based CMS. The changes are activated the next time you load the page.
Content types are defined in the JSP code creating an unconventional coupling of structure and display while providing some convenience. In larger sites, putting so much control in the JSP templates - which should be the domain of front end programmers - could lead to chaos. HTML is busy enough without the help of additional JSP tags and content definition code could accidentally changed. Adding new Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 143
Product Evaluations
attributes may become so easy that they proliferate, leading to legacy fields that no one knows how to use. This should be addressed with strong change control and governance practices. Adding an attribute should be thoroughly thought through, not an off-the-cuff decision that leaves a legacy form field that nobody knows the use of. The trend towards CSS driven layouts may mitigate the risk of this poor separation between code and HTML. HTML is getting simpler and shifting focus of HTML designers away from the JSP to the CSS file.
Content Contribution
A Jahia instance can support multiple sites that can be independent or share content with each other. When defining a site, the administrator can control which page templates, portlets, and languages will be available to content editors. Once created, a site is organized into a hierarchical structure of "container lists" (ordered collections of containers). A container (also called a "content object") is a structured content type using a content definition within the JSP as described earlier. For example, a "container list" showing a collection of Links (the "containers") to sub-pages defines a navigation bar that will structure the navigation of the web site. When a user creates a new page in this navigation bar, he chooses a template that defines a set of content containers that can contain other editable content types. Containers are made up of editable attributes. Jahia has a full set of data types including various length text fields, multi-value list, date, color, file, and portlet (for JSR 168 compliant portlets). While the administrative user interface does not have controls to set validation rules (because all form handling is done by Struts and the Apache Commons Validator), configuring validation rules is a relatively simple development task. Also, Jahia content types support field level access control.
Jahia practices an in-context management model. Jahia practices an in-context content management model. This decision is consistent with Jahia's focus on ease of use since most casual users find the in-context model more intuitive thanks to familiarity with tools such as Microsoft Word. While the browsing aspect uses the in-context model, the act of editing is done through a pop-up form. Forms are auto-generated from the content container definition. Where the in-context model tends to suffer is in the area of content reuse. Jahia addressed that limitation by adding some content picker and filter modules that allows a user to reuse content either explicitly (using the content picker to create Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 144
Product Evaluations
an alias or a linked copy of an asset or a whole branch of assets) or by query (using a filter module to define the filter that retrieves assets meeting certain conditions such as the 10 most recent press releases). The content picker module can be used to create linked satellite sub sites that share content with each other.
Users use pop-up edit forms to edit content components. Contextual menus show where a piece of content is reused and warns a user if deleting an asset would have unintended consequences on other pages. Content is locked at the container level and lock status is indicated by color coded dots (yellow for locked), which are visible by authenticated contributors as they browse the site. Assets are also locked when they are in a review state of a workflow. By default, Jahia uses FCKeditor to edit rich text fields but other editors can be plugged in and made available to users. FCKeditor is one of the more full featured WYSIWYG and is well maintained for cross platform support. FCKeditor is particularly well integrated into the Jahia editing interface with good browsing functionality for links and image references. When content is saved, Jahia records relationships created through the WYSIWYG editor and manages dependencies. Jahia has a simple form builder functionality that allows content contributors to create interactive forms as content. Data collected by the forms is available through a reporting interface. The user has the ability to select which form controls to display, whether the field is required, and available values. This functionality is useful for contact forms and simple data collection, and is pretty good for an end user tool. For managing binary assets such as documents and images, Jahia's integrated Document Management services support a WebDAV interface. Individual assets can be dropped in using Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 145
Product Evaluations
Windows Explorer or another WebDAV client. Jahia can automatically explode the contents of an uploaded zip file so these assets can be individually managed. Indexable assets are indexed when they are added or modified on a page but metadata are not addable until the file is wrapped in a container. Each virtual site has a folder structure that includes a common shared folder and private areas for individuals and groups. Each individual or group folder has private and public folders to determine access to other users. This folder based access control model is adequate for simple uses, but the need to move an asset to change permission has its limitations. All text based content is indexed by Jahia's Lucene-based, on-board search engine. The base product comes with file extractors such as Apache POI for indexing binary formats such as PDF or the Microsoft Office formats. Simple and advanced search interfaces are powerful enough for most common uses. Although there is a saved search feature, Jahia's out-of-thebox search functionality is not suitable as an ad hoc content reporting tool because there is no field level searching. However, the individual fields are indexed and exposed through Jahia's search API so this is possible with some systems integration effort.
Jahia's advanced search form allows users to define complex full text searches without the need to know a specially query syntax. Search definitions can be saved for later use. However, a user cannot restrict the search to match within a single custom field. Because of its common use in intranets, Jahia emphasizes its document management functionality and supports WebDAV management of binary files and containers that can store metadata attributes. Additional collaboration features like embedded chat and email notifications round out Jahia as a collaboration platform. More can be added by downloading them from the community site. Examples include a discussion forum, a calendaring server, and some RSS feed readers. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 146
Product Evaluations
Jahia has decent page level versioning that allows an authorized user to revert back to a previous version of an asset or restore a deleted asset. The versioning capability shows differences between two versions and also the consequences of restoring a deleted version.
Jahia's versioning system provides a "difference" view showing the changes between two versions of an asset. Jahia's localization system is built on the parallel model where each asset can have multiple translations. When you define a site, you select which languages the site will support and therefore the languages that assets can be translated into. When an asset is not translated, Jahia can optionally display the asset in its default language. The management interface of Jahia is maintained in six languages (English, French, German, Italian, Portuguese, and Spanish) and customers are able to translate the UI into other languages by adding Java resource bundles. The workflow model is simple but well integrated with the localization functionality. Assets are approved or rejected on a per language basis. Workflow state is shown with color-coded dots. Red indicates an editing state; yellow means that the asset has been submitted for approval; green means the asset has been approved and published. Assets cannot be edited when under approval. If a reviewer wants to edit an asset that has been assigned to him for review, he must reject the asset, edit it, and then approve it again.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
The Jahia workflow approval interface allows a user to approve or reject multiple assets in multiple languages. If this example site had more languages, there would be more approval columns like the British flag shown here. Workflows can be assigned by section or by individual assets and determine what users or groups get notified when the asset has been submitted for approval.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Jahia supports field level access control. While the feature is powerful and useful, the administrative interface to make these settings will confuse anyone who has not served as a Unix or Linux system administrator. A common configuration for corporate customers is LDAP integration that allows Jahia to authenticate users against any LDAP compliant directory. Jahia really only uses LDAP in read only mode for authenticating login credentials. Users only edit their Jahia managed profile information from within the Jahia UI but not their LDAP directory profile information.
Presentation
A Jahia instance can support multiple sites. Each site is defined by a set of templates, users, groups, portlets, a site key, a set of languages, and a host name. Sites can share content with one another and can also be created as derived copies. Because each site gets its own branding and access control, Jahia's multi-site functionality makes it useful for multiple departmental sites on a corporate intranet. Jahia's presentation functionality is based on the portal model. Each page is a collection of "containers," which are actual portlets or behave like portlets. Using the Apache Pluto portlet container, a Jahia page can host any JSR 168 compliant portlet complete with edit and public views. Jahia also ships with a few native portlets including a web-clipping portlet that can be used to embed any remote web based content or application. This technology works as a proxy to grab pages from the target application and return them within the Jahia page. Transactional applications can also be integrated in this way although the additional layer of rendering may slow performance. The Jahia team is currently working on an Ajax based Netvibes/iGoogle-inspired personal portal interface using the AJAX Google Web Toolkit libraries, but this feature is still in Beta. Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 149
Product Evaluations
Users can create their own portal pages and organize components by dragging and dropping them into position.
Jahia is working on a Netvibes-inspired personal portal framework (currently in Beta). The use of different engines and portlets makes Jahia a powerful and flexible web application development platform. Jahia includes engines like Search, Advanced Search, Sitemap, XML Import and Export, and Workflow. Developers can add their own engines as well as develop applications that can run as JSR 168 compliant portlets. The URLs of a Jahia site are based on its underlying MVC (Struts) architecture. At the start of the URL path is the engine name (such as jahia) that corresponds to a Java class named jahia_Engine.java. Then, like with most portals, URLs get ugly. A typical URL may look like / jahia/Jahia, site/mySite/pid/10. Jahia would interpret this URL as using the Jahia engine, the "mySite" site, and page number 10. Of course, what page 10 is about is anyone's guess. While human readable, search engine friendly URLs are not supported, Apache mod_rewrite or a Java URL rewrite filter can turn this URL into something like /Jahia/mySite/page_10.html better, but not necessarily descriptive. Jahia templates are written in standard JSP with tag libraries and scriplets (when necessary). This lowers the learning curve for most Java developers and provides lots of flexibility, but does not enforce the practice of keeping Java logic out of display code (as technologies like Velocity, FreeMarker, and XSLT do). Because of this, the separation of logic and display must be enforced by voluntarily accepted coding standards and code reviews. Otherwise, Jahia presentation templates could quickly become complex and unwieldy. Jahia uses Java's standard resource bundle framework for localizing, labels, messages, image references, and other localized strings so they are not hardcoded into the template. JSP code is packaged into a .jar file and deployed to the Jahia environment. Of all of the products evaluated in this report, Jahia has the richest set of community and collaboration functionality. Part of this is because of its common use as a platform for building intranets. Another advantage is its portal based architecture that provides a framework for Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 150
Product Evaluations
profile management and a container for building and deploying interactive applications. The portlet exchange could turn into a valuable resource for Jahia customers and provide energy similar to what is enjoyed by the Drupal and Plone projects. However, the non-open source business model will hinder Jahia's popular appeal with non-profits and community oriented sites. Jahia's large corporate customers may be less able to share their intellectual property with potential competitors. Jahia hosts a developer exchange site where the community can submit portlets, templates, and other code for the community to share. Many submissions are from Jahia Ltd. although there are some third party contributors represented in the catalog, as well. The portlets submitted by Jahia are published under a true open source license (Sun's Common Development and Distribution License [http://www.sun.com/cddl/]). A status rating indicates the contributors assessment of whether the contribution is stable, unstable, or Beta quality. Jahia's multi-layered caching system is quite powerful. Pages can be cached for different users and groups and there is support for the ESI (JSR128) [http://www.akamai.com/html/support/ esi.html] standard for caching page fragments. Underneath the page layer, data caching services are provided by the Hibernate object relational mapping layer. The largest Jahia site gets 500 hits per second on each of its three clustered nodes.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Conclusion
Table 3.17. Jahia 5.0.3 Summary
Category Contributor Navigation Structured Content Score Explanation In-context editing is intuitive for most users and Jahia has addressed content reuse issues typically associated with this model. Content objects, called containers, are defined by templates that specify fields and their data types. The rich text editor is particularly well integrated. Access control can be set at the field level. Creating validation logic is more difficult than other products in this category. Overloading the display templates to define content types may be convenient for smaller sites but may become unwieldy on large complex sites. Jahia's robust replication model makes it possible to push settings made in the through the web administration interfaces to QA and production environments. Jahia's event listener framework is a powerful mechanism to wire in custom code. Portlet support provides a container to build and integrate custom applications. Jahia's use of portal technology is both an advantage and a disadvantage. The portal architecture is flexible and conducive to building new functionality and integrating existing applications, but it is just one way to solve the problem. Jahia customers are somewhat locked into the portal approach. Luckily, it is not a bad approach. Jahia has done an admirable job of selecting popular technologies and keeping the architecture current. Jahia's non-standard licensing approach and the vagueness of the "sustainable source" model are less transparent than your average open source project. However, Jahia is not a full open source product and the fact that non-customers can see the source code base and Jahia makes all its documentation public (even in draft form) make information much more available than commercial products. None Wiki and PDF Guides for users, administrators and developers. There is a mailing list but most of the support appears to be delivered directly to paying customers. Below Average; Average; Above Average; Exceptional.
Configuration Management
Key:
Nonexistent;
In many respects, Jahia is a unique product in this space. Most interesting is the approach to combine portal and content management functionality from the beginning rather than to start on one side and then work over to the other. The result is a well designed, well executed hybrid of these two types of applications. As a development platform, the portal model provides great flexibility to build functionality and integrate with other systems. The tight integration with Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 152
Product Evaluations
content management services allows Jahia to avoid many of the trappings that other contentoriented portals suffer: poor link management and simplistic cache management, for example. However, the portal model is not ideal for every problem and those looking to use different frameworks on the delivery tier will find Jahia limiting. The other way in which Jahia is unique is in its licensing model. For practical purposes, Jahia is more of a "visible source" commercial software application than a true open source one. Unless you mind having a "powered by Jahia" logo on every page, expect to purchase one of the commercial products. However, those companies that use Jahia for their intranets can enjoy using the software for free and still have the option to buy support if they need it. For most North American customers, Jahia is probably still a new name. The establishment of a U.S. based office and some recent (although not yet public) client wins may change that soon.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Round Up
In the last five years the open source Java WCM market has grown from a disappointing collection of small niche projects to a set of legitimate options for enterprise buyers. General interest in open source, as well as the marketing efforts of Alfresco and other commercial open source vendors, has brought attention to this sector of the market. The selection is the broadest for customers looking to build basic informational and moderately interactive web sites and architects trying to plug content services into more elaborate web sites and web applications. Community oriented functionality is generally lagging in terms of what is provided out-of-the-box. It will probably continue to be a weakness on the Java stack because many of the community oriented sites are being built on technologies like PHP and Ruby On Rails. Still, Java WCM technologies may have a role in these lighter weight architectures by providing back-end content services and other basic infrastructure. For example, customers are building highly interactive applications in PHP using Alfresco's PHP API and Web Scripts and one of the primary Hippo customers is thinking of building their delivery tier on Ruby on Rails. Despite the temptation, consolidating open source products together as one category of software is a mistake because these products are so different. Of the seven platforms evaluated in this report, six (all but Apache Lenya) can be supported by commercial style support and maintenance agreements. Four (Alfresco, Jahia, Magnolia, and OpenCms) can be purchased as commercial software applications. Alfresco and Jahia operate the most like commercial software companies but Magnolia and OpenCms also sell commercial Enterprise Editions and can deliver a commercial software customer experience. These commercial open source companies encourage their customers to engage in their open source communities, but it is not a requirement and many of their customers do not. Daisy, Hippo, Apache Lenya, Magnolia, OpenCms and, (if you dont mind the badge) Jahia all have free versions of the software that can realistically be used in mission critical production environments. Customers that have the knowledge and bandwidth can potentially save money by self-supporting the software or buying consulting support when needed. Daisy, Hippo, Jahia, and OpenCms will give the option to support their free versions. Magnolia users must convert to the Enterprise Edition to qualify for a support package. However, since the code base is the same, this does not require migrating their implementation to another version of the software. If your company has the potential to execute a well run software implementation project and self-support the solution, a product like Hippo CMS or Jahia Community offers the potential cost savings in the realm of $150,000 in up front licensing and $30,000 per year for maintenance and support. Alfresco is the costliest of the products in this report, but typically competes against the most expensive commercial products. Products in the informational brochure category can save between $30,000 and $60,000 in licensing plus $6,000 to $12,000 in annual maintenance. The savings is less because commercial competitors at this level are cheaper. Companies with less aptitude and appetite for owning the technology can still save money as the support and licensing costs are considerably cheaper than the commercial software analogs. For the customers that would like to delegate more responsibility for maintaining the application, success usually hinges on connecting with the right systems integrator. In the U.S., Alfresco has the most options of qualified integration partners. In Europe, OpenCms has the largest community of independent integrators that bring the product into accounts that they win. Magnolia's market presence is growing rapidly. Daisy and Hippo can field capable implementation teams, but their networks are much smaller. The Lenya Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution. Version 1.0, Workgroup License Page 156
Product Evaluations
community is dwindling to a few independent contractors and small systems integrators that still build sites on the platform.
Product Evaluations
opportunity. Every company, non-profit and institution needs at least one online informational brochure to explain what they do. Other than Lenya, which has been in a state of limbo for a couple of years, Daisy has the smallest market presence of the products reviewed in this category. A major reason is its foundation on the complex Cocoon platform which limits the number of SIs capable of integrating the software. Still, through its affiliation with Schaubroeck, the install base is stable and growing. What is most interesting about Daisy, however, is its range into the wiki and knowledge base applications that makes it somewhat of a unique offering. In this capacity, Daisy competes very favorably with commercial wiki products like Confluence [http://www.atlassian.com/software/confluence/], Traction Softwares TeamPage [http:// traction.tractionsoftware.com], and MindTouchs Deki Wiki [http://wiki.mindtouch.com/]. With sophisticated access control, its faceted navigation, and support for structured content types, Daisy has better potential for managing persistent knowledge resources than traditional wikis that tend to excel for temporal collaboration spaces.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Product Evaluations
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Glossary
Baking and Frying "Baking vs. frying" refers to when presentation templates are applied to render pages out of structured content. Baking style rendering systems generate pages when content is published. Frying systems generate pages on the fly when they are requested by the end user. Whether a system bakes or fries content tells a lot about its architecture and what it is good at. Baking systems are great for high volume sites that do not need to personalize content. Frying systems excel when requirements include personalization, access control, and other presentation logic that uses information about the user in order to decide what to show and how. BPEL (or Business Process Execution Language) is an XML language for defining workflows. Workflow engines read in BPEL definitions and use them to drive workflow logic. BPEL is most commonly used in service oriented architectures to orchestrate processes across different de-coupled services. BPEL can also be used within an application to coordinate workflow states, events, and transitions. CForms (or Cocoon Forms) is a form handling system for the Cocoon web application framework. Although very powerful, CForms are more complex than the form handling systems of the other general purpose web application frameworks. The core design is that the programmer defines a "model" that describes a form as a set of form control widgets that will be presented to the user. Then the developer writes a template that controls how the form is displayed to the user. While much functionality can be achieved by writing minimal amounts of Java code, the amount of XML that one does have to write can be very complex. In most open source projects, only a few trusted developers have rights to check-in (or "commit") code updates to the source code repository. The people with this "commit" status are called committers. Noncommitters can submit patches to the code base and their submissions are reviewed by committers who either accept or reject them. Depending on the size of the project, the committer team can be small or large. Different governance structures have different ways of selecting committers. FreeMarker is a templating language that tries to do a better job of separating business logic from layout than JSP. Unlike JSP, FreeMarker prevents a developer from writing scriptlets or other procedural code in the template. The developer is forced to call Java classes or use an expression language like JEXL. The value of FreeMarker has diminished somewhat with improvements to JSP such as improved tag libraries like the Java Standard Tag Library (JSTL). Maven is displacing Apache Ant as the industry standard for scripting automated builds. More sophisticated than Ant, Maven also checks against remote code repositories to pull down the appropriate libraries. When Maven works, it works like magic; when it doesn't work, it is a Version 1.0, Workgroup License Page 161
BPEL
CForms
Committer
FreeMarker
Maven
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Glossary
programmer's worst enemy. If you have spent any time on a mailing list or IRC channel of any Java project, you probably have heard complaints about Maven. JCR The JCR is a relatively new Java standard (JSR 170) that defines a repository for managing content. The JCR is well suited for semi structured content that is hierarchical in nature. Unlike relational databases, JCR's natively support content management specific functions like versioning, workspaces, and content deployment. There are various levels of JCR support. The important distinctions are that level one is read only access, level two is read and write and specifies an access control model, and there are some optional features like observation (where you can monitor a set of assets and then be notified if there is a change). The JCR specification has not yet enjoyed widespread adoption. The biggest proponent is Day Software, whose CTO David Nscheler is the specification lead. Day also has a number of their own developers working on the reference implementation, Apache JackRabbit. Day also sells a commercial JCR implementation called CRX as well as JCR adaptors for other repositories like Documentum, FileNet, Lotus Notes, TeamSite, Sharepoint, OpenText Livelink, and Vignette. Outside of Day, however, use of the JCR has been limited. The big news for the JCR community is that Oracle now supports the JCR standard with the Oracle 11g XML DB product. There is more JCR interest within internal corporate software engineering departments that are building custom systems and want to reduce risk by sticking to standards. If the JCR is to become a truly successful standard, it will require these corporate architects putting pressure on software vendors like Mark Logic to support the specification. JSF JSF, or Java Server Faces, is a Java standard for a web programming model that is similar to .NET. The basic idea is that there is an event model that triggers "code behind" at the server. JSF implementations (such as the popular Apache MyFaces [http://myfaces.apache.org/]) generate large amounts of HTML to build data-bound HTML controls that have Javascript to trigger server side methods through an HTTP post. While all this code generation greatly increases developer productivity, it adds complexity under the covers. Tool vendors like the idea of JSF because it allows them to provide a WYSIWYG programming environment similar to Visual Basic, where a developer can drag controls onto a panel and set bindings and properties. JX Template is the official templating language for the Cocoon framework. Easier to understand than its predecessor, XSP (eXtensible Server Pages), JX Templates use an XML based syntax for basic conditional logic and control flow but try to limit the amount of business logic written in the template. JSR 168 defines a standard interface that a Java portlet can implement in order to be able to run within different portal products. JSR is supported by all of the major Java portal products and the Apache Pluto project provides a portlet container that can run a JSR 168 compliant portlet in any Java web application. The portlet standard is now being improved by JSR 283. Version 1.0, Workgroup License Page 162
JX Template
JSR 168
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Glossary
REST
REST, Representational State Transfer, is a slimmed down web services style architecture. Unlike classical web services that communicates over the SOAP protocol, a REST interface consist of a URL based API accessible via HTTP GET or POST actions. These methods return XML documents or execute other functions on the services. You could think of the web itself as being REST based with the response being simple HTML (or preferably XHTML) documents. Without all the overhead of building SOAP packages, REST APIs are easier to develop and are generally preferred by pragmatic programmers. Most of the major web service providers (like Amazon) are seeing much greater adoption of their REST interfaces than their SOAP interfaces. Jakarta Slide is an Apache project to implement the WebDAV protocol. Slide is actually the reference implementation for the WebDAV standard. Slide is a mature and robust platform capable of high performance and large volumes of content. A Sprint is an event when a group of open source developers get together to do some major work on the platform. The duration is anywhere from a day to three or four days. Sprint's usually have an established theme that describes the scope of work that will be attempted. Often a company that has an interest in building this functionality will sponsor and host the sprint. The term "sprint" is shared with various Agile development methodologies and there is a considerable amount of overlap in process. At the start of a sprint, the leaders communicate a game plan and organize the team to work on various tasks. The duration is a fixed amount of time. Scope is variable. Open source sprints also usually follow the Agile methodology practice of pair programming where developers work in teams of two on a single computer. In addition to all the work that gets done, there are important sidebenefits of sprinting. Information sharing, identifying leaders, and creative problem solving all result when people with different experiences and backgrounds work closely together. Many developers feel they have learned most of what they know from participating in sprints. There are also positive social aspects of sprints and developers travel from all over the world to participate in exotic locations. The deal is that they (or their employer) pay for their travel but, once they are there, their food and lodging are taken care of by the sprint organizers.
Slide
Sprint
Velocity
Like Freemarker, Velocity is a templating language that tries to do a better job than JSP of separating business logic from layout. Unlike JSP, Velocity prevents a developer from writing scriptlets or other procedural code in the template. The developer is forced to call Java classes or use an expression language like JEXL. Velocity is an Apache project and enjoys a wider install base than FreeMarker. Velocity is used in some commercial CMS applications such as Rhythmyx or Clickability's cmPublish product. The value of Velocity has diminished somewhat with improvements to JSP such as improved tag libraries like the Java Standard Tag Library (JSTL). WebDAV (Web Distributed Authoring and Versioning) is an extension to the HTTP standard that allows documents to be updated over the web. WebDAV is a very important standard because it is used by many Version 1.0, Workgroup License Page 163
WebDAV
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Glossary
technologies. If you are Microsoft Windows user and you are connecting to "Web Folders," you are connecting a WebDAV server. WebDAV support is well supported by many desktop applications including Microsoft Office. Many CMS vendors use WebDAV as a way for users to easily drag files from their hard drive into the repository. WSRP WSRP (or Web Services for Remote Portlets) is an OASIS XML standard for how portlets can communicate to back end services through web services. Unlike JSR 168, WSRP is technology agnostic because it is a communication protocol rather than a programmatic interface. A JSR 168 portlet can talk to a web service over WSRP. XPDL (or eXtensible Process Definition Language) is an XML format that allows graphical modeling tools to store and exchange workflow process definitions.
XPDL
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.
Colophon
This book was written in DocBook format Oxygen XML Editor. The content was Transformed into PDF using XSLT based on Norm Walsh's DocBook XLT examples.
Copyright 2007 Content Here, Inc. All Rights Reserved. Not for Redistribution.