Web Information Management:Content management systems
11.3 Content management systems
All organizations that have a website (and that is nearly all organizations) must have a content management system, whether it be a manual process enacted by a single web manager or a decentralized system for producing and publishing web content supported by sophisticated technologies. CMS is primarily a process, not a product, and we define it as 'an organizational process, aided by software tools, for the management of heterogeneous content on the web, encompassing a lifecycle that runs from creation to destruction' (Goodwin & Vidgen, 2002).
As with many new IT trends, web content management systems (CMS) is in part a practical response to a pressing business problem – how to organize and manage large-scale web sites – and in part a technology push on the part of software suppliers. Early to market software suppliers, such as Interwoven (www.interwoven.com) and Vignette (www.vignette.com), are finding many others jumping on the CMS bandwagon. Some of these offerings are extant software products that are being re-positioned (or possibly just re-branded) in the CMS domain. Some of the 'new' CMS products are rooted in document management, while others have developed from customer relationship management, e-commerce, and software configuration management. In the crowded CMS marketplace there is some confusion about what constitutes CMS – many suppliers define it in terms of the product they sell – hence our definition above.
11.3.1 Issues in web content management
Many organizations have created a website and most have established some infrastructural support for their website, such as a web manager or a web services department. There has been an explosion of content on web sites as the potential of the web for internal and external communication is recognized. For a web site to 'live and breathe' it must be fed with new content and out of date content must be removed. Organizations therefore need to encourage the activities of content providers. However, increased activity in content generation has raised a number of issues, all of which can cause problems if not properly managed:
• Bottlenecks. The web management function can become a bottleneck for content revision. Content arrives in different forms and has to be edited – usually manually – into a form suitable for publishing on the web. Funnelling content through a web manager resource can lead to delays in publishing on the web
• Consistency. Where web editing is devolved to departments there can be inconsistencies in the look and feel of the site and variable quality of layout and content
• Navigation. If structure and content are not closely controlled, there is a danger that navigation and search capabilities will suffer. This is of major importance, as without these, it becomes hard for the user to find the required information thus degrading the value of the entire Intranet
• Data duplication. In many cases, the content on the web is a copy of data held in a departmental or institutional system; changes to one system are manually replicated in the other systems. Ideally, data will not be stored redundantly in the organization. There will be one source accessed by all business applications, whether internal or external. Where data needs to be copied then replication should be automated and controlled
• Content audit and control. Unauthorized content may appear on the website. Material published on the web should be subject to a review and authorization process to ensure that it is acceptable from a marketing and legal viewpoint. Procedures and controls need to be defined to manage the web publishing process as well as user roles, such as creator, editor, and approver
• Tracking. To use content effectively it is necessary to know things about the content, such as who created it, when was it created, and when was it last updated. The ability to track and reconstruct the changes that have occurred to content is an important part of content management
• Business processes. Content is often tied tightly to business processes. For example, the production of a market intelligence report is a complex business process, involving data collection, data analysis, and the generation of commentaries and forecasts. Not only is the 'final' report published on the web, but also updates and revisions are likely to be needed on a regular basis. The business process and web content management need to be integrated, allowing content to be published internally for inspection and review and only released once it has been approved. Furthermore, the process itself may need to be redesigned to take account of differences between paper and web publishing.
Although many CMS packages are aimed at large organizations with tens or hundreds of thousands of pages, smaller organizations are also running into the problems caused by a lack of control over the content management process.
11.3.2 Genesis of web content management systems
CMS solutions must be capable of dealing with data with different degrees of structure. Some content will have a high degree of structure, such as employee records, and be amenable to being broken down into tables and stored in a traditional relational database system. Other content will be low in structure, such as a video clip of an Internet consultant talking about security issues. In between will be range of content, such as a health and safety manual or an in-house magazine, which will display more or less degrees of structure. A CMS must be capable of managing content in different media and with different degrees of structure, which suggests a combination of traditional data management and document (hypermedia) management technologies.
Many e-commerce products have specialized CMS facilities to address the maintenance of product and customer data. There is also much to learn from software configuration management, which is concerned with aspects of change such as versioning and audit trails. From an enabling perspective, technologies that support semantic interoperability, such as XML (eXtensible Markup Language) and RDF (Resource Description Framework – Dekker et al., 2000) provide a basis for sharing and the automated interchange of content between partners, i.e., humans are not the only users of web sites.
In figure 11.7 we show how the three antecedents of CMS – document management, customer relationship management, and software configuration management – together with enabling technologies and web semantics can be combined to provide a platform for CMS systems.
The antecedents also point to the range of disciplines that are encompassed by CMS. Research areas in hypermedia and document management, software engineering, marketing, and business process design are all relevant, underpinned by data management and web semantics. This diversity of interest suggests that although the idea of CMS is intuitively simple to grasp, the different emphases and permeable boundaries will make it a difficult area to tackle both in research and in practice.
11.3.3 A web content management systems framework
The integrated view of web content management developed above provides a basis for developing a framework that brings together a range of content management related issues and requirements (figure 11.8).
Content life-cycle
At the core is the content life-cycle and a logically unified repository. The content life-cycle covers all stages, from creation to destruction of content components. New content will arrive from a number of sources, including electronic documents (e.g., from word-processors, such as MS Word), paper documents, web page templates (e.g., press releases, new product descriptions), web design tools (e.g., MS FrontPage, Macromedia Dream Weaver), and direct edit on the web into the repository. In many cases, the source will require some sort of review. This may be a review in terms of acceptable content, or it may be a review to determine the optimum place for the data within the structure of the system. The original data may requirestorage prior to publication and this may need to be continued after publication if it is published in a different form. With regard to publication, apart from the obvious requirement of making the content available, this should include:
• Authentication – which is concerned with identifying the user through a mechanism such as a user id and password or biometrics
• Personalization – which relates to the ability to present different users with different views and different data depending on preferences, access profiles, role, previous accesses, etc.
• Transformation – which is concerned with constructing (e.g., combining subcomponents into new documents) and transforming content at the time of delivery.
There is often a requirement to archive the data. This could happen automatically after a given date, or it could be a manual process, with the archives stored online or offline. Finally, at some stage, there may be a need to remove (destroy) the content permanently. The repository is a collection of data stores that cater for components with more or less structure, including relational and object-oriented databases, document stores, file systems, etc. The CMS must give seamless access to the content components regardless of where and how they are stored.
Organizational integration
Content is generated by and in support of business processes, which indicates a workflow and application perspective. To take advantage of the web effectively it is likely that some of the workflows will need to be redesigned, possibly to incorporate computer mediated communication and collaboration. This
suggests that content management may go beyond routine automation and require, be constrained by, or initiate organizational change. As with any information technology, the consequences are likely to be unpredictable and are as much social as they are technical. The CMS core (content life-cycle and content component repository) must therefore be integrated with business processes and workflows and the existing business information systems that support those processes.
CMS process management
Management of the CMS process suggests a range of activities that need to be catered for, including data management, metadata management, and site management.
Data management is essential if content is to be exchanged, reused, shared, and searched intelligently. It is increasingly likely that web content will be marked up using XML and presentation will be defined using the style language XSL. To define and manage the XML data structures a modeling notation is required, such as the Unified Modeling Language (UML). If UML class diagrams are used to model the data structures (possibly represented in RDF – resource description framework), then XML documents can be thought of as physical implementations of an underlying and consistent enterprise class diagram. Connallen (www.rosearchitect.com) has shown how UML diagrams can be mapped into the hierarchical structure of XML documents. These hierarchies need to be shown to be consistent with the class diagram, i.e., that the UML schema can be derived by navigating through the class diagram. The combination of UML/RDF/XML is a promising approach to enterprise-wide data management.
Metadata is the information that needs to be stored about a data item for an agent to use it. For example: expiry date, source, revision history, title, keywords, date created, time to archive, and version number. The metadata might also define how content can be combined to make new/virtual documents; for example, can this section be combined with that one? In what order? Is there a pre-requisite/co-requisite? 'Bursting' documents into subcomponents has great potential for data reuse but is going to be difficult to implement and control. Consequently, content management systems should be expected to have sophisticated metadata capabilities.
Site management is concerned with web site design and structure. Content must be separate from style and be device-independent. It should be possible to change the look and feel of the site by changing a style sheet. It should be possible to add a new device, such as a personal digital assistant (PDA), without affecting the content structure.
The content life-cycle suggests that there will be a number of roles to be considered: content contributor, content publisher, content consumer, web designer, web developer, content manager, design/navigation manager. This list is not necessarily exhaustive and neither does it imply that in all cases, all these roles will require separate individuals. However, the list does indicate the type of roles that will need to be considered in managing the CMS process.
11.3.4 CMS illustration: Intranet development
We will use our model to critically examine an Intranet development programme that has been ongoing for some time. The company concerned is an autonomous division of a major multi-national organization. Early in 2000, senior executives within the division decided to 'reinvent' the Intranet and take it forward in a more proactive manner. Views tended to be short term – 'Let's just get something up and running' was a common comment. However, despite this short termism, a top-level structure was formulated with the agreement of all of the relevant departmental and business process managers. Furthermore, each of the managers agreed to take responsibility for the content of their areas.
The project started well. A prototype site was built and agreed and templates were provided to allow various groups to contribute content in a consistent fashion. However, despite advice to the contrary, further progress was poor and that can be explained through the following factors:
• There was little or no investment
• Executive interest and attention drifted away
• There was no project champion
• There was little or no central control.
Some sections of the Intranet, typically those with proactive owners or resident 'techies' who liked to play with web pages, expanded greatly and at the end of 2000 were up-to-date and perceived to be useful. Other sections of the Intranet had, after 12 months, no content whatsoever.
In terms of content management, the CMS process is manual, localized, and has little overall control. A common site look and feel and consistent navigation is achieved in part through web page templates, although this still requires content providers to be adept at using a web-authoring tool. Groups that happen to have web-authoring skills do quite well in terms of content, but frequently they rely on a web enthusiast whose actual duties are often very different. This can lead to a loss of activity in the areas of work that the content provider should be involved in and an under-recording of the time actually spent in generating and maintaining content. Groups without webauthoring skills have to rely on the very small central 'corporate' resource. This underlines the problems of the 'web manager bottleneck' mentioned above.
The Intranet is providing value in localized areas, but has not matured into a corporate resource and first point of call for help and information.
The CMS framework as a diagnostic tool
In our case study, it is clear that there has been little thought about many of the areas of figure 11.8 in the organization we considered. There has been almost no thought of business processes and work flows other than in the initial choice of top-level divisions of the Intranet. It has been considered almost entirely as just another means of distributing information rather than as a possible tool to streamline and redesign processes.
There have been culture clashes between the decentralized tradition of some departments and the more centralized needs for a corporate Intranet structure. In a recent paper, Rosabeth Moss Kanter (2001) describes one of the ways to fail in moving your company onto the Internet which is equally true in this Intranet case:
Under the banner of decentralization and business unit autonomy, reward each unit for its own performance, and offer no extra incentives to cooperate in cyberspace. Keep reminding divisions that they are separate businesses because they are different, and that's that.
Only in one or two small areas has there been an alignment between technology, structure and needs.
If we now look at the content life-cycle from figure 11.8 we again find discrepancies. First and foremost, there is no central strategy for, or control of, content. It is created randomly, rarely reviewed and published in an ad hoc manner. No one has thought about archiving or what happens to out of date data. Old items are rarely removed. There is no concept of data or metadata management other than a basic, free-text search engine. Site management is poor due to lack of skills, resources and experience.
Finally, we see little definition (or even understanding) of roles within the company. There are, by definition, a significant number of real or would-be content contributors as there are content consumers. However the roles of content publisher, web designer, content manager and design/navigation manager are combined into a very small part of one person who has little understanding of, and who has had no training in, any of them.
Summary
• XML is a basic building block for the separation of meaning and presentation of data.
• RDF goes beyond XML in allowing semantics to be expressed unambiguously as graphs rather than hierarchically-structured documents.
• Content management systems (CMS) provide an infrastructure to help organizations manage large-scale web sites and address issues such as the web manager bottleneck.
• CMS build on the XML family of technologies.
• CMS is primarily a process, not a product. We define it as 'an organizational process, aided by software tools, for the management of heterogeneous content on the web, encompassing a life-cycle that runs from creation to destruction'.
Exercises
1. For an industry in which you have worked or have experience of, what XML and RDF standards exist for that industry? How might the industry be affected or transformed by the uptake of common data standards?
2. What problems does your organization face in managing the content of its web site today?
3. What are the formal procedures for web publishing? Are these procedures followed (if not, why not)?
4. What benefits would your organization obtain from implementing a CMS? If your organization already has a CMS what implementation issues have had to be faced?
Further reading
Chappell, D., (2002). Understanding .NET: a Tutorial and Analysis. Addison
Wesley.
Davenport, T. H. and Prusak, L., (1998). Working Knowledge. Harvard Business
School Press.
Decker, S., Melnik, S., Van Hamelen, F., Fensel, D., Klein, M., Broekstra, J., Erdmann, M. and Horrocks, I., (2000). The Semantic Web: the Roles of XML and RDF. IEEE Internet Computing, September–October: 63–74.
Moss Kanter, R., (2001). The Ten Deadly Mistakes of Wanna-Dots. Harvard Business Review, 79(1): 91.
Nakano, R., (2002). Web Content Management: a Collaborative Approach. Addison Wesley, Boston, MA.
Goodwin, S. and Vidgen, R., (2002). Content, Content, Everywhere ... Time to Stop and Think? The Process of Web Content Management. IEE Computing & Control Engineering Journal. 13(2): 66–70.
Comments
Post a Comment