Web Information Management:eXtensible Markup Language (XML)

11.1 Introduction

Organizations collect and organize data to support the process of management decision-making. Marketing managers have to make decisions about pricing, which requires data on competitor prices, market data (e.g., size, segmentation, historical trends), and an appreciation of the dynamics of the market and the intensity of competition. Data includes sound, pictures, and video as well as numbers and text. By using data in a context managers are enacting a process of creating information. The technology part of an information system can only ever hold data; it takes a human being in a specific context to interpret that data and make sense of it, i.e., to turn it into information. Knowledge goes still further and is often represented as competency – the ability to do something with that information. Creating and sharing knowledge within the firm and between firms was a major concern for the corporations of the late 20th century and continues to be high on the agenda in the 21st. Although we do not address knowledge management directly in this book, in this chapter we look at the technologies that can support the firm in developing its knowledge management processes.

We will look firstly at a technology that helps to codify data, the extensible Markup Language (XML) (this material is taken from Vidgen & Goodwin, 2000). Secondly, we will look at a broader class of emerging technology, content management systems (CMS), which help organizations manage large amounts of data in a web environment, whether it is support for e-commerce  and external customers or an Intranet implementation of a knowledge base (Goodwin & Vidgen, 2002).

Knowledge, however, is an elusive concept and difficult to locate in practice, as Davenport and Prusak (1998) illustrate:

Knowledge is a fluid mix of framed experience, values, contextual information, and expert insight that provides a framework for evaluating and incorporating new experiences and information. It originates and is applied in the minds of knowers. In organizations, it often becomes embedded not only in documents or repositories but also in organizational routines, processes, practices, and norms. (p.5) Using data standards defined in XML we can support the process of knowledge codification and with a CMS we can manage the process of creating and publishing data. These technologies support knowledge management, but they are not by themselves a knowledge management system. A knowledge management system is always a combination of people and technology in some context and must cater for not only formalized knowledge that might in some limited sense be represented in software, but also the informal, tacit knowledge that cannot.

11.2 eXtensible Markup Language (XML)

The eXtensible Markup Language (XML) provides a notation for defining the content and presentation of data. It derives from the Standard Generalized Markup Language (SGML). SGML has been used for many years in document and specification-intensive industries such as pharmaceuticals and aerospace. However, although SGML is a powerful and general standard it is complex – XML is a simplified form of SGML that has been adapted to make it suitable for Internet applications.

XML is similar to HTML insofar as both are markup languages and both use angle-iron tags. Where HTML is concerned mainly with the presentation of data and not with what that data means, XML provides a blueprint of the data structure, just like an engineering drawing. For example, consider a web site that sells personal computers. There is a special offer on one of the Dill laptop computers. With HTML it is easy to emphasise a particular laptop, possibly by putting the price in a bold font, <b>£999</b>, making the font colour red, and increasing the font size so that it stands out to the human web user. There is no meaning associated with the price of £999; it is just text that is formatted for presentation in a browser. A search engine will struggle to find all the laptop computers on special offer on the web. It will be even harder to find a model with a given specification, e.g., an 800 MHz chip and 256 Mb of RAM.

Figure 11.1 shows how the HTML in figure 11.2 is rendered in an Internet browser. The HTML representation (figure 11.2) specifies how the data is formatted using tags such as <h2> for a heading and <font> for the font face. By inspection of the HTML source in figure 11.2 we can see that the code relates to PCs, but the data is not structured; it is just free-form text surrounded by formatting instructions. Each PC supplier will format their pages in different ways using different conventions, different phrases and abbreviations, and different levels of detail of content.

Developing Web Information Systems-0128Developing Web Information Systems-0129

With XML the laptop can be coded with meta, or user-defined, tags such as <price>£999</price> and <ram>128 Mb</ram>. Using XML we can define a standard set of tags for describing personal computers (or any other class of object) that will enable engines to search more intelligently (figure 11.3).

Developing Web Information Systems-0130

Inspection of the XML source code shows that the tags do not refer to presentation, as did the HTML tags, but to data content. With XML users can make up their own tags. This raises the question of what happens if they make up different tags. For example, maybe supplier A uses the tag <ram> and supplier B uses the tag <random access memory>. Therefore, standards are needed to get the best from XML. If computer suppliers – or engineering organisations – adopt a common standard for describing computers then search engines will be able to make meaningful comparisons. Standards are also important if organizations are to automate the interchange of data, such as purchase orders. Fortunately, there are organizations working on industry-specific standards. Rosetta Net is developing an XML specification for supply chain management. They have developed 5000 words and have 3500 words specific to the IT supply chain, e.g., 'mouse left button'. This is great for data interchange within industries, but there are also going to have to be standards that cross industries and communities.

Coordination of the production of standards means that bodies are needed to develop and act as custodians. There are many standards being developed in a wide range of industries, including Accounting, Banking, Advertising, Automotive, Financial, Insurance, Computer graphics, and Legal. Repositories of XML data standards (or, more properly, XML schemas) are being established by BiZTalk and XML.org. BizTalk is led by Microsoft and has an advisory panel that includes RosettaNet, the US Department of Defense (DoD), and CommerceOne. BiZTalk has around 250 schemas covering 11 industry categories. XML.org is an OASIS (Organization for the Advancement of Structured Information Standards) initiative with sponsors that include IBM, Oracle, Sun, and Commerce One. XML.Org has links to more than 100 schema-producing organizations listed in 45 categories. The Microsoft initiative differs from XML.org in having a commercial product, the BizTalk Server, which will provide a platform for XML-enabled business data interchange.

11.2.1 XML data definitions

The definition of a set of tags is called an XML schema. These were originally defined as document type definitions (DTD), but these used non-standard XML tags. A schema that is written in XML itself is replacing the DTD. This is a purer and ultimately more flexible approach, since an XML schema is indeed now just another XML document, albeit one that happens to be describing other XML documents. In the example in figure 11.3, the xmlns tag tells the browser to check the document against its schema (figure 11.4) – if it does not conform then an error message will appear in the browser. It is not a requirement that all XML documents area checked against a schema, but clearly this makes sense when processing documents such as purchase orders.

The XML schema can be stored anywhere on the web and accessed as needed (this is known as a namespace).

Developing Web Information Systems-0131

Figure 11.4: XML schema (pcspecschema.xml)

The XML schema in figure 11.4 shows how the basic XML building block, the element, is defined and used to create higher order elements. At the lowest level are elements such as manufacturer and computerType. These cannot be broken down into further detail but they can be combined to create higher  order XML entities, in this case the element computerSpec. The element computerSale can comprise of many (zero, one, or more) instances of the element computerSpec. Notice that some of the elements have attributes defined, such as price, which has an attribute called currencyCode. This is analogous to the idea of entities and attributes that we used in information modeling and database design. Although there are similarities between XML schemas and UML class modeling, note that the XML schema is a hierarchical (tree-type) structure and a single UML data structure can be implemented in many different ways using XML.

11.2.2 Displaying XML in a browser

The XML tags relate solely to the meaning of the data as described by the XML schema. Therefore we need a way of formatting the data for display on different devices, such as an Internet browser. Data can be presented in a browser using the latest versions of Internet Explorer and Netscape. There are two main ways of doing this on the client (browser) side – cascading style sheets (CSS) and the XML Style Language (XSL). CSS version 1 supplemented HTML's existing formatting capabilities and provided a powerful facility for managing the look and feel of a web site. With CSS version 2 the W3C tackled the requirements of a style language more seriously, allowing all the formatting of a document to be described. The XSL transformation engine (XSLT), such as the one supplied with IE5, provides style sheet capabilities and a whole lot more besides. Using XSLT it is possible to transform the data, for example to exclude some data or to sort the data before display in the browser (figure 11.5). Currently, XSL is only supported by IE5, which rather limits its use on public access web sites.

Developing Web Information Systems-0132

The XML data file (pcspec.xml) contains no information about how the data should be presented in the browser. This is supplied by the style sheet, pcspec.xsl (figure 11.5). Don't worry about the details of the XSL language – the main message is that all the formatting, including sorting if needed, is specified here. The XML implementation produces the same results when viewed in a browser as the HTML in figure 11.1 – superficially, the user will see no difference in the way the data is displayed. For the transformation engine to check whether the XML data in pcspec.xml is valid an XML schema needs to be specified (pcspecschema.xml). This is not a requirement; the XML parser in the IE5 browser will check that the document pcspec.xml is well-formed (e.g., matching opening and closing tags are present), but will only check that the document is valid if a schema is specified

11.2.3 Time to ditch HTML?

In the long term it is likely that XML will replace HTML, but it will be an evolutionary change. There is a massive investment in HTML and too many people are comfortable with HTML to throw it out straight away. This means that we are going to have to live in a hybrid world where it won't be safe to build XML-only sites, unless you can control the browser software that people use as might be possible on an Intranet. So for now, this means that server-side processing should be used to detect the browser type and serve up XML or convert the XML to HTML server-side before sending it to the browser.

11.2.4 Business applications of XML

XML is an interesting technology because it allows the content of a web page to be separated out from its presentation, but what are the business applications of XML?

Intelligent searching

Consumer e-commerce: e.g., find the cheapest price for a PC anywhere in the world. This would need PC manufacturers to agree a PC markup language that they could all use to describe their products. IBM have launched the first XML search engine to search for XML tags and schemas. A similar approach is also relevant to purchasing decisions, sourcing suppliers, finding acceptable cost/service combinations, etc. Because there is agreement on the data formats it will be possible for intelligent agents to go out and negotiate on our behalves.

Web automation for business to business transactions

Once a standard is agreed for the data format of orders then the processing of orders can be automated, as long as the order documents conform to the order XML schema (this is a use of XML for traditional EDI – electronic document interchange). Different industries and interest groups need to develop their own XML standards to facilitate interchange. For example, the Inland Revenue will have an XML standard for tax returns so they can be filled in online and if they conform to the tax return XML schema then they can be processed automatically by the Inland Revenue's computer systems. Different software suppliers could develop different tax return packages with varying degrees of sophistication but as long as they all produce documents that conform to the Inland Revenue schema then they can all be processed by the Inland Revenue.

Intranet development and knowledge management

Content is separated from style, so we can change the formatting at a stroke (we could achieve this with cascading style sheets – CSS). With XML an organization can go a step further and build standard definitions of its data and begin to manage the organizations' knowledge by structuring it so it can more easily be retrieved and reused. When coupled with a content management system (CMS) the power of the web can begin to be harnessed by organizations.

Delivery to different platforms

Again, separating content and presentation means the same content can be directed toward different display devices. It can also be used to integrate legacy systems, using XML structures to get data out of and into legacy systems that need to be integrated with the Internet (one of the key issues in e-business).

The XML schema specifies the data content, but not the way in which it is displayed. The display can be handled by the client device; for example, the client might be desktop computer with Internet Explorer, but it might be a personal digital assistant or a mobile phone running the wireless application protocol (WAP). As an aside, you will see that web pages for serving up to a mobile telephone via WAP are written in WML – wireless markup language. WML documents use an XML schema to define the tags that constitute valid WML. Unfortunately, today, WML does not separate data and presentation any better than does HTML, but over time this will change.

11.2.5 XML and databases, XML and Java

We discussed earlier the merits of object oriented (OO) DBMS. Many thought that relational DBMS would go away and be replaced by object technology. In practice, RDBMS have survived and gone from strength to strength. OODBMS are used in more specialist applications, such as computer aided design (CAD), while RDBMS have been extended with many of the features of an OO environment.

The situation is not dissimilar with XML. Some applications may warrant an XML document server that can store persistent XML objects. Many applications are likely to store the data in an RDBMS and to convert it to XML on output, and to convert XML to SQL on input. Today you could write this easily enough in a scripting language such as ASP or a heavier duty programming language C++, but tools are being added into products to make this easier for the developer and seamless for the user.

XML is basically a way of defining data structures. In object oriented terms it is less powerful when it comes to defining the behaviour of objects. However, Java is strong at defining and encapsulating behaviour, but less useful for defining and exposing data structures (the key thing about OO is that you don't need to know the data structure to use an object – this is internal and hidden from the client). By combining XML and Java it is possible to have a document object whose behaviour is given by a URL that indicates a Java applet that knows how to display, manipulate, process, interact with the XML document (this is analogous to the namespace where the schema is defined). Unlike the namespace, where there needs to be a definitive master version, there can be lots of Java applets that can add behaviour to an XML schema. For example, the user could use a range of different Java applications to complete their tax return – as long as all the applications conform to the definitive tax return schema. One way of exposing an interface on the web to allow others to make use of existing services is Microsoft's .NET framework.

11.2.6 XML, web services, and.NET

Microsoft's .NET initiative is a framework and vision for the future of Internet computing. One of the most important elements of .NET is web services. Using web services organizations can interface their software applications with those of their trading partners to conduct business-to-business (B2B) e-commerce, or to link the front office and back office applications within their own organization (Enterprise Application Integration – EAI). For web services to work there needs to be a standard way of defining an interface to a web service – in .NET this is accomplished using WSDL (Web Services Description Language). WSDL interfaces are defined with XML. The interface needs a way of invoking operations in the application and a way to return results to the client. In web services this achieved using SOAP (Simple Object Access Protocol). SOAP is also defined in XML. The final part of the picture is a directory to help clients find available web services – UDDI (Universal Discovery, Description and Integration). SOAP is used to read and write the contents of the UDDI database, so once again we see that XML, via SOAP, is a core .NET technology.

Developing Web Information Systems-0133

Let us see how web services could be used in the case of a theatre booking system. Rather than implement the business logic layer as a closed and proprietary software interface we could open it by making it into a web service  that can be exposed on the Internet for access by client software applications (figure 11.6). Assume that a theatre-goer wants to book a theatre performance using a personal digital assistant (PDA) and a wireless connection to the Internet. The user enters detail of the type of production they want to see and when (step 1). The Ticket Finder application is a web service that can access many theatre booking systems to find the user a suitable production and performance dates. The Ticket Finder application uses the Internet to access a ticket booking application, such as the one operated by the Barchester Playhouse. For this to work, the Barchester Playhouse needs to have implemented its ticket booking system as a web service and make it available via the Internet to the Ticket Finder application (step 2). In step 3 the Ticket Booking application interacts with Seat Availability and Seat Pricing applications to price and reserve the seats. These applications are run internally and are accessed via an Intranet. It is possible that they are pre-Internet legacy systems that have been extended to support a web service interface, thus integrating them as part of a larger, Internet-based ticket booking system. Finally, the Ticket Finder application takes payment using a further web service (step 4).

The web services component of the .NET framework allows organizations to collaborate in providing end user functionality (such as finding performances and making payment). The .NET framework also provides organizations with a platform for enterprise application integration that can integrate pre-Internet legacy systems as part of a larger and publicly available Internet application.

We introduced ColdFusion components as a way of structuring the business logic of the Ticket Manager application in chapters 9 (technical design) and 10 (software construction). The good news is that a ColdFusion component can be made available to the outside world as a web service simply by changing the parameter ACCESS=''public" to ACCESS="remote" in the <CFFunction> tag. The ColdFusion MX server will generate the necessary WISDL automatically on the fly, thus allowing the Ticket Manager application to be incorporated by partner organizations into their own applications. Web services can be created in many technical environments, but ColdFusion makes it about as simple as one can imagine. In summary, component-based, web service compatible architectures have great promise for B2B collaboration and should be the starting point for web-based information systems development.

Comments

Popular posts from this blog

The Conversion Cycle:The Traditional Manufacturing Environment

The Revenue Cycle:Manual Systems

HIPO (hierarchy plus input-process-output)