Technical Design:Logical database design

9.4 Logical database design

The business data will be stored in a relational database. Most organizations will have implemented a relational database, such as Microsoft Access, for end-user computing and an industrial-strength database server for enterprise applications, such as Microsoft SQL Server, Oracle, or Sybase. There are also Open Source database management systems (DBMS) available, such as MySQL. It is quite likely that the organization already has software licences for a relational database and some expertise in this technology. One of the benefits of a three-tier architecture design is that the database management system can be replaced with a minimum of disruption to the business logic layer and with no changes to the presentation layer.

Before looking at how the theatre booking system might be implemented in a relational database we will look briefly at the underlying principles of the relational model (see Date (1999) for a thorough introduction to database theory).

9.4.1 Relational databases

The relational model requires data to be stored as a series of two-dimensional tables known as relations. A relational table consists of columns and rows (figure 9.5). The tables (or relations) in a database must conform to the following rules:

• Entries in a relational table are single-valued. In figure 9.5 each of the cells of the table is atomic – a cell cannot contain a collection. For example, if it is a requirement to store multiple telephone numbers for employees then extra columns or tables must be created

• Each column has a distinct name (the attribute name). For example, empName, salary

• All of the values in a column are values of the same attribute, i.e., they share the same domain. All of the values in the salary column are from the salary domain – there won't be a telephone number or a bonus mixed in for one of the rows in the table

• The order of the columns is immaterial. The relational table will mean the same whatever the order of the columns. The table in figure 9.5 would not

be affected if the columns were reordered, for example, as telNo, salary, dateOfBirth, empID, empName (although the column(s) constituting the primary key are shown first for ease of reading)

• Each row is distinct. There must not be two rows with the same attribute values in each column. The attribute, or combination of attributes, that ensure that each row in the table is unique are known as the primary key. In the Employee table the primary key has been defined as a single column – empID. We have shaded the column header in figure 9.5 to make its role as primary key stand out clearly

• The order of the rows is immaterial. There is no implied order to the rows. To retrieve records in a given order a table must be sorted by one or more of the attributes of that table. For example, to present a list of employees in descending order of salary it is necessary to sort the rows first by the salary column.

Primary keys

The distinctness, or uniqueness, of the rows in a table is achieved by the definition of a primary key. Clearly, a primary key cannot have empty (or null) values, which means that the attribute telNo is not a candidate for the role of primary key since it is possible for an employee to exist without a telephone number (e.g., Gupta in figure 9.5).

Developing Web Information Systems-0094

It is also possible for two employees to share the same telephone number. It might be tempting to use the field emp Name, but this has problems since there are two employees called Jones. This might lead us into a compound key, such as emp Name + date Of Birth, but does this guarantee uniqueness? It is possible, although unlikely, that there will be two employees in an organization with the same name and same date of birth and therefore we could not reliably use this as a key. The typical response is to assign an artificial identifier, such as empID, that is guaranteed to be unique (because it is system generated). Look around any organization and you will see the attempts of organizations to  impose artificial identifiers on its members, whether it is a student number, an employee number, or a customer number. You will see these on library cards, employee identification cards, and supermarket loyalty cards respectively. Although these identifiers are somewhat artificial, they often become engrained in the real world, as is the case with National Insurance numbers.

Relationships

The distinctness of the rows in the table, and hence the primary key, is an essential aspect of the relational model, since it is through the primary key that one table is related to another. Relationships are implemented through matching the columns in one table with the columns in another table (figure 9.6).

Developing Web Information Systems-0095

In a many-to-one relationship, one occurrence of a table (such as Department) can be associated with many occurrences of another table (such as Employee). The relationship between Department and Employee is implemented by adding another column to the Employee table to house the key of the Department table. Thus, we say that the key of Department is aforeign key in Employee. The column deptID in table Employee does not represent an attribute of Employee – it represents and implements the relationship between Department and Employee and reflects the business requirement that each employee is allocated to one department (and the inverse, each department may have allocated many employees). Intuitively, we can imagine scanning down the Employee table, making a note of the value of the deptID and then going off to look up the details of the department in the Department table using the deptID as our direct access point. This is, in effect, what a database management system (DBMS) does using the structured query language (SQL).

The table layout of figures 9.5 and 9.6 is useful because it can show occurrences of the tables and is therefore helpful in visualizing the data structures. The design of a relational table can also be shown as a list, which is more useful for generating a database design. In the list presentation, the name of the table begins with an upper case letter and is followed by an indented list of attributes (columns), which begin with a lower case letter:

Department deptID deptName deptLocation Employee empID empName salary dateOfBirth (o) telNo deptID Foreign: deptID → Department The primary key of the table is underlined and optional attributes are marked by '(o)'. The relationship between Department and Employee is represented by the Foreign clause in the Employee table, which indicates that the attribute deptID is from the table Department (i.e., it is foreign to Employee). Graphically the tables can be represented as boxes and lines with multiplicity in similar fashion to the way the UML class model is drawn (figure 9.7). The boxes represent entities (tables) and the lines represent relationships and the diagram is known as an entity-relationship model.

Developing Web Information Systems-0096

Normalization

Relational databases have a strong theoretical foundation in set theory and implement a relational algebra that specifies the operations that can be carried out on relations in mathematical terms. The aim of breaking a data structure down into a set of two-dimensional tables is to remove data redundancy and to ensure data integrity and consistency. The fact that an employee has a salary is held only once, and if the salary of an employee changes then the changed salary is reflected automatically in all the reports and applications that use that database.

The process for ensuring that a set of tables in a database design conforms to the relational model is called normalization. This is a process for taking a complex data structure and breaking it down into a set of tables that conform to the rules of normal form. The database designer moves in stepwise fashion through different stages of normalization, from first normal form to third normal form (at least) and on to fifth normal form (ideally). It is outside the scope of this book to include details of the normalization process, which we leave to specialist books on database design (Date, 1999). However, an experienced data modeler will intuitively develop normalized models.

9.4.2 Moving from UML class model to relational schema

The design of a relational database using an entity-relationship model that shows primary and foreign keys is a logical design activity. Although it is often used as an information modeling technique (e.g., in SSADM) for analysis it is more rightly seen as a design activity since it is predicated on a specific database model, namely the relational model. The UML class diagram is less implementation dependent and is therefore more suitable for expressing information requirements of the problem situation.

Developing Web Information Systems-0097

The move from analysis design has involved the introduction of new terminology and it will probably help to clarify the terms used and show how they are loosely allied (table 9.1). Table 9.1 is not a direct comparison of terms. An entity is NOT directly equivalent to a class. One of the appeals of an object-oriented approach is that objects are not flat and two-dimensional as they are in a relational database. An object can hide complex data structures behind an interface of publicly declared operations. As we will see, it is possible to break complex objects down into relational tables, but those tables will need to be recombined to reconstruct them as equivalent objects. There will be an overhead involved in doing this and for that reason an object-oriented database will often be greatly more efficient at dealing with applications that require complex objects, such as the three-dimensional graphical design software packages used in the aerospace industry. However, for maintaining business data in the theatre setting and for satisfying management reporting requirements from that data a relational database is a suitable implementation environment. It is also relatively cheap, tried and tested, and has a large skill pool available.

A relational database does not typically support operations, although these can be approximated using database triggers, which are rules that fire whenever a given event occurs. For example a database trigger could be written to check that when an employee is added the salary is within the range stipulated by the grade of the employee. The business logic of the application will, most probably, be implemented using a programming language such as Java, COBOL, C++, or a scripting language such as Microsoft's active server pages (ASP) or ColdFusion (and possibly a combination of these approaches).

Other specific aspects of the class model that need to be worked around in the relational database schema are many-to-many associations and inheritance (generalization/specialization structures).

Implementing many-to-many relationships

Many-to-many associations in the class model must be resolved, as these cannot be represented in a relational database (at least, not without violating the rules of the relational theory outlined above). Figure 9.8 shows a class diagram with a many-to-many relationship between stores of a supermarket chain and the items stocked in those stores: a store may stock many items and an item may be stocked by many stores. The relational tables for Store and Item are:

StoreItem storeID storeName storeAddress storeTelephoneNo Item itemID itemDescription itemPrice ifOwnBrand

The many-to-many relationship in the class diagram is resolved by the table StoreItem, which is defined relationally as:

StoreItem storeID itemID Foreign: storeID → Store Foreign: itemID → Item

Notice that there are two Foreign clauses in the StoreItem -one for each of the relationships in figure 9.8. Although the StoreItem table has no attributes of its own, this need not necessarily be the case. An attribute of the resolving table might be qtyStockLevel -this belongs to neither Store nor Item but to them both and is therefore located in StoreItem. The primary key of the resolving entity StoreItem comprises the primary key of the contributing tables Store and Item. This is aprimary foreign key relationship because the keys of Store and Item contributes to the identity of the entity StoreItem. The primary foreign key relationship is shown diagrammatically by the relationship line entering the top of the entity box. Non-identifying relationships, such as that between Dept and Employee (e.g., figure 9.7), are shown entering the side of the box.

Developing Web Information Systems-0098

Implementing supertype and subtype classes

The basic relational model does not support inheritance and generalization/specialization directly. However, a supertype/subtype class structure can be approximated using one-to-one relationships (figure 9.9).

Developing Web Information Systems-0099

The table specification for figure 9.9 is as follows:

Staff employeeNumber name startDate dateOfBirth grade annualSalary GraphicDesigneremployeeNumber Foreign: employeeNumber → Staff TechnicalStaff employeeNumber ifCertified Foreign: employeeNumber → Staff

All three tables have the same key, employeeNumber. This is because they are all members of staff. An occurrence of GraphicDesigner or TechnicalStaff must have a corresponding occurrence in Staff. Assuming that it is acceptable for some members to be neither technical nor graphic designers then it is permissible for occurrences of Staff to exist without corresponding entries in the sub-tables. To access all the graphic designers all we need do is join the Staff and GraphicDesigner tables on their common column (employeeID).

There is no mechanism for aggregation in the relational model, which is modeled using one-to-many and one-to-one relationships.

9.4.3 Theatre booking system database design

We now bring together the threads and show how the theatre class model of chapter 8 can be represented as a logical database design.

Entity-relationship model

The entity relationship model for the Theatre is shown in figure 9.10. Where a relationship is identifying (primary foreign key) then the relationship line is shown entering the top of the entity box. For example, the logical key of a performance is a productionID and a performanceTime; as productionID, which comes from the Production entity, is part of the primary key of Performance it is a primary foreign key relationship. By contrast, theatreClubID is not part of the primary key of Member; it does not contribute to the identity of member and is therefore a foreign key relationship and is shown visually by the relationship line entering the side of the entity box.

Developing Web Information Systems-0100

Extra tables have been introduced to control the values that location and production Type can take on. The Location table is a list of locations that can be used to define where a theatre is allocated, e.g., Birmingham, Manchester, and Penzance. This table will ensure that only a valid location is entered against a theatre. The ProductionType table contains a list of production types, e.g., comedy, drama, and musical, to ensure that all productions can be searched reliably. In the absence of an agreed list of production types productions might be given types, such as comedy, humour, light entertainment, etc., making it difficult for customers to perform accurate searches.

The subtype relationship between Customer and Member has been implemented as a one-to-one relationship.

Entity and attribute list

The natural key of the table Performance is productionID plus performanceTime. However, with one eye on the UML class diagram and the other eye on implementation, each table will be allocated an artificial identifier that will be used to uniquely identify each occurrence of a table. This is analogous to an object identifier and helps provide a pseudo-OO dimension to the design - in a true OO implementation the object identifier, or OID, remains hidden from the user (including the system developer). Assigning a single field to act as an identifier is also a useful tactic for software development since any row in a table can be identified by a single value, thus avoiding the need to pass around long and complicated keys made up of multiple attributes. However, we must maintain the integrity of the database by ensuring that the 'real' key is enforced. Using an artificial identifier, such as an automatically incremented counter, means that it would be possible, for example, to add two rows to the performance table with the same productionID and performanceTime but different performanceIDs. This would allow the performance to be booked twice over, which would clearly be a major problem. To ensure this cannot happen all that needs to be done is for a unique index to be added to the Performance table to make the combination of productionID and performanceTime unique. This will ensure that a duplicate performance cannot be added while allowing us to use the object identifier performanceID for implementation purposes. Thus, performanceID and the combination of productionID plus performanceTime are alternate keys - either could serve as the primary key of the table. The Unique clause is used in the table definitions to show and enforce primary foreign key relationships while retaining the sense of an OO implementation with an object identifier (the ID attribute).

Therefore, each of the relationships that enters the top of the entity box in figure 9.10 is a primary foreign key relationship and will be accompanied by a Unique clause in the dependent table. The Member subtype of Customer is implemented with a one-to-one relationship and has an attribute 'user id' that would make a useful alternate key for the Member entity. The full entity attribute list for the Theatre database is as follows: ProductionType productionTypeID productionType Location locationID location Theatre theatreIDtheatreName webSiteUR locationID Foreign: locationID → Location Production ProductionproductionID productionTitle videoClip productionTypeID theatreID Foreign: productionTypeID →ProductionType Foreign: theatreID →Theatre Performance performanceID performanceTime productionID Foreign: productionID.→Production Unique: productionID, performanceTime PartOfTheatre partOfTheatreID theatreID partOfTheatreDesc maxSeatingCapacity Foreign: theatreID →Theatre Unique: theatreID,partOfTheatreDesc Seat seatID partOfTheatreID rowNumber seatNumber seatStatus Foreign: partOfTheatreID→PartOfTheatre Unique: partOfTheatreID, rowNumber, seatNumber PriceList priceListID partOfTheatreID productionID ticketPrice priceCurrency Foreign: partOfTheatreID →PartOfTheatre Foreign: productionID →Production Unique: partOfTheatreID, productionID TheatreClub theatreClubID clubName membershipFee clubDiscountPercent Customer customerID name address (o) telephoneNumber email Member customerID dateJoined userID password theatreClubID Foreign: theatreClubID→TheatreClub Foreign: customerID →Customer Unique: userID TicketTransaction ticketTransactionID ticketTransactionTime creditCardType creditCardNumber creditCardExpiryDate transactionTotalAmount transactionCurrency customerID Foreign: customerID →Customer SeatAtPerformanceseatAtPerformanceID performanceID seatID (o) ticketTransactionID priceAchieved Unique: seatID, performanceID Foreign: ticketTransactionlD→TicketTransaction Foreign: seatID→Seat Foreign: performanceID→Performance

9.4.4 Object-oriented databases

The technical design has assumed that the implementation will take place in a relational database environment. This is largely because relational database management systems (RDBMS) are the most-widely used form of DBMS today and are therefore the most likely environment that the reader will have ready access. A more consistent approach with the UML-based information modeling might have been to implement the Theatre system using an objectoriented DBMS. An OODBMS, such as GemStone, Ontos and ObjectStore, can store complex objects - objects that contain other objects, encapsulate behaviour, and have explicit support for inheritance (of attributes and behaviours). An OODBMS is often a suitable implementation environment for multi-media intensive applications and applications that rely heavily on complex objects (e.g., computer aided design). However, for business transaction processing and management reporting a relational DBMS is usually an appropriate choice.

Object-relational hybrids

Although at the extremes there are relational and OODBMS, in the middle ground hybrids are common - the 'object-relational' database. These DBMS, such as Oracle 8i, retain the power and simplicity of the relational DBMS but add facilities to store complex objects and to encapsulate behaviour.

9.4.5 Physical design

Figure 9.2 shows that design covers a spectrum of logical to physical design. The normalized database design presented here (figure 9.10) can be implemented as it is into any relational DBMS product. However, there are valid reasons for compromising the logical design through denormalization. The main reason for denormalization is to improve system performance. For example, if it is necessary to achieve sub-one second response times at peak periods of usage then some denormalization might be needed to reduce the number of database accesses and the number of tables that need to be updated when making online updates and queries. Other motivations for denormalization are archiving and security. In the interests of maintenance and simplicity denormalization is best avoided wherever possible - we shall implement the Theatre data model in its logical form.

Database transactions

Imagine that a web page is updating a record in the database for each of the four seats allocated to customer A pending payment. At the same time another customer, B, has been trying to reserve seats and has managed to lock onto one of the seats that is required by customer A. If the database accepts two of the three reservations but then fails on the third seat reservation then the system will be in an inconsistent state. This is likely to happen in a booking system with lots of users trying simultaneously to make ticket bookings and safeguards must be in place to handle it. In database terms a transaction is a logical grouping of database operations that must all succeed if the transaction is to be committed, i.e., written permanently to the database. If any part of the transaction fails, then all of the transaction is abandoned and the database is rolled back to the state it was prior to the transaction beginning to execute. Technical design should allow updates to be grouped into transactions and for the DBMS to instigate remedial action should a transaction fail.

Comments

Popular posts from this blog

The Conversion Cycle:The Traditional Manufacturing Environment

The Revenue Cycle:Manual Systems

Nassi-Shneiderman charts