Distinguishing Features of Structured and Relational Databases

The Hierarchical Database Model

The earliest database management systems (DBMSs) were based on the hierarchical data model. This was a popular approach to data representation because it reflected, more or less faithfully, many aspects of an organization that are hierarchical in relationship. Also, it was an efficient data processing tool for highly structured problems. Figure 9-29 presents a data structure diagram showing a portion of a hierarchical database.

The hierarchical model is constructed of sets of files. Each set contains a parent and a child. Notice that File B, at the second level, is both the child in one set and the parent in another set. Files at the same level with the same parent are called siblings. This structure is also called a tree structure. The file at the most aggregated level in the tree is the root segment, and the file at the most detailed level in a particular branch is called a leaf.

A NAVIGATIONAL DATABASE

The hierarchical data model is called a navigational database because traversing it requires following a predefined path. This is established through pointers (discussed in the appendix to Chapter 2) that create explicit linkages between related records. The only way to access data at lower levels in the tree is from the root and via the pointers down the navigational path to the desired records. For example, consider the partial database in Figure 9-30. To retrieve an invoice line-item record, the DBMS must first access the customer record (the root). That record contains a pointer to the sales invoice record, which points to the invoice line-item record.

LIMITATIONS OF THE HIERARCHICAL MODEL

The hierarchical model presents a limited view of data relationships. Based on the proposition that all business relationships are hierarchical (or can be represented as such), this model does not always reflect reality. The following rules, which govern the hierarchical model, reveal its operating constraints:

1. A parent record may have one or more child records. For example, in Figure 9-30, the customer is the parent of both sales invoice and cash receipts.

2. No child record can have more than one parent.

The second rule is often restrictive and limits the usefulness of the hierarchical model. Many firms need a view of data associations that permit multiple parents like that represented by Figure 9-31(a). In this example, the sales invoice file has two natural parents: the customer file and the salesperson file. A specific sales order is the product of both the customer’s purchase activity and a salesperson’s selling efforts. Management, wishing to keep track of sales orders by customer and by salesperson, will want to view sales order records as the logical child of both parents. This relationship, although logical, violates the single parent rule of the hierarchical model. Figure 9-31(b) shows the most common way of resolving this problem. By duplicating the sales invoice file, two separate hierarchical representations are created. Unfortunately, this improved functionality is achieved at a cost—increased data redundancy. The network model, examined next, deals with this problem more efficiently.

The Network Database Model

The network model is a variation of the hierarchical model. The principal distinguishing feature between the two is that the network model allows a child record to have multiple parents. The multiple ownership

rule is flexible in allowing complex relationships to be represented. Figure 9-31(a) illustrates a simple net- work model in which the sales invoice file has two parent files—Customer and Salesperson.

The navigational DBMS dominated the data processing industry for many years, although the cur-

rent trend is overwhelmingly toward the relational model. Many hierarchical and network systems still exist and their manufacturers continue to maintain and upgrade them.

Data Structures

Data structures are the bricks and mortar of the database. The data structure allows records to be located, stored, and retrieved and enables movement from one record to another. Chapter 2 examined data structures commonly used by flat-file systems. These include sequential, indexed, hashing, and pointer structures. These basic structures form the foundation for the more complex hierarchical and network database structures discussed next.

HIERARCHICAL MODEL DATA STRUCTURE

Figure 9-32 shows the data structures and linkages between files for the partial database in Figure 9-30. Because the purpose is to illustrate the navigational nature of the data structure, the content of the records has been simplified.

Assume that a user of this system is using a query program to retrieve all the data pertinent to a particular sales invoice (Invoice Number 1921) for a customer John Smith (Account Number 1875). The access method used for this situation is the hierarchical indexed direct access method. Under this method, the root segment (customer file) of the database is organized as an indexed file. Lower-level records (for example, sales invoice, invoice line item, and cash receipts records) use pointers in a linked- list arrangement. This allows both efficient processing of the root records for tasks, such as updating AR and billing customers, and direct access of detail records for inquiries.

The access method retrieves the primary key—CUST NUM 1875—entered by the user via the query program and compares this against the index for the root segment. Upon matching the key with the index, it directly accesses John Smith’s customer record. Notice the customer record contains only summary information. The current balance figure represents the total dollar amount John Smith owes ($1,820). This is the difference between the sum of all sales to this customer and all cash received in payment on the account. The supporting details about these transactions are contained in the lower-level sales invoice and cash receipts records. In response to the user’s inquiry for data pertaining to Invoice Number 1921, the access method follows the invoice record pointer to the sales invoice file.

The sales invoice file is a randomly organized file of invoices for all customers arranged in a series of linked lists. The records in each linked list have a common property—they all relate to a specific cus- tomer. The first record in the list—the head record—has a pointer to the next in the list (if one exists), which points to the next, and so on. The pointer in the customer record (the level above) directs the access method to the head record in the appropriate linked list. The access method then compares the key value sought (Invoice Number 1921) against each record in the list until it finds a match. The records in the sales invoice file contain only summary information about sales transactions. Pointers in these records identify the supporting detail records (the specific items sold) in the invoice line-item file. The structure of the line-item file is also a linked-list arrangement. Starting with the head record, the access method retrieves the entire list of line items for Invoice Number 1921. In this list there are two records. The sales invoice and line-item records are returned to the user’s application for processing.

NETWORK MODEL DATA STRUCTURE

As with the hierarchical model, the network model is a navigational database with pointers creating explicit linkages between records. Whereas the hierarchical model allows only one path, the network model supports multiple paths to a particular record. Figure 9-33 shows the linkages for the data structure in Figure 9-31(a).

The structure can be accessed at either of the root-level records (salesperson or customer) by hashing their respective primary keys (SP NUM or CUST NUM) or by reading their addresses from an index.

A pointer field in the parent record explicitly defines the path to the child record, as discussed earlier. Notice the structure of the sales invoice file. In this example, each child now has two parents and contains explicit links to other records that form linked lists related to each parent. For example, Invoice Number 1 is the child of Salesperson Number 1 and Customer Number 5. This record structure has two links to related records. One of these is a salesperson (SP) link to Invoice Number 2. This represents a sale by Salesperson Number 1 to Customer Number 6. The second pointer is the customer (C) link to Invoice Number 3. This represents the second sale to Customer Number 5, which was processed this time by Salesperson Number 2. Under this data structure, management can track and report sales information pertaining to both customers and sales staff.

Normalizing Tables in a Relational Database

The database anomalies (described in the chapter) are symptoms of structural problems within tables called dependencies. Specifically, these are known as repeating groups, partial dependencies, and transitive dependencies. The normalization process involves systematically identifying and removing these dependencies from the table(s) under review. Figure 9-34 graphically illustrates the unnormalized table’s progression toward 3NF as each type of dependency is resolved. Tables in 3NF will be free of anomalies and will meet two conditions:

1. All nonkey attributes will be wholly and uniquely dependent on (defined by) the primary key.

2. None of the nonkey attributes will be dependent on (defined by) other nonkey attributes.

DESIGN THE USER VIEW

As Illustrated in Figure 9-34, the process begins with a user view such as an output report, a source document, or an input screen. Images of user views may be prepared using a word processor, a graphics

package, or simply pencil and paper. At this point, the view is merely a pictorial representation of a set of data the user will eventually have when the project is completed. To demonstrate the normalization process, we will use the customer sales invoice and sample data presented in Figure 9-35. The basic issue here is, can we store all the data needed for this view in a single table that meets the two conditions previously noted?

REPRESENT THE VIEW AS A SINGLE TABLE

The next step is to represent the view as a single table that contains all of the view attributes. Figure 9-36 presents a single-table structure containing the sample data from Figure 9-35. Because the table contains customer invoices, invoice number (INVOICE NUM) will serve as a logical primary key. Notice the attri- butes Ex Price and Total Due have been highlighted in Figure 9-36. The values for these attributes may be either stored or calculated. Because Ex Price is the product of two other attributes (Quantity 3 Unit Price) and Total Due is the sum of all Ex Price values, they both can be calculated from existing stored attributes rather than storing them directly in the database table. To simplify this example, therefore, we will assume that the system will calculate these attributes and they will not be part of this analysis.

Now that we have a base table to work from, the next few steps in the normalization process involve identifying and, if necessary, eliminating structural dependencies that exist. If dependencies exist, correcting them will involve splitting the original single-table structure into two or more smaller and independent 3NF tables. Each of the structural dependencies and the techniques for identifying and removing them is outlined in the following sections.

REMOVE REPEATING GROUP DATA

The first step in correcting structural dependencies is to determine if the table under review contains repeating groups. Repeating group data is the existence of multiple values for a particular attribute in a specific record. For example, the sales invoice in Figure 9-35 contains multiple values for the attributes PROD NUM, DESCRIPTION, QUANTITY, and UNIT PRICE (we ignore EX PRICE). These repeating groups represent the transaction details of the invoice. We see repeating group data in many business user views, such as purchase orders, receiving reports, bills of lading, and so on. Relational database theory prohibits the construction of a table in which a single record (a row in the table) represents multiple values for an attribute (a column in the table). To represent repeating group values in a single table, there- fore, will require multiple rows as illustrated in Figure 9-36. Notice that the invoice attributes, which are common to each occurrence of the repeating group data, will also be represented multiple times. For example, Order Date, Shipped Date, Customer Name, Customer Address, and so on, are recorded along with each unique occurrence of Prod Num, Description, Quantity, and Unit Price. To avoid such data redundancy, the repeating group data need to be removed from the table and placed in a separate table. Figure 9-37 shows the resulting tables. One is called Sales Invoice Table, with INVOICE NUM as the primary key. The second table contains the transaction details for the invoice and is called Line Item Table.

Notice that the primary key of the Line Item Table is a composite key comprising two attributes: INVOICE NUM and PROD NUM. Keep in mind that this table will contain the transaction details for our example invoice as well as the transaction details for the invoices for all customers. Relational

database theory requires that a table’s primary key uniquely identify each record stored in the table. PROD NUM alone cannot do this since a particular product, such as r234 (bolt cutter), may well have been sold to many other customers whose transactions are also in the table. By combining PROD NUM with the INVOICE NUM, however, we can uniquely define each transaction because the table will never contain two occurrences of the same invoice number and product number together.

REMOVE PARTIAL DEPENDENCIES

Next we check to see if the resulting tables contain partial dependencies. A partial dependency occurs when one or more nonkey attributes are dependent on (defined by) only part of the primary key, rather than the whole key. This can occur only in tables that have composite (two or more attribute) primary keys. Because the Sales Invoice Table has a single attribute primary key, we can ignore it in this step of

the analysis. This table is already in 2NF. The Line Item Table, however, needs to be examined further. Figure 9-38 illustrates the partial dependencies in it.

In the Line Item Table, INVOICE NUM and PROD NUM together define the quantity sold attribute (Qunty). If we assume, however, that the price charged for r234 is the same for all customers, then the Unit Price attribute is common to all transactions involving product r234. Similarly, the attribute Description is common to all such transactions. These two attributes are not dependent on the Invoice Num component of the composite key. Instead, they are defined by Prod Num and, therefore, only partially rather than wholly dependent on the primary key.

We resolve this by splitting the table into two, as illustrated in Figure 9-38. The resulting Line Item Table is now left with the single nonkey attribute Qunty. Product description and unit price data are placed in a new table called Inventory. Notice that the Inventory table contains additional attributes that do not pertain to this user view. A typical inventory table may contain attributes such as reorder point, quantity on hand, supplier code, warehouse location, and more. This demonstrates how a single table may be used to support many different user views and reminds us that this normalization example pertains to only a small portion of the entire database. We will return to this issue later.

At this point, both of the tables in Figure 9-38 are in 3NF. The Line Item Table’s primary key (INVOICE NUM PROD NUM) wholly defines the attribute QUNTY. Similarly, in the Inventory Table, the attributes Description and Unit Price are wholly defined by the primary key PROD NUM.

REMOVE TRANSITIVE DEPENDENCIES

The final step in resolving structural dependencies is to remove transitive dependencies. A transitive dependency occurs in a table where nonkey attributes are dependent on another nonkey attribute and independent of the table’s primary key. An example of this is illustrated by the Sales Invoice Table in Figure 9-39. The primary key INVOICE NUM uniquely and wholly defines the economic event that the attributes Order Date, Shpd Date, and Shpd Via represent. The key does not, however, uniquely define the customer attributes. The attributes Cust Name, Street Address, and so on define an entity (Customer) that is independent of the specific transaction captured by a particular invoice record. For example, assume that during the period the firm had sold to a particular customer on ten different occasions. This would result in ten different invoice records stored in the table. Using the current table structure, each of these invoice records would capture the data uniquely related to the respective transaction along with customer data that are common to all ten transactions. Therefore, the primary key does not uniquely define customer attributes in the table. Indeed, they are independent of it.

We resolve this transitive dependency by splitting out the customer data and placing them in a new table called Customer. The logical key for this table is CUST NUM, which was the nonkey attribute in the former table on which the other nonkey customer attributes were dependent. With this dependency resolved, both the revised Sales Invoice Table and the new Customer Table are in 3NF.

LINKING THE NORMALIZED TABLES

At this point the original single-table structure has been reduced to the four normalized but independent tables presented in Figure 9-40. The tables contain the sample data used in the original single-table structure presented in Figure 9-36. Notice how data redundancy in the original single-table structure has been eliminated from the more efficient structure represented here. To work together, however, these tables need to be linked via foreign keys. This requires first determining the cardinality (degree of association) between the tables and then assigning foreign keys.

Determine Cardinality

In our example the cardinality between the four tables is one-to-many (1:M) as explained below.

1. Each customer (Customer Table) may be associated with one or many sales events (Sales Invoice Table), but each invoice is for a single customer.

2. Each Sales Invoice record is associated with one or more Line-Item records, but each Line Item is associated with only one Sales Invoice.

3. Each Inventory record is associated with one or more Line Items (a particular product has been sold many times to many customers) but each Line-Item record represents only one inventory item.

Assign Foreign Keys

Rules assigning foreign keys are explained in detail in the chapter. When linking in a 1:M relation as depicted in Figure 9-40, the rule is to take the primary key from the table on the 1 side of the relation and embed it as a foreign key in the table of the M side. Notice that in the relations between the Line Item Table, the Invoice Table, and the Inventory Table, this is already the case because of the Line Item Table’s composite key. The Sales Invoice Table, however, needs to be modified to include CUST NUM as the FOREIGN KEY, which links it to the Customer Table.

PRODUCING THE USER VIEW FROM THE NORMALIZED TABLES

After these tables have been created within the DBMS, they will be populated with data from several sources. For example, customer services will add customers’ data to the Customer Table, inventory

control will enter product values into the Inventory Table, and the Sales Invoice and Line Item tables will be populated by sales transactions data from the Sales Order process. The following steps describe how the batch process might produce the actual invoices.

1. A computer program reads the Sales Invoice Table. We will assume that the first record read is invoice number 16459 (our sample record). The record attributes are stored in memory.

2. The program then reads the foreign key CUST NUM and searches the Customer Table for a record with the primary key value 1765. The customer record attributes are then stored in memory.

3. The computer program then reads the primary key INVOICE NUM and searches the Line Item Table for all occurrences of records whose INVOICE NUM component of the primary key has a value of 16459. It locates three such records and stores them in memory.

4. The program next reads the PROD NUM component of the three Line-Item records and one by one searches the Inventory File for each occurrence. The DESCRIPTION and UNIT PRICE attributes of each located record is stored in memory.

5. The program then calculates the EX PRICE attribute for each item and sums these to obtain the TOTAL DUE attribute.

6. At this point all the attributes needed to produce the original user view are in memory. They are for- matted and sent to printing.

7. The computer then clears memory, reads the next record from the Invoice Table, and repeats the steps above until all invoices have been processed.

Search This Blog

information systems course