Database Management Systems:The Relational Database Model
The Relational Database Model
E. F. Codd originally proposed the principles of the relational model in the late 1960s.2 The formal model has its foundations in relational algebra and set theory, which provide the theoretical basis for most of the data manipulation operations used.
From a purist’s point of view, a fully relational system is one that conforms to 12 stringent rules that Codd outlined.3 As a practical matter, however, not all of Codd’s rules are equally important. Some are critical, but some are not. Other theorists have, therefore, proposed less rigid requirements for assessing the relational standing of a system.4 Accordingly, a system is relational if it:
1. Represents data in the form of two-dimensional tables such as the database table, called Customer, shown in Figure 9-8.
2. Supports the relational algebra functions of restrict, project, and join.
These three algebra functions are examined in the following section.
Restrict: Extracts specified rows from a specified table. This operation, illustrated in Figure 9-9(a), creates a virtual table (one that does not physically exist) that is a subset of the original table.
Project: Extracts specified attributes (columns) from a table to create a virtual table. This is presented in Figure 9-9(b).
Join: Builds a new physical table from two tables consisting of all concatenated pairs of rows, from each table, as shown in Figure 9-9(c).
Although restrict, project, and join is not the complete set of relational functions, it is a useful subset that satisfies most business information needs.
RELATIONAL DATABASE CONCEPTS
In this section we review basic concepts, terminology, and techniques common to relational database systems. These building blocks are then used later in the chapter to design a small database from scratch.
Entity, Occurrence, and Attributes
An entity is anything about which the organization wishes to capture data. Entities may be physical, such as inventories, customers, or employees. They may also be conceptual, such as sales (to a customer), accounts receivable (AR), or accounts payable (AP). Systems designers identify entities and prepare a model of them like the one presented in Figure 9-10. This data model is the blueprint for ultimately
creating the physical database. The graphical representation used to depict the model is called an entity relationship (ER) diagram. As a matter of convention, each entity in a data model is named in the singu- lar noun form, such as Customer rather than Customers. The term occurrence is used to describe the number of instances or records that pertain to a specific entity. For example, if an organization has 100 employees, the Employee entity is said to consist of 100 occurrences. Attributes are the data elements that define an entity. For example, an Employee entity may be defined by the following partial set of attributes: Name, Address, Job Skill, Years of Service, and Hourly Rate of Pay. Each occurrence in the Employee entity consists of the same types of attributes, but values of each attribute will vary among occurrences. Because attributes are the logical and relevant characteristics of an entity, they are unique to it. In other words, the same attribute should not be used to define two different entities.
Associations and Cardinality
The labeled line connecting two entities in a data model describes the nature of the association between them. This association is represented with a verb, such as ships, requests, or receives. Cardinality is the degree of association between two entities. Simply stated, cardinality describes the number of possible occurrences in one table that are associated with a single occurrence in a related table. Four basic forms of cardinality are possible: zero or one (0,1), one and only one (1,1), zero or many (0,M), and one or many (1,M). These are combined to represent logical associations between entities. The value of the upper cardinalities at each end of the association line defines the association. For example, a (0,1) cardinality at one end and a (1,M) cardinality at the other is a (1:M) association. Figure 9-11 presents several examples of entity associations, which are discussed next.
EXAMPLE 1 (1:1). Assume that a company has 1,000 employees but only 100 of them are sales staff. Assume also that each salesperson is assigned a company car, but nonsales staff are not. Example 1 in Figure 9-11 shows that for every occurrence (record) in the Employee entity, there is a possibility of zero or one occurrence in the Company Car entity.
When determining the cardinality values in an entity association, select a single occurrence (record) of one entity and answer the following question: What are the minimum and maximum number of records that may be associated with the single record that has been selected? For example, selecting an Employee entity record and looking toward the Company Car entity, we see two possible values. If the selected Employee record is that of a salesperson, then he or she is assigned one (and only one) company car. This particular Employee record, therefore, is associated with only one record in the Company Car entity. If, however, the selected Employee record is that of a nonsalesperson, then the individual would be assigned no (zero) car. The Employee record in this case is associated with zero Company Car records. Thus, the minimum cardinality is zero and the maximum cardinality is one. A circle and a short line intersecting the line connecting the two entities depict this degree of cardinality. Notice that from the Employee entity perspective, the cardinality is shown at the Company Car end of the association line. Now select a Company Car record and look back at the Employee entity. Because each company car is assigned to only one employee, both the minimum and maximum number of associated records is one. Two short intersecting lines at the Employee end of the association line signify this cardinality.
EXAMPLE 2 (1:1). Example 2 illustrates a situation in which each record in one entity is always associated with one (and only one) record in the associated entity. In this case, each company laptop computer is assigned to only one manager, and every manager is assigned only one computer. Two short lines intersecting the connecting line at both ends depict this cardinality.
EXAMPLE 3 (1:M). Example 3 presents the relationship between Customer and Sales Order entities. Notice that the minimum number of Sales Order records per Customer record is zero and the maximum is many. This is because in any given period (a year or month) to which the Sales Order entity pertains, a particular customer may have purchased nothing (zero Sales Order records) or purchased several times (many records). From the perspective of the Sales Order entity, however, every record is associated with one and only one customer. The crow’s foot symbol (which gives this form of notation its name) depicts the many cardinalities at the Sales Order end of the association line.
EXAMPLE 4 (1:M). Example 4 represents a situation in which each specific item of inventory is sup- plied by one and only one Vendor, and each Vendor supplies one or many different inventory items to the company. Contrast this (1:M) association with Example 5 next.
EXAMPLE 5 (M:M). To illustrate the many-to-many association, we again use a Vendor and Inventory relationship in Example 5. This time, however, the company has a policy of purchasing the same types of inventory from multiple suppliers. Management may choose to do this to ensure that they get the best prices or avoid becoming dependent on a single supplier. Under such a policy, each Vendor record is associated with one or many Inventory records, and each Inventory record is associated with one or many Vendors.
Examples 4 and 5 demonstrate how cardinality reflects the business rules in place within an organization. The database designer must obtain a thorough understanding of how the client-company and specific users conduct business to properly design the data model. If the data model is wrong, the resulting data- base tables will also be wrong. Examples 4 and 5 are both valid but different options and, as we shall see, require different database designs.
ALTERNATIVE CARDINALITY NOTATIONS. The cardinality notation technique shown in Figure 9-11 is called crow’s foot. An alternative method is to write the cardinality values on each end of the association line connecting the two entities. Some database designers explicitly show both the upper and lower cardinality values. Some choose a shorthand version that notes only the upper cardinality. For homework assignments, your instructor will advise you of the preferred method to use. Refer to Figure 10-7 in the next chapter for examples of the alternative techniques.
The Physical Database Tables
Physical database tables are constructed from the data model with each entity in the model being trans- formed into a separate physical table. Across the top of each table are attributes forming columns. Intersecting the columns to form the rows of the table are tuples. A tuple, which Codd gave a precise definition when he first introduced it, corresponds approximately to a record in a flat-file system. In accordance with convention, we will use the term record or occurrence rather than tuple.
Properly designed tables possess the following four characteristics:
1. The value of at least one attribute in each occurrence (row) must be unique. This attribute is the primary key. The values of the other (nonkey) attributes in the row need not be unique.
2. All attribute values in any column must be of the same class.
3. Each column in a given table must be uniquely named. However, different tables may contain columns with the same name.
4. Tables must conform to the rules of normalization. This means they must be free from structural dependencies including repeating groups, partial dependencies, and transitive dependencies (see this chapter’s appendix for a complete discussion).
Linkages between Relational Tables
Logically related tables need to be physically connected to achieve the associations described in the data model. Using foreign keys accomplishes this, as illustrated in Figure 9-12. In this example the foreign keys are embedded in the related table. For instance, the primary key of the Customer table (CUST NUM) is embedded as a foreign key in both the Sales Invoice and Cash Receipts tables. Similarly, the primary key in the Sales Invoice table (INVOICE NUM) is an embedded foreign key in the Line Item table. Note that the Line Item table uses a composite primary key comprised of INVOICE NUM and ITEM NUM. Both fields are needed to identify each record in the table uniquely, but only the invoice number portion of the key provides the logical link to the Sales Invoice table. Foreign keys are not always embedded like those in Figure 9-12. The nature of the association between the related tables determines the method used for assigning foreign keys. These methods are examined later.
With foreign keys in place, a computer program can be written to navigate among the tables of the database and provide users with the data they need to support their day-to-day tasks and decision-making responsibilities. For example, if a user wants all the invoices for Customer 1875, the program will search the Sales Invoice table for records with a foreign key value of 1875. We see from Figure 9-12 that there is only one such occurrence—invoice number 1921. To obtain the line-item details for this invoice, the program searches the Line Item table for records with a foreign key value of 1921, and two records are retrieved.
User Views
A user view was defined earlier as the set of data that a particular user sees. Examples of user views are computer screens for entering or viewing data, management reports, or source documents such as an invoice. Views may be digital or physical (paper), but in all cases, they derive from underlying database tables. Simple views may be constructed from a single table, while more complex views may require several tables. Furthermore, a single table may contribute data to many different views.
A large organization will have thousands of user views that thousands of individual tables support. The task of identifying all views and translating them into normalized tables is an important responsibility of database designers that has internal control implications. The issues and techniques related to this task are examined next.
ANOMALIES, STRUCTURAL DEPENDENCIES, AND DATA NORMALIZATION
This section deals with why database tables need to be normalized. In other words, why is it necessary for the organization’s database to form an elaborate network of normalized tables linked together like
those illustrated in Figure 9-12? Why, instead, can we not simply consolidate the views of one user (or several) into a single common table from which all data needs may be met?
Database Anomalies
The answer to the above questions is that improperly normalized tables can cause DBMS processing problems that restrict, or even deny, users’ access to the information they need. Such tables exhibit negative operational symptoms called anomalies. Specifically these are the update anomaly, the insertion anomaly, and the deletion anomaly. One or more of these anomalies will exist in tables that are not normalized or are normalized at a low level, such as first normal form (1NF) or second normal form (2NF). To be free of anomalies, tables must be normalized to the third normal form (3NF) level.
We will demonstrate the negative impact of anomalies with the user view in Figure 9-13. This inventory status report would be used to provide a purchasing agent with information about inventory items to be ordered and the various suppliers (vendors) of those items. If this view were to be produced from a
single table, it would look similar to the one presented in Figure 9-14. While such a table could indeed store the data to produce the view, it will exhibit the anomalies mentioned above.
UPDATE ANOMALY. The update anomaly results from data redundancy in an unnormalized table. To illustrate, notice that Supplier Number 22 provides each of the three inventory items (Part Num 1, 2, and 3) shown in Figure 9-14. The data attributes pertaining to Supplier Number 22 (Name, Address, and Tele Num) are thus repeated in every record of every inventory item that Supplier Number 22 provides. Any change in the supplier’s name, address, or telephone number must be made to each of these records in the table. In the example, this means three different updates. To better appreciate the implications of the update anomaly, consider a more realistic situation where the vendor supplies 10,000 different items of inventory. Any update to an attribute must then be made 10,000 times.
INSERTION ANOMALY. To demonstrate the effects of the insertion anomaly, assume that a new vendor has entered the marketplace. The organization does not yet purchase from the vendor, but may wish to do so in the future. In the meantime, the organization wants to add the vendor to the database. This is not possible, however, because the primary key for the Inventory table is PART NUM. Because the vendor does not supply the organization with any inventory items, the supplier data cannot be added to the table.
DELETION ANOMALY. The deletion anomaly involves the unintentional deletion of data from a table. To illustrate, assume that Supplier Number 27 provides the company with only one item: Part Number 1. If the organization discontinues this item of inventory and deletes it from the table, the data pertain- ing to Supplier Number 27 will also be deleted. Although the company may wish to retain the supplier’s information for future use, the current table design prevents it from doing so.
The presence of the deletion anomaly is less conspicuous, but potentially more serious than the update and insertion anomalies. A flawed database design that prevents the insertion of records or requires the user to perform excessive updates attracts attention quickly. The deletion anomaly, however, may go undetected, leaving the user unaware of the loss of important data until it is too late. This can result in the unintentional loss of critical accounting records and the destruction of audit trails. Table design, therefore, is not just an operational efficiency issue; it carries internal control significance that accountants need to recognize.
Normalizing Tables
The database anomalies described above are symptoms of structural problems within tables called dependencies. Specifically, these are known as repeating groups, partial dependencies, and transitive dependencies. The normalization process involves identifying and removing structural dependencies from the table(s) under review. (For a full explanation of this process, please review the appendix to this chapter.) The resulting tables will then meet the two conditions below:
1. All nonkey (data) attributes in the table are dependent on (defined by) the primary key.
2. All nonkey attributes are independent of the other nonkey attributes.
In other words, a 3NF table is one in which the primary key of a table wholly and uniquely defines each attribute in the table. Furthermore, none of the table attributes is defined by an attribute other than the primary key. If any attributes violate these conditions, they need to be removed and placed in a separate ta- ble and assigned an appropriate key.
Upon examination of Figure 9-14, we see that the primary key PART NUM does not uniquely define the nonkey attributes of SUPPLIER NUMBER, NAME, ADDRESS, and TELE NUM. Instead, each unique primary key value is associated with multiple (repeating group) values for these nonkey attributes. To resolve this structural dependency, the repeating group data must be removed from the Inventory table and placed in a separate table, which we will call Supplier. Figure 9-15 shows the two 3NF tables, Inventory and Supplier, along with a third table called Part/Supplier, which links the two. This linking technique will be explained later.
Normalizing the tables has eliminated the three anomalies. First, the update anomaly is resolved because data about each supplier exist in only one location—the Supplier table. Any change in the data about an individual vendor is made only once, regardless of how many items it supplies. Second, the insert anomaly no longer exists because new vendors can be added to the Supplier table even if they are not currently supplying the organization with inventory. For example, Supplier Number 30 in the table does not supply any inventory items. Finally, the deletion anomaly is eliminated. The decision to delete an inventory item from the database will not result in the unintentional deletion of the supplier data because these data reside independently in different tables.
Linking Normalized Tables
When unnormalized tables are split into multiple 3NF tables, they need to be linked together via foreign keys so the data in them can be related and made accessible to users. The degree of association
(joint cardinality) between the resulting tables (that is, 1:1, 1:M, or M:M) determines how the linking occurs. The key-assignment rules for linking tables are discussed next.
KEYS IN 1:1 ASSOCIATIONS. Where a true 1:1 association exists between tables, either (or both) primary key may be embedded as a foreign key in the related table. On the other hand, when the lower cardinality value is zero (1:0,1) a more efficient table structure can be achieved by placing the one-side (1:) table’s primary key in the zero-or-one (:0,1) table as a foreign key. Using the Employee/Company Car example in Figure 9-11, we see the importance of this key-assignment rule. To illustrate, imagine reversing the rule by placing the Company Car (0 side) table’s primary key into the Employee (1 side) table. Because most employees are not assigned a company car, most of the foreign keys in the Employee table will have null (blank) values. While this approach would work, it could cause some technical problems during table searches. Correctly applying the key-assignment rule solves this problem because all Company Car records will have an employee assigned and no null values will occur.
KEYS IN 1:M ASSOCIATIONS. Where a 1:M (or 1:0,M) association exists, the primary key of the 1 side is embedded in the table of the M side. To demonstrate the logic behind this key-assignment rule, consider two alternative business rules for purchasing inventory from suppliers.
Business Rule 1. Each vendor supplies the firm with three (or fewer) different items of inventory, but each item is supplied by only one vendor.
To apply this rule, the designer will need to modify the Inventory table structure to include the Supplier Number as illustrated in Figure 9-16. Under this approach, each record in the Inventory table will now contain the value of the key field of the vendor that supplies that item. By contrast, Figure 9-17 shows what the table structure might look like if the designer reversed the key-assignment rule by embed- ding the PART NUM key in the Supplier table. Notice that the Supplier table now contains three part number fields each linking to an associated record in the Inventory table. Only the links to part numbers 1, 2, and 3 are shown. Although this technique violates the key-assignment rule, it would work. It does so, however, only because the upper limit of the many side of the association is known and is very small (that is, limited to three). How would this table structure look if we assume the following, more realistic business rule?
Business Rule 2. Each vendor supplies the firm with any number of inventory items, but each item is supplied by only one vendor.
This is a true 1:M association in which the upper limit of the many side of the association is unbounded. For instance, the vendor may supply one item of inventory or 10,000 items. How many fields must we add to the Supplier table structure to accommodate all possible links to the Inventory table? Here we can see the logic behind the 1:M key-assignment rule. The structure in Figure 9-16 still works under this business rule, whereas the technique illustrated in Figure 9-17 does not.
KEYS IN M:M ASSOCIATIONS. To represent the M:M association between tables, a link table needs to be created. The link table has a combined (composite) key consisting of the primary keys of two related
tables. Let’s now return to the table relationship depicted in Figure 9-15. These tables illustrate an M:M association that the following business rule describes:
Business Rule 3. Each vendor supplies the firm with any number of inventory items, and each item may be supplied by any number of vendors.
This business rule is evident by examining the contents of the Inventory Status Report (the user view) in Figure 9-13. Each part number shown has multiple suppliers, and each supplier may supply multiple items. For example, Ozment Supply provides items 1, 2, and 3. Harris Manufacturing also provides items 2 and 3.
An M:M association between tables requires the creation of a separate link table because embedding a foreign key within either table is not possible. The logic in the previous 1:M example that prevented us from embedding the primary key from the many side table of an unbounded association into the table of the one side applies here also. Neither table can donate an embedded foreign key to the other because both are on the many side of the association. The solution, therefore, is to create a new link table contain- ing the key fields of the other two tables.
The link table called Part/Supplier in Figure 9-15 is a table of foreign keys. It contains the primary keys for the records in the Inventory table (PART NUM) and the related Supplier table (SUPPLIER NUM- BER). Via the link table, each inventory item can be linked to the corresponding supplier of the item, and each supplier can be linked to the inventory items that it supplies. For example, by searching the Inventory table for PART NUM 1, we see that Suppliers 22, 24, and 27 provide this item. Searching in the opposite direction, SUPPLIER NUMBER 28 provides parts 2 and 3. A separate record in the link table represents each unique occurrence of a supplier/inventory association. For example, if Supplier 24 provides 500 different inventory items, then the link table will contain 500 records to depict these associations.
Accountants and Data Normalization
Database normalization is a technical matter that is usually the responsibility of systems professionals. The subject, however, has implications for internal control that make it the concern of accountants also. For example, the update anomaly can generate conflicting and obsolete database values, the insertion anomaly can result in unrecorded transactions and incomplete audit trails, and the deletion anomaly can cause the loss of accounting records and the destruction of audit trails. Although most accountants will not be responsible for normalizing an organization’s databases, they should have an understanding of the process and be able to determine whether a table is properly normalized.
Comments
Post a Comment