Database design

Database design

45.1 Purpose

45.2 Strengths, weaknesses, and limitations

This chapter discusses several database design principles. Relevant strengths, weaknesses, and limitations will be discussed in context.

45.3 Inputs and related ideas

The primary inputs to database design are the user’s views of the information as defined during the problem definition (Part II) and analysis (Part IV) stages of the system development life cycle. In addition to the existing views, new information requirements may be included in the system design.

Key data concepts are defined in Chapters 25, 43, and 44. Commonly used database design tools and techniques include data dictionaries (Chapter 25), entity-relationship models (Chapter 26), and data normalization (Chapter 28). Additionally, information concerning the application design is also relevant. In particular, database design influences or is influenced by hardware interface design (Chapter 42) and software design (Chapters 62, 66, 67, and 69).

45.4 Concepts

Database design is the process of gathering data requirements for an organization, a business process, or a proposed information system, and transforming these requirements into a set of specifications that can be used to create a database. The goal of database design is not only to provide the ability to capture all necessary information, but to organize that information in an efficient and usable way. To do so requires knowledge about what the information is and how it will be used, as well as information about the computing environment in which the database will be implemented. The computing environment typically includes a database management system, a network, and the computing platforms upon which the information system will be installed.

45.4.1 Database structures

The simplest database structure is called a flat-file database. All the data in a flat-file database are stored in a single, spreadsheet-like table that is not linked with any other files. The advantage is simplicity. Although technically not databases, flat files are fine for many personal computer applications.

In a hierarchical database, the file links form a hierarchy; for example, Figure 45.1 shows a student name and address file with links or pointers to an academic file, a financial file, and an activity file. In this example, the name and address file is the parent and the other files are its children. Database access starts at the top of the hierarchy and flows downward, so it is possible to access a student’s academic file starting from his or her name and address record, but not vice versa. Note that a parent can have many children, but a child can have only one parent.

Figure 45.1 A hierarchical database.

In a network database (Figure 45.2), the links or pointers can describe relationships between any two files in any direction, so a child can have many parents. For example, if the name and address file contains a link to the academic file and the academic file also contains a link to the name and address file, a given student’s grade report might be prepared starting with either file. Because links are relatively easy to add to a data structure, the distinction between hierarchical and network databases has practically disappeared.

The files that form a relational database are best visualized as two-dimensional tables that resemble spreadsheets. In a given file, each column holds values of a single field (or attribute) and each row holds a single record (a single occurrence of the entity). Files are linked by pointers or, more generally, relationships. The relational model has become a de facto standard, and data normalization (Chapter 28) suggests logical data structures that are compatible with a relational database.

Figure 45.2 A network database.

As the name implies, an object-oriented database holds objects (Chapter 6) and defines the links or relationships between them. An object includes both the data and the procedures to manipulate the data. The objects can be traditional data oriented objects, but often are more complex. For example, an object-oriented database can define and hold such objects as sounds, videos, graphics, and spreadsheets.

45.4.2 Database administration

The responsibility for database design often resides in a database administration group. The database administrators must work closely with the application designers throughout the system design process. Maintaining the integrity of the database is another database administration responsibility. See Chapter 82 for additional details about this group.

45.4.3 Database design methodologies

Many organizations have a formal process for designing application databases. Because database design is a key part of information system design, the database design methodology is generally integrated into the design phase of the organization’s application development methodology. Although the specific steps and tools vary considerably from organization to organization (and even across a given organization), the process is generally divided into two primary stages: logical (or conceptual) database design and physical database design.

Changes in the physical database design can affect the logical design. Consequently, these two steps are tightly integrated, with numerous feedback cycles. Note, however, that the application design and the logical database design must be completed before the physical database design can be finalized.

45.4.3.1 Logical database design

Logical or conceptual database design is concerned with defining and documenting the database in user terms. The objective is to formally define the user’s understanding of the data and how the various data elements and composites are interrelated. This phase is geared to a non-technical audience. From the user’s standpoint, it is the database design.

The process starts with a study of the user’s data views and data uses in the context of the application. Following the organization’s standards, data elements and composites are defined, named, and documented in the data dictionary (Chapter 25). Additionally, the relationships between the entities are studied and documented, often using such tools as entity relationship models (Chapter 26) and/or data normalization (Chapter 28). The output of the logical design process is a set of detailed documentation for all the data that will be stored in the database and for the interrelationships between those data.

45.4.3.2 Physical database design

The objective of physical database design is to produce a blueprint for physically implementing the database. (The resulting documentation is intended for a technical audience.) In addition to the logical database design, inputs to this stage also include key elements of the application design. The logical database design defines the data requirements. The application design defines how the data will be used. The goal is to design a physical database that implements the logical design and efficiently provides for all appropriate uses of the data.

The physical database design process starts with transaction analysis, a study of expected usage levels associated with the various application functions. Future users are usually involved in this process because they know how much work they do. The application design identifies points where the application accesses data, and these data interfaces (or transactions) provide a focus for transaction analysis. Generally, statistical information about the level of use is collected for each transaction type.

Based on these usage levels, decisions about how the data will be physically implemented are made. Note that high volume portions of the database receive more attention than those accessed less frequently. One option is to index high volume information for quicker access. Indexes improve performance, but maintaining them adds to system overhead.

Another physical database design requirement is ensuring the integrity of the database. Many database management systems provide features that help ensure database integrity, and integrity enforcement can also be implemented through the application software. The physical database design describes the methods that will be used for enforcing data integrity.

45.4.3.3 Modifying the logical design

Because physical design decisions can affect the logical design, logical and physical database design are not independent. Consequently, logical and physical database design can be viewed collectively as an iterative process with numerous feedback loops.

For example, based on a transaction analysis, it might be necessary (for performance reasons) to store a small percentage of the data redundantly. Altering the logical database design to include the redundant data is called denormalization. Like indexing, denormalization can improve performance, but at the cost of more overhead.

Most applications include information that is calculated from other information in the database; a sales commission (a percentage of sales revenue) is a good example. A choice must be made between physically storing such values in the database or calculating them as needed. This decision is based on how often the computed value is needed and how much information is needed to calculate it. Note that storing the computed value means computing it only once and subsequently reading it on demand. Stored calculated fields must be added to the logical database design.

The application design can also be altered to avoid projected performance problems. For example, it is not unusual for a large, high frequency transaction to provide more information than is needed for most of its invocations. Such transactions might be broken into a primary transaction that accesses only the data needed most of the time and a supplemental transaction that accesses the remaining information only when needed.

45.4.3.4 Schema and subschema

Finally, consider the database’s schema and subschema. The database’s physical contents are defined in a schema, a general description of the entire database that shows all the record types and their relationships. A large database might contain scores of files, but most users will access only a few of those files. To minimize the risk of a user accidentally accessing confidential data or changing data in an unneeded file, custom subschema that include only those records and relationships needed by a particular user or class of users can be defined. Clearly, the subschema must be consistent with the user’s logical view of the data. Consequently, defining the schema and the subschema affects both logical and physical database design.

45.5 Key terms

Child —: A lower-level record in a hierarchical database structure.
Composite —: A set of related data elements.
Conceptual database design —: See logical database design.
Data administration —: The administrative function charged with the overall responsibility for data resources in an organization.
Data dictionary —: A collection of data about the data.
Data element —: An attribute that cannot be logically decomposed.
Data structure —: A set of related data elements.
Database —: A collection of interrelated and shared data of different types organized into a structure that minimizes redundancies and enhances the manipulation of the data; generally, a set of related files.
Database administration —: The technical function charged with physically managing an organization’s databases, including such issues as backup and recovery, performance, and security enforcement.
Database integrity —: The state of a database that is protected against loss or contamination.
Database Management System (DBMS) —: A software package that provides the means to define, maintain, control, and administer a database and its applications; a set of software routines that define the rules for creating, accessing, and maintaining a database.
Denormalization —: Altering the logical database design to include redundant data.
Entity —: An object (a person, group, place, thing, or activity) about which data are stored.
Filter —: A set of logical conditions used to screen records in a query.
Flat-file database —: A database (more accurately, a file) in which all the data are stored in a single, spreadsheet-like table that is not linked with any other files.
Hierarchical database —: A database in which the file links (or relationships) form a hierarchy.
Index —: A list of the keys and physical locations of each record in a file.
Logical database design —: The database design stage concerned with defining and documenting the database in user terms.
Network database —: A network in which the links or pointers can describe relationships between any two files in any direction, so a child can have many parents.
Parent —: A higher-level record in a hierarchical database structure.
Physical database design —: The database design stage during which a blueprint for physically implementing the database is produced.
Query —: A question; usually, a request for data or information.
Relation —: A table (analogous to a file) in a relational database.
Relational database —: A database in which the files (or relations) are visualized as two-dimensional tables with each column holding values of a single field (or attribute) and each row holding a single record (a single occurrence of the entity). The files are linked by pointers or, more generally, relationships.
Relationship —: A link between two data structures or entities.
Schema —: A general description of the entire database that shows all the record types and their relationships.
Subschema —: A subset of the schema that includes only those records and relationships needed by a particular user or class of users.
Transaction —: The sequence of steps required to carry out an event about which data are recorded or processed; also, a single occurrence of a business activity; for example, a single sale or the receipt of a single order from a supplier.
Transaction analysis —: A study of expected usage levels associated with the various application functions.
View —: A subset of the database that includes only selected fields from the records that meet a set of conditions defined in a logical filter.

45.6 Software

Many CASE products (Chapter 5), such as Texas Instrument’s Information Engineering Facility (IEF), include software to provide integrated support for both database and application design. Other tools such as automated data dictionaries (Chapter 25) provide support as well. Access, Paradox, dBase, Filemaker Pro, Approach, 4th Dimension, and Alpha Four are popular microcomputer database management programs. Examples of mainframe database software include DB2, IDMS, and ORACLE.

Most database management systems include a data definition language (DDL) for specifying file structures, relationships, schema, and subschema. Once the database is created, its contents are accessed and maintained using a data manipulation language (DML). Most database management systems incorporate a query language (such as Structured Query Language, or SQL) and a report generator. The database administrator uses a data control language (DCL) to perform such activities as backing up files, keeping track of user names and passwords, and monitoring system performance.

45.7 References

1. Date, C. J., An Introduction to Database Systems, vol. 1, 6th ed., Addison-Wesley, Reading, MA, 1994.

2. Hansen, G. W. and Hansen, J. V., Database Management and Design, Prentice-Hall, Upper Saddle River, NJ, 1996.

3. Kroenke, D. M., Database Processing: Fundamentals, Design, and Implementation, Prentice-Hall, Upper Saddle River, NJ, 1995.

4. McFadden, F. R. and Hoffer, J. A., Modern Database Management, Benjamin/Cummings, Redwood City, CA, 1994.

5. Watson, R. T., Data Management, an Organizational Perspective, John Wiley & Sons, New York, 1996.

Search This Blog

information systems course