Operation and maintenance:Database administration

Database administration

82.1 Purpose

The purpose of database administration is to provide reliable, consistent, secure, and available corporate-wide data. This chapter discusses the roles performed by database administration, distinguishes database administration and data administration, and describes several database operation and maintenance issues.

82.2 Strengths, weaknesses, and limitations

Where appropriate, the strengths and weaknesses of various techniques will be discussed in context.

82.3 Inputs and related ideas

Database administration uses such tools as CASE (Chapter 5), the data dictionary (Chapter 25), entity-relationship models (Chapter 26), and data normalization (Chapter 28). Database design is discussed in Chapter 45. System controls are discussed in Chapter 77. Several key data concepts are introduced in Chapters 43 and44.

82.4 Concepts

The purpose of database administration is to provide reliable, consistent, secure, and available corporate-wide data. This chapter discusses the roles performed by database administration, distinguishes database administration and data administration, and describes several database operation and maintenance issues.

82.4.1 Database administration roles

Although database administration (DBA) means different things to different organizations, the overall objective is to achieve centralization and control of the corporation’s data resource.

82.4.1.1 Planning

Database administration is responsible for developing a strategy for the control and use of corporate data and for developing, implementing and supporting standards, guidelines, and tools that are consistent with the strategy. The planning task includes performing strategic planning for managing and centralizing data; evaluating, selecting, and implementing a database management system (DBMS) and related hardware and tools; and ensuring that priorities for application development are consistent with the long range data strategy.

82.4.1.2 Design

Both conceptual and physical database design (Chapter 45) are database administration (DBA) responsibilities. The DBA group provides technical and design support for application developers and controls the conceptual design to ensure that it is consistent with and integrated into the corporate data strategy. Physical database design requires making effective use of the database management system and related tools. Additionally, the DBA group establishes corporate guidelines and standards to ensure data consistency across applications.

82.4.1.3 Operation and control

The database administration group supports existing operational systems and is responsible for data integrity, security, availability, and the performance of the database management system. Database administration also provides education and support for the use of the data standards, guidelines, and tools.

82.4.1.4 Usage

Database administration provides documentation and support for using data by developing and enforcing standards for database content and use. Data dictionaries (Chapter 25) and CASE (Chapter 5) are used to develop documentation, enforce standards, and support design. Tools such as query languages also fall under DBA control. Knowledge, education, and support for all database-related software are other DBA responsibilities.

82.4.2 Data administration

The management side of the database administration group is sometimes called data administration (DA). The data administration people are the policy setters and decision makers responsible for the corporate data. They provide an executive level vision of the corporate data needs. The broader database administration group includes the technicians who implement and enforce the decisions and standards the data administration people put in place.1 Data administration tends to deal with the conceptual side of the data and database administration deals with the physical side.

Whether or not this break is formalized, database administration calls for a blend of management, technical, and interpersonal skills. Data processing skills are rarely used directly, but the database administration group must have an appreciation for them. Technical skills are required to provide ongoing support for the use of the data model. Interpersonal skills are required because database administration must resolve across functional boundaries any conflicts that concern how the data will be stored and presented. Data administration calls for stronger interpersonal skills to work with the users during logical design.

82.4.3 Operation and maintenance issues

This section covers several database administration operation and maintenance standards, tools, utilities, and methods.

82.4.3.1 Implementation guidelines

Implementation guidelines define how applications will be developed and installed with respect to data. The database administration role is primarily physical database implementation and support, but some of the coding effort may be defined and enforced by the DBA group.

82.4.3.2 Error handling

Database administration is responsible for the error-handling routines that protect the integrity of the database. Techniques for handling errors may require common error routines to ensure databases are rolled back to a point of consistency in the event of an error. Typically, all transactions running in a database management system environment are logged on a transaction log and the database is not permanently altered until the transaction is run to completion or committed. Error handling becomes more complex if more than one database management system is involved in a transaction. Techniques such as two-phase commits may be employed. Two-phase commits synchronize the committing of transaction changes across multiple database management systems to ensure that the transaction is either run to completion or is rolled back on all of the systems.

82.4.3.3 Security

Security is a means of ensuring data integrity and protecting confidential information by controlling who has the authority to view, alter, or delete the data in a database. The ultimate responsibility falls on database administration, but the ability to grant authority can be delegated to other individuals as long as DBA maintains control. Security can be implemented in the database management system itself or controlled at the application level.

82.4.3.4 Backup and recovery

The responsibility for designing and implementing backup and recovery procedures rests with database administration. Data can be lost in many ways, including program errors, hardware failures, incorrect changes, and bad input data. A backup copy is used to restore a table of data to the point in time when the copy was made. Recovery procedures help to ensure that the recovered data are correct.

Backup copies of database tables and transaction log files are the primary inputs for backup and recovery. All changes of any type to the database are stored on the log. With a backup copy of a table and the contents of the log since that copy was made, a table can be recovered to the point where the failure occurred. To avoid recovering “bad” transactions, a well-designed recovery procedure allows the system to be recovered to any point in the log.

Synchronization is an important issue because simply recovering a table does not mean that it is in synch with other related tables in the database. For instance, a given table could contain data that are related to other tables, because deleted data in other tables or the recovered data could contain broken relationships that point to non-existing data. The only way to ensure data integrity is to recover the complete set of relationally connected tables (a referential group) to a consistent or current point.

Points of consistency can be established in two ways. The first is to disable the database and backup the entire referential group. The database must be disabled to ensure that no tables are changed from the time the backup process starts until it is complete. Disabling the database while backups are run is not always acceptable, however, because some applications cannot afford the downtime.

Some database management systems provide a means for establishing a synch point for a set of tables. As transactions are processed against the tables, new transactions are held until the entire set of tables is free from processing. At this time, all the tables are guaranteed to be in synch, and the synch point is recorded on the transaction log. Although the act of establishing a synch point makes the tables unavailable for a time, the duration is very short and, if run during off-peak hours, not noticeable.

Synch points do not eliminate the need for backup; they simply eliminate the need to take the database off-line to back it up. Backup copies should still be taken prior to establishing the synch point. Recovery time will then be short, because only a small piece of the log will be needed.

Another option available on some database management systems is to take incremental backups of tables. An incremental backup copy contains only the data that have changed since the last full or incremental backup. Recovery then starts with the latest full copy, adds any incremental changes, and uses the log from that point.

Note that the log is a finite file with a defined upper limit on its size, so it must periodically be cleared so that changes can continue to be recorded. The existing log information cannot be destroyed unless all the tables in the database have been backed up since the log was previously cleared, but the log may become full at any time. To get around these problems, the existing log file is typically archived or backed up and then reset to empty. Subsequently, the archived log files can be used should recovery become necessary.

The goal is to create a complete backup and recovery plan that . . .minimizes downtime (for taking backups) and also minimizes recovery complexity and time. Backups can be scheduled, perhaps (depending on application requirements) when the application is not running. Recovery however must be performed whenever the need arises (usually at the worst possible time). The issue becomes more complex when the data are centralized and multiple applications access a single, corporate-wide database.

82.4.3.5 Concurrency control

Concurrency issues arise when multiple users can access a database simultaneously. If precautions are not taken, lost updates, uncommitted dependencies, and inconsistent analysis problems can arise, all of which affect database integrity.

82.4.3.5.1 Locking

The solution to these multi-user pitfalls is locking. It is a simple concept. When the data must remain consistent during the entire execution of a transaction, the application routine locks the relevant data, thus denying other transactions the ability to change the data. The transaction can then be processed to completion and concurrency-related problems will be eliminated.1

Locking can be performed at many levels or granularities (e.g., the entire database, individual tables, physical pages of data within tables, or individual rows). Higher granularity (i.e., locking a bigger object) implies lower lock maintenance overhead, but it also implies a lower level of concurrency. In other words, granularity and concurrency are tradeoffs. In general, choose the highest level of granularity that will support the required level of concurrency. Some database management systems provide the ability to control the duration that locks are held by transactions, thus allowing the level of concurrency to be increased in some situations.

82.4.3.5.2 Optimistic concurrency

A transaction that maintains locks while information is displayed for update is considered to be a conversational transaction. On such transactions, there is active communication with the database management system and the appropriate locks are maintained for the duration of the transaction. Locking may be the only viable alternative, but it can cause problems.

Consider, for example, the following scenario. A bank teller has just begun updating a customer’s address when he or she is interrupted for some reason. The appropriate data have already been accessed, and locks are being held against the data displayed on the screen. If the locking level is coarse (e.g., a page or a table), then locks are also being held against other customers and no information can be updated for those other customers. If the teller is distracted for a long period of time, other tellers and customers will be inconvenienced.

An alternative is to break the transaction into two transactions, one that reads the information and displays it (TRANR) and another that interprets data changes and updates the database (TRANU). An exposure exists between the time the data are displayed and subsequently updated because once the locks are released, another transaction might update the information. This could lead to lost updates when TRANU writes old information over a new change.

The best way to avoid the problem is to determine if the data were updated after the locks were released by TRANR. Typically, all update transactions are required to maintain in the database row-level timestamps indicating the time of the most recent change. Copies of the rows that were displayed by TRANR are maintained in memory. TRANU reads updated timestamps from the rows it wants to change and compares the timestamps to the copies in memory. If the timestamps match the ones in memory, then no changes were made and the updates can be performed. If there is a conflict, then the application must resolve it, possibly by redisplaying the changed row(s) and asking the user to resubmit the changes.

This method of concurrency control, known as optimistic concurrency, adds overhead to the system but greatly improves the concurrency potential. Once again, database administration is faced with a tradeoff, and the method chosen will vary from application to application based on requirements.

82.5 Key terms
Backup and recovery —
Logging transactions and database changes, periodically making backup copies of databases, and recovering databases in the event of a failure or loss of data.
Checkpoint —
A point at which the database and the transaction logs are physically synchronized.
Concurrency control —
Ways of preventing data loss or data corruption when multiple transactions can be updating a database at the same time.
Conversational transaction —
A transaction that maintains locks while information is displayed for update.
Data administration (DA) —
An administrative function charged with the overall responsibility for data resources in an organization.
Data integrity —
The state of a database that is protected against loss or contamination; data integrity is ensured by carefully controlling and managing data entry, data maintenance, and data access from the time the data first enter the system until they are of no further use.
Database administration (DBA) —
A technical group charged with the physical functioning of an organization’s databases, including backup and recovery, performance, and security enforcement.
Database management system (DBMS) —
A software package that provides a means to define, maintain, control, and administer a database and its applications.
Deadlock —
A situation that occurs when two or more transactions are waiting for a resource that the other transaction has locked; both transactions will hold their locks waiting for the needed resource to be unlocked.
Granularity —
The level of locking; for example, the entire database, individual tables, physical pages of data within tables, or individual rows.
Incremental backup copy —
A backup copy that contains only the data that have changed since the last full or incremental back up was performed.
Locking —
The process of allocating control over a database resource to a specific transaction to avoid problems associated with concurrent use.
Log —
A file containing a record of all database changes.
Optimistic concurrency —
Breaking a transaction into two transactions, one that reads the information and displays it, and another that interprets data changes and updates the database, and releases locks between those transactions. The system is optimistic that no other users will change information between the read and update transactions. If another user does change the information, the update transaction is lost.
Referential group —
A complete set of relationally connected tables.
Security —
Steps taken to protect a database from unauthorized access.
Synch point —
A point at which a complete set of relationally connected tables (a referential group) is consistent.
Synchronization —
The act of ensuring that a complete set of relationally connected tables (a referential group) is consistent.
Transaction —
The sequence of steps required to carry out an event about which data are recorded or processed.
Two-phase commit —
A process of committing changes across multiple database management systems. In the first phase each DBMS votes on whether they can commit a transaction. If all vote yes, the transaction is committed on all. If any vote no, the transaction is rolled back on all.
82.6 Software

Many CASE products (Chapter 5), such as Texas Instrument’s Information Engineering Facility (IEF), include software to provide integrated support for both database and application design. Other tools such as automated data dictionaries (Chapter 25) provide support as well. Most full functioned database management systems provide a set of utilities or commands that support the creation, maintenance, control, and administration of data-bases, including back up and recovery. Access, Paradox, dBase, Filemaker Pro, Approach, 4th Dimension, and Alpha Four are popular microcomputer database management programs. Examples of mainframe database software include DB2, IDMS, and ORACLE.

82.7 References
1.  Date, C. J., An Introduction to Database Systems, Vol. 1, 6th ed., Addison-Wesley, Reading, MA, 1994.
2.  Hansen, G. W. and Hansen, J. V., Database Management and Design, Prentice-Hall, Upper Saddle River, NJ, 1996.
3.  Kroenke, D. M., Database Processing: Fundamentals, Design, and Implementation, Prentice-Hall, Upper Saddle River, NJ, 1995.
4.  McFadden, F. R. and Hoffer, J. A., Modern Database Management, Benjamin/Cummings, Redwood City, CA, 1994.
5.  Watson, R. T., Data Management, an Organizational Perspective, John Wiley & Sons, New York, 1996.
6.  Weldon, J. L., Data Base Administration, Plenum Press, New York, 1981.

Comments

Popular posts from this blog

The Conversion Cycle:The Traditional Manufacturing Environment

The Revenue Cycle:Manual Systems

HIPO (hierarchy plus input-process-output)