Database Management Systems:Databases in a Distributed Environment.
Databases in a Distributed Environment
Chapter 1 introduced the concept of distributed data processing (DDP) as an alternative to the centralized approach. Most modern organizations use some form of distributed processing and networking to process their transactions. Some companies process all of their transactions in this way. An important consideration in planning a distributed system is the location of the organization’s database. In addressing this issue, the planner has two basic options: databases can be centralized, or they can be distributed. Distributed databases fall into two categories: partitioned and replicated databases. This section examines issues, features, and trade-offs that should be carefully evaluated in deciding how databases should be distributed.
CENTRALIZED DATABASES
Under the centralized database approach, remote users send requests via terminals for data to the central site, which processes the requests and transmits the data back to the user. The central site performs the functions of a file manager that services the data needs of the remote users. The centralized database approach is illustrated in Figure 9-24.
Earlier in the chapter, three primary advantages of the database approach were presented: the reduction of data storage costs, the elimination of multiple update procedures, and the establishment of data cur- rency (that is, the firm’s data files reflect accurately the effects of its transactions). Achieving data currency is critical to database integrity and reliability. However, in the DDP environment, this can be a challenging task.
Data Currency in a DDP Environment
During data processing, account balances pass through a state of temporary inconsistency, in which their values are incorrectly stated. This occurs during the execution of any accounting transaction. To illustrate, consider the computer logic for recording the credit sale of $2,000 to customer Jones.
Immediately after the execution of Instruction Number 3, and before the execution of Instruction Number 4, the AR-Control account value is temporarily inconsistent by the sum of $2,000. This inconsistency is resolved only after the completion of the entire transaction. In a DDP environment, such temporary inconsistencies can result in the permanent corruption of the database. To illustrate the potential for damage, look at a slightly more complicated example. Using the same computer logic as before, consider the processing of two separate transactions from two remote sites: Transaction 1 (T1) is the sale of $2,000 on account to customer Jones from Site A; Transaction 2 (T2) is the sale of $1,000 on account to customer Smith from Site B. The following logic shows the possible interweaving of the two processing tasks and the effect on data currency.
Notice that Site B seized the AR-Control data value of $10,000 when it was in an inconsistent state. By using this value to process its transaction, Site B effectively destroyed the record of Transaction T1 that had been processed by Site A. Therefore, instead of $13,000, the new AR-Control balance is misstated at $11,000.
Database Lockout
To achieve data currency, simultaneous access to individual data elements by multiple sites needs to be prevented. The solution to this problem is to use a database lockout, which is a software control (usually a function of the DBMS) that prevents multiple simultaneous accesses to data. The previous example can be used to illustrate this technique: immediately upon receiving the access request from Site A for AR- Control (T1, Instruction Number 2), the central site DBMS places a lock on AR-Control to prevent access from other sites until Transaction T1 is complete. Thus, when Site B requests AR-Control (T2, Instruction Number 2), it is placed on wait status until the lock is removed. Only then can Site B access AR-Control and complete Transaction T2.
DISTRIBUTED DATABASES
Distributed databases can be distributed using either the partitioned or replicated technique.
Partitioned Databases
The partitioned database approach splits the central database into segments or partitions that are distributed to their primary users. The advantages of this approach are:
• Storing data at local sites increases users’ control.
• Permitting local access to data and reducing the volume of data that must be transmitted between sites improves transaction processing response time.
• Partitioned databases can reduce the potential for disaster. By having data located at several sites, the loss of a single site cannot terminate all data processing by the organization.
The partitioned approach, which is illustrated in Figure 9-25, works best for organizations that require minimal data sharing among users at remote sites. To the extent that remote users share common data, the
problems associated with the centralized approach still apply. The primary user must now manage requests for data from other sites. Selecting the optimum host location for the partitions will minimize data access problems. This requires an in-depth analysis of end-user data needs.
THE DEADLOCK PHENOMENON. In a distributed environment, it is possible that multiple sites will lock out each other, thus preventing each from processing its transactions. For example, Figure 9-26 illustrates three sites and their mutual data needs. Notice that Site 1 has requested (and locked) Data A and is waiting for the removal of the lock on Data C to complete its transaction. Site 2 has a lock on C and is waiting for E. Finally, Site 3 has a lock on E and is waiting for A. A deadlock occurs here because there is mutual exclusion to data, and the transactions are in a wait state until the locks are removed. This can result in transactions being incompletely processed and corruption of the database. A deadlock is a permanent condition that must be resolved by special software that analyzes each deadlock condition to determine the best solution. Because of the implications for transaction processing, accountants should be aware of the issues pertaining to deadlock resolutions.
DEADLOCK RESOLUTION. Resolving a deadlock usually involves sacrificing one or more transactions. These must be terminated to complete the processing of the other transactions in the deadlock. The preempted transactions must then be reinitiated. In preempting transactions, the deadlock resolution soft- ware attempts to minimize the total cost of breaking the deadlock. Although not an easy task to automate, some of the factors that influence this decision are as follows:
1. The resources currently invested in the transaction. This may be measured by the number of updates that the transaction has already performed and that must be repeated if the transaction is terminated.
2. The transaction’s stage of completion. In general, deadlock resolution software will avoid terminating transactions that are close to completion.
3. The number of deadlocks associated with the transaction. Because terminating the transaction breaks all deadlock involvement, the software should attempt to terminate transactions that are part of more than one deadlock.
Replicated Databases
In some organizations, the entire database is replicated at each site. Replicated databases are effective in companies in which there exists a high degree of data sharing but no primary user. Because common data are replicated at each site, the data traffic between sites is reduced considerably. Figure 9-27 illustrates the replicated database model.
The primary justification for a replicated database is to support read-only queries. With data replicated at every site, data access for query purposes is ensured, and lockouts and delays because of network traffic are minimized. A problem arises, however, when local sites also need to update the replicated database with transactions.
Because each site processes only its local transactions, different transactions will update the common data attributes that are replicated at each site and, thus, each site will possess uniquely different values af- ter the respective updates. Using the data from the earlier example, Figure 9-28 illustrates the effect of processing credit sales for Jones at Site A and Smith at Site B. After the transactions are processed, the value shown for the common AR-Control account is inconsistent ($12,000 at Site A and $11,000 at Site B) and incorrect at both sites.
Concurrency Control
Database concurrency is the presence of complete and accurate data at all remote sites. System designers need to employ methods to ensure that transactions processed at each site are accurately reflected in the databases at all other sites. This task, while problematic, has implications for accounting records and is a matter of concern for accountants.
A commonly used method for concurrency control is to serialize transactions. This involves labeling each transaction by two criteria. First, special software groups transactions into classes to identify poten- tial conflicts. For example, read-only (query) transactions do not conflict with other classes of transactions. Similarly, AP and AR transactions are not likely to use the same data and do not conflict. However, multiple sales order transactions involving both read and write operations will potentially conflict.
The second part of the control process is to time stamp each transaction. A system-wide clock is used to keep all sites, some of which may be in different time zones, on the same logical time. Each time stamp is made unique by incorporating the site’s identification number. When transactions are received at each site, they are examined first for potential conflicts. If conflicts exist, the transactions are entered into a serialization schedule. An algorithm is used to schedule updates to the database based on the transaction time stamp and class. This method permits multiple interleaved transactions to be processed at each site as if they were serial events.
Distributed Databases and the Accountant
The decision to distribute databases is one that should be entered into thoughtfully. There are many issues and trade-offs to consider. Some of the most basic questions to be addressed are:
• Should the organization’s data be centralized or distributed?
• If data distribution is desirable, should the databases be replicated or partitioned?
• If replicated, should the databases be totally replicated or partially replicated?
• If the database is to be partitioned, how should the data segments be allocated among the sites?
The choices involved in each of these questions impact the organization’s ability to maintain database integrity. The preservation of audit trails and the accuracy of accounting records are key concerns. Clearly, these are decisions that the modern accountant should understand and influence intelligently.
Comments
Post a Comment