Wednesday, July 30, 2008

Recovery of Data Dependencies

Today, many companies have to deal with problems in maintaining legacy database applications, which were developed on old database technology. These applications are getting harder and harder to maintain. Re-engineering is an important means to address the problems and to upgrade the applications to newer technology (Hainaut, Englebert, Henrard, Hick, J.M., & Roland, 1995). However, much of the design of legacy databases including data dependencies is buried in the transactions, which update the databases. They are not explicitly stated anywhere else. The recovery of data dependencies designed from transactions is essential to both the re-engineering of database applications and frequently encountered maintenance tasks. Without an automated approach, the recovery is difficult and time-consuming. This issue is important in data mining, which entails mining the relationships between data from program source codes. However, until recently, no such approach was proposed in the literature.

Recently, Hee Beng Kuan Tan proposed an approach based on program path patterns identified in transactions for the implementation of the most commonly used methods to enforce each common data dependency. The approach is feasible for automation to infer data dependencies designed from the identification of these patterns through program analysis (Muchnick & Jones, 1981; Wilhelm & Maurer, 1995).

Data dependencies play an important role in database design (Maiser, 1986; Piatetsky-Shapiro & Frawley, 1991). Many legacy database applications were developed on old generation database management systems and conventional file systems. As a result, most of the data dependencies in legacy databases are not enforced in the database management systems. As such, they are not explicitly defined in database schema and are enforced in the transactions, which update the databases. Finding out the data dependencies designed manually during the maintenance and re-engineering of database applications is very difficult and time-consuming. In software engineering, program analysis has long been developed and proven as a useful aid in many areas. This article reports the research on the use of program analysis for the recovery of common data dependencies, that is, functional dependencies, key constraints, inclusion dependencies, referential constraints, and sum dependencies, designed in a database from the behavior of transactions.

For more information on Data Recovery, visit my Data Recovery Guide here.

No comments: