摘要
Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of cer- tain attribute values in pairs of database tuples when some similarity conditions on other values are satisfied. Their en- forcement can be seen as a natural generalization of entity resolution. In what we call the pure case of MD enforce- ment, an arbitrary value from the underlying data domain can be used for the value in common that is used for a match- ing. However, the overall number of changes of attribute val- ues is expected to be kept to a minimum. We investigate this case in terms of semantics and the properties of data clean- ing through the enforcement of MDs. We characterize the in- tended clean instances, and also the clean answers to queries, as those that are invariant under the cleaning process. The complexity of computing clean instances and clean query an- swering is investigated. Tractable and intractable cases de- pending on the MDs are identified and characterized.
Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of cer- tain attribute values in pairs of database tuples when some similarity conditions on other values are satisfied. Their en- forcement can be seen as a natural generalization of entity resolution. In what we call the pure case of MD enforce- ment, an arbitrary value from the underlying data domain can be used for the value in common that is used for a match- ing. However, the overall number of changes of attribute val- ues is expected to be kept to a minimum. We investigate this case in terms of semantics and the properties of data clean- ing through the enforcement of MDs. We characterize the in- tended clean instances, and also the clean answers to queries, as those that are invariant under the cleaning process. The complexity of computing clean instances and clean query an- swering is investigated. Tractable and intractable cases de- pending on the MDs are identified and characterized.