Die Einführung des VIVO-Systems an der HTWD befindet sich derzeit in der Testphase. Daher kann es noch zu anwendungsseitigen Fehlern kommen. Sollten Sie solche Fehler bemerken, können Sie diese gerne >>hier<< melden.
Sollten Sie dieses Fenster schließen, können Sie über die Schaltfläche "Feedback" in der Fußleiste weiterhin Meldungen abgeben.
Vielen Dank für Ihre Unterstützung!
Top-k Entity Augmentation using Consistent Set Covering
Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. Slate of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of heterogeneous sources using information retrieval and automatic schema matching methods call not easily return a single result that the user can trust, especially if the result is composed from a large number of sources that user has to verify manually. We therefore propose to process these queries in a Top-k fashion, in which the system produces multiple minimal consistent solutions front which the user can choose to resolve the uncertainty of the data sources and methods used. In this paper, we introduce and formalize the problem of consistent, multi-solution set covering, and present algorithms based on a greedy and a genetic optimization approach. We then apply these algorithms to Web table-based entity augmentation. The publication further includes a Web table corpus with 100M tables, and a Web table retrieval and matching system in which these algorithms are implemented. Our experiments show that the consistency and minimality of the augmentation results can be improved using our set covering approach, without loss of precision or coverage and while producing multiple alternative query results.