Knowledge integration in machine reading

Kim, Doo Soon

Knowledge integration in machine reading

Date

2011-08

Authors

Kim, Doo Soon

Abstract

Machine reading is the artiﬁcial-intelligence task of automatically reading a corpus of texts and, from the contents, building a knowledge base that supports automated reasoning and question answering. Success at this task could fundamentally solve the knowledge acquisition bottleneck – the widely recognized problem that knowledge-based AI systems are diﬃcult and expensive to build because of the diﬃculty of acquiring knowledge from authoritative sources and building useful knowledge bases. One challenge inherent in machine reading is knowledge integration – the task of correctly and coherently combining knowledge snippets extracted from texts. This dissertation shows that knowledge integration can be automated and that it can signiﬁcantly improve the performance of machine reading.

We speciﬁcally focus on two contributions of knowledge integration. The ﬁrst contribution is for improving the coherence of learned knowledge bases to better support automated reasoning and question answering. Knowledge integration achieves this beneﬁt by aligning knowledge snippets that contain overlapping content. The alignment is diﬃcult because the snippets can use signiﬁcantly diﬀerent surface forms. In one common type of variation, two snippets might contain overlapping content that is expressed at diﬀerent levels of granularity or detail. Our matcher can “see past” this diﬀerence to align knowledge snippets drawn from a single document, from multiple documents, or from a document and a background knowledge base.

The second contribution is for improving text interpretation. Our approach is to delay ambiguity resolution to enable a machine-reading system to maintain multiple candidate interpretations. This is useful because typically, as the system reads through texts, evidence accumulates to help the knowledge integration system resolve ambiguities correctly. To avoid a combinatorial explosion in the number of candidate interpretations, we propose the packed representation to compactly encode all the candidates. Also, we present an algorithm that prunes interpretations from the packed representation as evidence accumulates.

We evaluate our work by building and testing two prototype machine reading systems and measuring the quality of the knowledge bases they construct. The evaluation shows that our knowledge integration algorithms improve the cohesiveness of the knowledge bases, indicating their improved ability to support automated reasoning and question answering. The evaluation also shows that our approach to postponing ambiguity resolution improves the system’s accuracy at text interpretation.