Learning for semantic parsing with kernels under various forms of supervision
Semantic parsing involves deep semantic analysis that maps natural language sentences to their formal executable meaning representations. This is a challenging problem and is critical for developing computing systems that understand natural language input. This thesis presents a new machine learning approach for semantic parsing based on string-kernel-based classification. It takes natural language sentences paired with their formal meaning representations as training data. For every production in the formal language grammar, a Support-Vector Machine (SVM) classifier is trained using string similarity as the kernel. Meaning representations for novel natural language sentences are obtained by finding the most probable semantic parse using these classi- fiers. This method does not use any hard-matching rules and unlike previous and other recent methods, does not use grammar rules for natural language, probabilistic or otherwise, which makes it more robust to noisy input. Besides being robust, this approach is also flexible and able to learn under a wide range of supervision, from extra to weaker forms of supervision. It can easily utilize extra supervision given in the form of syntactic parse trees for natural language sentences by using a syntactic tree kernel instead of a string kernel. Its learning algorithm can also take advantage of detailed supervision provided in the form of semantically augmented parse trees. A simple extension using transductive SVMs enables the system to do semi-supervised learning and improve its performance utilizing unannotated sentences which are usually easily available. Another extension involving EM-like retraining makes the system capable of learning under ambiguous supervision in which the correct meaning representation for each sentence is not explicitly given, but instead a set of possible meaning representations is given. This weaker and more general form of supervision is better representative of a natural training environment for a language-learning system requiring minimal human supervision. For a semantic parser to work well, conformity between natural language and meaning representation grammar is necessary. However meaning representation grammars are typically designed to best suit the application which will use the meaning representations with little consideration for how well they correspond to natural language semantics. We present approaches to automatically transform meaning representation grammars to make them more compatible with natural language semantics and hence more suitable for learning semantic parsers. Finally, we also show that ensembles of different semantic parser learning systems can obtain the best overall performance.