Unsupervised partial parsing

dc.contributor.advisorBaldridge, Jasonen
dc.contributor.committeeMemberBannard, Colinen
dc.contributor.committeeMemberBeaver, David I.en
dc.contributor.committeeMemberErk, Katrin E.en
dc.contributor.committeeMemberMooney, Raymond J.en
dc.creatorPonvert, Elias Franchoten
dc.date.accessioned2011-10-25T14:13:53Zen
dc.date.accessioned2017-05-11T22:23:36Z
dc.date.available2011-10-25T14:13:53Zen
dc.date.available2017-05-11T22:23:36Z
dc.date.issued2011-08en
dc.date.submittedAugust 2011en
dc.date.updated2011-10-25T14:14:02Zen
dc.descriptiontexten
dc.description.abstractThe subject matter of this thesis is the problem of learning to discover grammatical structure from raw text alone, without access to explicit instruction or annotation -- in particular, by a computer or computational process -- in other words, unsupervised parser induction, or simply, unsupervised parsing. This work presents a method for raw text unsupervised parsing that is simple, but nevertheless achieves state-of-the-art results on treebank-based direct evaluation. The approach to unsupervised parsing presented in this dissertation adopts a different way to constrain learned models than has been deployed in previous work. Specifically, I focus on a sub-task of full unsupervised partial parsing called unsupervised partial parsing. In essence, the strategy is to learn to segment a string of tokens into a set of non-overlapping constituents or chunks which may be one or more tokens in length. This strategy has a number of advantages: it is fast and scalable, based on well-understood and extensible natural language processing techniques, and it produces predictions about human language structure which are useful for human language technologies. The models developed for unsupervised partial parsing recover base noun phrases and local constituent structure with high accuracy compared to strong baselines. Finally, these models may be applied in a cascaded fashion for the prediction of full constituent trees: first segmenting a string of tokens into local phrases, then re-segmenting to predict higher-level constituent structure. This simple strategy leads to an unsupervised parsing model which produces state-of-the-art results for constituent parsing of English, German and Chinese. This thesis presents, evaluates and explores these models and strategies.en
dc.description.departmentLinguisticsen
dc.format.mimetypeapplication/pdfen
dc.identifier.slug2152/ETD-UT-2011-08-3991en
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2011-08-3991en
dc.language.isoengen
dc.subjectComputational linguisticsen
dc.subjectNatural language processingen
dc.subjectUnsuperviseden
dc.subjectParsingen
dc.subjectChunkingen
dc.subjectText processingen
dc.titleUnsupervised partial parsingen
dc.type.genrethesisen

Files