Unsupervised partial parsing

Ponvert, Elias Franchot

Unsupervised partial parsing

dc.contributor.advisor	Baldridge, Jason	en
dc.contributor.committeeMember	Bannard, Colin	en
dc.contributor.committeeMember	Beaver, David I.	en
dc.contributor.committeeMember	Erk, Katrin E.	en
dc.contributor.committeeMember	Mooney, Raymond J.	en
dc.creator	Ponvert, Elias Franchot	en
dc.date.accessioned	2011-10-25T14:13:53Z	en
dc.date.accessioned	2017-05-11T22:23:36Z
dc.date.available	2011-10-25T14:13:53Z	en
dc.date.available	2017-05-11T22:23:36Z
dc.date.issued	2011-08	en
dc.date.submitted	August 2011	en
dc.date.updated	2011-10-25T14:14:02Z	en
dc.description	text	en
dc.description.abstract	The subject matter of this thesis is the problem of learning to discover grammatical structure from raw text alone, without access to explicit instruction or annotation -- in particular, by a computer or computational process -- in other words, unsupervised parser induction, or simply, unsupervised parsing. This work presents a method for raw text unsupervised parsing that is simple, but nevertheless achieves state-of-the-art results on treebank-based direct evaluation. The approach to unsupervised parsing presented in this dissertation adopts a different way to constrain learned models than has been deployed in previous work. Specifically, I focus on a sub-task of full unsupervised partial parsing called unsupervised partial parsing. In essence, the strategy is to learn to segment a string of tokens into a set of non-overlapping constituents or chunks which may be one or more tokens in length. This strategy has a number of advantages: it is fast and scalable, based on well-understood and extensible natural language processing techniques, and it produces predictions about human language structure which are useful for human language technologies. The models developed for unsupervised partial parsing recover base noun phrases and local constituent structure with high accuracy compared to strong baselines. Finally, these models may be applied in a cascaded fashion for the prediction of full constituent trees: first segmenting a string of tokens into local phrases, then re-segmenting to predict higher-level constituent structure. This simple strategy leads to an unsupervised parsing model which produces state-of-the-art results for constituent parsing of English, German and Chinese. This thesis presents, evaluates and explores these models and strategies.	en
dc.description.department	Linguistics	en
dc.format.mimetype	application/pdf	en
dc.identifier.slug	2152/ETD-UT-2011-08-3991	en
dc.identifier.uri	http://hdl.handle.net/2152/ETD-UT-2011-08-3991	en
dc.language.iso	eng	en
dc.subject	Computational linguistics	en
dc.subject	Natural language processing	en
dc.subject	Unsupervised	en
dc.subject	Parsing	en
dc.subject	Chunking	en
dc.subject	Text processing	en
dc.title	Unsupervised partial parsing	en
dc.type.genre	thesis	en

Collections

University of Texas at Austin

Unsupervised partial parsing

Files

Collections