Browsing by Subject "Language documentation"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item A descriptive grammar of Yongning Na (Mosuo)(2010-12) Lidz, Liberty A.; Woodbury, Anthony C.; England, Nora; Epps, Patience; Zhang, Qing; Thurgood, Graham; Crowhurst, MeganThis dissertation is a descriptive grammar of Yongning Na (Mosuo), a Tibeto-Burman language spoken in southwestern China. The theoretical approaches taken are functional syntax and the discourse-based approach to language description and documentation. The aim of this dissertation is to describe the ways that the language's features and subsystems intersect to make Na a unique entity: analyticity; zero anaphora; OV word order; topic/comment information structure; a five-part evidential system; a conjunct/disjunct-like system that intersects with evidentiality and verbal semantics; prolific grammaticalization; overlap between nominalization and relativization and associated structures; representation of time through aspect, Aktionsarten, adverbials, and discourse context; and the Daba shamanic register.Item The phonology and inflectional morphology of Cháʔknyá, Tataltepec de Valdés Chatino, a Zapotecan language(2015-05) Sullivant, John Ryan; Woodbury, Anthony C.; England, Nora; Epps, Patience; Myers, Scott; DiCanio, Christian; Rasch, JeffreyThis dissertation is a description of the phonology and inflectional morphology of an endangered indigenous language of Mexico stemming from a collaborative research project that places an emphasis on natural language and on describing a language on its own terms. The language described is Tataltepec Chatino (ISO 639-3: cta), a Zapotecan language spoken by fewer than 500 people only in the community of Tataltepec de Valdés in Mexico's Oaxaca state. The language has a complex system of tone in which tone sequences are the crucial morphological element rather than the constituent tones of the tone sequences. The tone system has a slightly peculiar inventory, with the level tones Low, High, and Superhigh rather than Low, Mid, and High in addition to a High-Low contour tone. The tonal system is also notable given the unlinked tone in two tone sequences which only surfaces in particular phonological contexts, but is never displaced from the word it is associated with, unlike canonical floating tones. The segmental phonology shows a language that permits a large number of often very complex onset clusters many of which violate the Sonority Sequencing Principle, but maintains tight restrictions on codas, allowing only a simple coda which can only be filled by one of two consonants in the language. Tataltepec Chatino also has interesting morphological features in its complex systems of verb aspect and person inflection which are instantiated by a system of prefixes and a system of complex paradigmatic alternations which only partially intersect. The language also has an unusual word I analyze as a "pseudoclassifier" which appears to serve some pragmatic functions of numeral classifiers while failing to do any lexical classification.Item Semi-automated annotation and active learning for language documentation(2009-12) Palmer, Alexis Mary; Baldridge, Jason; Erk, Katrin; England, Nora; Mooney, Raymond; Woodbury, AnthonyBy the end of this century, half of the approximately 6000 extant languages will cease to be transmitted from one generation to the next. The field of language documentation seeks to make a record of endangered languages before they reach the point of extinction, while they are still in use. The work of documenting and describing a language is difficult and extremely time-consuming, and resources are extremely limited. Developing efficient methods for making lasting records of languages may increase the amount of documentation achieved within budget restrictions. This thesis approaches the problem from the perspective of computational linguistics, asking whether and how automated language processing can reduce human annotation effort when very little labeled data is available for model training. The task addressed is morpheme labeling for the Mayan language Uspanteko, and we test the effectiveness of two complementary types of machine support: (a) learner-guided selection of examples for annotation (active learning); and (b) annotator access to the predictions of the learned model (semi-automated annotation). Active learning (AL) has been shown to increase efficacy of annotation effort for many different tasks. Most of the reported results, however, are from studies which simulate annotation, often assuming a single, infallible oracle. In our studies, crucially, annotation is not simulated but rather performed by human annotators. We measure and record the time spent on each annotation, which in turn allows us to evaluate the effectiveness of machine support in terms of actual annotation effort. We report three main findings with respect to active learning. First, in order for efficiency gains reported from active learning to be meaningful for realistic annotation scenarios, the type of cost measurement used to gauge those gains must faithfully reflect the actual annotation cost. Second, the relative effectiveness of different selection strategies in AL seems to depend in part on the characteristics of the annotator, so it is important to model the individual oracle or annotator when choosing a selection strategy. And third, the cost of labeling a given instance from a sample is not a static value but rather depends on the context in which it is labeled. We report two main findings with respect to semi-automated annotation. First, machine label suggestions have the potential to increase annotator efficacy, but the degree of their impact varies by annotator, with annotator expertise a likely contributing factor. At the same time, we find that implementation and interface must be handled very carefully if we are to accurately measure gains from semi-automated annotation. Together these findings suggest that simulated annotation studies fail to model crucial human factors inherent to applying machine learning strategies in real annotation settings.Item The social life and sound patterns of Nanti ways of speaking(2010-05) Beier, Christine Marie; England, Nora C.; Sherzer, Joel; Woodbury, Anthony C.; Keating, Elizabeth L.; Crowhurst, Megan J.; Hanks, William F.This dissertation explores the phenomenon of ways of speaking in the Nanti speech community of Montetoni, in southeastern Peruvian Amazonia, between 1999 and 2009. In the context of this study, a 'way of speaking' is a socially meaningful, conventionalized sound pattern, manifest at the level of the utterance, that expresses the speaker's orientation toward some aspect of the interaction. This study closely examines both the sound patterns and patterns of use of three Nanti ways of speaking — matter-of-fact talk, scolding talk, and hunting talk — and describes each one in relation to a broader set of linguistic, social, and cultural practices characteristic of the speech community at the time. The data for this study is naturally-occurring discourse recorded during multi-party, face-to-face interactions in Montetoni. Bringing together methods developed by linguists, linguistic anthropologists, conversation analysts, and interactional sociologists, this study explores the communicative relations among participants, interactions, situations of interaction, and the utterances that link them all, attending to both the individual-level cognitive (subjective) facets of interpersonal communication and the necessarily intersubjective environment in which communication takes place. In order to disaggregate the multiple levels of signification evidenced in specific utterances, tokens are examined at four levels of organization: the sound form, the sentence, the turn, and the move. The data are presented via audio files; acoustic analyses; sequentially-organized and temporally-anchored interlinearized transcripts; and composite visual representations, all of which are framed by detailed ethnographic description. Nantis' ways of speaking are shown to consistently and systematically convey social aspects of 'meaning' that are crucial to utterance interpretation and, therefore, to successful interpersonal communication. Based on the robust correspondences between sound form and communicative function identified in the Nanti communicative system, this study proposes that ways of speaking are a cross-linguistically viable level of organization in language use that awaits discovery and description in other speech communities. The research project itself is framed in terms of the practical issues that emerged through the author's own experiences in learning to communicate appropriately in monolingual Nanti society, and the ethical issues that motivate community-oriented documentation of endangered language practices.Item Texas Alsatian : Henri Castro's legacy(2009-12) Roesch, Karen A.; Boas, Hans Christian, 1971-; Pierce, Marc; King, Robert D; Epps, Patience; Hinrichs, LarsThis study constitutes the first in-depth description and analysis of Texas Alsatian as spoken in Medina County, Texas, in the twenty-first century. The Alsatian dialect was transported to Texas in 1842, when the entrepreneur Henri Castro recruited colonists from the Alsace to fulfill the Texas Republic’s stipulations for populating his land grant located to the west of San Antonio. Texas Alsatian (TxAls)is a dialect distinct from other varieties of Texas German (Gilbert 1972: 1, Salmons 1983: 191) and is mainly spoken in Eastern Medina County in and around the city of Castroville. With a small and aging speaker population, it has not been transmitted to the next generation and will likely survive for only another two to three decades. Despite this endangered status, TxAls is a language undergoing death with minimal change. This study provides both a descriptive account of TxAls and discussions on extralinguistic factors linked to ethnic identity and language loyalty, which have enabled the maintenance of this distinctive Texas German dialect for 150 years. To investigate the extent of the maintenance of lexical, phonological, and morphological features, this study identifies the main donor dialect(s), Upper Rhine Alsatian, and compares its linguistic features to those presently maintained in the community, based on current data collected between 2007 and 2009 and Gilbert’s (1972) data collected in the 1960s. This discussion of TxAls is three-fold: (1) an analysis of social, historical, political, and economic factors affecting the maintenance and decline of TxAls, (2) a detailed structural analysis of the grammatical features of TxAls, supported by a description of its European donor dialect and substantiated by Gilbert’s (1972) data, and (3) a discussion of the participants’ attitudes toward their ancestral language, which have either contributed to the maintenance of TxAls, or are now accelerating its decline, based on responses to a survey developed for the TxAls community, the Alsatian Questionnaire.