Crowdsourcing construction of information retrieval test collections for conversational speech

Zhou, Haofeng

Crowdsourcing construction of information retrieval test collections for conversational speech

dc.contributor.advisor	Lease, Matthew A.	en
dc.contributor.committeeMember	Wallace, Byron	en
dc.creator	Zhou, Haofeng	en
dc.date.accessioned	2015-10-23T19:00:09Z	en
dc.date.accessioned	2018-01-22T22:28:36Z
dc.date.available	2015-10-23T19:00:09Z	en
dc.date.available	2018-01-22T22:28:36Z
dc.date.issued	2015-05	en
dc.date.submitted	May 2015	en
dc.date.updated	2015-10-23T19:00:09Z	en
dc.description	text	en
dc.description.abstract	Building a test collection for an ad hoc information retrieval system on conversational speech raises new challenges for researchers. Traditional methods for building test collections are costly, and thus they are not feasible to apply to large scale conversational speech data. Constructing a large test collection on conversational speech with high quality at low cost is challenging. Crowdsourcing may represent a promising approach. Crowd workers tend to be less expensive than professional assessors, and crowd workers can work simultaneously to perform jobs on a large scale. However, despite the benefits of scale and cost, the quality of the results delivered by crowd workers may suffer. This thesis focuses on relevance judging, one of the key components of a test collection. We adopt two crowdsourcing platforms: oDesk and MTurk, use audio clips and various versions of transcripts, conduct multiple experiments under diverse settings, and analyze the results qualitatively and quantitatively. We delve into what factors influence the quality of relevance judgments on conversational speech. We also investigate differences between relevance judgements from experts and crowd workers. This thesis also describes best practices for the design of crowdsourcing tasks to improve crowd workers' performance. Ultimately, these may assist researchers in building high-quality test collections on conversational speech at low cost and scale through crowdsourcing.	en
dc.description.department	Information	en
dc.format.mimetype	application/pdf	en
dc.identifier	doi:10.15781/T26032	en
dc.identifier.uri	http://hdl.handle.net/2152/31916	en
dc.language.iso	en	en
dc.subject	Crowdsourcing	en
dc.subject	Information retrieval	en
dc.subject	Test collection	en
dc.subject	Conversational speech	en
dc.subject	Relevance judgement	en
dc.title	Crowdsourcing construction of information retrieval test collections for conversational speech	en
dc.type	Thesis	en

Collections

University of Texas at Austin

Crowdsourcing construction of information retrieval test collections for conversational speech

Files

Collections