Temporal modeling of crowd work quality for quality assurance in crowdsourcing



Journal Title

Journal ISSN

Volume Title



While crowdsourcing offers potential traction on data collection at scale, it also poses new and significant quality concerns. Beyond the obvious issue of any new methodology being untested and often suffering initial growing pains, crowdsourcing has faced a very particular criticism since its inception: given anonymity of crowd workers, it is questionable whether we can trust their contributions as much as work completed by trusted workers. To relieve this concern, recent studies have proposed a variety of methods. However, while temporal behavioral patterns can be discerned to underlie real crowd work, prior studies have typically modeled worker performance under an assumption that a sequence of model variables is independent and identically distributed (i.i.d). This dissertation focuses on the measurement and prediction of crowd work quality by considering its temporal properties. To better model such temporal worker behavior, we present a time-series prediction model for crowd work quality. This model captures and summarizes past worker label quality, enabling us to better predict the quality of each worker’s next label. Further- more, we propose a crowd assessor model for predicting crowd work quality more accurately. By taking account of multi-dimensional features of a crowd assessor, we aim to build a better quality prediction model of crowd work. Finally, this dissertation explores how the proposed prediction models work under realistic scenarios. In particular, we consider a realistic use case in which limited gold labels are provided for learning our proposed model. For this problem, we leverage instance weighting with soft labels, which takes ac- count of uncertainty of each training instance. Our empirical evaluation with synthetic datasets and a public crowdsourcing dataset has shown that our pro- posed models significantly improve prediction quality of crowd work as well as lead to an acquisition of better quality labels in crowdsourcing.