Adversarial Machine Learning for Social Good
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The deployment of Machine learning (ML) techniques to automate critical decision-making in healthcare, employment, finance, and crime prevention has played a huge role in im- proving these systems. However, an incorrect decision can lead to potentially life-changing consequences. In addition, these deployed ML models are also highly vulnerable to test-time adversarial attacks. Thus these ML models need to be made trustworthy and reliable. This dissertation deals with these problems and presents work to make the model more robust, fair, and reliable by using adversarial machine learning techniques. In this dissertation, we start with the problem of whether ML-based deception detection systems are generalizable to real-life scenarios. We perform experiments to examine whether multimodal aspects such as facial expressions, eye movements, and video cues can be used as deception detection features. We develop three different datasets based on real-life lying scenarios (e.g., for a reward, duress, and speaking white lies) and use them as deception detection features. We also study state-of-the-art deception detection systems and algorithms and try to extend them to our deception scenarios. We show that deception detection is not generalizable to real-life scenarios, and more subject matter knowledge and better models are needed to make such a claim. We also address the issue of using adversarial examples to protect user security and privacy and make the black-box models more advantageous to the end-user, specifically in the vision and tabular data domain. In the vision domain, we consider the problems of protecting a sensitive attribute (e.g., gender) using an adversarial artifact like adversarial glasses on the image. In the tabular data domain, we provide adversarial recommendations to the user in their loan application data to change the black-box loan application model from bad credit to good credit. Thus, we demonstrate that we can craft adversarial examples to help end-users protect their privacy and ensure they do not get unfair treatment. Finally, we tackle the problem of multi-concept adversarial examples. We show that the state-of-the-art adversarial attack of a particular classifier, e.g., gender, reduces the accuracy of a different classifier, e.g., age trained independently on the same data pool. Combining the loss functions of the attacked model and the protected model in our loss formulation shows that we can create targeted adversarial attacks. These custom adversarial attacks not only successfully attack the target classifiers, but also cause no drop in the accuracy of the other classifiers in the protected set.