Browsing by Subject "spam"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Combating Crowdsourced Manipulation of Social Media(2013-08-01) Tamilarasan, PrithiviCrowdsourcing systems - like Ushahidi (for crisis mapping), Foldit (for protein folding) and Duolingo (for foreign language learning and translation) - have shown the effectiveness of intelligently organizing large numbers of people to solve traditionally vexing problems. Unfortunately, new crowdsourcing platforms are emerging to support the coordinated dissemination of spam, misinformation, and propaganda. These ?crowdturfing? systems are a sinister counterpart to the enormous positive opportunities of crowdsourcing; they combine the organizational capabilities of crowdsourcing with the ability to widely spread artificial grass root support (so called ?astroturfing?). This thesis begins a study of crowdturfing that targets social media and proposes a framework for ?pulling back the curtain? on crowdturfers to reveal their underlying ecosystem. Concretely, this thesis (i) analyzes the types of campaigns hosted on multiple crowdsourcing sites; (ii) links campaigns and their workers on crowdsourcing sites to social media; (iii) analyzes the relationship structure connecting these workers, their profile, activity, and linguistic characteristics, in comparison with a random sample of regular social media users; and (iv) proposes and develops statistical user models to automatically identify crowdturfers in social media. Since many crowdturfing campaigns are hidden, it is important to understand the potential of learning models from known campaigns to detect these unknown campaigns. Our experimental results show that the statistical user models built can predict crowdturfers with very high accuracy.Item Identifying Search Engine Spam Using DNS(2012-02-14) Mathiharan, Siddhartha SankaranWeb crawlers encounter both finite and infinite elements during crawl. Pages and hosts can be infinitely generated using automated scripts and DNS wildcard entries. It is a challenge to rank such resources as an entire web of pages and hosts could be created to manipulate the rank of a target resource. It is crucial to be able to differentiate genuine content from spam in real-time to allocate crawl budgets. In this study, ranking algorithms to rank hosts are designed which use the finite Pay Level Domains(PLD) and IPv4 addresses. Heterogenous graphs derived from the webgraph of IRLbot are used to achieve this. PLD Supporters (PSUPP) which is the number of level-2 PLD supporters for each host on the host-host-PLD graph is the first algorithm that is studied. This is further improved by True PLD Supporters(TSUPP) which uses true egalitarian level-2 PLD supporters on the host-IP-PLD graph and DNS blacklists. It was found that support from content farms and stolen links could be eliminated by finding TSUPP. When TSUPP was applied on the host graph of IRLbot, there was less than 1% spam in the top 100,000 hosts.Item Playing Hide-and-Seek with Spammers: Detecting Evasive Adversaries in the Online Social Network Domain(2012-10-19) Harkreader, Robert ChandlerOnline Social Networks (OSNs) have seen an enormous boost in popularity in recent years. Along with this popularity has come tribulations such as privacy concerns, spam, phishing and malware. Many recent works have focused on automatically detecting these unwanted behaviors in OSNs so that they may be removed. These works have developed state-of-the-art detection schemes that use machine learning techniques to automatically classify OSN accounts as spam or non-spam. In this work, these detection schemes are recreated and tested on new data. Through this analysis, it is clear that spammers are beginning to evade even these detectors. The evasion tactics used by spammers are identified and analyzed. Then a new detection scheme is built upon the previous ones that is robust against these evasion tactics. Next, the difficulty of evasion of the existing detectors and the new detector are formalized and compared. This work builds a foundation for future researchers to build on so that those who would like to protect innocent internet users from spam and malicious content can overcome the advances of those that would prey on these users for a meager dollar.