Algorithms for Large-Scale Internet Measurements
As the Internet has grown in size and importance to society, it has become increasingly difficult to generate global metrics of interest that can be used to verify proposed algorithms or monitor performance. This dissertation tackles the problem by proposing several novel algorithms designed to perform Internet-wide measurements using existing or inexpensive resources. We initially address distance estimation in the Internet, which is used by many distributed applications. We propose a new end-to-end measurement framework called Turbo King (T-King) that uses the existing DNS infrastructure and, when compared to its predecessor King, obtains delay samples without bias in the presence of distant authoritative servers and forwarders, consumes half the bandwidth, and reduces the impact on caches at remote servers by several orders of magnitude. Motivated by recent interest in the literature and our need to find remote DNS nameservers, we next address Internet-wide service discovery by developing IRLscanner, whose main design objectives have been to maximize politeness at remote networks, allow scanning rates that achieve coverage of the Internet in minutes/hours (rather than weeks/months), and significantly reduce administrator complaints. Using IRLscanner and 24-hour scan durations, we perform 20 Internet-wide experiments using 6 different protocols (i.e., DNS, HTTP, SMTP, EPMAP, ICMP and UDP ECHO). We analyze the feedback generated and suggest novel approaches for reducing the amount of blowback during similar studies, which should enable researchers to collect valuable experimental data in the future with significantly fewer hurdles. We finally turn our attention to Intrusion Detection Systems (IDS), which are often tasked with detecting scans and preventing them; however, it is currently unknown how likely an IDS is to detect a given Internet-wide scan pattern and whether there exist sufficiently fast stealth techniques that can remain virtually undetectable at large-scale. To address these questions, we propose a novel model for the windowexpiration rules of popular IDS tools (i.e., Snort and Bro), derive the probability that existing scan patterns (i.e., uniform and sequential) are detected by each of these tools, and prove the existence of stealth-optimal patterns.