Robust and Scalable Sampling Algorithms for Network Measurement

Date

2010-10-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Recent growth of the Internet in both scale and complexity has imposed a number of difficult challenges on existing measurement techniques and approaches, which are essential for both network management and many ongoing research projects. For any measurement algorithm, achieving both accuracy and scalability is very challenging given hard resource constraints (e.g., bandwidth, delay, physical memory, and CPU speed). My dissertation research tackles this problem by first proposing a novel mechanism called residual sampling, which intentionally introduces a predetermined amount of bias into the measurement process. We show that such biased sampling can be extremely scalable; moreover, we develop residual estimation algorithms that can unbiasedly recover the original information from the sampled data. Utilizing these results, we further develop two versions of the residual sampling mechanism: a continuous version for characterizing the user lifetime distribution in large-scale peer-to-peer networks and a discrete version for monitoring flow statistics (including per-flow counts and the flow size distribution) in high-speed Internet routers. For the former application in P2P networks, this work presents two methods: ResIDual-based Estimator (RIDE), which takes single-point snapshots of the system and assumes systems with stationary arrivals, and Uniform RIDE (U-RIDE), which takes multiple snapshots and adapts to systems with arbitrary (including non-stationary) arrival processes. For the latter application in traffic monitoring, we introduce Discrete RIDE (D-RIDE), which allows one to sample each flow with a geometric random variable. Our numerous simulations and experiments with P2P networks and real Internet traces confirm that these algorithms are able to make accurate estimation about the monitored metrics and simultaneously meet the requirements of hard resource constraints. These results show that residual sampling indeed provides an ideal solution to balancing between accuracy and scalability.

Description

Citation