Browsing by Subject "Scalability"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Large-scale network analytics(2011-08) Song, Han Hee, 1978-; Zhang, Yin, doctor of computer scienceScalable and accurate analysis of networks is essential to a wide variety of existing and emerging network systems. Specifically, network measurement and analysis helps to understand networks, improve existing services, and enable new data-mining applications. To support various services and applications in large-scale networks, network analytics must address the following challenges: (i) how to conduct scalable analysis in networks with a large number of nodes and links, (ii) how to flexibly accommodate various objectives from different administrative tasks, (iii) and how to cope with the dynamic changes in the networks. This dissertation presents novel path analysis schemes that effectively address the above challenges in analyzing pair-wise relationships among networked entities. In doing so, we make the following three major contributions to large-scale IP networks, social networks, and application service networks. For IP networks, we propose an accurate and flexible framework for path property monitoring. Analyzing the performance side of paths between pairs of nodes, our framework incorporates approaches that perform exact reconstruction of path properties as well as approximate reconstruction. Our framework is highly scalable to design measurement experiments that span thousands of routers and end hosts. It is also flexible to accommodate a variety of design requirements. For social networks, we present scalable and accurate graph embedding schemes. Aimed at analyzing the pair-wise relationships of social network users, we present three dimensionality reduction schemes leveraging matrix factorization, count-min sketch, and graph clustering paired with spectral graph embedding. As concrete applications showing the practical value of our schemes, we apply them to the important social analysis tasks of proximity estimation, missing link inference, and link prediction. The results clearly demonstrate the accuracy, scalability, and flexibility of our schemes for analyzing social networks with millions of nodes and tens of millions of links. For application service networks, we provide a proactive service quality assessment scheme. Analyzing the relationship between the satisfaction level of subscribers of an IPTV service and network performance indicators, our proposed scheme proactively (i.e., detect issues before IPTV subscribers complain) assesses user-perceived service quality using performance metrics collected from the network. From our evaluation using network data collected from a commercial IPTV service provider, we show that our scheme is able to predict 60% of the service problems that are complained by customers with only 0.1% of false positives.Item Populating a Linked Data Entity Name System(2016-05) Kejriwal, Mayank; Miranker, Daniel P.; Ghosh, Joydeep; Price, Eric; Miikkulainen, Risto; Mooney, RaymondResource Description Framework (RDF) is a graph-based data model used to publish data as a Web of Linked Data. RDF is an emergent foundation for large-scale data integration, the problem of providing a unified view over multiple data sources. An Entity Name System (ENS) is a thesaurus for entities, and is a crucial component in a data integration architecture. Populating a Linked Data ENS is equivalent to solving an Artificial Intelligence problem called instance matching, which concerns identifying pairs of entities referring to the same underlying entity. This dissertation presents an instance matcher with four properties, namely automation, heterogeneity, scalability and domain independence. Automation is addressed by employing inexpensive but well-performing heuristics to automatically generate a training set, which is employed by other machine learning algorithms in the pipeline. Data-driven alignment algorithms are adapted to deal with structural heterogeneity in RDF graphs. Domain independence is established by actively avoiding prior assumptions about input domains, and through evaluations on ten RDF test cases. The full system is scaled by implementing it on cloud infrastructure using MapReduce algorithms.Item Retrospect on contemporary Internet organization and its challenges in the future(2011-05) Gutierrez De Lara, Felipe; Bard, William Carl, 1944-; Julien, ChristineThe intent of this report is to expose the audience to the contemporary organization of the Internet and to highlight the challenges it has to deal with in the future as well as the current efforts being made to overcome such threats. This report aims to build a frame of reference for how the Internet is currently structured and how the different layers interact together to make it possible for the Internet to exist as we know it. Additionally, the report explores the challenges the current Internet architecture design is facing, the reasons why these challenges are arising, and the multiple efforts taking place to keep the Internet working. In order to reach these objectives I visited multiple sites of organizations whose only reason for existence is to support the Internet and keep it functioning. The approach used to write this report was to research the topic by accessing multiple technical papers extracted from the IEEE database and network conferences reviews and to analyze and expose their findings. This report utilizes this vii information to elaborate on how network engineers are handling the challenges of keeping the Internet functional while supporting dynamic requirements. This report exposes the challenges the Internet is facing with scalability, the existence of debugging tools, security, mobility, reliability, and quality of service. It is explained in brief how each of these challenges are affecting the Internet and the strategies in place to vanquish them. The final objectives are to inform the reader of how the Internet is working with a set of ever changing and growing requirements, give an overview of the multiple institutions dedicated to reinforcing the Internet and provide a list of current challenges and the actions being taken to overcome them.Item Scalability in analysis of software architectures(2007-08) Chukwuogo, Bosah I.; Shin, Michael; Andersen, Per H.This thesis describes an approach to improving scalability in analysis of software architectures for large-scale application systems by model transformation. The software architecture for a large-scale application system is modeled using the Unified Modeling Language (UML), and then, in consideration of scalability, it is transformed to a hierarchical Colored Petri Net (CPN) model. The goal of CPN to UML model transformation efforts is to use CPN analysis based on well established formal methods to analyze dynamic behavior of the software architecture model. However, for large scale software architectures the resulting CPN model can contain enough states that lead to combinatorial explosion during analysis of the model consequently creating an innumerable state space where any kind of complete analysis may be practically infeasible. This thesis presents a methodology that shows how abstraction can be considered during transformation to reduce the resulting CPN state space while preserving the expected behavior of the software. Detailed descriptions of the transformation process from UML model to CPN model for software architectures of application systems are presented as well as two case studies - Automated Teller Machine (ATM) system and Elevator system. The state spaces generated during analysis of each case study are presented, and the practical feasibility of this methodology is described using results of each case study.Item Scaling managed runtime systems for future multicore hardware(2009-12) Ha, Jung Woo; McKinley, Kathryn S.; Arnold, Matthew; Blackburn, Stephen M.; Keckler, Stephen W.; Witchel, EmmettThe exponential improvement in single processor performance has recently come to an end, mainly because clock frequency has reached its limit due to power constraints. Thus, processor manufacturers are choosing to enhance computing capabilities by placing multiple cores into a single chip, which can improve performance given parallel software. This paradigm shift to chip multiprocessors (also called multicore) requires scalable parallel applications that execute tasks on each core, otherwise the additional cores are worthless. Making an application scalable requires more than simply parallelizing the application code itself. Modern applications are written in managed languages, which require automatic memory management, type and memory abstractions, dynamic analysis and just-in-time (JIT) compilation. These managed runtime systems monitor and interact frequently with the executing application. Hence, the managed runtime itself must be scalable, and the instrumentation that monitors the application should not perturb its scalability. While multicore hardware forces a redesign of managed runtimes for scalability, it also provides opportunities when applications do not fully utilize all of the cores. Using available cores for concurrent helper threads that enhance the software, with debugging, security, and software support will make the runtime itself more capable and more scalable. This dissertation presents two novel techniques that improve the scalability of managed runtimes by utilizing unused cores. The first technique is a concurrent dynamic analysis framework that provides a low-overhead buffering mechanism called Cache-friendly Asymmetric Buffering (CAB) that quickly offloads data from the application to helper threads that perform specific dynamic analyses. Our framework minimizes application instrumentation overhead, prevents microarchitectural side-effects, and supports a variety of dynamic analysis clients, ranging from call graph and path profiling to cache simulation. The use of this framework ensures that helper threads perturb the performance of application as little as possible. Our second technique is concurrent trace-based just-in-time compilation, which exploits available cores for the JavaScript runtime. The JavaScript language limits applications to a single-thread, so extra cores are worthless unless they are used by the runtime components. We redesigned a production trace-based JIT compiler to run concurrently with the interpreter, and our technique is the first to improve both responsiveness and throughput in a trace-based JIT compiler. This thesis presents the design and implementation of both techniques and shows that they improve scalability and core utilization when running applications in managed runtimes. Industry is already adopting our approaches, which demonstrates the urgency of the scalable runtime problem and the utility of these techniques.Item Separating data from metadata for robustness and scalability(2014-12) Wang, Yang, Ph. D.; Alvisi, Lorenzo; Dahlin, MichaelWhen building storage systems that aim to simultaneously provide robustness, scalability, and efficiency, one faces a fundamental tension, as higher robustness typically incurs higher costs and thus hurts both efficiency and scalability. My research shows that an approach to storage system design based on a simple principle—separating data from metadata—can yield systems that address elegantly and effectively that tension in a variety of settings. One observation motivates our approach: much of the cost paid by many strong protection techniques is incurred to detect errors. This observation suggests an opportunity: if we can build a low-cost oracle to detect errors and identify correct data, it may be possible to reduce the cost of protection without weakening its guarantees. This dissertation shows that metadata, if carefully designed, can serve as such an oracle and help a storage system protect its data with minimal cost. This dissertation shows how to effectively apply this idea in three very different systems: Gnothi—a storage replication protocol that combines the high availability of asynchronous replication and the low cost of synchronous replication for a small-scale block storage; Salus—a large-scale block storage with unprecedented guarantees in terms of consistency, availability, and durability in the face of a wide range of server failures; and Exalt—a tool to emulate a large storage system with 100 times fewer machines.