Browsing by Author "Kim, Eun Jung"

Now showing 1 - 20 of 21

A verilog-hdl implementation of virtual channels in a network-on-chip router
(2009-05-15) Park, Sungho
As the feature size is continuously decreasing and integration density is increasing, interconnections have become a dominating factor in determining the overall quality of a chip. Due to the limited scalability of system bus, it cannot meet the requirement of current System-on-Chip (SoC) implementations where only a limited number of functional units can be supported. Long global wires also cause many design problems, such as routing congestion, noise coupling, and difficult timing closure. Network-on-Chip (NoC) architectures have been proposed to be an alternative to solve the above problems by using a packet-based communication network. The processing elements (PEs) communicate with each other by exchanging messages over the network and these messages go through buffers in each router. Buffers are one of the major resource used by the routers in virtual channel flow control. In this thesis, we analyze two kinds of buffer allocation approaches, static and dynamic buffer allocations. These approaches aim to increase throughput and minimize latency by means of virtual channel flow control. In statically allocated buffer architecture, size and organization are design time decisions and thus, do not perform optimally for all traffic conditions. In addition, statically allocated virtual channel consumes a waste of area and significant leakage power. However, dynamic buffer allocation scheme claims that buffer utilization can be increased using dynamic virtual channels. Dynamic virtual channel regulator (ViChaR), have been proposed to use centralized buffer architecture which dynamically allocates virtual channels and buffer slots in real-time depending on traffic conditions. This ViChaR?s dynamic buffer management scheme increases buffer utilization, but it also increases design complexity. In this research, we reexamine performance, power consumption, and area of ViChaR?s buffer architecture through implementation. We implement a generic router and a ViChaR architecture using Verilog-HDL. These RTL codes are verified by dynamic simulation, and synthesized by Design Compiler to get area and power consumption. In addition, we get latency through Static Timing Analysis. The results show that a ViChaR?s dynamic buffer management scheme increases the latency and power consumption significantly even though it could increase buffer utilization. Therefore, we need a novel design to achieve high buffer utilization without a loss.
Accelerating Communication in On-Chip Interconnection Networks
(2012-07-16) Ahn, Minseon
Due to the ever-shrinking feature size in CMOS process technology, it is expected that future chip multiprocessors (CMPs) will have hundreds or thousands of processing cores. To support a massively large number of cores, packet-switched on-chip interconnection networks have become a de facto communication paradigm in CMPs. However, the on-chip networks have several drawbacks, such as limited on-chip resources, increasing communication latency, and insufficient communication bandwidth. In this dissertation, several schemes are proposed to accelerate communication in on-chip interconnection networks within area and cost budgets to overcome the problems. First, an early transition scheme for fully adaptive routing algorithms is proposed to improve network throughput. Within a limited number of resources, previously proposed fully adaptive routing algorithms have low utilization in escape channels. To increase utilization of escape channels, it transfers packets earlier before the normal channels are full. Second, a pseudo-circuit scheme is proposed to reduce network latency using communication temporal locality. Reducing per-hop router delay becomes more important for communication latency reduction in larger on-chip interconnection networks. To improve communication latency, the previous arbitration information is reused to bypass switch arbitration. For further acceleration, we also propose two aggressive schemes, pseudo-circuit speculation and buffer bypassing. Third, two handshake schemes are proposed to improve network throughput for nanophotonic interconnects. Nanophotonic interconnects have been proposed to replace metal wires with optical links in on-chip interconnection networks for low latency and power consumptions as well as high bandwidth. To minimize the average token waiting time of the nanophotonic interconnects, the traditional credit-based flow control is removed. Thus, the handshake schemes increase link utilization and enhance network throughput.
ActiveSTB: an efficient wireless resource manager in home networks
(2009-05-15) Hall, Varrian Durand
The rapid growth of new wireless and mobile devices accessing the internet has led to an increase in the demand for multimedia streaming services. These home-based wireless connections require efficient distribution of shared network resources which is a major concern for the transport of stored video. In our study, a set-top box is the access point between the internet and a home network. Our main goal is to design a set-top box capable of performing network flow control in a home network and capable of quality adaptation of the delivered stream quality to the available bandwidth. To achieve our main goal, estimating the available bandwidth quickly and precisely is the first task in the decision of streaming rates of layered and scalable multimedia services. We present a novel bandwidth estimation method called IdleGap that uses the NAV (Network Allocation Vector) information in the wireless LAN. We will design a new set-top box that will implement IdleGap and perform buffering and quality adaptation to a wireless network based on the IdleGap?s bandwidth estimate. We use a network simulation tool called NS-2 to evaluate IdleGap and our ActiveSTB compared to traditional STBs. We performed several tests simulating network conditions over various ranges of cross traffic with different error rates and observation times. Our simulation results reveal how IdleGap accurately estimates the available bandwidth for all ranges of cross traffic (100Kbps ~ 1Mbps) with a very short observation time (10 seconds). Test results also reveal how our novel ActiveSTB outperforms traditional STBs and provides good QoS to the end-user by reducing latency and excess bandwidth consumption.
Adaptive Resource Management Schemes for Web Services
(2011-02-22) Lee, Heung Ki
Web cluster systems provide cost-effective solutions when scalable and reliable web services are required. However, as the number of servers in web cluster systems increase, web cluster systems incur long and unpredictable delays to manage servers. This study presents the efficient management schemes for web cluster systems. First of all, we propose an efficient request distribution scheme in web cluster systems. Distributor-based systems forward user requests to a balanced set of waiting servers in complete transparency to the users. The policy employed in forwarding requests from the frontend distributor to the backend servers plays an important role in the overall system performance. In this study, we present a proactive request distribution (ProRD) to provide an intelligent distribution at the distributor. Second, we propose the heuristic memory management schemes through a web prefetching scheme. For this study, we design a Double Prediction-by-Partial-Match Scheme (DPS) that can be adapted to the modern web frameworks. In addition, we present an Adaptive Rate Controller (ARC) to determine the prefetch rate depending on the memory status dynamically. For evaluating the prefetch gain in a server node, we implement an Apache module. Lastly, we design an adaptive web streaming system in wireless networks. The rapid growth of new wireless and mobile devices accessing the internet has contributed to a whole new level of heterogeneity in web streaming systems. Particularly, in-home networks have also increased in heterogeneity by using various devices such as laptops, cell phone and PDAs. In our study, a set-top box(STB) is the access pointer between the internet and a home network. We design an ActiveSTB which has a capability of buffering and quality adaptation based on the estimation for the available bandwidth in the wireless LAN.
Architectural Support for Efficient Communication in Future Microprocessors
(2010-01-16) Jin, Yu Ho
Traditionally, the microprocessor design has focused on the computational aspects of the problem at hand. However, as the number of components on a single chip continues to increase, the design of communication architecture has become a crucial and dominating factor in defining performance models of the overall system. On-chip networks, also known as Networks-on-Chip (NoC), emerged recently as a promising architecture to coordinate chip-wide communication. Although there are numerous interconnection network studies in an inter-chip environment, an intra-chip network design poses a number of substantial challenges to this well-established interconnection network field. This research investigates designs and applications of on-chip interconnection network in next-generation microprocessors for optimizing performance, power consumption, and area cost. First, we present domain-specific NoC designs targeted to large-scale and wire-delay dominated L2 cache systems. The domain-specifically designed interconnect shows 38% performance improvement and uses only 12% of the mesh-based interconnect. Then, we present a methodology of communication characterization in parallel programs and application of characterization results to long-channel reconfiguration. Reconfigured long channels suited to communication patterns enhance the latency of the mesh network by 16% and 14% in 16-core and 64-core systems, respectively. Finally, we discuss an adaptive data compression technique that builds a network-wide frequent value pattern map and reduces the packet size. In two examined multi-core systems, cache traffic has 69% compressibility and shows high value sharing among flows. Compression-enabled NoC improves the latency by up to 63% and saves energy consumption by up to 12%.
Architectural support for enhancing security in clusters
(2009-05-15) Lee, Man Hee
Cluster computing has emerged as a common approach for providing more comput- ing and data resources in industry as well as in academia. However, since cluster computer developers have paid more attention to performance and cost e?ciency than to security, numerous security loopholes in cluster servers come to the forefront. Clusters usually rely on ?rewalls for their security, but the ?rewalls cannot prevent all security attacks; therefore, cluster systems should be designed to be robust to security attacks intrinsically. In this research, we propose architectural supports for enhancing security of clus- ter systems with marginal performance overhead. This research proceeds in a bottom- up fashion starting from enforcing each cluster component's security to building an integrated secure cluster. First, we propose secure cluster interconnects providing con- ?dentiality, authentication, and availability. Second, a security accelerating network interface card architecture is proposed to enable low performance overhead encryption and authentication. Third, to enhance security in an individual cluster node, we pro- pose a secure design for shared-memory multiprocessors (SMP) architecture, which is deployed in many clusters. The secure SMP architecture will provide con?dential communication between processors. This will remove the vulnerability of eavesdrop- ping attacks in a cluster node. Finally, to put all proposed schemes together, we propose a security/performance trade-o? model which can precisely predict performance of an integrated secure cluster.
Architectural Support for High-Performance, Power-Efficient and Secure Multiprocessor Systems
(2012-10-19) An, Baik Song
High performance systems have been widely adopted in many fields and the demand for better performance is constantly increasing. And the need of powerful yet flexible systems is also increasing to meet varying application requirements from diverse domains. Also, power efficiency in high performance computing has been one of the major issues to be resolved. The power density of core components becomes significantly higher, and the fraction of power supply in total management cost is dominant. Providing dependability is also a main concern in large-scale systems since more hardware resources can be abused by attackers. Therefore, designing high-performance, power-efficient and secure systems is crucial to provide adequate performance as well as reliability to users. Adhering to using traditional design methodologies for large-scale computing systems has a limit to meet the demand under restricted resource budgets. Interconnecting a large number of uniprocessor chips to build parallel processing systems is not an efficient solution in terms of performance and power. Chip multiprocessor (CMP) integrates multiple processing cores and caches on a chip and is thought of as a good alternative to previous design trends. In this dissertation, we deal with various design issues of high performance multiprocessor systems based on CMP to achieve both performance and power efficiency while maintaining security. First, we propose a fast and secure off-chip interconnects through minimizing network overheads and providing an efficient security mechanism. Second, we propose architectural support for fast and efficient memory protection in CMP systems, making the best use of the characteristics in CMP environments and multi-threaded workloads. Third, we propose a new router design for network-on-chip (NoC) based on a new memory technique. We introduce hybrid input buffers that use both SRAM and STT-MRAM for better performance as well as power efficiency. Simulation results show that the proposed schemes improve the performance of off-chip networks through reducing the message size by 54% on average. Also, the schemes diminish the overheads of bounds checking operations, thus enhancing the overall performance by 11% on average. Adopting hybrid buffers in NoC routers contributes to increasing the network throughput up to 21%.
Design and Analysis of Dynamic Thermal Management in Chip Multiprocessors (CMPs)
(2011-02-22) Yeo, In Choon
Chip Multiprocessors (CMPs) have been prevailing in the modern microprocessor market. As the significant heat is converted by the ever-increasing power density and current leakage, the raised operating temperature in a chip has already threatened the system?s reliability and led the thermal control to be one of the most important issues needed to be addressed immediately in chip designs. Due to the cost and complexity of designing thermal packaging, many Dynamic Thermal Management (DTM) schemes have been widely adopted in modern processors. In this study, we focus on developing a simple and accurate thermal model, which provides a scheduling decision for running tasks. And we show how to design an efficient DTM scheme with negligible performance overhead. First, we propose an efficient DTM scheme for multimedia applications that tackles the thermal control problem in a unified manner. A DTM scheme for multimedia applications makes soft realtime scheduling decisions based on statistical characteristics of multimedia applications. Specifically, we model application execution characteristics as the probability distribution of the number of cycles required to decode frames. Our DTM scheme for multimedia applications has been implemented on Linux in two mobile processors providing variable clock frequencies in an Intel Pentium-M processor and an Intel Atom processor. In order to evaluate the performance of the proposed DTM scheme, we exploit two major codecs, MPEG-4 and H.264/AVC based on various frame resolutions. Our results show that our DTM scheme for multimedia applications lowers the overall temperature by 4 degrees C and the peak temperature by 6 degrees C (up to 10 degrees C), while maintaining frame drop ratio under 5% compared to existing DTM schemes for multimedia applications. Second, we propose a lightweight online workload estimation using the cumulative distribution function and architectural information via Performance Monitoring Counters (PMC) to observe the processes dynamic workload behaviors. We also present an accurate thermal model for CMP architectures to analyze the thermal correlation effects by profiling the thermal impacts from neighboring cores under the specific workload. Hence, according to the estimated workload characteristics and thermal correlation effects, we can estimate the future temperature of each core more accurately. We implement a DTM scheme considering workload characteristics and thermal correlation effects on real machines, an Intel Quad-Core Q6600 system and Dell PowerEdge 2950 (dual Intel Xeon E5310 Quad-Core) system, running applications ranging from multimedia applications to several benchmarks. Experiments results show that our DTM scheme reduces the peak temperature by 8% with 0.54% performance overhead compared to Linux Standard Scheduler, while existing DTM schemes reduce peak temperature by 4% with up to 50% performance overhead.
Dynamic thermal management in chip multiprocessor systems
(2009-05-15) Liu, Chih-Chun
Recently, processor power density has been increasing at an alarming rate result- ing in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In our research, we ?rst propose a Predictive Dynamic Ther- mal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possi- ble overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. Furthermore, we present the Thermal Correlative Thermal Management (TCDTM), which incorporates three main components: Statistical Workload Estimation (SWE), Future Temperature Estima- tion Model (FTEM) and Temperature-Aware Thread Controller (TATC), to model the thermal correlation e?ect and distinguish the thermal contributions from appli- cations with di?erent workload behaviors at run time in the CMP systems. The pro- posed PDTM and TCDTM enable the exploration of the tradeo? between throughput and fairness in temperature-constrained multicore systems.
Handshake and Circulation Flow Control in Nanaphotonic Interconnects
(2012-10-19) Jayabalan, Jagadish
Nanophotonics has been proposed to design low latency and high bandwidth Network-On-Chip (NOC) for future Chip Multi-Processors (CMPs). Recent nanophotonic NOC designs adopt the token-based arbitration coupled with credit-based flow control, which leads to low bandwidth utilization. This thesis proposes two handshake schemes for nanophotonic interconnects in CMPs, Global Handshake (GHS) and Distributed Handshake (DHS), which get rid of the traditional credit-based flow control, reduce the average token waiting time, and finally improve the network throughput. Furthermore, we enhance the basic handshake schemes with setaside buffer and circulation techniques to overcome the Head-Of-Line (HOL) blocking. The evaluations show that the proposed handshake schemes improve network throughput by up to 11x under synthetic workloads. With the extracted trace traffic from real applications, the handshake schemes can reduce the communication delay by up to 55%. The basic handshake schemes add only 0.4% hardware overhead for optical components and negligible power consumption. In addition, the performance of the handshake schemes is independent of on-chip buffer space, which makes them feasible in a large scale nanophotonic interconnect design.
Health disparity and the built environment: spatial disparity and environmental correlates of health status, obesity, and health disparity
(2009-05-15) Kim, Eun Jung
Increasing evidence suggests that the environment is related to many public health challenges. Unequal distributions of services and resources needed for healthy lifestyles may contribute to increasing levels of health disparity. However, empirical studies are not sufficient to understand the relationship between health disparity and the built environment. This dissertation examines how health disparity are associated with the built environment and if the environmental conditions that support physical activity and healthy diet are associated with lower health disparity. This research uses a multidisciplinary approach, drawing from urban planning, regional economics and public health. The data came from the Behavioral Risk Factor Surveillance System, and the GIS derived environmental data and the 608-respondent survey data from a larger study conducted in urbanized King County, Washington. Health disparity was measured with the Gini-coefficient, and health status and obesity were used as indicators of health. Hot spot analysis was used to identify the spatial aggregations of high health disparity, and multiple regression models identified the environmental correlates of health disparity. The overall trend showed that disparity has increased in most states in the US over the past decade and the southern states showed the highest disparity levels. Strong spatial autocorrelations were found for disparities, indicating that disparity levels are not equally distributed across different geographic areas. From the multivariate analyses estimating disparity levels, spatial regression models significantly improved the overall model fit compared to the ordinary least-square models. Areas with more supportive built environments for physical activity had lower health disparities, including proximity to downtown (+) and access to parks (+), day care centers (+), offices (+), schools (+), theaters (+), big box shopping centers (-), and libraries (-). Overall results showed that the built environment, compared to the personal factors, was more strongly correlated with health disparities. This study brings attention to the problem of health disparity in the US, and provides evidence supporting the existence of spatial disparity in the environmental support for a healthy lifestyle. Further research is needed to better understand environmental and socioeconomic conditions associated with health disparity among more diverse population groups and in different environmental settings.
High Performance Interconnect System Design for Future Chip Multiprocessors
(2013-05-02) Wang, Lei
Chip Multi-Processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Network-On-Chip (NOC) provides a scalable communication method for CMP architectures. NOC must be carefully designed to meet constraints of power and area, and provide ultra low latencies and high throughput. In this research, we explore different techniques to design high performance NOC. First, existing NOCs mostly use Dimension Order Routing (DOR) to determine the route taken by a packet in unicast traffic. However, with the development of diverse applications in CMPs, one-to-many (multicast) and one-to-all (broadcast) traffic are becoming more common. Current unicast routing cannot support multi-cast and broadcast traffic efficiently. We propose Recursive Partitioning Multicast (RPM) routing and a detailed multicast wormhole router design for NOCs. RPM allows routers to select intermediate replication nodes based on the global distribution of destination nodes. This provides more path diversities, thus achieves more bandwidth-efficiency and finally improves the performance of the whole network. Second, as feature size is shrinking, wires are becoming abundant resources available in NOC. Since NOC can benefit from high wire density due to no limits on the number of pins and faster signaling rates, it is very critical in the NOC router design to find a way that fully utilizes the wire resources to provide high performance. We propose an Adaptive Physical Channel Regulator (APCR) for NOC routers to exploit huge wiring resources. The flit size in an APCR router is less than the physical channel width (phit size) to provide finer granularity flow control. An APCR router allows flits from different packets or flows to share the same physical channel in a single cycle. The three regulation schemes (Monopolizing, Fair-sharing and Channel-stealing) intelligently allocate the output channel resources considering not only the availability of physical channels but the occupancy of input buffers. In an APCR router, each Virtual Channel can forward a dynamic number of flits every cycle depending on the run-time network status. Third, nanophotonics has been proposed to design low latency and high band- width NOC for future CMPs. Recent nanophotonic NOC designs adopt the token- based arbitration coupled with credit-based flow control, which leads to low band- width utilization. We propose two handshake schemes for nanophotonic interconnects in CMPs, Global Handshake (GHS) and Distributed Handshake (DHS), which get rid of the traditional credit-based flow control, reduce the average token waiting time, and finally improve the network throughput. Furthermore, we enhance the basic handshake schemes with setaside buffer and circulation techniques to overcome the Head-Of-Line (HOL) blocking.
Hybrid Nanophotonic NOC Design for GPGPU
(2012-07-16) Yuan, Wen
Due to the massive computational power, Graphics Processing Units (GPUs) have become a popular platform for executing general purpose parallel applications. The majority of on-chip communications in GPU architecture occur between memory controllers and compute cores, thus memory controllers become hot spots and bottle neck when conventional mesh interconnection networks are used. Leveraging this observation, we reduce the network latency and improve throughput by providing a nanophotonic ring network which connects all memory controllers. This new interconnection network employs a new routing algorithm that combines Dimension Ordered Routing (DOR) and nanophotonic ring algorithms. By exploring this new topology, we can achieve to reduce interconnection network latency by 17% on average (up to 32%) and improve IPC by 5% on average (up to 11.5%). We also analyze application characteristics of six CUDA benchmarks on the GPGPU-Sim simulator to obtain better perspective for designing high performance GPU interconnection network.
Macroscopic and spectroscopic investigation of interactions of arsenic with synthesized pyrite
(2009-05-15) Kim, Eun Jung
Sulfide minerals have been suggested to play an important role in regulating dissolved metal concentrations in anoxic environments. Pyrite is the most common sulfide mineral and it has shown an affinity for arsenic, but little is known about the arsenic retention mechanisms of pyrite. In this study, interactions of arsenic with pyrite were investigated in an anoxic environment to understand geochemical cycling of arsenic better and to predict arsenic fate and transport in the environment better. A procedure using microwaves was studied to develop a fast and reliable method for synthesizing pyrite. Arsenic-pyrite interactions were investigated using macroscopic (solution phase experiments) and microscopic (X-ray photoelectron spectroscopic investigation) approaches. Pyrite was successfully synthesized within a few minutes via reaction of ferric iron and hydrogen sulfide under the influence of irradiation by a conventional microwave oven. The SEM-EDX study revealed that the nucleation and growth of pyrite occurred on the surface of elemental sulfur, where polysulfides are available. Compared to conventional heating, microwave energy results in rapid (< 1 minute) formation of smaller particulates of pyrite. Higher levels of microwave power can form pyrite even faster, but faster reaction can lead to the formation of pyrite with defects. Arsenic removal by pyrite was strongly dependent on pH and arsenic species. Both arsenite (As(III)) and arsenate (As(V)) had a strong affinity for the pyrite surface under acidic conditions, but As(III) was removed more effectively than As(V). Under acidic conditions, arsenic removal continued to occur almost linearly with time until complete removal was achieved. However, under neutral to alkaline conditions, fast removal was followed by slow removal and complete removal was not achieved in our experimental conditions. A BET isotherm equation provided the best fit to arsenic removal data, suggesting that surface precipitation occurred at high arsenic/pyrite ratio. The addition of competing ions did not substantially affect the ultimate distribution of arsenic between the pyrite surface and the solution, but changing pH affected arsenic stability on pyrite. X-ray photoelectron spectroscopy revealed that under acidic conditions, arsenic was removed and formed solid phases similar to As2S3 and As4S4 by reaction with pyrite. However, under neutral to alkaline conditions, arsenic was removed and formed As(III)-O and As(V)-O surface complexes, as well as As2S3/As4S4-like precipitates. As pH increases, the amount of arsenic that formed As2S3/As4S4-like precipitates decreased, while the amount that formed As(III)-O and As(V)-O surface complexes increased. Under alkaline conditions, a FeAsS-like phase was also detected.
On improving performance and conserving power in cluster-based web servers
(Texas A&M University, 2007-04-25) Vageesan, Gopinath
Efficiency and power conservation are critical issues in the design of cluster systems because these two parameters have direct implications on the user experience and the global need to conserve power. Widely adopted, distributor-based systems forward client requests to a balanced set of waiting servers in complete transparency to the clients. The policy employed in forwarding requests from the front-end distributor to the backend servers plays an important role in the overall system performance. Existing research separately addresses server performance and power conservation. The locality-aware request distribution (LARD) scheme improves the system response time by having the requests served by web servers which have the data in their cache. The power-aware request distribution aims at reducing the power consumption by turning the web servers OFF and ON according to the load. This research tries to achieve power conservation while preserving the performance of the system. First, we prove that using both power-aware and locality-aware request distribution together provides optimum power conservation, while still maintaining the required QoS of the system. We apply the usage of pinned memory in the backend servers to boost performance along with a request distributor design based on power and locality considerations. Secondly, we employ an intelligent-proactive-distribution policy at the front-end to improve the distribution scheme and complementary pre-fetching at the backend server nodes. The proactive distribution depends on both online and offline analysis of the website log files, which capture user navigation patterns on the website. The prefetching scheme pre-fetches the web pages into the memory based on a confidence value of the web page predicted by backend using the log file analysis. Designed to work with the prevailing web technologies, such as HTTP 1.1, our scheme provides reduced response time to the clients and improved power conservation at the backend server cluster. Simulations carried out with traces derived from the log files of real web servers witness performance boost of 15-45% and 10-40% power conservation in comparison to the existing distribution policies.
Priority Based Switch Allocator in Adaptive Physical Channel Regulator for On Chip Interconnects
(2014-08-04) Mahapatra, Sonali
Chip multiprocessors (CMPs) are now popular design paradigm for microprocessors due to their power, performance and complexity advantages where a number of relatively simple cores are integrated on a single die. On chip interconnection network (NoC) is an excellent architectural paradigm which offers a stable and generalized communication platform for large scale of chip multiprocessors. The existing model APCR has three regulation schemes designed at switch allocation stage of NoC router pipelining, such as monopolizing, fair-sharing and channel-stealing. Its aim is to fairly allocate physical bandwidth in the form of flit level transmission unit while breaking the conventional assumptions i.e.its size is same as phit size. They have implemented channel-stealing scheme using the existing round-robin scheduler which is a well known scheduling algorithm for providing fairness, which is not an optimal solution. In this thesis, we have extended the efficiency of APCR model and propose three efficient scheduling policies for the channel stealing scheme in order to provide better quality of service (QoS). Our work can be divided into three parts. In the first part, we implemented ratio based scheduling technique in which we keep track of average number of its sent from each input in every cycle. It not only provides fairness among virtual channels (VCs), but also increases the saturation throughput of the network. In the second part, we have implemented an age based scheduling technique where we prioritize the VC, based on the age of the requesting flits. The age of each request is calculated as the difference between the time of injection and the current simulation time. Age based scheduler minimizes the packet latency. In the last part, we implemented a Static-Priority based scheduler. In this case, we arbitrarily assign random priorities to the packets at the time of their injection into the network. In this case, the high priority packets can be forwarded to any of the VCs, whereas the low priority packets can be forwarded to a limited number of VCs. So, basically Static-Priority based scheduler limits the accessibility on the number of VCs depending upon the packet priority. We study the performance metrics such as the average packet latency, and saturation throughput resulted by all the three new scheduling techniques. We demonstrate our simulation results for all three scheduling policies i.e. bit complement, transpose and uniform random considering from very low (no load) to high load injection rates. We evaluate the performance improvement because of our proposed scheduling techniques in APCR comparing with the performance of basic NoC design. The performance is also compared with the results found in monopolizing, fair-sharing and round-robin schemes for channel-stealing of APCR. It is observed from the simulation results using our detailed cycle-accurate simulator that our new scheduling policies implemented in APCR model improves the network throughput by 10% in case of synthetic workloads, compared with the existing round-robin scheme. Also, our scheduling policy in APCR model outperforms the baseline router by 28X under synthetic workloads.
STT-MRAM Based NoC Buffer Design
(2012-10-19) Vikram Kulkarni, Nikhil
As Chip Multiprocessor (CMP) design moves toward many-core architectures, communication delay in Network-on-Chip (NoC) is a major bottleneck in CMP design. An emerging non-volatile memory - STT MRAM (Spin-Torque Transfer Magnetic RAM) which provides substantial power and area savings, near zero leakage power, and displays higher memory density compared to conventional SRAM. But STT-MRAM suffers from inherit drawbacks like multi cycle write latency and high write power consumption. So, these problem have to addressed in order to have an efficient design to incorporate STT-MRAM for NoC input buffer instead of traditional SRAM based input buffer design. Motivated by short intra-router latency, previously proposed write latency reduction technique is explored by sacrificing retention time and a hybrid design of input buffers using both SRAM and STT-MRAM to "hide" the long write latency efficiently is proposed. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer is also proposed.
Throughput-Efficient Network-on-Chip Router Design with STT-MRAM
(2012-11-02) Narayana, Sagar 1986-
As the number of processor cores on a chip increases with the advance of CMOS technology, there has been a growing need of more efficient Network-on-Chip (NoC) design since communication delay has become a major bottleneck in large-scale multicore systems. In designing efficient input buffers of NoC routers for better performance and power efficiency, Spin-Torque Transfer Magnetic RAM (STT-MRAM) is regarded as a promising solution due to its nature of high density and near-zero leakage power. Previous work that adopts STT-MRAM in designing NoC router input buffer shows a limitation in minimizing the overhead of power consumption, even though it succeeds to some degree in achieving high network throughput by the use of SRAM to hide the long write latency of STT-MRAM. In this thesis, we propose a novel input buffer design that depends solely on STT-MRAM without the need of SRAM to maximize the benefits of low leakage power and area efficiency inherent in STT-MRAM. In addition, we introduce power-efficient buffer refreshing schemes synergized with age-based switch arbitration that gives higher priority to older flits to remove unnecessary refreshing operations. On an average, we observed throughput improvements of 16% on synthetic workloads and benchmarks.
Topology management protocols in ad hoc wireless sensor networks
(2009-05-15) Kim, Hogil
A wireless sensor network (WSN) is comprised of a few hundred or thousand au-tonomous sensor nodes spatially distributed over a particular region. Each sensornode is equipped with a wireless communication device, a small microprocessor, anda battery-powered energy source. Typically, the applications of WSNs such as habitatmonitoring, re detection, and military surveillance, require data collection, process-ing, and transmission among the sensor nodes. Due to their energy constraints andhostile environments, the main challenge in the research of WSN lies in prolongingthe lifetime of WSNs.In this dissertation, we present four dierent topology management protocols forK-coverage and load balancing to prolong the lifetime of WSNs.First, we present a Randomly Ordered Activation and Layering (ROAL) protocolfor K-coverage in a stationary WSN. The ROAL suggests a new model of layer cov-erage that can construct a K-covered WSN using the layer information received fromits previously activated nodes in the sensing distance. Second, we enhance the faulttolerance of layer coverage through a Circulation-ROAL (C-ROAL) protocol. Us-ing the layer number, the C-ROAL can activate each node in a round-robin fashionduring a predened period while conserving reconguration energy. Next, MobilityResilient Coverage Control (MRCC) is presented to assure K-coverage in the presence of mobility, in which a more practical and reliable model for K-coverage with nodalmobility is introduced. Finally, we present a Multiple-Connected Dominating Set(MCDS) protocol that can balance the network trac using an on-demand routingprotocol. The MCDS protocol constructs and manages multiple backbone networks,each of which is constructed with a connected dominating set (CDS) to ensure a con-nected backbone network. We describe each protocol, and compare the performanceof our protocols with Dynamic Source Routing (DSR) and/or existing K-coveragealgorithms through extensive simulations.The simulation results obtained by the ROAL protocol show that K-coverage canbe guaranteed with more than 95% coverage ratio, and signicantly extend networklifetime against a given WSN. We also observe that the C-ROAL protocol provides abetter reconguration method, which consumes only less than 1% of the recongura-tion energy in the ROAL protocol, with a greatly reduced packet latency. The MRCCprotocol, considering the mobility, achieves better coverage by 1.4% with 22% feweractive sensors than that of an existing coverage protocol for the mobility. The resultson the MCDS protocol show that the energy depletion ratio of nodes is decreasedconsequently, while the network throughput is improved by 35%.
Variable length pattern coding for power reduction in off-chip data buses
(2009-05-15) Venkitasubramanian Iyer, Jayakrishnan
Off-chip buses consume a huge fraction (20%-40%) of the system power. Hence, techniques such as increasing bus widths, transition encoding etc. have been used for power reduction on off-chip data buses. Since capacitances at the I/O pads and interwire capacitances contribute significantly to increase in power, encoding/decoding schemes have been developed to reduce switching activity of the off-chip bus lines, thus reducing power. Frequent-Value Encoding(FVE) [1], Frequent Value Encoding with Xor (FVExor) [1] and VALVE [2] are some of the better known encoding schemes but they still have scope for improvement. This thesis addresses the problem of power reduction in off-chip data buses by encoding variable number (1 to 4) of fixed-size (32-bit) data values (variable length patterns) which exhibit temporal locality. This characteristic enables us to cache these patterns using 64-entry CAM at the encoder and 64-entry SRAM at the decoder. Whenever a pattern match occurs a 2-bit code indicating the index of the match is sent. If a variable length pattern match occurs then the code and unmatched portion of data is sent. We implemented our scheme, Variable Length Pattern Coding (VLPC) for various integer and floating point benchmarks and have seen 6% to 49% encodable patterns in these benchmarks. Based on the experiments on simplescalar and our analysis in MATLAB, we obtained 4.88% to 40.11% reduction in transition activity for SPEC2000 benchmarks such as crafty, swim, mcf, applu, ammp etc. over unencoded data. This is 0.3% to 38.9% higher than that obtained using FVE, FVExor [1] and VALVE [2] encoding schemes. Finally, we have designed a low-power custom CAM and SRAM using 45nm BSIM4 technology models which has been used to verify lower latency of data matching and storing.

Browsing by Author "Kim, Eun Jung"

Results Per Page

Sort Options