Browsing by Subject "CMP"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Control Techniques for Uncore Power Mangement in Chip Multiprocessor Designs(2013-08-01) Xu, ZhengIn chip-multiprocessor (CMP) designs, when the number of core increases, the size of on-chip communication fabric and data storage grows accordingly and therefore the chip power challenge is exacerbated. This thesis work considers the power management for networks-on-chip (NoC) and the last level cache, which constitute the uncore in CMP designs. NoC is regarded as a scalable approach to cope with the increasing demand for on-chip communication bandwidth. The last level cache is shared among all cores. The focus of this work is on the control techniques for uncore dynamic voltage and frequency scaling. A realistic but not well-studied scenario is investigated. That is, the entire uncore shares a single voltage/frequency domain, as opposed to separated domains in most of previous works. One appealing advantage here is that data packets no longer experience the interfacing overhead across different voltage/frequency domains. The classic PI (Proportional and Integral) control method is adopted due to its simplicity, flexibility and low implementation overhead. This thesis research outcome includes three parts. First, stability of the PI control is analyzed. Second, a model-assisted PI control scheme is proposed and studied. The model assist is to address the problem that no universally good reference point exists for the control. Third, the windup issue for the PI control is investigated. Full architecture simulations are performed on public benchmark suites to validate the proposed techniques. The result show 76% energy reduction with less than 6% performance degradation compared to constantly high voltage/frequency for uncore.Item Core-characteristic-aware off-chip memory management in a multicore system-on-chip(2012-12) Jeong, Min Kyu; Erez, Mattan; John, Lizy K.; Chiou, Derek; Lin, Calvin; Schulte, Michael J.Future processors will integrate an increasing number of cores because the scaling of single-thread performance is limited and because smaller cores are more power efficient. Off-chip memory bandwidth that is shared between those many cores, however, scales slower than the transistor (and core) count does. As a result, in many future systems, off-chip bandwidth will become the bottleneck of heavy demand from multiple cores. Therefore, optimally managing the limited off-chip bandwidth is critical to achieving high performance and efficiency in future systems. In this dissertation, I will develop techniques to optimize the shared use of limited off-chip memory bandwidth in chip-multiprocessors. I focus on issues that arise from the sharing and exploit the differences in memory access characteristics, such as locality, bandwidth requirement, and latency sensitivity, between the applications running in parallel and competing for the bandwidth. First, I investigate how the shared use of memory by many cores can result in reduced spatial locality in memory accesses. I propose a technique that partitions the internal memory banks between cores in order to isolate their access streams and eliminate locality interference. The technique compensates for the reduced bank-level parallelism of each thread by employing memory sub-ranking to effectively increase the number of independent banks. For three different workload groups that consist of benchmarks with high spatial locality, low spatial locality, and mixes of the two, the average system efficiency improves by 10%, 7%, 9% for 2-rank systems, and 18%, 25%, 20% for 1-rank systems, respectively, over the baseline shared-bank system. Next, I improve the performance of a heterogeneous system-on-chip (SoC) in which cores have distinct memory access characteristics. I develop a deadline-aware shared memory bandwidth management scheme for SoCs that have both CPU and GPU cores. I show that statically prioritizing the CPU can severely constrict GPU performance, and propose to dynamically adapt the priority of CPU and GPU memory requests based on the progress of GPU workload. The proposed dynamic bandwidth management scheme provides the target GPU performance while prioritizing CPU performance as much as possible, for any CPU-GPU workload combination with different complexities.Item Dynamic thermal management in chip multiprocessor systems(2009-05-15) Liu, Chih-ChunRecently, processor power density has been increasing at an alarming rate result- ing in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In our research, we ?rst propose a Predictive Dynamic Ther- mal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possi- ble overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. Furthermore, we present the Thermal Correlative Thermal Management (TCDTM), which incorporates three main components: Statistical Workload Estimation (SWE), Future Temperature Estima- tion Model (FTEM) and Temperature-Aware Thread Controller (TATC), to model the thermal correlation e?ect and distinguish the thermal contributions from appli- cations with di?erent workload behaviors at run time in the CMP systems. The pro- posed PDTM and TCDTM enable the exploration of the tradeo? between throughput and fairness in temperature-constrained multicore systems.Item Interfacial forces in chemical-mechanical polishing(2009-05-15) Ng, DedyThe demand for microelectronic device miniaturization requires new concepts and technology improvement in the integrated circuits fabrication. In last two decades, Chemical-Mechanical Polishing (CMP) has emerged as the process of choice for planarization. The process takes place at the interface of a substrate, a polishing pad, and an abrasive containing slurry. This synergetic process involves several forces in multi-length scales and multi-mechanisms. This research contributes fundamental understanding of surface and interface sciences of microelectronic materials with three major objectives. In order to extend the industrial impact of this research, the chemical-mechanical polishing (CMP) is used as a model system for this study. The first objective of this research is to investigate the interfacial forces in the CMP system. For the first time, the interfacial forces are discussed systematically and comparatively so that key forces in CMP can be pinpointed. The second objective of this research is to understand the basic principles of lubrication, i.e., fluid drag force that can be used to monitor, evaluate, and optimize CMP processes. New parameters were introduced to include the change of material properties during CMP. Using the experimental results, a new equation was developed to understand the principle of lubrication behind the CMP. The third objective is to study the synergy of those interfacial forces with electrochemistry. The electro-chemical-mechanical polishing (ECMP) of copper was studied. Experiments were conducted on the tribometer in combination with a potentiostat. Friction coefficient was used to monitor the polishing process and correlated with the wear behavior of post-CMP samples. Surface characterization was performed using AFM, SEM, and XPS techniques. Results from experiments were used to generate a new wear model, which provided insight from CMP mechanisms. The ECMP is currently the newest technique used in the semiconductor industries. This research is expected to contribute to the CMP technology and improve its process performance. This dissertation consists of six chapters. The first chapter covers the introduction and background information of surface forces and CMP. The motivation and objectives are discussed in the second chapter. The three major objectives which include approaches and expected results are covered in the next three chapters. Finally chapter VI summarizes the major discovery in this research and provides some recommendations for future work.Item Mobile Home Node: Improving Directory Cache Coherence Performance in NoCs via Exploitation of Producer-Consumer Relationships(2011-10-21) Soni, TarunThe implementation of multiple processors on a single chip has been made possible with advancements in process technology. The benefits of having multiple cores on a single chip bring with it a new set of constraints for maintaining fast and consistent memory accesses. Cache coherence protocols are needed to maintain the consistency of shared memory on individual caches. Current cache coherency protocols are either snoop based, which is not scalable but provides fast access for small number of cores, or directory based, which involves a directory that acts as the ordering point providing scalability with relatively slower access. Our focus is on improving the memory access time of the scalable directory protocol. We have observed that most memory requests follow a pattern where in one of the processors, which we will dub the Producer, repeatedly writes to a particular memory location. A subset of the remaining cores, which we will dub the Consumers, repeatedly read the data from that same memory location. In our implementation we utilize this relationship to provide direct cache to cache transfers and minimize the access time by avoiding the indirection through the directory. We move the directory temporarily to the Producer node so that the consumer can directly request the producer for the cache line. Our technique improves the memory access time by 13 percent and reduces network traffic by 30 percent over standard directory coherence protocol with very little area overhead.Item Nanoparticles removal in post-CMP (Chemical-Mechanical Polishing) cleaning(Texas A&M University, 2006-10-30) Ng, DedyResearch was performed to study the particle adhesion on the wafer surface after the chemical-mechanical polishing (CMP) process. The embedded particles can be abrasive particles from the slurry, debris from pad material, and particles of film being polished. Different methods of particle removal mechanism were investigated in order to find out the most effective technique. In post-CMP cleaning, surfactant was added in the solution. Results were compared with cleaning without surfactant and showed that cleaning was more effective with the combined interaction of the mechanical effort from the brush sweeping and the chemistry of the surfactant in the solution (i.e., tribochemical interaction). Numerical analysis was also performed to predict the particle removal rate with the addition of surfactants. The van der Waals forces present in the wafer-particle interface were calculated in order to find the energy required to remove the particle. Finally, the adhesion process was studied by modeling the van der Waals force as a function of separation distance between the particle and the surface. The successful adaptation of elasticity theory to nanoparticle-surface interaction brought insight into CMP cleaning mechanisms. The model tells us that it is not always the case that as the separation distance is decreased, the attraction force will be increased. The force value estimated can be used for slurry design and CMP process estimation.Item Priority Based Switch Allocator in Adaptive Physical Channel Regulator for On Chip Interconnects(2014-08-04) Mahapatra, SonaliChip multiprocessors (CMPs) are now popular design paradigm for microprocessors due to their power, performance and complexity advantages where a number of relatively simple cores are integrated on a single die. On chip interconnection network (NoC) is an excellent architectural paradigm which offers a stable and generalized communication platform for large scale of chip multiprocessors. The existing model APCR has three regulation schemes designed at switch allocation stage of NoC router pipelining, such as monopolizing, fair-sharing and channel-stealing. Its aim is to fairly allocate physical bandwidth in the form of flit level transmission unit while breaking the conventional assumptions i.e.its size is same as phit size. They have implemented channel-stealing scheme using the existing round-robin scheduler which is a well known scheduling algorithm for providing fairness, which is not an optimal solution. In this thesis, we have extended the efficiency of APCR model and propose three efficient scheduling policies for the channel stealing scheme in order to provide better quality of service (QoS). Our work can be divided into three parts. In the first part, we implemented ratio based scheduling technique in which we keep track of average number of its sent from each input in every cycle. It not only provides fairness among virtual channels (VCs), but also increases the saturation throughput of the network. In the second part, we have implemented an age based scheduling technique where we prioritize the VC, based on the age of the requesting flits. The age of each request is calculated as the difference between the time of injection and the current simulation time. Age based scheduler minimizes the packet latency. In the last part, we implemented a Static-Priority based scheduler. In this case, we arbitrarily assign random priorities to the packets at the time of their injection into the network. In this case, the high priority packets can be forwarded to any of the VCs, whereas the low priority packets can be forwarded to a limited number of VCs. So, basically Static-Priority based scheduler limits the accessibility on the number of VCs depending upon the packet priority. We study the performance metrics such as the average packet latency, and saturation throughput resulted by all the three new scheduling techniques. We demonstrate our simulation results for all three scheduling policies i.e. bit complement, transpose and uniform random considering from very low (no load) to high load injection rates. We evaluate the performance improvement because of our proposed scheduling techniques in APCR comparing with the performance of basic NoC design. The performance is also compared with the results found in monopolizing, fair-sharing and round-robin schemes for channel-stealing of APCR. It is observed from the simulation results using our detailed cycle-accurate simulator that our new scheduling policies implemented in APCR model improves the network throughput by 10% in case of synthetic workloads, compared with the existing round-robin scheme. Also, our scheduling policy in APCR model outperforms the baseline router by 28X under synthetic workloads.