Browsing by Subject "VLSI"
Now showing 1 - 20 of 22
Results Per Page
Sort Options
Item Algorithms for the scaling toward nanometer VLSI physical synthesis(Texas A&M University, 2007-04-25) Sze, Chin NgaiAlong the history of Very Large Scale Integration (VLSI), we have successfully scaled down the size of transistors, scaled up the speed of integrated circuits (IC) and the number of transistors in a chip - these are just a few examples of our achievement in VLSI scaling. It is projected to enter the nanometer (timing estimation and buffer planning for global routing and other early stages such as floorplanning. A novel path based buffer insertion scheme is also included, which can overcome the weakness of the net based approaches. Part-2 Circuit clustering techniques with the application in Field-Programmable Gate Array (FPGA) technology mapping The problem of timing driven n-way circuit partitioning with application to FPGA technology mapping is studied and a hierarchical clustering approach is presented for the latest multi-level FPGA architectures. Moreover, a more general delay model is included in order to accurately characterize the delay behavior of the clusters and circuit elements.Item An efficient logic fault diagnosis framework based on effect-cause approach(2009-05-15) Wu, LeiFault diagnosis plays an important role in improving the circuit design process and the manufacturing yield. With the increasing number of gates in modern circuits, determining the source of failure in a defective circuit is becoming more and more challenging. In this research, we present an efficient effect-cause diagnosis framework for combinational VLSI circuits. The framework consists of three stages to obtain an accurate and reasonably precise diagnosis. First, an improved critical path tracing algorithm is proposed to identify an initial suspect list by backtracing from faulty primary outputs toward primary inputs. Compared to the traditional critical path tracing approach, our algorithm is faster and exact. Second, a novel probabilistic ranking model is applied to rank the suspects so that the most suspicious one will be ranked at or near the top. Several fast filtering methods are used to prune unrelated suspects. Finally, to refine the diagnosis, fault simulation is performed on the top suspect nets using several common fault models. The difference between the observed faulty behavior and the simulated behavior is used to rank each suspect. Experimental results on ISCAS85 benchmark circuits show that this diagnosis approach is efficient both in terms of memory space and CPU time and the diagnosis results are accurate and reasonably precise.Item An Improved Lagrangian Relaxation Method for VLSI Combinational Circuit Optimization(2012-02-14) Huang, Yi-LeGate sizing and threshold voltage (Vt) assignment are very popular and useful techniques in current very large scale integration (VLSI) design flow for timing and power optimization. Lagrangian relaxation (LR) is a common method for handling multi-objectives and proven to reach optimal solution under continuous solution space. However, it is more complex to use Lagrangian relaxation under discrete solution space. The Lagrangian dual problem is non-convex and previously a sub-gradient method was used to solve it. The sub-gradient method is a greedy approach for substituting gradient method in the deepest descent method, and has room for further improvement. In addition, Lagrangian sub-problem cannot be solved directly by mathematical approaches under discrete solution space. Here we propose a new Lagrangian relaxation-based method for simultaneous gate sizing and Vt assignment under discrete solution space. In this work, some new approaches are provided to solve the Lagrangian dual problem considering not only slack but also the relationship between Lagrangian multipliers and circuit timing. We want to solve the Lagrangian dual problem more precisely than did previous methods, such as the sub-gradient method. In addition, a table-lookup method is provided to replace mathematical approaches for solving the Lagrangian sub-problem under discrete size and Vt options. The experimental results show that our method can lead to about 50 percent and 58 percent power reduction subject to the same timing constraints compared with a Lagrangian relaxation method using sub-gradient method and a state-of-the-art previous work. These two methods are implemented by us for comparison. Our method also results in better circuit timing subject to tight timing constraints.Item Analytical Layer Planning for Nanometer VLSI Designs(2012-10-19) Chang, Chi-YuIn this thesis, we proposed an intermediate sub-process between placement and routing stage in physical design. The algorithm is for generating layer guidance for post-placement optimization technique especially buffer insertion. This issue becomes critical in nowadays VLSI chip design due to the factor of timing, congestion, and increasingly non-uniform parasitic among different metal layers. Besides, as a step before routing, this layer planning algorithm accounts for routability by considering minimized overlap area between different nets. Moreover, layer directive information which is a crucial concern in industrial design is also considered in the algorithm. The core problem is formulated as nonlinear programming problem which is composed of objective function and constraints. The problem is further solved by conjugate gradient method. The whole algorithm is implemented by C++ under Linux operating system and tested on ISPD2008 Global Routing Contest Benchmarks. The experiment results are shown in the end of this thesis and confirm the effectiveness of our approach especially in routability aspect.Item CAD for nanolithography and nanophotonics(2011-08) Ding, Duo; Pan, David Z.; Chen, Ray T.; Ghosh, Joydeep; Orshansky, Michael E.; Torres, J. Andres; Touba, NurAs the semiconductor technology roadmap further extends, the development of next generation silicon systems becomes critically challenged. On the one hand, design and manufacturing closures become much more difficult due to the widening gap between the increasing integration density and the limited manufacturing capability. As a result, manufacturability issues become more and more critically challenged in the design of reliable silicon systems. On the other hand, the continuous scaling of feature size imposes critical issues on traditional interconnect materials (Cu/Low-K dielectrics) due to power, delay and bandwidth concerns. As a result, multiple classes of new materials are under research and development for future generation technologies. In this dissertation, we investigate several critical Computer-Aided Design (CAD) challenges under advanced nanolithography and nanophotonics technologies. In addressing these challenges, we propose systematic CAD methodologies and optimization techniques to assist the design of high-yield and high-performance integrated circuits (IC) with low power consumption. In Very Large Scale Integration (VLSI) CAD for nanolithography, we study the manufacturing variability under resolution enhancement techniques (RETs) and explore two important topics: (1) fast and high fidelity lithography hotspot detection; (2) generic and efficient manufacturability aware physical design. For the first topic, we propose a number of CAD optimization and integration techniques to achieve the following goals in detecting lithography hotspots: (a) high hotspot detection accuracy; (b) low false-positive rate (hotspot false-alarms); (c) good capability to trade-off between detection accuracy and false-alarms; (d) fast CPU run-time; and (e) excellent layout coverage and computation scalability as design gets more complex. For the second topic, we explore the routing stage by incorporating post-RET manufacturability models into the mathematical formulation of a detailed router to achieve: (a) significantly reduced lithography-unfriendly patterns; (b) small CPU run-time overhead; and (c) formulation generality and compatibility to all types of RETs and evoling manufacturing conditions. In VLSI CAD for nanophotonics, we focus on three topics: (1) characterization and evaluation of standard on-chip nanophotonics devices; (2) low power planar routing for on-chip opto-electrically interconnected systems; (3) power-efficient and thermal-reliable design of nanophotonics Wavelength Division Multiplexing for ultra-high bandwidth on-chip communication. With simulations and experiments, we demonstrate the critical role and effectiveness of Computer-Aided Design techniques as the semiconductor industry marches forward in the deeper sub-micron (45nm and below) domain.Item Case studies on lithography-friendly vlsi circuit layout(2009-05-15) Shah, Pratik JitendraMoore?s Law has driven a continuous demand for decreasing feature sizes used in Very Large Scale Integrated (VLSI) technology which has outpaced the solutions offered by lithography hardware. Currently, a light wavelength of 193nm is being used to print sub-65nm features. This introduces process variations which cause mismatches between desired and actual wafer feature sizes. However, the layout which affects the printability of a circuit can be modified in a manner which can make it more lithography-friendly. In this work, we intend to implement these modifications as a series of perturbations on the initial layout generated by the CAD tool for the circuit. To implement these changes we first calculate the feature variations offline on the boundaries of all possible standard cell pairs used in the circuit layout and record them in a Look-Up Table (LUT). After the CAD tool generates the initial placement of the circuit, we use the LUT to estimate the variations on the boundaries of all the standard cells. Depending on the features which may have the highest feature variations we assign a cost to the layout and our aim is now to reduce the cost of the layout after implementing perturbations which could be a simple cell flip or swap with a neighboring cell. The algorithm used to generate a circuit placement with a low cost is Simulated Annealing which allows a high probability for a solution with a higher cost to be selected during the initial iterations and as time goes on it tends closer to the greedy algorithm. The idea here is to avoid a locally optimum solution. It is also essential to minimize the impact of the iterations performed on the initial solution in terms of wirelength, vias and routing congestion. We validate our procedure on ISCAS85 benchmark circuits by simulating dose and defocus variations using the Mentor tool Calibre LFD. We obtain a reduction of greater 20% in the number of instances with the highest cell boundary feature variations. The wirelength and the number of vias showed an increase of roughly 2.2-8.8% and 1.2- 7.8% respectively for different circuits. The routing congestion by and large remains unaffected.Item Data integrity for on-chip interconnects(Texas A&M University, 2007-09-17) Singhal, RohitWith shrinking feature size and growing integration density in the Deep Sub- Micron (DSM) technologies, the global buses are fast becoming the "weakest-links" in VLSI design. They have large delays and are error-prone. Especially, in system-onchip (SoC) designs, where parallel interconnects run over large distances, they pose difficult research and design problems. This work presents an approach for evaluating the data carrying capacity of such wires. The method treats the delay and reliability in interconnects from an information theoretic perspective. The results point to an optimal frequency of operation for a given bus dimension for maximum data transfer rate. Moreover, this optimal frequency is higher than that achieved by present day designs which accommodate the worst case delays. This work also proposes several novel ways to approach this optimal data transfer rate in practical designs.From the analysis of signal propagation delay in long wires, it is seen that the signal delay distribution has a long tail, meaning that most signals arrive at the output much faster than the worst case delay. Using communication theory, these "good" signals arriving early can be used to predict/correct the "few" signals that arrive late. In addition to this correction based on prediction, the approaches use coding techniques to eliminate high delay cases to generate a higher transmission rate. The work also extends communication theoretic approaches to other areas of VLSI design. Parity groups are generated based on low output delay correlation to add redundancy in combinatorial circuits. This redundancy is used to increase the frequency of operation and/or reduce the energy consumption while improving the overall reliability of the circuit.Item Design for manufacturing (DFM) in submicron VLSI design(2009-05-15) Cao, KeAs VLSI technology scales to 65nm and below, traditional communication between design and manufacturing becomes more and more inadequate. Gone are the days when designers simply pass the design GDSII ?le to the foundry and expect very good man?ufacturing and parametric yield. This is largely due to the enormous challenges in the manufacturing stage as the feature size continues to shrink. Thus, the idea of DFM (Design for Manufacturing) is getting very popular. Even though there is no universally accepted de?nition of DFM, in my opinion, one of the major parts of DFM is to bring manufacturing information into the design stage in a way that is understood by designers. Consequently, designers can act on the information to improve both manufacturing and parametric yield. In this dissertation, I will present several attempts to reduce the gap between design and manufacturing communities: Alt-PSM aware standard cell designs, printability improve?ment for detailed routing and the ASIC design ?ow with litho aware static timing analysis. Experiment results show that we can greatly improve the manufacturability of the designs and we can reduce design pessimism signi?cantly for easier design closure.Item Efficient Design and Clocking for a Network-on-Chip(2013-05-02) Mandal, AyanAs VLSI fabrication technology scales, an increasing number of processing elements (cores) on a chip makes on-chip communication a new performance bottleneck. The Network-on-Chip (NoC) paradigm has emerged as an efficient and scalable infrastructure to handle the communication needs for such multi-core systems. In most existing NoCs, design decisions are made assuming that the NoC operates at the same or lower clock speed as the cores, which slows down the communication system. A major challenge in designing a high speed NoC is the difficulty of distributing a high speed, low power clock across the chip. In this dissertation, we first propose several techniques to address the issue of distributing a high-speed, low power, low jitter clock across the IC. We primarily focus our attention on resonant standing wave oscillators (SWOs), which have recently emerged as a promising technique for high-speed, low power clock generation. In addition, we also present a dynamic programming based approach to synthesize a low jitter, low power buffered H-tree for clock distribution. In the second part of this dissertation, we use these efficient clock distribution schemes to present a novel fast NoC design that relies on source synchronous data transfer over a ring. In our source-synchronous design, the clock and data NoC are routed in parallel yielding a fast, robust design. Architectural simulations on synthetic and real traffic show that our source-synchronous NoC designs can provide significantly lower latency while achieving the same or better bandwidth compared to a state of the art mesh, while consuming lower area. The fact that the our ring-based NoC runs significantly faster than the mesh contributes to these improvements. Moreover, since our proposed NoC designs are fully synchronous, they are very amenable to testing as well. In the last part of this dissertation, we explore an alternate scheme of achieving high-speed on-chip data transfer using sinusoidal signals of different frequencies. The key advantage of our method is the ability to superimpose such sinusoids and thereby effectively send multiple logic values along the same wire in a clock cycle. Experimental results show that for the same throughput as that of a traditional scheme, we require significantly fewer wires.Item Electromigration modeling and layout optimization for advanced VLSI(2014-05) Pak, Jiwoo; Pan, David Z.; Lim, Sung K; Touba, Nur A; Orshansky, Michael; Sun, NanElectromigration (EM) is a critical problem for interconnect reliability in advanced VLSI design. Because EM is a strong function of current density, a smaller cross-sectional area of interconnects can degrade the EM-related lifetime of IC, which is expected to become more severe in future technology nodes. Moreover, as EM is governed by various factors such as temperature, material property, geometrical shape, and mechanical stress, different interconnect structures can have distinct EM issues and solutions to mitigate them. For example, one of the most prominent technologies, die stacking technology of three-dimensional (3D) ICs, can have different EM problems from that of planer ICs, due to their unique interconnects such as through-silicon vias (TSVs). This dissertation investigates EM in various interconnect structures, and applies the EM models to optimize IC layout. First, modeling of EM is developed for chip-level interconnects, such as wires, local vias, TSVs, and multi-scale vias (MSVs). Based on the models, fast and accurate EM-prediction methods are proposed for the chip-level designs. After that, by utilizing the EM-prediction methods, the layout optimization methods are suggested, such as EM-aware routing for 3D ICs and EM-aware redundant via insertion for the future technology nodes in VLSI. Experimental results show that the proposed EM modeling approaches enable fast and accurate EM evaluation for chip design, and the EM-aware layout optimization methods improve EM-robustness of advanced VLSI designs.Item High throughput low power decoder architectures for low density parity check codes(Texas A&M University, 2005-11-01) Selvarathinam, Anand ManivannanA high throughput scalable decoder architecture, a tiling approach to reduce the complexity of the scalable architecture, and two low power decoding schemes have been proposed in this research. The proposed scalable design is generated from a serial architecture by scaling the combinational logic; memory partitioning and constructing a novel H matrix to make parallelization possible. The scalable architecture achieves a high throughput for higher values of the parallelization factor M. The switch logic used to route the bit nodes to the appropriate checks is an important constituent of the scalable architecture and its complexity is high with higher M. The proposed tiling approach is applied to the scalable architecture to simplify the switch logic and reduce gate complexity. The tiling approach generates patterns that are used to construct the H matrix by repeating a fixed number of those generated patterns. The advantages of the proposed approach are two-fold. First, the information stored about the H matrix is reduced by onethird. Second, the switch logic of the scalable architecture is simplified. The H matrix information is also embedded in the switch and no external memory is needed to store the H matrix. Scalable architecture and tiling approach are proposed at the architectural level of the LDPC decoder. We propose two low power decoding schemes that take advantage of the distribution of errors in the received packets. Both schemes use a hard iteration after a fixed number of soft iterations. The dynamic scheme performs X soft iterations, then a parity checker cHT that computes the number of parity checks in error. Based on cHT value, the decoder decides on performing either soft iterations or a hard iteration. The advantage of the hard iteration is so significant that the second low power scheme performs a fixed number of iterations followed by a hard iteration. To compensate the bit error rate performance, the number of soft iterations in this case is higher than that of those performed before cHT in the first scheme.Item IMPACT OF DYNAMIC VOLTAGE SCALING (DVS) ON CIRCUIT OPTIMIZATION(2010-01-16) Esquit Hernandez, Carlos A.Circuit designers perform optimization procedures targeting speed and power during the design of a circuit. Gate sizing can be applied to optimize for speed, while Dual-VT and Dynamic Voltage Scaling (DVS) can be applied to optimize for leakage and dynamic power, respectively. Both gate sizing and Dual-VT are design-time techniques, which are applied to the circuit at a fixed voltage. On the other hand, DVS is a run-time technique and implies that the circuit will be operating at a different voltage than that used during the optimization phase at design-time. After some analysis, the risk of non-critical paths becoming critical paths at run-time is detected under these circumstances. The following questions arise: 1) should we take DVS into account during the optimization phase? 2) Does DVS impose any restrictions while performing design-time circuit optimizations?. This thesis is a case study of applying DVS to a circuit that has been optimized for speed and power, and aims at answering the previous two questions. We used a 45-nm CMOS design kit and flow. Synthesis, placement and routing, and timing analysis were applied to the benchmark circuit ISCAS?85 c432. Logical Effort and Dual-VT algorithms were implemented and applied to the circuit to optimize for speed and leakage power, respectively. Optimizations were run for the circuit operating at different voltages. Finally, the impact of DVS on circuit optimization was studied based on HSPICE simulations sweeping the supply voltage for each optimization. The results showed that DVS had no impact on gate sizing optimizations, but it did on Dual-VT optimizations. It is shown that we should not optimize at an arbitrary voltage. Moreover, simulations showed that Dual-VT optimizations should be performed at the lowest voltage that DVS is intended to operate, otherwise non-critical paths will become critical paths at run-time.Item Layout optimization in ultra deep submicron VLSI design(Texas A&M University, 2006-08-16) Wu, DiAs fabrication technology keeps advancing, many deep submicron (DSM) effects have become increasingly evident and can no longer be ignored in Very Large Scale Integration (VLSI) design. In this dissertation, we study several deep submicron problems (eg. coupling capacitance, antenna effect and delay variation) and propose optimization techniques to mitigate these DSM effects in the place-and-route stage of VLSI physical design. The place-and-route stage of physical design can be further divided into several steps: (1) Placement, (2) Global routing, (3) Layer assignment, (4) Track assignment, and (5) Detailed routing. Among them, layer/track assignment assigns major trunks of wire segments to specific layers/tracks in order to guide the underlying detailed router. In this dissertation, we have proposed techniques to handle coupling capacitance at the layer/track assignment stage, antenna effect at the layer assignment, and delay variation at the ECO (Engineering Change Order) placement stage, respectively. More specifically, at layer assignment, we have proposed an improved probabilistic model to quickly estimate the amount of coupling capacitance for timing optimization. Antenna effects are also handled at layer assignment through a linear-time tree partitioning algorithm. At the track assignment stage, timing is further optimized using a graph based technique. In addition, we have proposed a novel gate splitting methodology to reduce delay variation in the ECO placement considering spatial correlations. Experimental results on benchmark circuits showed the effectiveness of our approaches.Item Low power low-density parity-checking (ldpc) codes decoder design using dynamic voltage and frequency scaling(2009-05-15) Wang, WeihuangThis thesis presents a low-power LDPC decoder design based on speculative scheduling of energy necessary to decode dynamically varying data frame in both block-fading channels and general AWGN channels. A model of a memory-efficient low-power high-throughput multi-rate array LDPC decoder as well as its FPGA implementa- tion results is first presented. Then, I propose a decoding scheme that provides the feature of constant-time decoding and thus facilitates real-time applications where guaranteed data rate is required. It pre-analyzes each received data frame to estimate the maximum number of necessary iterations for frame convergence. The results are then used to dynamically adjust decoder frequency and switch between multiple-voltage levels; thereby energy use is minimized. This is in contrast to the conventional fixed-iteration decoding schemes that operate at a fixed voltage level regardless of the quality of data received. Analysis shows that the proposed decoding scheme is widely applicable for both two-phase message-passing (TPMP) decoding algorithm and turbo decoding message passing (TDMP) decoding algorithm in block fading channels, and it is independent of the specific LDPC decoder architecture. A decoder architecture utilizing our recently published multi-rate decoding architecture for general AWGN channels is also presented. The result of this thesis is a decoder design scheme that provides a judicious trade-off between power consumption and coding gain.Item Mechanical stress and circuit aging aware VLSI CAD(2010-12) Chakraborty, Ashutosh; Pan, David Z.; Abraham, Jacob; Orshansky, Michael; Saxena, Prashant; Touba, NurWith the gradual advance of the state-of-the-art VLSI manufacturing technology into the sub-45nm regime, engineering a reliable, high performance VLSI chip with economically attractive yield in accordance with Moore's law of scaling and integration has become extremely difficult. Some of the most serious challenges that make this task difficult are: a) the delay of a transistor is strongly dependent on process induced mechanical stress around it, b) the reliability of devices is affected by several aging mechanisms like Negative Bias Temperature Instability (NBTI), hot carrier injection (HCI), etc and c) the delay and reliability of any device are strongly related to lithographically drawn geometry of various features on wafer. These three challenges are the main focus of this dissertation. High performance fabrication processes routinely use embedded silicon-germanium (eSiGe) technology that imparts compressive mechanical stress to PMOS devices. In this work, cell level timing models considering flexibility to modulate active area to change mechanical stress, were proposed and exploited to perform timing optimization during circuit placement phase. Analysis of key physical synthesis optimization steps such as gate sizing and repeater insertion was done to understand and exploit mechanical stress to significantly improve delay of interconnect and device dominated circuits. Regarding circuit reliability, the proposed work is focused on reducing the clock skew degradation due to NBTI effect specially due to the use of clock gating technique for achieving low power operation. In addition, we also target the detrimental impact of burn-in testing on NBTI. The problem is identified and a runtime technique to reduce clock skew increase was proposed. For designs with predictable clock gating activities, a zero overhead design time technique was proposed to reduce clock skew increase over time. The concept of using minimum degradation input vector during static burn-in testing is proposed to reduce the impact of burn-in testing on parametric yield. Delay and reliability strongly depend on dimension of various features on the wafer such as gate oxide thickness, channel length and contact position. Increased variability of these dimensions can severely restrict ability to analyze or optimize a design considering mechanical stress and circuit reliability. One key technique to control physical variability is to move towards regular fabrics. However, to make implementation on regular fabrics attractive, high quality physical design tools need to be developed. This dissertation proposes a new circuit placement algorithm to place a design on a structured ASIC platform with strict site and clock constraints and excellent overall wirelength. An algorithm for reducing the clock and leakage power dissipation of a structured ASIC by reducing spine usage is then proposed to allow lower power dissipation of designs implemented using structured ASICs.Item Minimizing and exploiting leakage in VLSI(2009-05-15) Jayakumar, NikhilPower consumption of VLSI (Very Large Scale Integrated) circuits has been growing at an alarmingly rapid rate. This increase in power consumption, coupled with the increasing demand for portable/hand-held electronics, has made power consumption a dominant concern in the design of VLSI circuits today. Traditionally dynamic (switching) power has dominated the total power consumption of VLSI circuits. However, due to process scaling trends, leakage power has now become a major component of the total power consumption in VLSI circuits. This dissertation explores techniques to reduce leakage, as well as techniques to exploit leakage currents through the use of sub-threshold circuits. This dissertation consists of two studies. In the first study, techniques to reduce leakage are presented. These include a low leakage ASIC design methodology that uses high VT sleep transistors selectively, a methodology that combines input vector control and circuit modification, and a scheme to find the optimum reverse body bias voltage to minimize leakage. As the minimum feature size of VLSI fabrication processes continues to shrink with each successive process generation (along with the value of supply voltage and therefore the threshold voltage of the devices), leakage currents increase exponentially. Leakage currents are hence seen as a necessary evil in traditional VLSI design methodologies. We present an approach to turn this problem into an opportunity. In the second study in this dissertation, we attempt to exploit leakage currents to perform computation. We use sub-threshold digital circuits and come up with ways to get around some of the pitfalls associated with sub-threshold circuit design. These include a technique that uses body biasing adaptively to compensate for Process, Voltage and Temperature (PVT) variations, a design approach that uses asynchronous micro-pipelined Network of Programmable Logic Arrays (NPLAs) to help improve the throughput of sub-threshold designs, and a method to find the optimum supply voltage that minimizes energy consumption in a circuit.Item Modeling, Design and Optimization of IC Power Delivery with On-Chip Regulation(2014-12-11) Lai, SumingAs IC technology continues to follow the Moore?s Law, IC designers have been constantly challenged with power delivery issues. While useful power must be reliably delivered to the on-die functional circuits to fulfill the desired functionality and performance, additional power overheads arise due to the loss associated with voltage conversion and parasitic resistance in the metal wires. Hence, one of the key IC power delivery design challenges is to develop voltage conversion/regulation circuits and the corresponding design strategies to provide a guaranteed level of power integrity while achieving high power efficiency and low area overhead. On-chip voltage regulation, a significant ongoing design trend, offers appealing active supply noise suppression close to the loads and is well positioned to address many power delivery challenges. However, to realize the full potential of on-chip voltage regulation requires systemic optimization of and tradeoffs among settling time, steady-state error, power supply noise, power efficiency, stability and area overhead, which are the key focuses of this dissertation. First, we develop new low-dropout voltage regulators (LDOs) that are well optimized for low power applications. To this end, dropout voltage, bias current and speed are important competing design objectives. This dissertation presents new flipped voltage follower (FVF) based topologies of on-chip voltage regulators that handle ultra-fast load transients in nanoseconds while achieving significant improvement on bias current consumption. An active frequency compensation is embedded to achieve high area efficiency by employing a smaller amount of compensation capacitors, the major silicon area contributor. Furthermore, in one of the proposed topologies an auxiliary digital feedback loop is employed in order to lower quiescent power consumption further. Second, coping with supply noise is becoming increasingly more difficult as design complexity grows, which leads to increased spatial and temporal load heterogeneity, and hence larger voltage variations in a given power domain. Addressing this challenge through a distributed methodology wherein multiple voltage regulators are placed across the same voltage domain is particularly promising. This distributive nature allows for even faster suppression of multiple hot spots by the nearby regulators within the power domain and can significantly boost power integrity. Nevertheless, reasoning about the stability of such distributively regulated power networks becomes rather complicated as a result of complex interactions between multiple active regulators and the large passive subnetwork. Coping with this stability challenge requires new theory and stability-ensuring design practice, as targeted by this dissertation. For the first time, we adopt and develop a hybrid stability framework for large power delivery networks with distributed voltage regulation. This framework is local in the sense that both the checking and assurance of network stability can be dealt with on the basis of each individual voltage regulator, leading to feasible design of large power delivery networks that would be computationally impossible otherwise. Accordingly, we propose a new hybrid stability margin concept, examine its tradeoffs with power efficiency, supply noise and silicon area, and demonstrate the resulted key design implications pertaining to new stability-ensuring LDO circuit design techniques and circuit topologies. Finally, we develop an automated hybrid stability design flow that is computationally efficient and provides a practical guarantee of network stability.Item Performance and power optimization in VLSI physical design(Texas A&M University, 2008-10-10) Jiang, ZhanyuanAs VLSI technology enters the nanoscale regime, a great amount of efforts have been made to reduce interconnect delay. Among them, buffer insertion stands out as an effective technique for timing optimization. A dramatic rise in on-chip buffer density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates are buffers. In this thesis, three buffer insertion algorithms are presented for the procedure of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under the dynamic programming framework and runs in provably linear time for multiple buffer types due to two novel techniques: restrictive cost bucketing and efficient delay update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter time and the buffered tree has better timing. The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce via variation and signal distortion in twisted differential line. In addition, a new buffer insertion technique is proposed to synchronize the transmitted signals, thus further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new approaches. In contrast, only a 100MHz signal can be reliably transmitted using a single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45% as witnessed in our simulation. The fourth chapter proposes a buffer insertion and gate sizing algorithm for million plus gates. The algorithm takes a combinational circuit as input instead of individual nets and greatly reduces the buffer and gate cost of the entire circuit. The algorithm has two main features: 1) A circuit partition technique based on the criticality of the primary inputs, which provides the scalability for the algorithm, and 2) A linear programming formulation of non-linear delay versus cost tradeoff, which formulates the simultaneous buffer insertion and gate sizing into linear programming problem. Experimental results on ISCAS85 circuits show that even without the circuit partition technique, the new algorithm achieves 17X speedup compared with path based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9% gate cost, 5.8% total cost and results in less circuit delay.Item Performance and power optimization in VLSI physical design(2009-05-15) Jiang, ZhanyuanAs VLSI technology enters the nanoscale regime, a great amount of efforts have been made to reduce interconnect delay. Among them, buffer insertion stands out as an effective technique for timing optimization. A dramatic rise in on-chip buffer density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates are buffers. In this thesis, three buffer insertion algorithms are presented for the procedure of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under the dynamic programming framework and runs in provably linear time for multiple buffer types due to two novel techniques: restrictive cost bucketing and efficient delay update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter time and the buffered tree has better timing. The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce via variation and signal distortion in twisted differential line. In addition, a new buffer insertion technique is proposed to synchronize the transmitted signals, thus further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new approaches. In contrast, only a 100MHz signal can be reliably transmitted using a single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45% as witnessed in our simulation. The fourth chapter proposes a buffer insertion and gate sizing algorithm for million plus gates. The algorithm takes a combinational circuit as input instead of individual nets and greatly reduces the buffer and gate cost of the entire circuit. The algorithm has two main features: 1) A circuit partition technique based on the criticality of the primary inputs, which provides the scalability for the algorithm, and 2) A linear programming formulation of non-linear delay versus cost tradeoff, which formulates the simultaneous buffer insertion and gate sizing into linear programming problem. Experimental results on ISCAS85 circuits show that even without the circuit partition technique, the new algorithm achieves 17X speedup compared with path based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9% gate cost, 5.8% total cost and results in less circuit delay.Item Track Assignment Considering Crosstalk-Induced Performance Degradation(2012-07-16) Zhao, QiongTrack assignment is a critical step between global routing and detailed routing in modern VLSI chip designs. It greatly affects some very important design characteristics, such as routability, via usage and timing performance. Crosstalk, which is largely decided by wire adjacency, has significant impact on interconnect delay and circuit performance. Therefore, the amount of crosstalk should be restrained in order to satisfy timing constraints. In this work, a track assignment approach is proposed to control crosstalk-induced performance degradation. The problem is formulated as a Traveling Salesman Problem (TSP) and solved by a graph-based heuristic. The proposed approach is implemented and tested on benchmark circuits from the ISPD2011 contest and the experimental results are quite promising.