Adaptable and enhanced error correction codes for efficient error and defect tolerance in memories

Date

2011-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Ongoing technology improvements and feature size reduction have led to an increase in manufacturing-induced parameter variations. These variations affect various memory cell circuits, making them unreliable at low voltages. Memories are very dense structures that are especially susceptible to defects, and more so at lower voltages. Transient errors due to radiation, power supply noise, etc., can also cause bit-flips in a memory. To protect the data integrity of the memory, an error correcting code (ECC) is generally employed. Present ECC, however, is either single error correcting or corrects multiple errors at the cost of high redundancy or longer correction time. This research addresses the problem of memory reliability under adverse conditions. The goal is to achieve a desired reliability at reduced redundancy while also keeping in check the correction time. Several methods are proposed here including one that makes use of leftover spare columns/rows in memory arrays [Datta 09] and another one that uses memory characterization tests to customize ECC on a chip by chip basis [Datta 10]. The former demonstrates how reusing spare columns leftover from the memory repair process can help increase code reliability while keeping hardware overhead to a minimum. In the latter case customizing ECCs on a chip by chip basis shows considerable reduction in check bit overhead, at the same time providing a desired level of protection for low voltage operations. The customization is done with help from a defect map generated at manufacturing time, which helps identify potentially vulnerable cells at low voltage. An ECC based solution for tackling the wear out problem of phase change memories (PCM) has also been presented here. To handle the problem of gradual wear out and hence increasing defect rates in PCM systems an adaptive error correction scheme is proposed [Datta 11a]. The adaptive scheme, implemented alongside the operating system seeks to increase PCM lifetime by manifold times. Finally the work on memory ECC is extended by proposing a fast burst error correcting code with minimal overhead for handling scenarios where multi-bit failures are common [Datta 11b]. The twofold goal of this work – design a low-cost code capable of handling multi bit errors affecting adjacent cells, and fast multi bit error correction – is achieved by modifying conventional Orthogonal Latin Square codes into burst error codes.

Description

text

Citation