Browsing by Subject "Computer algorithms"

Now showing 1 - 20 of 50

Adaptive CORDIC: using parallel angle recording to accelerate CORDIC rotations
(2007-12) Rodrigues, Terence Keith, 1974-; Swartzlander, Earl E.
This dissertation presents the Parallel Angle Recoding algorithm which is used to accelerate the calculation of CORDIC rotations. The CORDIC algorithm is used in the evaluation of a wide variety of elementary functions such as Sin, Cos, Tan, Log, Exp, etc. It is a simple and versatile algorithm, but its characteristic linear convergence causes it to suffer from long latency. It can be sped up by using the angle recoding algorithm which skips over certain intermediate CORDIC iterations to deliver the same precision while requiring 50% or fewer iterations. However because the selection of the angle constants is quite complex and must be performed off-line, its use has been limited to applications where the rotation angle is static and known a priori. This dissertation extends the low-latency advantage of the angle recoding method to dynamic situations too, where the incoming angle of rotation is allowed to take on any arbitrary value. The proposed method is called Parallel Angle Recoding and it makes use of a much simpler angle selection scheme to identify the angle constants needed by angle recoding. Because of its simplicity, it can be easily implemented in hardware without having to increase the cycle time. All the angle constants for angle recoding can be found in parallel in a single preliminary step by testing just the initial incoming rotation angle using range comparators -- there is no need to perform successive CORDIC iterations in order to identify them. With increasing precision, (N= 8, 16, 24, 32, etc.) the number of comparators which are needed by this scheme increases rapidly. The parallel angle recoding method can be re-formulated to apply to smaller groups of consecutive angle constants known as 'sections.' This limits the number of comparators that are needed, to a reasonable amount. There is an attendant savings in area and power consumption, but at the same time the evaluation of multiple sections introduces additional overhead cycles which reduces some of the gains made in latency by the Parallel Angle Recoding method. By interleaving multiple rotations and making use of a small buffer to store intermediate results, the number of overhead cycles can be reduced drastically. The Parallel Angle Recoding technique is modelled using Verilog, synthesised and mapped to a 65 nm. cell library. The latency and area characteristics that are obtained show that the method can improve the performance of the rotation mode in CORDIC, by delivering a reduced iteration count with no increase in the cycle time, and only a modest increase in power and area.
Algorithms and data structures for cache-efficient computation: theory and experimental evaluation
(2007-08) Chowdhury, Rezaul Alam; Ramachandran, Vijaya
Algorithms for distributed caching and aggregation
(2007-12) Tiwari, Mitul; Plaxton, Greg
In recent years, there has been an explosion in the amount of distributed data due to the ever decreasing cost of both storage and bandwidth. There is a growing need for automatic distributed data management techniques. The three main areas in dealing with distributed data that we address in this dissertation are (1) cooperative caching, (2) compression caching, and (3) aggregation. First, we address cooperative caching, in which caches cooperate to locate and cache data objects. The benefits of cooperative caching have been demonstrated by various studies. We address a hierarchical generalization of cooperative caching in which caches are arranged as leaf nodes in a hierarchical tree network, and we call this variant Hierarchical Cooperative Caching. We present a deterministic hierarchical generalization of LRU that is constantcompetitive when the capacity blowup is linear in d, the depth of the cache hierarchy. Furthermore, we show that any randomized hierarchical cooperative caching algorithm with capacity blowup b has competitive ratio Ω(log d b ) against an oblivious adversary. Thus we establish that there is no resource competitive algorithm for the hierarchical cooperative caching problem. Second, we address a class of compression caching problems in which a file can be cached in multiple formats with varying sizes and encode/decode costs. In this work, we address three problems in this class of compression caching. The first problem assumes that the encode cost and decode cost associated with any format of a file are equal. For this problem we present a resource competitive online algorithm. To explore the existence of resource competitive online algorithms for compression caching with arbitrary encode costs and decode costs, we address two other natural problems in the aforementioned class, and for each of these problems, we show that there exists a non-constant lower bound on the competitive ratio of any online algorithm, even if the algorithm is given an arbitrary factor capacity blowup. Thus, we establish that there is no resource competitive algorithm for compression caching in its full generality. Third, we address the problem of aggregation over trees with the goal of adapting aggregation aggressiveness. Consider a distributed network with nodes arranged in a tree, and each node having a local value. We consider the problem of aggregating values (e.g., summing values) from all nodes to the requesting nodes in the presence of writes. The goal is to minimize the total number of messages exchanged. The key challenges are to define a noix tion of “acceptable” aggregate values, and to design algorithms with good performance that are guaranteed to produce such values. We formalize the acceptability of aggregate values in terms of certain consistency guarantees. We propose a lease-based aggregation mechanism, and evaluate algorithms based on this mechanism in terms of consistency and performance. With regard to consistency, we adapt the definitions of strict and causal consistency to apply to the aggregation problem. We show that any lease-based aggregation algorithm provides strict consistency in sequential executions, and causal consistency in concurrent executions. With regard to performance, we propose an online lease-based aggregation algorithm, and show that, for sequential executions, the algorithm is constant competitive against any offline algorithm that provides strict consistency. Our online lease-based aggregation algorithm is presented in the form of a fully distributed protocol, and the aforementioned consistency and performance results are formally established with respect to this protocol. We also present experimental results to show that the algorithm performs well under various workloads.
An adaptive vector quantization technique with a fuzzy distortion measure for efficient image coding
(Texas Tech University, 1996-08) Pemmaraju, Suryalakshmi V.
Digital image compression techniques are currently experiencing significant growth due to diverse applications demanding efficient storage and transmission of increasing image data contents. These compression techniques involve representation of an image with reduced number of bits pdr pixel by exploiting the redundancy in data present within an image. In lossy^ compression, information theory predicts that the performance of vector quantization (VQ) is superior to that obtained using scalar quantization (SQ) in optimizing the rate distortion fimction. In practice, however, the existing VQ algorithms suffer from a number of serious problems, e.g., long search process, codebook initialization and getting trapped in local minima. This research develops an adaptive vector quantization technique for generating an optimal codebook by employing a neurofuzzy clustering approach to ensure minimum distortion. In addition, a multiresolution decomposition of an image is used as a preprocessing stage for transforming the image into a form that is more suitable for quantization, coding and progressive transmission. The multiresolution wavelet decomposition of an image is performed using Daubechies coefficients prior to vector quantization and a multiresolution codebook scheme is used for quantizing the sub-images at different resolutions. This integrated approach of adaptive vector quantization with wavelet based pyramid image decomposition aids in significant facilitation of the compression and coding processes thereby allowing higher compression ratios with acceptable visual quality. Experimental results of this new approach show significant improvement in performance as compared to a variable block size vector quantization (VBQ). The superior performance of this integrated algorithm has been validated by applying it to several classes of images and comparing the performance in terms of mean-squared error (MSB), peak-signal-to-noise-ratio (PSNR), bit rates and visual fidelity.
An algorithm for hyperbolic geometry
(Texas Tech University, 1998-12) Tinney, Pheobe Alexis Samuels
Not available
An efficient implementation of adaptive Gauss Quadrature rule for parallel environment
(Texas Tech University, 2004-12) Jaiswal, Vivek
Not available
Architectures and algorithms for high performance switching
(2004) Prakash, Amit; Aziz, Adnan
Switches are ubiquitous in modern computing, appearing in wide-area networks, multiprocessor servers, and data storage systems. With the the advent of high-speed link technology, switches have become the bottleneck in moving data in the network. Existing switch architectures either require the interconnection network and packet buffers to work at a very high speed or require complex scheduling problems to be solved quickly. In this dissertation we investigate whether there are switch architectures that can support high-speed links that are simultaneously easy to schedule, and can be built out of inexpensive components. The approach we take is using parallelism to solve complex scheduling problems. We choose switching architectures such that the corresponding scheduling problem can be efficiently solved with a reasonable amount of hardware. In particular, we present two switch architectures for which we have developed efficient scheduling algorithms. The first switch achieves optimum throughput and optimum average latency while the second switch guarantees optimum throughput only but uses considerably less hardware.
Automated object recognition through reinforcement learning
(Texas Tech University, 2002) Ge, Zhanyu; Mitra, Sunanda; Krile, Thomas; Bredeson, Jon G.
Object recognition, a branch of pattern recognition, is to identify and localize one or more objects in a given scene. We have to determine what is present and where it is within the input image. Although great achievements have been made during the last decades, currently existing object recognition techniques have shortcomings like unreliability and inefficiency, general inadaptability, manual template marking heavily influenced by human factors, inability to recognize an object without a model, and so on. Any recognition problem can be formulated as a searching process and has to be guided in a controlled manner. All search problems involve optimization, so object recognition requires optimization and control techniques. Reinforcement learning is learning how to behave given a situation and possible actions to maximize the total expected reward in the long run, and therefore needs to be optimized. Most pattern recognition techniques do not combine reinforcement learning for feature understanding. In this dissertation, reinforcement learning is applied both to automatic template generation from a model image and to template matching within the input image. The newly designed affine parameter estimation algorithm provides reliable results based on information contained at all feature point locations. The points are extracted in the scale-space using isophote curvature extreme points, which are invariant to affine transformations. The affine parameter estimation algorithm is applicable to any kind of translations, rotations, and scales, and moderate occlusions and deformations of the object to be recognized. Experiment results showed that the proposed set of algorithms are fast, efficient, and potentially robust. The automatic template generation algorithm, an efficient contour tracing one in gray-level images, can also be used in object recognition without a model. This is a new research field, and a great amount of future work needs to be done before an intelligent recognition system, as efficient as the human vision system, can be developed.
Automatic segmentation of vertebrae from digitized x-ray images
(Texas Tech University, 2002-12) Zamora-Camarena, Gilberto
The segmentation of vertebrae in x-ray images is of prime importance in the assessment of abnormalities of the spine. Manual segmentation is prone to errors due to inter- and intra-subject variabilities due to the subjective judgement that is employed. The use of computer vision methods is, therefore, an attractive alternative to providing an automatic means for segmenting vertebrae. However, general-purpose algorithms present a number of shortcomings that limit their ability to locate and delineate precise vertebral shapes. Therefore, there is a need for a different approach. This work presents the development of an automatic segmentation methodology that employs a hierarchical approach to segmentation. The unique combination of the Generalized Hough Transform, Active Shape Models, and Deformable Models provides three levels of segmentation firom coarse to fine, respectively. Each algorithm has been customized to address the shortcomings of the other two, thus providing a robust framework. Generalized Hough Transform is used to estimate the pose of the spine within a target image. Then, the technique of Active Shape Models is used to find the boundaries of the vertebrae and to give a global approximation to their shape. Finally, the technique of Deformable Models is used to refine the shape of the vertebrae at key points of interest, such as anterior comers. Experimental results with a data set of 100 lateral views of cervical vertebrae and 100 lateral views of lumbar vertebrae have shown a success rate of 75% in finding boundaries of cervical vertebrae and 50% in lumbar vertebrae. The algorithm developed in this work represents a viable alternative to the currently available segmentation methods in which a unique combination of customized algorithms implements a hierarchical firamework.
Color image compression using wavelet transform
(Texas Tech University, 1997-08) Meadows, Steven Carl
C language coding of image compression algorithms can be a difficult and tedious task. Image compression methods are usually composed of many stages of cascaded algorithms. Each algorithm may be developed independently. This thesis will address the problem of interfacing new image compression algorithms with older and established algorithms such as entropy coding and the discrete wavelet transform. The thesis will describe ANSI C coding procedures and functions involved in implementing two entrop\ coding algorithms including Huffman coding and arithmetic coding. Wavelet theory will be discussed as it applies to the discrete wavelet transform. The thesis will also describe an ANSI C coding implementation of one of the newest wavelet coefficient coding techniques, embedded zerotree wavelets (EZW) developed by Jerome Shapiro. The EZW compression performance will be compared with JPEG which is the standard adapted currently for still images by the Joint Photographic Experts Group.
Comparison of lossless compression models
(Texas Tech University, 1999-08) Hovhannisyan, Anahit
With the development of multimedia and digital imaging there is a need to reduce the cost of storage and transmission of information. The cost reduction translates into reducing the amount of data that represent the information. This thesis investigates the performance of several lossless compression algorithms widely used for image coding. The first three chapters describe these algorithms in detail, as well as give examples of well-known data compression algorithms such as Huffman. Arithmetic. Dynamic Markov Coding, and Run Length Encoding. Finally, the relative performances of the listed algorithms are compared. The thesis also includes C— and ANSI C implementations of the Huffman and RLE algorithms respectively.
Coping with dynamic membership, selfishness, and incomplete information: applications of probabilistic analysis and game theory
(2008-05) Dimitrov, Nedialko B.; Plaxton, C. Greg
The emergence of large scale distributed computing networks has given increased prominence to a number of algorithmic concerns, including the need to handle dynamic membership, selfishness, and incomplete information. In this document, we outline our explorations into these algorithmic issues. We first present our results on the analysis of a graph-based coupon collecvi tor process related to load balancing for networks with dynamic membership. In addition to extending the study of the coupon collector process, our results imply load balancing properties of certain distributed hash tables. Second, we detail our results on worst case payoffs when playing buyersupplier games, against many selfish, collaborating opponents. We study optimization over the set of core vectors. We show both positive and negative results on optimizing over the cores of such games. Furthermore, we introduce and study the concept of focus point price, which answers the question: If we are constrained to play in equilibrium, how much can we lose by playing the wrong equilibrium? Finally, we present our analysis of a revenue management problem with incomplete information, the online weighted transversal matroid matching problem. In specific, we present an algorithm that delivers expected revenue within a constant of optimal in the online setting. Our results use a novel algorithm to generalize several results known for special cases of transversal matroids.
CORDIC-based high-speed direct digital frequency synthesis
(2003) Kang, Chang Yong; Swartzlander, Earl E.
The circular-mode CORDIC (coordinate rotation digital computer) algorithm is analyzed in the context of direct digital frequency synthesis (DDFS). It is shown how the CORDIC parameters should be selected to meet given DDFS parameters, and, through simulations, it is demonstrated that jamming outperforms truncation as a datapath quantization scheme, making it an attractive alternative to rounding. It is also shown that the CORDIC output can be made exact to the digits by an additional rounding process, which is especially useful for DDFS applications where the CORDIC output is truncated to the final digital-to-analog converter (DAC) width. Variations of the basic circular-mode CORDIC algorithm are investigated, and previous works for fast DDFS implementation based on those algorithms are discussed. A new DDFS architecture based on the differential CORDIC (DCORDIC) algorithm is proposed. The proposed architecture allows a bit-level pipelining in the angle path by implementing a two-dimensional systolic array. Unlike other fast DDFS architectures, it incorporates the phase accumulator in the bit-level pipelining framework so that a bottleneck-free datapath throughout the whole system is achieved. The sequential implementation and the hybrid implementation of the architecture for area-sensitive and low-latency designs, respectively, are proposed for further research.
Developing an inference engine for CR-Prolog with preferences
(Texas Tech University, 2004-12) Kolvekar, Loveleen
In recent years, A-Prolog with answer set semantics was shown to be a useful tool for knowledge representation and reasoning. A-Prolog is a declarative language based on stable models of logic programs. It allows the encoding of defaults and various other types of knowledge contained in dynamic domains. It seems however that A-Prolog lacks the ability to gracefully perform the reasoning needed for certain types of conflict resolution, e.g. finding the best explanations of unexpected observations. To solve this problem CR-Prolog - an extension of A-Prolog by consistency restoring rules with preferences was introduced. The most intuitive solutions correspond to those models that best satisfy the preferences expressed, and minimize the application of cr-rules. The goal of this work is to develop an inference engine for computing the answer sets of CR-Prolog program automatically. The inference engine handles preferences efficiently.
Distributed learning using generative models
(2006) Merugu, Srujana; Ghosh, Joydeep
Edge detection in noisy images using directional diffusion with log filters
(Texas Tech University, 1996-08) Liu, Chang
Image noise reduction and edge detection have been important image processing techniques. Traditional isotropic image smoothing reduces noise at the cost of image blurring. Anisotropic smoothing tries to maintain the image features, while reducing noise. This thesis presents an anisotropic smoothing implementation that only uses a 3-by-3 window, and therefore is easy to implement in hardware. Edge detection is also studied in this thesis. A pre-processing technique is proposed to do position dependent brightness correction. This technique makes edge thresholding easier. We also present an algorithm that implements Gaussian filtering more accurately than Gauss-Hermite integration at the cost of an insignificant increase in computing complexity.
Energy repacking for video compression using wavelet transforms
(Texas Tech University, 1997-05) Sreenath, Sreenivas Prasad
The video compression field has gone through a rapid change in the past few years. This change can be attributed to the requirement of a large amount of video data handling and storage. The enormous amount of video data has increased the need for an optimum way of storing and compression. In order to achieve a better video compression, many techniques have been developed. This research exploits the energy repacking capability of wavelet transforms for video compression. A wavelet transform of an original image provides an efficient way to represent the original data with potentially very high compression rate. To evaluate the compression achievable with the wavelet transform, the results obtained from the wavelet algorithm is compared with results obtained from the standard JPEG baseline technique. The evaluation is mainly based on the log mean square error, the signal to noise ratio and the bit per pixel requirement of the data. The results confirm that the wavelet transform yields a better compression and energy repacking than that obtained from DCT transform. Different levels of subband decomposition of wavelets are also evaluated.
High level static analysis of system descriptions for taming verification complexity
(2007-12) Vasudevan, Shobha; Abraham, Jacob A.
The growing complexity of VLSI and System-on-a-chip(SoC) designs has made their verification extremely expensive, time-consuming and resource intensive. Formal verification of system behavior is critical to the design cycle due to its ability to isolate subtle flaws and provide high quality assurance. However, its computational intractability limits applicability in practice, rendering the state-of-the-art insufficient to meet the needs of the industry. In this dissertation, a suite of techniques that are a significant departure from traditional Boolean level approaches to formal verification are presented. The algorithms are based on a top-down, domain-aware perspective of the system by reasoning at the system level and register transfer level (RTL) descriptions. Static analysis of these high level system descriptions using structural information and symbolic reasoning leads to effective decomposition strategies that create tractable portions of the system. These manageable system components can then be verified by deploying efficient Boolean level algorithms. The techniques presented here apply to the actual RTL source code, and are intended to blend seamlessly into the system design cycle. All approaches using high level static analyis follow a three pronged solution- domain aware analysis, high level symbolic simulation and a decision procedure for the verification task. This work advocates a marked difference in the perspective to formal hardware verification as compared to popular paradigms. The techniques shown here are illustrative of a hardware-aware viewpoint to verification, and argue this case for contemporary verification tasks. Antecedent conditioned slicing, an abstraction technique for reducing RTL design space is introduced. The reduced RTL can then be model checked. An open source RTL implementation of the USB 2.0 protocol is verified using this technique. A technique for pipelined processor verification using antecedent conditioned slicing is introduced. An open source Verilog RTL of OR1200, an embedded processor is explained in a detailed case study for verification using this technique. A static analysis technique is proposed to alleviate the complexity of the sequential equivalence checking problem between system level and RTL descriptions, by efficiently decomposing designs using sequential compare points. A satisfiability (SAT) solver is used as the lower level verification engine. In a case study, sequential equivalence checking between a system level description of a Viterbi decoder and two different RTL implementations are detailed.
A hybrid real-time visible surface solution for rays with a common origin and arbitrary directions
(2008-12) Johnson, Gregory Scott, 1971-; Mark, William R.
A fundamental operation in computer graphics is to determine for a given point and direction in a scene, which geometric surface is nearest this point from this direction and thus visible. Conceptually, the point and direction define a "ray". Z-buffer hardware can compute surface visibility for a set of rays with a common origin (i.e. eye point) and a regular pattern of directions in real-time. However, this hardware is much less efficient at performing other visibility computations such as those required to accurately render shadows. A more flexible solution to the visible surface problem is needed. This work introduces the irregular Z-buffer algorithm, which efficiently solves the visible surface problem for rays with a common origin and arbitrary directions. In addition, we identify several changes to classical graphics architectures needed for hardware acceleration of this algorithm. Though these modifications are incremental in nature (i.e. no new functional units are introduced), we show that they enable significant new capability. In tandem with the irregular Z-buffer algorithm, a GPU with these changes has applications in: shadow rendering, indirect illumination, frameless rendering, adaptive anti-aliasing, adaptive textures, and jittered sampling. We explore the performance of hard and soft shadow rendering in particular, by way of a detailed hardware simulator.
Image recovery and segmentation using competitive learning in a computational network
(Texas Tech University, 1992-12) Phoha, Vir Virander
In this study, the principle of competitive learning is used to develop an iterative algorithm for image recovery and segmentation, within the framework of Markov Random Fields (MRF). the image recovery problem is transformed to the problem of minimization of an energy function. A local update rule for each pixel point is then developed in a stepwise fashion and is shown to be a gradient descent rule for an associated global energy function. Relationship of the update rule to Kohonens update rule is shown. Quantitative measures of edge preservation and edge enhancement for synthetic images are introduced. Simulation experiments using this algorithm on real and synthetic images show promising results on smoothing within regions and also on enhancing the boundaries. Restoration results are compared with recently published results using the mean field approximation. Segmentation results using the proposed approach are compared with edge detection using fractal dimension, edge detection using mean field approximation, and edge detection using the Sobel operator. Edge points obtained by using these techniques are combined to produce edge maps which include both hard and soft edges.

Browsing by Subject "Computer algorithms"

Results Per Page

Sort Options