Browsing by Subject "PCA"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
Item Aptamers as cross-reactive receptors : using binding patterns to discriminate biomolecules(2013-05) Stewart, Sara, 1980-; Anslyn, Eric V., 1960-; Ellington, Andrew D.Exploration into the use of aptamers as cross-reactive receptors was the focus of this work. Cross-reactivity is of interest for developing assays to identify complex targets and solutions. By exploiting the simple chemistries of aptamers, we hope to introduce a new class of receptors to the science of molecular discrimination. This manuscript first addresses the use designed aptamers for the identification of variants of HIV-1 reverse transcriptase. In this research aptamers were immobilized on a platform and were used to discriminate four variants of HIV-1 reverse transcriptase. It was found that not only could the array discriminate HIV-1 reverse transcriptase variants for which aptamers were designed, it would also discriminate variants for which no aptamers exist. A panel of aptamers was used to discriminate four separate cell lines, which were chosen as examples of complex targets. This aptamer panel was used to further explore the use of aptamers as cross-reactive sensors. Forty-six aptamers were selected from the literature that were designed to be specific to cells or molecules expected to be in the surface of cells. This panel showed differential binding patterns to each of the cell types, displaying cross-reactive behavior. During the course of this research, we also developed a novel ratiometric method of using aptamer count derived from next-generation sequencing as a method for discrimination. This is in lieu of the more commonly used fluorescent signals. Finally the use of multiple signals for pattern recognition routines was further explored by running various models using artificial data. Various situations were applied to replicate different possible situation which might arise when working with macromolecular interactions. The purpose of this was to advance the communities understanding and ability to interpret results from the pattern recognition methods of PCA and LDA.Item Data-driven human body morphing(Texas A&M University, 2005-11-01) Zhang, XiaoThis thesis presents an efficient and biologically informed 3D human body morphing technique through data-driven alteration of standardized 3D models. The anthropometric data is derived from a large empirical database and processed using principal component analysis (PCA). Although techniques using PCA are relatively commonplace in computer graphics, they are mainly used for scientific visualizations and animation. Here we focus on uncovering the underlying mathematical structure of anthropometric data and using it to build an intuitive interface that allows the interactive manipulation of body shape within the normal range of human variation. We achieve weight/gender based body morphing by using PCA. First we calculate the principal vector space of the original data. The data then are transformed into a new orthogonal multidimensional space. Next, we reduce the dimension of the data by only keeping the components of the most significant principal vectors. We then fit a curve through the original data points and are able to generate a new human body shape by inversely transforming the data from principal vector space back to the original measuring data space. Finally, we sort the original data by the body weight, calculating males and females separately. This enables us to use weight and gender as two intuitive controls for body morphing. The Deformer program is implemented using the programming language C++ with OPENGL and FLTK API. 3D and human body models are created using Alias MayaTm.Item Dimension Reduction and Covariance Structure for Multivariate Data, Beyond Gaussian Assumption(2012-10-19) Maadooliat, MehdiStorage and analysis of high-dimensional datasets are always challenging. Dimension reduction techniques are commonly used to reduce the complexity of the data and obtain the informative aspects of datasets. Principal Component Analysis (PCA) is one of the commonly used dimension reduction techniques. However, PCA does not work well when there are outliers or the data distribution is skewed. Gene expression index estimation is an important problem in bioinformatics. Some of the popular methods in this area are based on the PCA, and thus may not work well when there is non-Gaussian structure in the data. To address this issue, a likelihood based data transformation method with a computationally efficient algorithm is developed. Also, a new multivariate expression index is studied and the performance of the multivariate expression index is compared with the commonly used univariate expression index. As an extension of the gene expression index estimation problem, a general procedure that integrates data transformation with the PCA is developed. In particular, this general method can handle missing data and data with functional structure. It is well-known that the PCA can be obtained by the eigen decomposition of the sample covariance matrix. Another focus of this dissertation is to study the covariance (or correlation) structure under the non-Gaussian assumption. An important issue in modeling the covariance matrix is the positive definiteness constraint. The modified Cholesky decomposition of the inverse covariance matrix has been considered to address this issue in the literature. An alternative Cholesky decomposition of the covariance matrix is considered and used to construct an estimator of the covariance matrix under multivariate-t assumption. The advantage of this alternative Cholesky decomposition is the decoupling of the correlation and the variances.Item Improving process monitoring and modeling of batch-type plasma etching tools(2015-05) Lu, Bo, active 21st century; Edgar, Thomas F.; Stuber, John D; Djurdjanovic, Dragan; Ekerdt, John G; Bonnecaze, Roger T; Baldea, MichaelManufacturing equipments in semiconductor factories (fabs) provide abundant data and opportunities for data-driven process monitoring and modeling. In particular, virtual metrology (VM) is an active area of research. Traditional monitoring techniques using univariate statistical process control charts do not provide immediate feedback to quality excursions, hindering the implementation of fab-wide advanced process control initiatives. VM models or inferential sensors aim to bridge this gap by predicting of quality measurements instantaneously using tool fault detection and classification (FDC) sensor measurements. The existing research in the field of inferential sensor and VM has focused on comparing regressions algorithms to demonstrate their feasibility in various applications. However, two important areas, data pretreatment and post-deployment model maintenance, are usually neglected in these discussions. Since it is well known that the industrial data collected is of poor quality, and that the semiconductor processes undergo drifts and periodic disturbances, these two issues are the roadblocks in furthering the adoption of inferential sensors and VM models. In data pretreatment, batch data collected from FDC systems usually contain inconsistent trajectories of various durations. Most analysis techniques requires the data from all batches to be of same duration with similar trajectory patterns. These inconsistencies, if unresolved, will propagate into the developed model and cause challenges in interpreting the modeling results and degrade model performance. To address this issue, a Constrained selective Derivative Dynamic Time Warping (CsDTW) method was developed to perform automatic alignment of trajectories. CsDTW is designed to preserve the key features that characterizes each batch and can be solved efficiently in polynomial time. Variable selection after trajectory alignment is another topic that requires improvement. To this end, the proposed Moving Window Variable Importance in Projection (MW-VIP) method yields a more robust set of variables with demonstrably more long-term correlation with the predicted output. In model maintenance, model adaptation has been the standard solution for dealing with drifting processes. However, most case studies have already preprocessed the model update data offline. This is an implicit assumption that the adaptation data is free of faults and outliers, which is often not true for practical implementations. To this end, a moving window scheme using Total Projection to Latent Structure (T-PLS) decomposition screens incoming updates to separate the harmless process noise from the outliers that negatively affects the model. The integrated approach was demonstrated to be more robust. In addition, model adaptation is very inefficient when there are multiplicities in the process, multiplicities could occur due to process nonlinearity, switches in product grade, or different operating conditions. A growing structure multiple model system using local PLS and PCA models have been proposed to improve model performance around process conditions with multiplicity. The use of local PLS and PCA models allows the method to handle a much larger set of inputs and overcome several challenges in mixture model systems. In addition, fault detection sensitivities are also improved by using the multivariate monitoring statistics of these local PLS/PCA models. These proposed methods are tested on two plasma etch data sets provided by Texas Instruments. In addition, a proof of concept using virtual metrology in a controller performance assessment application was also tested.Item Multi-state PLS based data-driven predictive modeling for continuous process analytics(2012-05) Kumar, Vinay; Flake, Robert H.; Edgar, Thomas F.Today’s process control industry, which is extensively automated, generates huge amounts of process data from the sensors used to monitor the processes. These data if effectively analyzed and interpreted can give a clearer picture of the performance of the underlying process and can be used for its proactive monitoring. With the great advancements in computing systems a new genre of process monitoring and fault detection systems are being developed which are essentially data-driven. The objectives of this research are to explore a set of data-driven methodologies with a motive to provide a predictive modeling framework and to apply it to process control. This project explores some of the data-driven methods being used in the process control industry, compares their performance, and introduces a novel method based on statistical process control techniques. To evaluate the performance of this novel predictive modeling technique called Multi-state PLS, a patented continuous process analytics technique that is being developed at Emerson Process Management, Austin, some extensive simulations were performed in MATLAB. A MATLAB Graphical User Interface has been developed for implementing the algorithm on the data generated from the simulation of a continuously stirred blending tank. The effects of noise, disturbances, and different excitations on the performance of this algorithm were studied through these simulations. The simulations have been performed first on a steady state system and then applied to a dynamic system .Based on the results obtained for the dynamic system, some modifications have been done in the algorithm to further improve the prediction performance when the system is in dynamic state. Future work includes implementing of the MATLAB based predictive modeling technique to real production data, assessing the performance of the algorithm and to compare with the performance for simulated data.Item Principal Components Analysis for Binary Data(2010-07-14) Lee, SeokhoPrincipal components analysis (PCA) has been widely used as a statistical tool for the dimension reduction of multivariate data in various application areas and extensively studied in the long history of statistics. One of the limitations of PCA machinery is that PCA can be applied only to the continuous type variables. Recent advances of information technology in various applied areas have created numerous large diverse data sets with a high dimensional feature space, including high dimensional binary data. In spite of such great demands, only a few methodologies tailored to such binary dataset have been suggested. The methodologies we developed are the model-based approach for generalization to binary data. We developed a statistical model for binary PCA and proposed two stable estimation procedures using MM algorithm and variational method. By considering the regularization technique, the selection of important variables is automatically achieved. We also proposed an efficient algorithm for model selection including the choice of the number of principal components and regularization parameter in this study.Item Protection Motivation Theory and Consumer Willingness-to-Pay, in the Case of Post-Harvest Processed Gulf Oysters(2012-10-19) Blunt, Emily AnnGulf oysters are harvested and consumed year-round, with more than 90% consumed in a raw, unprocessed state. A chief concern of policymakers in recent years is the incidence of Vibrio vulnificus infection following raw seafood consumption. V.vulnificus refers to a halophilic bacterium naturally occurring in brackish coastal waters, which concentrates in filter-feeding oysters. Proposed FDA legislation requiring processing of all raw Gulf oysters sold during warmer summer months threatens the Gulf oyster industry, as little to no research regarding demand for post-harvest processing (PHP) has preceded the potential mandate. This research endeavors to examine the relationship between oyster consumers' fears of V.vulnificus infection and their willingness-to-pay (WTP) for processing of an oyster meal. The psychological model of Protection Motivation Theory (PMT) is employed alongside the economic framework of contingent valuation (CV) to result in an analysis of oyster processing demand with respect to threats and efficacy. A survey administered to 2,172 oyster consumers in six oyster producing states elicits projected consumption and PMT data. Principal Component Analysis is used to reduce the number of PMT variables to a smaller size, resulting in five individual principal components representing the PMT elements of source information, threat appraisal, coping appraisal, maladaptive coping, and protection motivation. Using survey data, the marginal willingness-to-pay (MWTP) for PHP per oyster meal is also calculated, and the five created PMT variables are regressed on this calculation using four separate OLS models. Results indicate significant correlation for four of the five created PMT variables. In addition, a mean MWTP for PHP of $0.31 per oyster meal is determined, contributing to the demand analysis for processing of Gulf oysters. The findings suggest a strong relationship between the fear elements and the demand for processing, and support arguments in favor of further research on specific PHP treatments and the necessity for a valid PMT survey instrument.Item Spatial and temporal controls on biogeochemical indicators at the small-scale interface between a contaminated aquifer and wetland surface water(2009-05-15) Baez-Cazull, Susan EnidThis high-resolution biogeochemical study investigated spatial and temporal variability in the mixing interface zones within a wetland-aquifer system near a municipal landfill in the city of Norman, Oklahoma. Steep biogeochemical gradients indicating zones of enhanced microbial activity (e.g. iron/sulfate reduction and fermentation) were found at centimeter-scale hydrological and lithological interfaces. The small resolution study was achieved by combining passive diffusion samplers with capillary electrophoresis for chemical analysis. The spatial and temporal variability of biogeochemical processes found at the interfaces was evaluated in a depth profile over a period of three years. Correlations between geochemical parameters were determined using Principal Component Analysis (PCA) and the principal factors obtained were interpreted as a dominant biogeochemical process. Factors scores were mapped by date and depth to determine the spatial-temporal associations of the dominant processes. Fermentation was the process controlling the greatest variability in the dataset followed by iron/sulfate reduction, and methanogenesis. The effect of seasonal and hydrologic changes on biogeochemistry was evaluated from samples collected in a wet/dry period from three locations exhibiting upward, downward, and negligent hydrologic flow between aquifer and wetland. PCA was used to identify the principal biogeochemical processes and to obtain factor scores for evaluating significant seasonal and hydrological differences via analysis of variance. Iron and sulfate reduction were dominated by changes in water table levels and water flow paths, whereas methanogenesis and bacterial barite utilization were dominated by season and associated with a site with negligible flow. A preliminary study on microbial response to changes in geochemical nutrients (e.g. electron acceptors and electron donors) was conducted using in situ microcosms with the purpose of quantifying iron and sulfate reduction rates. Problems encountered in the experiment such as leaks in the microcosms did not allow the determination of respiration rates, therefore the experiments will be repeated in the future. The results suggest that iron and sulfate reduction were stimulated with the addition of sulfate and ferrihydrite (electron acceptors) and acetate and lactate (electron donors). This research demonstrates the importance of assessing biogeochemical processes at interface zones at appropriate scales and reveals the seasonal and hydrological controls on system processes.Item The effect of censorship on American film adaptations of Shakespearean plays(2009-05-15) Alfred, Ruth AnnFrom July 1, 1934, to November 1, 1968, the Production Code Administration (PCA) oversaw the creation of American motion pictures, in order to improve Hollywood?s moral standing. To assist in this endeavor, the studios produced film adaptations of classic literature, such as the plays of William Shakespeare. In the first two years of the Code?s inception, two Shakespearean films were produced by major studios: A Midsummer Night?s Dream (1935) and Romeo and Juliet (1936). But were these classic adaptations able to avoid the censorship that other films endured? With the use of archived collections, film viewings, and an in-depth analysis of the plays, multiple versions of the scripts, and other available surviving documents, I was able to see how these productions were affected by the enforcement of film censorship and what it said about the position of Shakespeare?s work in society. A Midsummer Night?s Dream tended to use self-regulation, so as to avoid the censorship of the PCA. However, the film did not escape without some required changes. In spite of the filmmakers? efforts, there were a few textual changes and the fairy costumes required revisions to meet the PCA?s standards. In the case of Romeo and Juliet, the PCA was far more involved in all stages of the film?s production. There were many documented text changes and even a case in which the censors objected to how the actors and director executed a scene on film. The motion picture was created as if it were of the greatest importance by all involved. And, as it were, the existing archives paint a picture of a production that was a sort of battleground in a sociopolitical war between the censors and the filmmakers. As both films arrived on the international stage, this sociopolitical campaigning did not end. During international distribution, the films were each accepted, rejected, and forced to endure further censorship, in order to become acceptable for public screening. This censorship often relayed a message about the location?s societal views and its contrast to American society.Item A thermodynamic definition of protein folds(2008-05-01) Jason Vertrees; Robert Fox; Wlodek Bujalowski; Vincent Hilser; Montgomery Pettitt; Henry EpsteinModern techniques in structural biology, like homology modeling, protein threading, protein fold classification, and homology detection have proven extremely useful. For example, they have provided us with evolutionary information about protein homology which has in some many cases lead directly to therapeutics. Due to the importance of these methods, augmenting or improving them may lead to significant advances in understanding proteins. These methods treat the high-resolution structure as a static entity upon which they operate, however we know that proteins are not static entities---they are polymers that exist in an enormous array of conformational states. Therefore, we propose to model the proteins from a statistical thermodynamic viewpoint based upon their average energetic properties. We show that this model can be used to (1) better characterize the partial unfolding process of proteins, and (2) reclassify the protein fold space from a new perspective.Item Variance reduction and outlier identification for IDDQ testing of integrated chips using principal component analysis(Texas A&M University, 2007-04-25) Balasubramanian, VijayIntegrated circuits manufactured in current technology consist of millions of transistors with dimensions shrinking into the nanometer range. These small transistors have quiescent (leakage) currents that are increasingly sensitive to process variations, which have increased the variation in good-chip quiescent current and consequently reduced the effectiveness of IDDQ testing. This research proposes the use of a multivariate statistical technique known as principal component analysis for the purpose of variance reduction. Outlier analysis is applied to the reduced leakage current values as well as the good chip leakage current estimate, to identify defective chips. The proposed idea is evaluated using IDDQ values from multiple wafers of an industrial chip fabricated in 130 nm technology. It is shown that the proposed method achieves significant variance reduction and identifies many outliers that escape identification by other established techniques. For example, it identifies many of the absolute outliers in bad neighborhoods, which are not detected by Nearest Neighbor Residual and Nearest Current Ratio. It also identifies many of the spatial outliers that pass when using Current Ratio. The proposed method also identifies both active and passive defects.