Browsing by Subject "Principal components analysis"

Now showing 1 - 4 of 4

Methods for improving the reliability of semiconductor fault detection and diagnosis with principal component analysis
(2006) Cherry, Gregory Allan; Qin, S. Joe
Plant-wide monitoring of processes under closed-loop control
(2001-08) Valle-Cervantes, Sergio; Qin, S. Joe
Faults in industrial processes produce off-spec products, unsafe conditions, and damage to equipment. According to Misra et al. (1999), just in the U.S. petrochemical industries, an annual loss of $20 billion has been estimated because of poor monitoring and control of such abnormal situations. Therefore, developing efficient methods for on line fault detection and identification has been one of the main tasks in industry. This dissertation focuses on the development of process monitoring, fault detection and identification methods that are applied to a polyester film process at DuPont. The monitoring methods developed in this dissertation are based on principal component analysis (PCA). The contributions of this dissertation are summarized as follows: - A new method is presented that is based on the variance of the reconstruction error to select the number of principal components (PC’s). This method demonstrates a minimum over the number of PC’s. Conditions are given so that the minimum corresponds to the true number of PC’s. Ten other methods that are available in the signal processing and chemometrics literature are overviewed and compared with the proposed method. Three data sets are used to test the different methods for selecting the number of PC’s: two of them are real process data and the other one is a batch reactor simulation. - A new approach is presented in the use of a fault identification index to identify faults based on fault directions. These are extracted from abnormal data using the singular value decomposition (SVD) method. The proposed method is demonstrated on an industrial polyester film process which is characterized by frequent set-point changes and multiple grade changes. Further, a comparison between the fault identification indexand the contribution plot method is given. - It is shown that both the loadings and scores of consensus principal component analysis (PCA) can be calculated directly from those of regular PCA, and the multi-block partial least squares (PLS) loadings, weights, and scores can be directly calculated from the regular PLS. The orthogonal properties of four multi-block PCA (MBPCA) and multi-block PLS (MBPLS) algorithms are explored. The use of MBPCA and MBPLS for decentralized monitoring and diagnosis is derived in terms of the regular PCA and PLS scores and residuals. The multi-block analysis algorithms are basically equivalent to the regular PCA and PLS algorithms. The blocking of process variables in a large scale plant, based on process knowledge, helps to localize the root cause of the fault in a decentralized manner. New definitions of block and variable contributions in the squared prediction error (SPE) and Hotteling’s T2 are proposed for decentralized monitoring. This decentralized monitoring method, based on proper variable blocking, is successfully applied to an industrial polyester film process. - In Chapter 4 a fault identification approach is proposed using regular PCA. With the multiblock analysis presented in Chapter 5, we propose to integrate the fault identification indexwith MBPCA. First MBPCA is used to identify the block where the fault occurs. Then fault directions are extracted from the fault block only. Instead of using all the sensors to do the identification only the information in the faulty block is used to identify the new fault. Next the original signal is decomposed using wavelets. By keeping the most important wavelet coefficients, we obtain a cleaned and de-noised signal. PCA is applied to this de-noised data to obtain a better fault detection. A MBPCA is also conducted using the de-noised signal, to improve fault identification. The combined multiblock fault identification method is demonstrated on the polyester film process.
Supervised and unsupervised PRIDIT for active insurance fraud detection
(2008-08) Ai, Jing, 1981-; Brockett, Patrick; Golden, Linda L.
This dissertation develops statistical and data mining based methods for insurance fraud detection. Insurance fraud is very costly and has become a world concern in recent years. Great efforts have been made to develop models to identify potentially fraudulent claims for special investigations. In a broader context, insurance fraud detection is a classification task. Both supervised learning methods (where a dependent variable is available for training the model) and unsupervised learning methods (where no prior information of dependent variable is available for use) can be potentially employed to solve this problem. First, an unsupervised method is developed to improve detection effectiveness. Unsupervised methods are especially pertinent to insurance fraud detection since the nature of insurance claims (i.e., fraud or not) is very costly to obtain, if it can be identified at all. In addition, available unsupervised methods are limited and some of them are computationally intensive and the comprehension of the results may be ambiguous. An empirical demonstration of the proposed method is conducted on a widely used large dataset where labels are known for the dependent variable. The proposed unsupervised method is also empirically evaluated against prevalent supervised methods as a form of external validation. This method can be used in other applications as well. Second, another set of learning methods is then developed based on the proposed unsupervised method to further improve performance. These methods are developed in the context of a special class of data mining methods, active learning. The performance of these methods is also empirically evaluated using insurance fraud datasets. Finally, a method is proposed to estimate the fraud rate (i.e., the percentage of fraudulent claims in the entire claims set). Since the true nature of insurance claims (and any level of fraud) is unknown in most cases, there has not been any consensus on the estimated fraud rate. The proposed estimation method is designed based on the proposed unsupervised method. Implemented using insurance fraud datasets with the known nature of claims (i.e., fraud or not), this estimation method yields accurate estimates which are superior to those generated by a benchmark naïve estimation method.
Understanding the variations in fluorescence spectra of gynecologic tissue
(2004) Chang, Sung Keun; Richards-Kortum, Rebecca, 1964-
Optical spectroscopy has shown promise as a diagnostic tool for detecting cervical pre-cancer because spectral variations in optical measurements are closely correlated with the molecular and architectural changes in tissue that accompany dysplastic progression. However, optical measurements from cervical tissue are also affected by other factors, such as age or menopausal status of the patient. In order to develop robust diagnostic algorithms based on optical measurements, it is important to identify diagnostically significant features and to devise methods to extract them from the spectral variations. Principal component analysis (PCA) is a statistical method of extracting features based on the variance in a dataset. PCA applied to fluorescence measurements from cervical tissue revealed biophysically significant spectral variations during the menstrual cycle. We have also applied PCA in developing a classification algorithm to discriminate a pair of diagnostic classes. Although statistical methods can reveal subtle changes in optical spectra that are diagnostically significant, it is difficult to interpret the biophysical significance of the extracted features. Another approach is to extract the tissue optical parameters that are directly related to precancerous changes. In order to perform model-based parameter estimation, an analytical model was developed to describe fluorescence in two-layered tissue such as the cervix. Briefly, the model uses exponential attenuation and diffusion theory, respectively, to describe light propagation in the epithelium and the stroma, and calculates the total detected fluorescence as the sum of the fluorescence signals emitted from the two layers. In the inverse model, the analytical model was iteratively fitted to the measured fluorescence spectra, and as a result of the fitting process, the optical parameters are estimated. Validations with Monte Carlo simulations show that optical properties of the epithelium and the stroma can be estimated accurately. The inverse model was subsequently applied to a large-scale clinical data, and the estimated parameters show good correlation with changes associated with dysplastic progression as well as age.

Browsing by Subject "Principal components analysis"

Results Per Page

Sort Options