Browsing by Subject "Multivariate analysis"
Now showing 1 - 20 of 30
Results Per Page
Sort Options
Item A machine learning approach to automate classification of literature in a SAM research database(Texas Tech University, 2004-08) Morris, Elizabeth PIn the mid-eighties, researchers at the University of Miami confronted their problem of information overload while investigating information on worker performance. They required literature sources from various fields, such as engineering, business, and psychology, to name a few. To cope with their information overload, they devised a research methodology to partition information resources into category matrices in order to find pattems, frends, or voids. The approach was termed State-of-the-Art Matrix or SAM Analysis. SAM Analysis is a manual process, thus restricting the amount of information for conveying category decisions. During the first phase of the manual process, researchers construct models or categories that best describe the research area. In the next phase, articles from the information sources are read and assigned to the pre-defined categories based on the judgment of assessors. The manual approach presents major challenges to researchers who must deal with identifying and utilizing the information hidden in a large corpus of information. The approach is only practical for a small number of articles and categorization relies on the subjective judgment of assessors. A more scalable and flexible approach, therefore, is needed for categorizing information, such as by using machine leaming and data mining techniques to automate categorization of articles in large volumes of data. In this research, automation is approached through the use of a machine leaming technique known as a Leaming Classifier Systems (LCS). The LCS performs the data mining task of categorizing articles using the SAM approach by utilizing training and testing datasets extracted from SAM EndNote bibliographic databases related to a specific area of research. In order to evaluate the ability of the LCS to predict category membership, accuracy-based metrics borrowed from the field of medicine are applied. The metrics include sensitivity, specificity, positive predictive value, and negative predictive value. After training, the evaluation results indicate that the predictive ability of the LCS system is greater than 90%. The results are obtained during the second trial of a five trial experiment.Item A multivariate comparison of dimensions of family functioning for first married and remarried families(Texas Tech University, 1986-05) Waldren, Terry EdwardNot availableItem Asymptotic relative efficiencies of the rank-score procedure in the multivariate two-way layout problem without interactions(Texas Tech University, 1986-08) Ren, Luh-yuNot availableItem Bayesian multivariate spatial models and their applications(Texas A&M University, 2004-11-15) Song, Joon JinUnivariate hierarchical Bayes models are being vigorously researched for use in disease mapping, engineering, geology, and ecology. This dissertation shows how the models can also be used to build modelbased risk maps for areabased roadway traffic crashes. Countylevel vehicle crash records and roadway data from Texas are used to illustrate the method. A potential extension that uses univariate hierarchical models to develop networkbased risk maps is also discussed. Several Bayesian multivariate spatial models for estimating the traffic crash rates from different types of crashes simultaneously are then developed. The specific class of spatial models considered is conditional autoregressive (CAR) model. The univariate CAR model is generalized for several multivariate cases. A general theorem for each case is provided to ensure that the posterior distribution is proper under improper and flat prior. The performance of various multivariate spatial models is compared using a Bayesian information criterion. The Markov chain Monte Carlo (MCMC) computational techniques are used for the model parameter estimation and statistical inference. These models are illustrated and compared again with the Texas crash data. There are many directions in which this study can be extended. This dissertation concludes with a short summary of this research and recommends several promising extensions.Item Changepoint detection and estimation using nonparametric procedures(Texas Tech University, 1989-05) Balakumar, SivanandanNot availableItem A comparative study of correlational outlier detection metrics(2008-05) Ritter, Paul Muse, 1961-; Beretvas, Susan NatashaThe present investigation was a Monte Carlo experiment designed to evaluate the performance of several metrics in spotting correlational outliers. Specifically, the metrics that were compared were the Mahalanobis D², Bacon MLD, Carrig D, MCD, Robust PCLOW and Robust PCHIGH. This was the first comparative simulation study to include robust PCLOW and robust PCHIGH. The Mahalanobis D², MCD, Robust PCLOW and Robust PCHIGH were each applied using an approximate statistical criterion. The Carrig D and Bacon MLD were applied using a "natural drop" approach that separated scores on the metric into two groups: outlying and non-outlying. The "natural drop" utilizes a k-means algorithm from cluster analysis to separate the scores into the two groups. Both majority and contaminant observations were generated from multivariate normal distributions based on factor-analytic models. Experimental factors included majority versus contaminant communality level, majority-contaminant factor models scenario, number of variables, sample size and fraction of outliers. Results indicated that the "natural drop" method of application for the Carrig D and Bacon MLD leads to intolerably high false-alarm rates. Overall, PCLOW clearly outperformed PCHIGH. Suprisingly, PCLOW did not distinguish itself from MCD in terms of performance as expected in certain experimental conditions. The conditions in this study were limited. Future comparative studies of the metrics could include conditions of non-normality and hybrid types of outliers (i.e. outliers that are both mean shift and correlational). Despite its poor performance in this study, I theorize that robust PCHIGH could have an advantage over MCD in spotting certain kinds of mean-shift outliers. Also, research into the distributional properties of the Carrig D is warranted.Item Differential sensing of hydrophobic analytes with serum albumins(2012-05) Ivy, Michelle Adams; Anslyn, Eric V., 1960-In the last decade, there has been a growing interest in the use of differential sensing for molecular recognition. Inspired by the mammalian olfactory system, differential sensing employs an array of non-selective receptors, which through cross-reactive interactions, create a distinct pattern for each analyte tested. The unique fingerprints obtained for each analyte with differential sensing are studied with statistical analysis techniques, such as principal component analysis and linear discriminant analysis. It was postulated that serum albumin proteins would be applicable to differential sensing schemes due to significant differences in sequence identity between different serum albumin species, and due to the wide range of hydrophobic molecules which are known to bind to these proteins. Consequently, cross-reactive serum albumin arrays were developed, utilizing hydrophobic fluorescent indicators to detect hydrophobic molecules. As such, serum albumin cross-reactive arrays were employed to discriminate subtly different hydrophobic analytes, and mixtures of these analytes, in the form of terpenes and perfumes, plasticizers and plastic explosive mixtures, and glycerides and adipocyte extracts. In this doctoral work, a detailed review of the field of differential sensing, and a thorough study of principal component analysis and linear discriminant analysis in various differential sensing scenarios, are given. These introductory chapters aid in better understanding the methods and techniques applied in later experimental chapters. In chapter 3, serum albumins, a PRODAN indicator, and an additive are shown to discriminate five terpene analytes and terpene doped perfumes. Chapter 4 describes an array with serum albumins, two dansyl fluorophores, and an additive which successfully differentiate the plasticizers found within the plastic explosives C4 and Semtex and simulated C4 and Semtex mixtures. Discrimination of these simulated mixtures was also achieved with this array in the presence of soil contaminants, demonstrating the potential real-world applicability of this sensing ensemble. Finally, chapter 5 details an array consisting of serum albumins, several fluorescent indicators, and a Grubb's olefin metathesis reaction, to differentiate saturated and unsaturated triglycerides, diglycerides, and monoglycerides. Mixtures of glycerides in adipocyte extracts taken from rats with different health states were then successfully discriminated, showing promise for clinical applications in differentiating adipoctyes from pre-diabetic, type 2 diabetic, and non-diabetic individuals.Item Discrete multistate coherent systems and the three modules theorem(Texas Tech University, 1989-08) Esparza, Sergio OSystems of n components, in which the system and its components either fail or function, and where the system is not degraded by performance improvement of any of its components, are known as binary coherent systems (BCS's). Sometimes there are subsets of components which function collectively as a single unit. In this case such subsets could be replaced by single units, consequently reducing the system, without affecting the system performance. These subsets are called modular sets and play a significant role in system reliability analysis. Complex systems are quite common in the present high-technology age; for example, nuclear power systems and space technology systems, among others. These complex systems require a large amount of computation, and interaction among their components can make the exact computation of their reliabilities virtually impossible. The analyst may attempt to attack these problems by incorporating modules into the system. These modules facilitate the analysis of the system, since they reduce its general configuration, and provide the analyst with a good methodology for obtaining bounds for the system reliability. For BCS's there is an important result known as the Three Modules Theorem (Birnbaum and Esary (1965)). This theorem asserts that if two modular sets have at least one common component, then we can obtain a larger modular set (by combining the two original sets) or three smaller modular sets (by decomposing the two original sets into disjoint sets). While BCS's have been extended to multistate coherent systems (MCS's) since the late 1970's, the Three Modules Theorem has undergone no generalization. In our paper this theorem is generalized to a specific class of multistate coherent systems. We also provide bounds for the reliability of a specific class of a MCS, by extending the (binary) results given by Shanthikumar (1986).Item Discriminant analysis with proportional covariance structure(Texas Tech University, 1982-05) Li, Eldon Yu-zen,Not availableItem Estimating the antecedents and consequences of early marriage: a path analytical approach(Texas Tech University, 1983-05) Witt, David DeanNot availableItem Estimation of the Multivariate Normal Distribution Function.(Texas Tech University, 1975-08) Davis, Mary A.Not Available.Item Expenditure patterns within an occupational group: teachers and non-teachers(Texas Tech University, 2004-05) Salim, Juma KNumerous studies of expenditure patterns have been conducted over aggregated occupational categories. However, there are few studies that specifically compare and contrast expenditure patterns among industry groups within an occupational field. This research examined the hypothesis that teachers' expenditure patterns were lower than administrators/managers and professionals who are grouped together in the manager/professional occupational field by the Bureau of Labor Statistics (BLS). It was also hypothesized that there were statistically significant differences in expenditure patterns within each industry group while controlling for some socio-demographic factors and consumer life cycle variables. The sample size of 3,976 was drawn from the Bureau of Labor Statistics Consumer Expenditure Survey (CES) interview tapes for the years 1995 through 2001. It consisted of 611 teachers, 1,353 administrators/managers, and 2,012 professionals. Multivariate Tobit analysis was used to examine statistical relationships related to expenditures for each of the groups of interest. Fourteen consumption categories (food at home, food away from home, alcoholic beverages, housing, apparel and services, transportation, health care, entertainment, personal care, reading materials, education, miscellaneous expenditures, cash contributions, and personal insurance and pensions) were treated as dependent variables and regressed against total expenditure (as a proxy for income), life cycle variables, region of residence, race/ethnicity, occupation, gender, and education of the reference persons. The descriptive analysis of the expenditure patterns illustrated that differences existed among administrators/managers, teachers, and professionals with respect to their distributions of expenditures. Findings indicated not only how industry group membership influences spending over the life cycle, but also ways in which consumer units make substitutions in consumption to meet their needs. The average total expenditures by teachers were lower by 15.3% and 16.5% than those of professionals and administrators/managers respectively. The total expenditure (as a proxy for income) was a driving force in determining the level of expense for all expenditure categories investigated. Occupation was shown to have significant effects for most items also. Various life cycle stages and other socio-demographic factors such as region of residence, race of the reference person, educational attainment, occupational group, and sex of the reference person were all found to be significant determinants of the pattern of expenditures within each occupational group. Statistically significant differences in spending patterns among teacher, professional, and administrator/manager consumer units were found for nine out of fourteen expenditure categories after controlling for socio-demographic factors. The findings have important implications for various agencies of the government at the federal, state and local levels, and for the business sector as well. They can facilitate development of a useful public policy and programs by government or community agencies that may help in reducing the recruitment and retention problems facing the education sector in the U.S. Businesses can use the results of this study as a guide for market segmentation in the potential areas.Item Factor analysis as a data compression technique(Texas Tech University, 1973-08) Pore, Michael DavidNot availableItem Feminism: the development of a scale and exploration of antecedents(Texas Tech University, 1980-05) Overton, Helen HawthorneNot available.Item Feminism: the development of a scale and exploration of antecedents(Texas Tech University, 1980-05) Overton, Helen HawthorneNot availableItem Investigation of bootstrap estimates of the parameters, their standard errors, and associated confidence intervals of structural equation models with ordered categorical variables(Texas Tech University, 1999-12) Fafouti, ElisabethThis study investigates the performance of the bootstrap when it is applied to structural equation models with ordered categorical variables. The study focuses on the parameter estimates, their standard errors and the coverage rates of the associated bootstrap confidence intervals. Structural equation models are used widely in many disciplines and often the data analyzed involve ordered categorical variables. The performance of the bootstrap has been investigated through simulation, and it is also compared with the Maximum Likelihood estimator applied on both polychoric correlation matrices and Pearson's product moment correlation matrices. The bootstrap samples are generated randomly and transformed, so that they preserve the covariance structure of the model. Then the polychoric correlation matrix is computed and analyzed for each sample. The study involves three different models, and for each model different sample sizes have been analyzed. One of the models that has been analyzed is one that Muthen and Kaplan used in their research to investigate the performance of the Categorical Variable Methodology (CVM) estimator, so direct comparisons between the two methods have been made. The bootstrap compares well with the CVM estimator. The results of this research indicate that the bootstrap pro\ ides correct standard errors that are larger than the standard errors obtained from the Maximum Likelihood estimator when it is applied on the Pearson's product moment correlation matrices. The coverage rates of the bootstrap confidence intervals have also been investigated, using two methods: the percentile method and the bootstrap-t method. The results are not very encouraging, especially for the bootstrap-t method, since the coverage rates are in some cases far away from the prespecified confidence level. The percentile method seems to perform better than the bootstrap-t method with regard to coverage rates, though it presents problems also. The performance of the bootstrap is affected by the sample size, the complexity of the model and the parameter values. Overall, the bootstrap performs rather adequately and could provide a valid alternative to other estimation methods for structural equation models if the researchers are cautious on its application.Item Maximum likelihood estimation in the random coefficient regression model via the EM algorithm(Texas Tech University, 1995-12) Wu, Jiang-MingMaximum likelihood estimates in a random coefficient regression (RCR) model from cross-sectional and time series sample data are presented, within the framework of expectation-maximization (EM) algorithm Unlike the model considered by Swamy (1970), the full rank assumption of the design matrix is not assumed in this research. A simulation study is performed to compare computational feasibility, in terms of CPU time, of the EM algorithm versus PROC MIXED in SAS/STAT®. Efficiencies of estimators using homoscedastic and heteroscedastic models are also compared. The RCR model is applied to the Ernst & Young/University of Michigan Individual Taxpayer Panel data to obtain the maximum likelihood estimates, and the results are compared to previously existing works. Some advantages and limitations of the EM algorithm are also discussed.Item Multivariate fault detection and visualization in the semiconductor industry(2006) Chamness, Kevin Andrew; Edgar, Thomas F.The semiconductor industry provides vast opportunities for process monitoring and multivariate fault detection. Most of the multivariate methods currently used in the industry are statistically-based techniques. These methods are also extended to monitor batch processes such as the process tools used in semiconductor manufacturing. In this dissertation, the existing statistical fault detection methodologies are discussed and compared to non-parametric modeling techniques for multivariate outlier detection. Inspired by these non-parametric modeling techniques, a new k Nearest Neighbor (KNN) multivariate fault detection method is proposed to augment the existing statistical methods. In this technique, instead of pre-computing a model, only a window of historic reference data is retained. The fault detection performance metric used in this algorithm provides universal scaling and confidence limits for the overall metric value, the block contributions, and individual variable contributions. It also has the flexibility to be tuned for local or global sensitivity when multiple populations are present within the reference data. This new KNN method also is extended to monitor batch processes. Two applications of the KNN method are created by simply unfolding the batch data or by selecting only reference data similar in batch time for each individual trace sample. Both KNN batch methods are compared against other existing batch methods to detect induced faults using a plasma etch experiment. The trace sample method performs among the best of all investigated batch techniques. This dissertation also introduces additional methods for monitoring systems with multivariate models. A complete software architecture is presented for reporting and visualization of multivariate results. This method takes advantage of block and variable contributions to guide users to the process variables with the most extreme and most frequent excursions. This system is applied to monitor final wafer electrical test data. In addition, methods are presented which assist the monitoring of drifting processes. A simple technique to recursively adapt the centering and scaling coefficients of a principal component analysis (PCA) model is presented. Movement metrics are also introduced to monitor the changes in these coefficients over time. These movement metrics allow visibility into the process changes which caused the model to adapt.Item On estimation and testing in the generalized multivariate linear model.(Texas Tech University, 1974-05) Tubbs, Jackie DaleThis dissertation is concerned with two areas in statistics, namely, estimation of the unknown parameter matrix and testing the general linear hypothesis in two multivariate linear models. The first model is the generalized multivariate model as proposed by Potthoff and Roy, the second is a special case of the first and is referred to as the usual multivariate model. The discussion is divided into six chapters. In Chapter II, mean-squared error estimators for the parameter matrix are found in the generalized multivariate model when the covariance matrix is possibly singular. In Chapter III, mean-squared error estimators are found when the usual multivariate model is restricted by a system of linear constraints. The estimators are given when the covariance matrix is either positive definite or positive semidefinite. In Chapter IV, the estimators from Chapter III are used with a normality assumption to develop the procedures for testing the general linear hypothesis in the less than full rank model. Examples are given for two multivariate analysis of variance models.