Browsing by Subject "Multivariate analysis."
Now showing 1 - 7 of 7
Results Per Page
Sort Options
Item Application of chemometric analysis to UV-visible and diffuse near-infrared reflectance spectra.(2007-08-21T16:23:23Z) Davis, Christopher Brent.; Busch, Kenneth W.; Busch, Marianna A.; Chemistry and Biochemistry.; Baylor University. Dept. of Chemistry and Biochemistry.Multivariate analysis of spectroscopic data has become more common place in analytical investigations due to several factors, including diode-array spectrometers, computer-assisted data acquisition systems, and chemometric modeling software. Chemometric regression modeling as well as classification studies were conducted on spectral data obtained with chili peppers and fabrics samples. Multivariate regression models known as partial least squares (PLS-1) were developed from the spectral data of alcoholic extracts of Habanero peppers. The developed regression models were used to predict the total capsaicinoids concentration of a set of unknown samples. The ability of the regression models to correctly predict the total capsaicinoids concentration of unknown samples was evaluated in terms of the root mean square error or prediction (RMSEP). The prediction ability of the models produced was found to be robust and stable over time and in the face of instrumental modifications. A near-infrared spectral database was developed from over 800 textile samples. Principal components analysis (PCA) was performed on the diffuse near-infrared reflectance spectra from these commercially available textiles. The PCA models were combined together into a soft independent modeling of class analogy (SIMCA) in order to classify the samples according to fiber type. The samples in the study had no pretreatments. The discriminating power of these models was tested by creating validation sets within a given fiber type as well as attempting to classify samples into a category that they do not belong to. The apparent sub-class groupings within the same fiber class were investigated as to whether or not they were caused by chemical processing residues, multipurpose finishes, or dyes.Item Logistic regression with misclassified response and covariate measurement error: a Bayesian approach.(2007-12-04T19:56:26Z) McGlothlin, Anna E.; Stamey, James D.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.In a variety of regression applications, measurement problems are unavoidable because infallible measurement tools may be expensive or unavailable. When modeling the relationship between a response variable and covariates, we must account for the uncertainty that is inherently introduced when one or both of these variables are measured with error. In this dissertation, we explore the consequences of and remedies for imperfect measurements. We consider a Bayesian analysis for modeling a binary outcome that is subject to misclassification. We investigate the use of informative conditional means priors for the regression coefficients. Additionally, we incorporate random effects into the model to accommodate correlated responses. Markov chain Monte Carlo methods are utilized to perform the necessary computations. We use the deviance information criterion to aid in model selection. Next, we consider data where measurements are flawed for both the response and explanatory variables. Our interest is in the case of a misclassified dichotomous response and a continuous covariate that is unobservable, but where measurements are available on its surrogate. A logistic regression model is developed to incorporate the measurement error in the covariate as well as the misclassification in the response. The methods developed are illustrated through an example. Results from a simulation experiment are provided illustrating advantages of the approach. Finally, we expand this model to incorporate random effects, resulting in a generalized linear mixed model for a misclassified response and covariate measurement error. We demonstrate the use of this model using a simulated data set.Item Multivariate analyses of near-infrared and UV spectral data.(2009-07-01T16:59:19Z) Dogra, Jody A.; Busch, Kenneth W.; Busch, Marianna A.; Chemistry and Biochemistry.; Baylor University. Dept. of Chemistry and Biochemistry.Various chemometric analyses were applied to spectroscopic data with goals to develop alternative methods that could be employed in government or industrial settings. With the concerns of these organizations in mind, the described methods are cost-effective and time-efficient. The first method is aimed at establishing a time of death from skeletal remains—an issue that continues to be difficult for the forensic community. Following death, the skeletal remains undergo changes in chemical composition. This includes the breakdown of protein and the loss of water. Near-infrared spectroscopy is sensitive to vibrations associated with both protein and water. In the described method, near-infrared reflectance measurements of aging porcine skeletal remains were correlated to postmortem interval (PMI). Initial studies were conducted to determine the optimum sampling orientation—cross-sectional or surface. Several chemometric approaches were investigated, but the best results were obtained through a scheme involving classification by partial least-squares discriminant analysis (PLS–DA) followed by segmented partial least-squares regression (PLSR). The method was evaluated through independent test sets. The optimized method was able to predict PMI with an average deviation of six days. A brief field study was also conducted and yielded similar results. The second study relates to a present analytical encumbrance faced by the pharmaceutical industry, namely assuring the enantiomeric purity of chiral active pharmaceutical ingredients (APIs). With the rising number of chiral drugs on the market, the analytical burden continues to increase. Ultraviolet absorption spectral data were correlated to enantiomeric composition by PLSR for solutions containing a chiral analyte and a chiral ionic liquid (IL) as a chiral selector. Test set evaluation gave results of average deviations of ± 4.0–12 units of %D depending on the analyte and chiral IL involved. Finally, a quality control analysis was demonstrated, which follows a classification format where the sample either meets or does not meet the specified requirement regarding enantiomeric purity. Test set evaluation gave results of 97% correct classifications for a threshold of 1% impurity.Item Multivariate analysis of luminescence spectra as a means of determining postmortem interval.(2011-09-14) Diamond, Patricia A.; Busch, Kenneth W.; Busch, Marianna A.; Chemistry and Biochemistry.; Baylor University. Dept. of Chemistry and Biochemistry.Post-mortem interval (PMI) is the time elapsed since a person died. Currently there is no accurate method for determining PMI of skeletal remains. Existing methods are best suited for deciding whether a bone is of forensic interest, meaning less than fifty years old. This is a problem for areas that have extreme climates, specifically those areas that experience high heat and high humidity, which accelerate decomposition. The objective of this study was to develop a method to accurately predict PMI of skeletal remains through luminescence studies of the change in the intensity of the luminol reaction with skeletal remains over time. Previous research in the area demonstrated that a correlation can be found between the PMI and the change in intensity over long periods of time. This research aims to demonstrate a similar correlation with PMI and to correctly predict the PMI of skeletal remains over much shorter age ranges.Item A restriction method for the analysis of discrete longitudinal missing data.(2007-02-07T18:57:26Z) Moore, Page Casey.; Seaman, John Weldon, 1956-; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.Clinical trial endpoints are traditionally either physical or laboratory responses. However, such endpoints fail to reflect how patients feel or function in their daily activities. Missing data is inevitable in most every clinical trial regardless of the amount of effort and pre-planning that originally went into a study. Many researchers often resort to ad hoc methods(e.g. case-deletion or mean imputation) when they are faced with missing data, which can lead to biased results. An alternative to these ad hoc methods is multiple imputation. Sources of missing data due to patient dropout in health related quality of life (HRQoL) studies most often result from one of the following: toxicity, disease progression, or therapeutic effectiveness. As a result, nonignorable (NMAR) missing data are the most common type of missing data found in HRQoL studies. Studies involving missing data with a NMAR mechanism are the most difficult type of data to analyze primarily for two reasons: a large number of potential models exist for these data and the hypothesis of random dropout can be neither confirmed nor repudiated. The performance of methods used for the analysis of discrete longitudinal clinical trial data considered to have a nonignorable missingness mechanism under the commonly applied restriction of monotone dropout were developed and evaluated in this dissertation. Monotone dropout, or attrition, occurs when responses are available for a patient until a certain occasion and missing for all subsequent occasions. The purpose of this study is to investigate the performance of different imputation methods available to researchers for handling the problem of missing data where the parameters of interest are six QoL assessments scheduled for collection across six equally spaced visits. We evaluate the relative effectiveness of three commonly used imputation methods, along with three restriction methods and a newly developed restriction method, through a simulation study. The new restriction method is a straightforward technique that provides superior overall performance and much higher coverage rates relative to the other methods under investigation.Item Selected topics in statistical discriminant analysis.(2007-02-07T19:01:17Z) Ounpraseuth, Songthip T.; Young, Dean M.; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.This dissertation consists of three selected topics in statistical discriminant analysis: dimension reduction, regularization methods, and imputation methods. In Chapter 2 we first derive a new linear dimension-reduction method to determine a low-dimensional hyperplane that preserves or nearly preserves the separation of the individual populations and the Bayes probability of misclassification. Next, we derive a new low-dimensional representation-space approach for multiple high-dimensional multivariate normal populations. Third, we develop a linear dimension reduction method for quadratic discriminant analysis when the class population parameters must be estimated. Using a Monte Carlo simulation with several different parameter configurations, we compare our new methodology with two competing linear dimension-reduction procedures for statistical discrimination in terms of expected error rates. We find that under certain conditions, our new dimension-reduction method yields superior results for a majority of the configurations we consider. In addition, we determine that in several configurations, classification performance is actually enhanced by our new feature-reduction method when the sample size is sufficiently small relative to the original feature space dimension. In Chapter 3 we compare and contrast the efficacy of seven regularization methods for the quadratic discriminant function under a variety of parameter configurations. In particular, we use the expected error rate to assess the efficacy of these regularized quadratic discriminant functions. A two-parameter family of regularized class covariance-matrix estimators derived by Friedman (1989) yields superior classification results relative to its six competitors for the configurations, training-sample sizes, and original feature dimensions examined here. Finally, in Chapter 4 we consider the statistical classification problem for two multivariate normal populations with equal covariance matrices when the training samples contain observations missing at random. That is, we analyze the effect of missing-at-random data on Anderson's linear discriminant function. We use a Monte Carlo simulation to examine the expected probabilities of misclassification under several single and multiple imputation methods. The seven missing-data algorithms include: complete observation, mean substitution, expectation maximization, regression, predictive mean matching, propensity score, and MCMC. The regression, predictive mean, and propensity score multiple imputation approaches are, in general, superior to the other methods for the configurations and training-sample sizes we consider.Item Statistical considerations in the analysis of multivariate Phase II testing.(2009-04-01T12:08:15Z) Hetzer, Joel D.; Johnston, Dennis A.; Statistical Sciences.; Baylor University. Dept. of Statistical Sciences.In medical diagnosis and treatment, many diseases are characterized by multiple measurable differences in clinical (e.g., physical or radiological differences) and laboratory parameters (biomarkers from "healthy levels". Each of these differences is a symptom of the disease often correlated with the other symptoms. In Phase II human trials, the level of a single symptom is often used as a surrogate for disease level. In multi-symptom diseases, all relevant symptoms often should be included in the evaluation, thus a multivariate approach to Phase II analysis becomes critical. In this dissertation we formulate seven (7) multivariate tests for use in multivariate Phase II studies. Each method is evaluated using metabolic syndrome with data obtained from the public domain data set, NHANES III as an example to train the algorithms and the associated tests.