Browsing by Subject "Prediction"
Now showing 1 - 20 of 24
Results Per Page
Sort Options
Item A hydrograph-based prediction of meander migration(Texas A&M University, 2006-08-16) Wang, WeiMeander migration is a process in which water flow erodes soil on one bank and deposits it on the opposite bank creating a gradual shift of the bank line over time. For bridges crossing such a river, the soil foundation of the abutments may be eroded away before the designed lifetime is reached. For highways parallel to and close to such a river, the whole road may be eaten away. This problem is costing millions of dollars to TxDOT in protection of affected bridges and highway embankments. This research is aimed at developing a methodology which will predict the possible migration of a meander considering the design life of bridges crossing it and highways parallel to it. The approaches we use are experimental tests, numerical simulation, modeling of migration, risk analysis, and development of a computer program. Experimental tests can simulate river flow in a controlled environment. Influential parameters can be chosen, adjusted, and varied systematically to quantify their influence on the problem. The role of numerical simulation is to model the flow field and the stress field at the soil-water interface. Migration modeling is intended to integrate the results of experimental tests and numerical simulations and to develop a model which can make predictions. The Hyperbolic Model is used and its two major components Mmax equation and τmax equation are developed. Uncertainties in the parameters used for prediction make deterministic prediction less meaningful. Risk analysis is used to make the prediction based on a probabilistic approach. Hand calculation is too laborious to apply these procedures. Thus the development of a user friendly computer program is needed to automate the calculations. Experiments performed show that the Hyperbolic Model matches the test data well and is suitable for the prediction of meander migration. Based on analysis of shear stress data from numerical simulation, the τmax equation was derived for the Hyperbolic Model. Extensive work on the simplification of river geometry produced a working solution. The geometry of river channels can be automatically simplified into arcs and straight lines. Future hydrograph is critical to risk analysis. Tens of thousands of hydrographs bearing the same statistical characteristics as in history can be generated. The final product that can be directly used, the MEANDER program, consists of 11,600 lines of code in C++ and 2,500 lines of code in Matlab, not including the part of risk analysis. The computer program is ready for practice engineers to make predictions based on the findings of this research.Item A methodology for memory chip stress levels prediction(Texas A&M University, 2006-10-30) Sharma, KartikThe reliability of electronic component plays an important role in proper functioning of the electronic devices. The manufacturer tests electronic components before they are used by end users. Still at times electronic devices fail due to undue stresses existing inside the microelectronic components such as memory chips, microcontrollers, resistors etc. The stresses can be caused by variation in the operating voltage, variation in the usage frequency of the particular chip and other factors. This variation leads to variation in chip temperature, which can be made evident from thermal profiles of these chips. In this thesis, effort was made to study two different kind of stress existing in the electronic board, namely signal stress based on variation in duty cycle/frequency of chip usage and the voltage stress. Memory chips were stressed using these stresses causing change in heating rates, which were captured by infrared camera. This data was then extracted and plotted to obtain different curves for the heating rate. The same experiment was done time and again for a large number of chips to get heating rate data. This data consisting of average heating rate for large number of chips was used to build Neural Network model (NN). Back Propagation algorithm was used for modeling because of its advantage of converging to solution faster compared to other algorithms. To develop a prediction model, data sets were divided into two-third and one-third parts. This two-thirds of the data was used to build the prediction model and remaining one third was used to evaluate the model. The designed model would predict the stress levels existing in the chips based on the heating rates of the chips. Results obtained suggested 1. There is difference in heating rate for chips stressed at different stress levels. 2. Accuracy of the model to predict the stress is high (greater than 90 %). 3. Model is robust enough that is it can yield efficient results even if there is presence of noise in the data. 4. Generic methodology can be proposed based on the experiments. This work is a progress in direction of making predictive model, for a complete electronic device, which can predict the stress level existing on any component in the device and will provide an opportunity to either protect the data or removal of the defected components timely before it even fails.Item A prediction of meander migration based on large-scale flume tests in clay(2009-05-15) Park, NamgyuMeander migration is a complex and dynamic process of the lateral movement of a river due to erosion on one bank and deposition on the opposite bank. As a result, the channel migrates in a lateral direction, which might be a major concern for the safety of bridges during their life span of 75 years. Although there are several existing models for predicting meander migration of a river, none of them are based on the physical model tests on a specific type of soil. A total of eight flume tests are conducted to develop a prediction equation of meander migration in clay. The test results of migration rate follow a hyperbolic function, and spatial distribution of the maximum migration distance is fitted with the Pearson IV function. The proposed equations of the initial migration rate and the maximum migration distance, obtained by a multiple regression technique, are validated with the laboratory data. A new methodology for risk analysis is developed to process a number of predicted channel locations based on each future hydrograph generated in such a way that all the hydrographs have the same probability of occurrence. As the output from risk analysis, a CDF map is created for a whole river representing a general trend of migration movement along with the probability associated with new location of the river. In addition, a separate screen is generated with a CDF plot for a given bridge direction so that bridge engineers can read a specific migration distance along the bridge corresponding to the target risk level (e.g. 1 %). The newly developed components through this research are incorporated with the other components in the MEANDER program which is a stand-alone program and the final outcome of the research team. Verification study of the MEANDER program is conducted with full-scale field data at the Brazos River at SH 105, Texas. The prediction results matched quite well with the measured field data. However, a more extensive verification study for other sites is highly recommended.Item A study in the predictive ability of the Draw-A-Person Test(Texas Tech University, 1969-05) Corbet, TadPsychologists are making increasing use of human figure drawings as a diagnostic tool. The Draw-A-Person Test (DAP), according to Sundberg (I96I), ranks second only to the Rorschach in use as a tool for psychological assessment. While there has been much research on some of the projective tests such as the Rorschach, there has been com.paratively little research with the DAP. What is more discouraging is the fact that in the research that has been done, there is little significant data upon which the clinician can base his judgments.Item CPU performance in the age of big data : a case study with Hive(2016-12) Shulyak, Alexander Cole; John, Lizy KurianDistributed SQL Query Engines (DSQEs), like Hive, Shark, and Impala, have become the de-facto database set-up for Decision Support Systems with large database sizes. Unlike their single-threaded counterparts like MySQL, DSQEs experience inefficiencies related to the algorithm, code base, OS, and CPU micro-architecture that limit throughput despite the speedup from distributed execution. In my thesis, I present a detailed performance analysis of a DSQE called Hive, comparing it to MySQL, a single-threaded database application. Hive has difficulty converting queries into a set of MapReduce jobs for distributed execution. Hive also experiences a startup phase that is a significant overhead for short running queries. Additionally, both Hive and MySQL, like other server applications, experience high L1I miss rates due to a large code footprint. However, because MySQL is algorithmically efficient and traverses the database at a faster rate, it incurs a larger back-end bottleneck from LLC misses, which hides the front-end bottleneck. In contrast, Hive does not hide the high L1I cache miss rate with back-end stalls. Additionally, the higher context switch rates experienced by multi-process Hive setups thrash the first level caches, further inflaming the L1I cache miss rate. To address this micro-architectural inefficiency, I propose an instruction prefetch mechanism called Runahead Prefetch. It is similar to previously proposed branch prediction base prefetchers [19], but designed to easily extend modern Intel microarchitectures. Despite newer instruction prefetch mechanisms that discount branch prediction based prefching potential [8] [9] [12], I show Runahead Prefetch can eliminate 92% of L1I misses and 96% of icache stalls on average given modern branch misprediction rates and sufficient runahead.Item Development of beliefs about chance and luck(2011-12) Cornelius, Chelsea Ann; Woolley, Jacqueline D.; Bigler, Rebecca S.; Legare, Cristine H.Children ages 5 and 8 dropped a marble into a box and made predictions about which of two doors the marble would exit. Participants provided explanations and certainty ratings for each of their predictions. A lucky charm was used in a second round of the game, in which half of participants experienced an increase in success and half did not. Results indicated that older children were more cognizant of the chance nature of the game, however both age groups exhibited misconceptions about the predictability of chance outcomes. When asked to explain their overall success in Round 2, only 8 year-olds who experienced an increase in success and a perfect success rate reliably endorsed the lucky charm. Results are discussed with reference to literature on children’s and adults’ understanding of chance. We also discuss developmental patterns in the use of luck as an explanatory tool.Item Error analysis for randomized uniaxial stretch test on high strain materials and tissues(Texas A&M University, 2006-08-16) Jhun, Choon-SikMany people have readily suggested different types of hyperelastic models for high strain materials and biotissues since the 1940??s without validating them. But, there is no agreement for those models and no model is better than the other because of the ambiguity. The existence of ambiguity is because the error analysis has not been done yet (Criscione, 2003). The error analysis is motivated by the fact that no physical quantity can be measured without having some degree of uncertainties. Inelastic behavior is inevitable for the high strain materials and biotissues, and validity of the model should be justified by understanding the uncertainty due to it. We applied the fundamental statistical theory to the data obtained by randomized uniaxial stretch-controlled tests. The goodness-of-fit test (2R) and test of significance (t-test) were also employed. We initially presumed the factors that give rise to the inelastic deviation are time spent testing, stretch-rate, and stretch history. We found that these factors characterize the inelastic deviation in a systematic way. A huge amount of inelastic deviation was found at the stretch ratio of 1.1 for both specimens. The significance of this fact is that the inelastic uncertainties in the low stretch ranges of the rubber-like materials and biotissues are primarily related to the entropy. This is why the strain energy can hardly be determined by the experimentation at low strain ranges and there has been a deficiency in the understanding of the exclusive nature of the strain energy function at low strain ranges of the rubber-like materials and biotissues (Criscione, 2003). We also found the answers for the significance, effectiveness, and differences of the presumed factors above. Lastly, we checked the predictive capability by comparing the unused deviation data to the predicted deviation. To check if we have missed any variables for the prediction, we newly defined the prediction deviation which is the difference between the observed deviation and the point forecasting deviation. We found that the prediction deviation is off in a random way and what we have missed is random which means we didn??t miss any factors to predict the degree of inelastic deviation in our fitting.Item Observation Method to Predict Meander Migration and Vertical Degradation of Rivers(2014-03-05) Montalvo Bartolomei, Axel MMeander migration and vertical degradation of river bed are processes that have been studied for years. These two erosion controlled processes consist of the gradual change of the geometry of the river due to the flow of water eroding the soil. This erosion may cause a shift that could be a threat to existing bridges, highways, and useful lands. Different methods have been proposed to make predictions of the behavior of rivers with respect to these processes. Many of these methods are used to predict the migration rate and the final position of the bankline or centerline of a river, assuming that the erosion rate is constant for a certain time. However, most of these methods ignore one of the three general processes of meander migration and vertical degradation: geometry, flow, and soil. Therefore, there is need for a method that can accurately predict the amount of erosion that may occur in rivers. Six different sites in Texas were selected for this project. Four of the selected rivers have meander migration problems, and two rivers have vertical degradation problems. Each river has shown erosion problems that have been a threat to the bridges, roads or farm lands. A new method, called the Observation Method, was developed to predict meander migration and vertical degradation by using geometry, water flow, and soil erodibility. Aerial photos and maps from different years were obtained to study the change of the geometry of the rivers. River hydrographs were obtained from the U.S. Geological Survey to estimate the river velocity from daily flow. Soil samples from each site were obtained for laboratory testing, using the Erosion Function Apparatus. A code was written in MATLAB and Excel to estimate the critical velocity by using a model based on the erosion function obtained from the erosion tests. It is important to know where the river was and its history to be able to predict where the river will be. The erosion of each river from the six sites was estimated using the model and predictions were made for 10 years after the last observation for each case. This method proved to be a simple and quick way to obtain results for the movement of one point of the river.Item Performance Projections of HPC Applications on Chip Multiprocessor (CMP) Based Systems(2012-07-16) Shawky Sharkawi, Sameh ShPerformance projections of High Performance Computing (HPC) applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in the design of future systems and help HPC users with system procurement and application refinements. In this dissertation, we present an efficient method to project the performance of HPC applications onto Chip Multiprocessor (CMP) based systems using widely available standard benchmark data. The main advantage of this method is the use of published data about the target machine; the target machine need not be available. With the current trend in HPC platforms shifting towards cluster systems with chip multiprocessors (CMPs), efficient and accurate performance projection becomes a challenging task. Typically, CMP-based systems are configured hierarchically, which significantly impacts the performance of HPC applications. The goal of this research is to develop an efficient method to project the performance of HPC applications onto systems that utilize CMPs. To provide for efficiency, our projection methodology is automated (projections are done using a tool) and fast (with small overhead). Our method, called the surrogate-based workload application projection method, utilizes surrogate benchmarks to project an HPC application performance on target systems where computation component of an HPC application is projected separately from the communication component. Our methodology was validated on a variety of systems utilizing different processor and interconnect architectures with high accuracy and efficiency. The average projection error on three target systems was 11.22 percent with standard deviation of 1.18 percent for twelve HPC workloads.Item Predicting required maintenance and repair funding based on standard facility data elements(Texas Tech University, 2007-05) Tolk, Janice N.; Collins, Terry R.; Simonton, James L.; Smith, Milton L.; Matis, Timothy I.Government entities and educational institutions have billions of dollars invested in facility portfolios designed to supply services to those that they support. Maintaining these portfolios requires continuous investment to keep them viable in order to meet their intended mision. In the past fifteen years, owners of these portfolios have realized that the facilities have degraded to the point that they may not be usable, they may require a significant investment to return them to full service, and they require a continuous financial commitment to maintain them. Both government and educational institution managers have realized that they have allowed this situation to occur due to chronic underinvestment in annual maintenance. Now they are faced with a large backlog of deferred maintenance and potential loss of mission. This research investigates the underlying cause of chronic underfunding of the annual maintenance and repair of large facility portfolios, reviews the related literature for existing methods for estimating annual maintenance and repair funding, and develops a model that can be used by a facility portfolio manager based on facilty attributes commonly found in a condition assessment program. In addition, the research determines the effect on the developed model from varying facility portfolio size and facility model types, and compares the developed model to three models most often cited in the related literature. Using multiple regression analysis, a prediction equation has been derived for the research portfolio, and is found to have good correlation to one of the models cited in the literature. It does not have good correlation to two of the models cited in the literature. Further, the research found that "fine tuning" a prediction equation to a specific facility portfolio yields the best results, although a more generic model is useful for an order of magnitude estimate.Item Prediction of Asphalt Mixture Compactability from Mixture, Asphalt, and Aggregate Properties(2010-07-14) Muras, Andrew J.The underlying purpose of any pavement is to provide a safe, smooth and reliable surface for the intended users. In the case of hot mix asphalt (HMA) pavements, this includes producing a surface that is resistant to the principal HMA distress types: permanent deformation (or rutting) and fatigue damage (or cracking). To protect better against these distress types, there have recently been changes in HMA mixture design practice. These changes have had the positive effect of producing more damage resistant mixtures but have also had the effect of producing mixtures that require more compaction effort to obtain required densities. It is important to understand what properties of an HMA mixture contribute to their compactability. This study presents analysis of the correlation between HMA mixture properties and laboratory compaction parameters for the purpose of predicting compactability. Mixture property data were measured for a variety of mixtures; these mixtures were compacted in the laboratory and compaction parameters were collected. A statistical analysis was implemented to correlate the mixture data to the compaction data for the purpose of predicting compactability. The resulting model performs well at predicting compactability for mixtures that are similar to the ones used to make the model, and it reveals some mixture properties that influence compaction. The analysis showed that the binder content in an HMA mixture and the slope of the aggregate gradation curve are important in determining the compactability of a mixture.Item Prediction of automotive turbocharger nonlinear dynamic forced response with engine-induced housing excitations: comparisons to test data(2009-05-15) Maruyama, Ashley KatsumiThe trend in passenger vehicle internal combustion (IC) engines is to produce smaller, more fuel-efficient engines with power outputs comparable to those of large displacement engines. One way to accomplish this goal is through using turbochargers (TCs) supported on semi-floating ring bearings (SFRBs). The thesis presents progress on the nonlinear modeling of rotor-bearing systems (RBSs) by including engine-induced TC housing excitations. Test data collected from an engine-mounted TC unit operating to a top speed of 160 krpm (engine speed = 3,600 rpm) validates the nonlinear predictions of shaft motion. Engine-induced housing excitations are input into the nonlinear time transient rotor model as Fourier coefficients (and corresponding phase angles) derived from measured TC center housing accelerations. Analysis of recorded housing accelerations shows the IC engine induces TC motions with a broad range of subsynchronous frequencies, rich in engine (e) superharmonics. In particular, 2e and 4e vibration frequencies contribute greatly to housing motion. Most importantly, the analysis reveals TC center and compressor housings do not vibrate as a rigid body. Eigenvalue analysis of the TC system evidences four damped natural frequencies within the TC operating speed range. However, only the highest damped natural frequency (first elastic mode, f = 2,025 Hz, ? = 0.14) is lightly-damped (critical speed = 150 krpm). Predicted linear and nonlinear imbalance response amplitudes increase with TC shaft speed, with linear predictions agreeing with test data at high shaft speeds. The differences between predictions and test data are attributed to an inaccurate knowledge of the actual TC rotor imbalance distribution. For the nonlinear analysis, predicted shaft motions not accounting for housing accelerations show the TC is stable (i.e. no subsynchronous whirl) at all but the lowest shaft speeds (<70 krpm). However, predicted shaft motions accounting for housing accelerations, as well as the test data, reveal TC motions rich in subsynchronous activity. Clearly, engine-induced housing accelerations have a significant impact on TC shaft motions. Predicted total shaft motions show good agreement with test data. Predicted nonlinear subsynchronous amplitudes as well as peak shaft amplitudes also agree well with test data. However, nonlinear predictions only show TC shaft vibrations attributed to engine order frequencies below 6e, whereas test data evidences TC vibrations are due to order frequencies greater than 6e. Overall, nonlinear predictions and test data illustrate the importance of accounting for engine-induced housing vibrations in the design and operation of TC systems. The good agreement between predictions and test data serve to validate the rotor model. The tools developed will aid a TC manufacturer in reducing development time and expenditures.Item Prediction of end-to-end single flow characteristics in best-effort networks(Texas A&M University, 2005-08-29) Shukla, Yashkumar DipakkumarThe nature of user traffic in coming years will become increasingly multimediaoriented which has much more stringent Quality of Service (QoS) requirements. The current generation of the public Internet does not provide any strict QoS guarantees. Providing Quality of Service (QoS) for multimedia application has been a difficult and challenging problem. Developing predictive models for best-effort networks, like the Internet, would be beneficial for addressing a number of technical issues, such as network bandwidth provisioning, congestion avoidance/control to name a few. The immediate motivation for creating predictive models is to improve the QoS perceived by end-users in real-time applications, such as audio and video. This research aims at developing models for single-step-ahead and multi-stepahead prediction of end-to-end single flow characteristics in best-effort networks. The performance of path-independent predictors has also been studied in this research. Empirical predictors are developed using simulated traffic data obtained from ns-2 as well as for actual traffic data collected from PlanetLab. The linear system identification models Auto-Regressive (AR), Auto-Regressive Moving Average (ARMA) and the non-linear models Feed-forward Multi-layer Perceptron (FMLP) have been used to develop predictive models. In the present research, accumulation is chosen as a signal to model the end-to-end single flow characteristics. As the raw accumulation signal is extremely noisy, the moving average of the accumulation isused for the prediction. Developed predictors have been found to perform accurate single-step-ahead predictions. However, as the multi-step-ahead prediction horizon is increased, the models do not perform as accurately as in the single-step-ahead prediction case. Acceptable multi-step-ahead predictors for up to 240 msec prediction horizon have been obtained using actual traffic data.Item Prediction of gas-hydrate formation conditions in production and surface facilities(Texas A&M University, 2006-10-30) Ameripour, ShararehGas hydrates are a well-known problem in the oil and gas industry and cost millions of dollars in production and transmission pipelines. To prevent this problem, it is important to predict the temperature and pressure under which gas hydrates will form. Of the thermodynamic models in the literature, only a couple can predict the hydrate-formation temperature or pressure for complex systems including inhibitors. I developed two simple correlations for calculating the hydrate-formation pressure or temperature for single components or gas mixtures. These correlations are based on over 1,100 published data points of gas-hydrate formation temperatures and pressures with and without inhibitors. The data include samples ranging from pure-hydrate formers such as methane, ethane, propane, carbon dioxide and hydrogen sulfide to binary, ternary, and natural gas mixtures. I used the Statistical Analysis Software (SAS) to find the best correlations among variables such as specific gravity and pseudoreduced pressure and temperature of gas mixtures, vapor pressure and liquid viscosity of water, and concentrations of electrolytes and thermodynamic inhibitors. These correlations are applicable to temperatures up to 90????F and pressures up to 12,000 psi. I tested the capability of the correlations for aqueous solutions containing electrolytes such as sodium, potassium, and calcium chlorides less than 20 wt% and inhibitors such as methanol less than 20 wt%, ethylene glycol, triethylene glycol, and glycerol less than 40 wt%. The results show an average absolute percentage deviation of 15.93 in pressure and an average absolute temperature difference of 2.97????F. Portability and simplicity are other advantages of these correlations since they are applicable even with a simple calculator. The results are in excellent agreement with the experimental data in most cases and even better than the results from commercial simulators in some cases. These correlations provide guidelines to help users forecast gas-hydrate forming conditions for most systems of hydrate formers with and without inhibitors and to design remediation schemes such as: ???? Increasing the operating temperature by insulating the pipelines or applying heat. ???? Decreasing the operating pressure when possible. ???? Adding a required amount of appropriate inhibitor to reduce the hydrateformation temperature and/or increase the hydrate-formation pressure.Item A predictive model for sand production in poorly consolidated sands(2010-12) Kim, Sung Hyun, 1983-; Sharma, Mukul M.; Prodanovic, MasaThis thesis presents a model for the process of sand production that allows us to predict the stability of wellbores and perforation tunnels as well as mass of sand produced. Past analytical, numerical, and empirical models on material failure and erosion mechanisms were analyzed. The sand production model incorporates shear and tensile failure mechanisms. A criterion for sand erosion in failed sand was proposed based on a force balance calculation on the sand face. It is shown that failure, post failure sand mechanics and flow-dominated erosion mechanisms are important in the sand production process. The model has a small number of required input parameters that can be directly measured in the lab and does not require the use of empirical correlations for determining sand erosion. The model was implemented in a numerical simulator. Three different experiments using different materials were simulated and the results were compared to test the model. The model-generated results successfully matched the sand production profiles in experiments. When the post-failure behavior of materials was well-known, the match between the simulation and experiment was excellent. Sensitivity studies on the effect of mechanical stresses, flow rates, cohesion, and permeability show qualitative agreement with experimental observations. In addition, the effect of two-phase flow was presented to emphasize the importance of the water-weakening of the sand. These results show that catastrophic sand production can occur following water breakthrough. Finally the impact of increasing sand cohesion by the use of sand consolidation chemicals was shown to be an effective strategy for preventing sand production.Item Single station Doppler tracking for satellite orbit prediction and propagation(2015-05) Dykstra, Matthew C.; Fowler, Wallace T.; Lightsey, E. GlennPresently, there are two main methods of launching a cube satellite into Earth orbit. The first method is to purchase a secondary payload slot on a major launch vehicle. For the second method, the satellite must first be transported via a major launch vehicle to the International Space Station. From there, the satellite is loaded into one of two deployment mechanisms, and deployed at a specified time. In each case, the satellite's initial orbit is not accurately known. For ground operators this poses a problem of position uncertainty. In order to solve this problem, a satellite tracking algorithm was developed to use an initial two-line element set for coarse orbit prediction, followed by Doppler measurements for continuous processing and updating. The system was tested using simulated data. The analysis showed that this low-cost, scalable system will satisfy the tracking requirements of many cube satellite missions, including current missions at the University of Texas.Item Software defect data - predictability and exploration(2006-12) Kulkarni, Aniruddha P.; Hewett, Rattikorn; Shin, Michael; Denton, JasonSoftware defect reports have been prominently used in reliability modeling. Data about the defects found during software testing is recorded in software defect reports or bug reports. The data consists of defect information including defect number at various testing stages, complexity (of the defect), severity, information about the component to which the defect belongs, tester, and person fixing the defect. Reliability models mainly use data about the number of defects and its corresponding time to predict the remaining number of defects. This thesis proposes an empirical approach to systematically elucidate useful information from software defect reports by (1) employing a data exploration technique that analyzes relationships between software quality of different releases using appropriate statistics, and (2) constructing predictive models for forecasting time for fixing defects using existing machine learning and data mining techniques. This work differs from traditional software reliability in two ways. First, it aims to predict time for fixing defects, as opposed to the remaining number of defects. While the latter gives a useful measure of software quality, in practice it cannot be used directly for development planning since defect number is not linear with respect to time and resources required. On the contrary, prediction of the time for fixing defects can be used directly to help schedule and manage software activities. Second, while reliability models are mainly based on a small number of attributes of defect data with numerical attribute values, the proposed approach extends use of defect data to include more relevant attributes whose values can be both quantitative and qualitative. To illustrate the approach, we present an empirical study on a software defect report collected during the testing of a large medical software system. For data exploration, we use defects found per component and investigate relationships between defects in modules before and after release. For building predictive models, we apply various well-established machine learning and data mining algorithms including the decision tree learner, the Naive Bayes learner and neural networks with back propagation learning. The average results obtained from these algorithms are compared and also to illustrate the robustness of the proposed approach to predict time for fixing defects. The results obtained are promising with the top performance model having an average accuracy of 93.5%.Item Teacher Certification Exams: Predicting Failure on the TExES History (8-12) Content Exam (A Nonparametric Approach using Classification Trees)(2011-05) Gard, Dwight R.; Simpson, Douglas J.; Murray, John P.; Wang, Eugene W.; Tipton, Pamela E.Previous research efforts concerning teacher certification in Texas focused primarily on the Pedagogy and Professional Responsibilities exam; an exam that all teacher candidates must pass regardless of their specific content area. Few studies have attempted to explore which variables are useful for predicting the outcome of the TExES content-area certification exams, which represents a major gap in the literature. Because of its high failure rate, this study focused on identifying factors that were influential in predicting failure on the TExES History (8-12) certification exam. A convenience sample was used and only those who had taken the TExES History (8-12) exam from 2002 – 2008 were selected (n = 181). The study is an exploratory data design using classification trees—a nonparametric statistical technique often associated with data mining. The study was different from previous studies in two important aspects: a) the study included a much wider range of variables, and b) nonparametric, classification tree methodology was used to build predictive models. Using the proportional chance criterion and Press’ Q to assess significance, the models were statistically significant (p < .05), indicating that the models were capable of predicting outcomes well beyond what would be expected based on chance. Because classification trees produce a set of decision rules that can be graphically depicted, a model based on a decision tree paradigm is more intuitive, and more easily interpreted and implemented compared to regression methods. Although classification trees are not widely used in social science research, the success of the technique in the current study suggests that classification trees can be an effective, nonparametric alternative to the more traditional multiple regression and logistic regression methods and provides researchers a glimpse of the capabilities of classification trees.Item Temporal modeling of crowd work quality for quality assurance in crowdsourcing(2015-12) Jung, Hyun Joon; Lease, Matthew A.; Mooney, Raymond; Bennett, Paul; Fleischmann, Kenneth; Wallace, Byron CWhile crowdsourcing offers potential traction on data collection at scale, it also poses new and significant quality concerns. Beyond the obvious issue of any new methodology being untested and often suffering initial growing pains, crowdsourcing has faced a very particular criticism since its inception: given anonymity of crowd workers, it is questionable whether we can trust their contributions as much as work completed by trusted workers. To relieve this concern, recent studies have proposed a variety of methods. However, while temporal behavioral patterns can be discerned to underlie real crowd work, prior studies have typically modeled worker performance under an assumption that a sequence of model variables is independent and identically distributed (i.i.d). This dissertation focuses on the measurement and prediction of crowd work quality by considering its temporal properties. To better model such temporal worker behavior, we present a time-series prediction model for crowd work quality. This model captures and summarizes past worker label quality, enabling us to better predict the quality of each worker’s next label. Further- more, we propose a crowd assessor model for predicting crowd work quality more accurately. By taking account of multi-dimensional features of a crowd assessor, we aim to build a better quality prediction model of crowd work. Finally, this dissertation explores how the proposed prediction models work under realistic scenarios. In particular, we consider a realistic use case in which limited gold labels are provided for learning our proposed model. For this problem, we leverage instance weighting with soft labels, which takes ac- count of uncertainty of each training instance. Our empirical evaluation with synthetic datasets and a public crowdsourcing dataset has shown that our pro- posed models significantly improve prediction quality of crowd work as well as lead to an acquisition of better quality labels in crowdsourcing.Item Understanding and Predicting Changes in Precipitation and Water Availability Under the Influence of Large-Scale Circulation Patterns: Rio Grande and Texas(2012-12-11) Khedun, Chundun 1977-Large-scale circulation patterns have a significant modulating influence on local hydro-meteorological variables, and consequently on water availability. An understanding of the influence of these patterns on the hydrological cycle, and the ability to timely predict their impacts, is crucial for water resources planning and management. This dissertation focusses on the influence of two major large-scale circulation patterns, the El Ni?o Southern Oscillation (ENSO) and the Pacific Decadal Oscillation (PDO), on the Rio Grande basin and the state of Texas, US. Both study areas are subject to a varying climate, and are extremely vulnerable to droughts, which can have devastating socio-economic impacts. The strength and spatial correlation structure of the climate indices on gauged precipitation was first established. Precipitation is not linearly related to water availability; therefore a land surface model (LSM), with land use land cover constant, was used to create naturalized flow, as it incorporates all necessary hydro-meteorological factors. As not all ENSO events are created equal, the influence of individual El Ni?o and La Ni?a events, classified using four different metrics, on water availability was examined. A general increase (decrease) in runoff during El Ni?os (La Ni?as) was noted, but some individual events actually caused a decrease (increase) in water availability. Long duration El Ni?os have more influence on water availability than short duration high intensity events. Positive PDO enhances the effect of El Ni?o, and dampens the negative effect of La Ni?a, but when it is in its neutral or transition phase, La Ni?a tends to dominate climatic conditions and reduce water availability. LSM derived runoffs were converted into 3-month Standardized Runoff Indices (SRI 3) from which water deficit durations and severities were extracted. Conditional probability models of duration and severity were developed and compared with that based on observed precipitations. It was found that model derived information can be used in regions having limited ground observation data, or can be used in tandem with observation driven conditional probabilities for more efficient water resources planning and management. Finally a multidimensional model was developed, using copulas, to predict precipitation based on the phase of ENSO and PDO. A bivariate model, with ENSO and precipitation, was compared to a trivariate model, which incorporates PDO, and it was found that information on the state of PDO is important for efficient precipitation predictions.