Browsing by Subject "data mining"

Now showing 1 - 9 of 9

Analysis of the HSEES Chemical Incident Database Using Data and Text Mining Methodologies
(2012-07-16) Mahdiyati, -
Chemical incidents can be prevented or mitigated by improving safety performance and implementing the lessons learned from past incidents. Despite some limitations in the range of information they provide, chemical incident databases can be utilized as sources of lessons learned from incidents by evaluating patterns and relationships that exist between the data variables. Much of the previous research focused on studying the causal factors of incidents; hence, this research analyzes the chemical incidents from both the causal and consequence elements of the incidents. A subset of incidents data reported to the Hazardous Substance Emergency Events Surveillance (HSEES) chemical incident database from 2002-2006 was analyzed using data mining and text mining methodologies. Both methodologies were performed with the aid of STATISTICA software. The analysis studied 12,737 chemical process related incidents and extracted descriptions of incidents in free-text data format from 3,316 incident reports. The structured data was analyzed using data mining tools such as classification and regression trees, association rules, and cluster analysis. The unstructured data (textual data) was transformed into structured data using text mining, and subsequently analyzed further using data mining tools such as, feature selections and cluster analysis. The data mining analysis demonstrated that this technique can be used in estimating the incident severity based on input variables of release quantity and distance between victims and source of release. Using the subset data of ammonia release, the classification and regression tree produced 23 final nodes. Each of the final nodes corresponded to a range of release quantity and, of distance between victims and source of release. For each node, the severity of injury was estimated from the observed severity scores' average. The association rule identified the conditional probability for incidents involving piping, chlorine, ammonia, and benzene in the value of 0.19, 0.04, 0.12, and 0.04 respectively. The text mining was utilized successfully to generate elements of incidents that can be used in developing incident scenarios. Also, the research has identified information gaps in the HSEES database that can be improved to enhance future data analysis. The findings from data mining and text mining should then be used to modify or revise design, operation, emergency response planning or other management strategies.
Developing intelligent agents for training systems that learn their strategies from expert players
(Texas A&M University, 2005-11-01) Whetzel, Jonathan Hunt
Computer-based training systems have become a mainstay in military and private institutions for training people how to perform certain complex tasks. As these tasks expand in difficulty, intelligent agents will appear as virtual teammates or tutors assisting a trainee in performing and learning the task. For developing these agents, we must obtain the strategies from expert players and emulate their behavior within the agent. Past researchers have shown the challenges in acquiring this information from expert human players and translating it into the agent. A solution for this problem involves using computer systems that assist in the human expert knowledge elicitation process. In this thesis, we present an approach for developing an agent for the game Revised Space Fortress, a game representative of the complex tasks found in training systems. Using machine learning techniques, the agent learns the strategy for the game by observing how a human expert plays. We highlight the challenges encountered while designing and training the agent in this real-time game environment, and our solutions toward handling these problems. Afterward, we discuss our experiment that examines whether trainees experience a difference in performance when training with a human or virtual partner, and how expert agents that express distinctive behaviors affect the learning of a human trainee. We show from our results that a partner agent that learns its strategy from an expert player serves the same benefit as a training partner compared to a programmed expert-level agent and a human partner of equal intelligence to the trainee.
Diagnosing spatial variation patterns in manufacturing processes
(Texas A&M University, 2004-09-30) Lee, Ho Young
This dissertation discusses a method that will aid in diagnosing the root causes of product and process variability in complex manufacturing processes when large quantities of multivariate in-process measurement data are available. As in any data mining application, this dissertation has as its objective the extraction of useful information from the data. A linear structured model, similar to the standard factor analysis model, is used to generically represent the variation patterns that result from the root causes. Blind source separation methods are investigated to identify spatial variation patterns in manufacturing data. Further, the existing blind source separation methods are extended, enhanced and improved to be a more effective, accurate and widely applicable method for manufacturing variation diagnosis. An overall strategy is offered to guide the use of the presented methods in conjunction with alternative methods.
Domestic Surveillance and Government's Loss of Legitimacy
(2014-06-09) Inks, Christopher Scott; Inks, Christopher Scott; Phelps, James R; Phelps, James R
The terrorist attacks against the United States on the morning of September 11, 2001 created an environment ripe for the abuse of power. With a fearful nation clamoring for greater protection against future attacks, the National Security Administration (NSA) took the opportunity to create and implement a secret domestic spying and data mining program, the size of which had never before been imagined. Because information is the ultimate form of power in today’s world, unmitigated access to so much personal data has the potential to aggregate power into this one agency, leaving the rest of government and the populace unable to defend themselves against those who would use it to advance their own agendas. Once obtained, there is no way to check this power. Since government is only as legitimate as the populace believes it to be, such aggregations of power are likely to increase dissent among the citizenry and ultimately result in a belief that it has become illegitimate. Such a government is ineffective and puts the entirety of the populace in harm’s way, not only from terrorists outside its borders, but from potential domestic abuses of this power. In the rush to protect the country against terrorism, one must be careful the actions he or she takes do not inadvertently create a homeland security threat from within.
GPR Method for the Detection and Characterization of Fractures and Karst Features: Polarimetry, Attribute Extraction, Inverse Modeling and Data Mining Techniques
(2011-02-22) Sassen, Douglas Spencer
The presence of fractures, joints and karst features within rock strongly influence the hydraulic and mechanical behavior of a rock mass, and there is a strong desire to characterize these features in a noninvasive manner, such as by using ground penetrating radar (GPR). These features can alter the incident waveform and polarization of the GPR signal depending on the aperture, fill and orientation of the features. The GPR methods developed here focus on changes in waveform, polarization or texture that can improve the detection and discrimination of these features within rock bodies. These new methods are utilized to better understand the interaction of an invasive shrub, Juniperus ashei, with subsurface flow conduits at an ecohydrologic experimentation plot situated on the limestone of the Edwards Aquifer, central Texas. First, a coherency algorithm is developed for polarimetric GPR that uses the largest eigenvalue of a scattering matrix in the calculation of coherence. This coherency is sensitive to waveshape and unbiased by the polarization of the GPR antennas, and it shows improvement over scalar coherency in detection of possible conduits in the plot data. Second, a method is described for full-waveform inversion of transmission data to quantitatively determine fracture aperture and electromagnetic properties of the fill, based on a thin-layer model. This inversion method is validated on synthetic data, and the results from field data at the experimentation plot show consistency with the reflection data. Finally, growing hierarchical self-organizing maps (GHSOM) are applied to the GPR data to discover new patterns indicative of subsurface features, without representative examples. The GHSOMs are able to distinguish patterns indicating soil filled cavities within the limestone. Using these methods, locations of soil filled cavities and the dominant flow conduits were indentified. This information helps to reconcile previous hydrologic experiments conducted at the site. Additionally, the GPR and hydrologic experiments suggests that Juniperus ashei significantly impacts infiltration by redirecting flow towards its roots occupying conduits and soil bodies within the rock. This research demonstrates that GPR provides a noninvasive tool that can improve future subsurface experimentation.
Identifying nonlinear variaiton patterns in multivariate manufacturing processes
(Texas A&M University, 2005-02-17) Zhang, Feng
This dissertation develops a set of nonlinear variation pattern identification methods that are intended to aid in diagnosing the root causes of product variability in complex manufacturing processes, in which large amounts of high dimensional in-process measurement data are collected for quality control purposes. First, a nonlinear variation pattern model is presented to generically represent a single nonlinear variation pattern that results from a single underlying root cause, the nature of which is unknown a priori. We propose a modified version of a principal curve estimation algorithm for identifying the variation pattern. Principal curve analysis is a nonlinear generalization of principal components analysis (PCA) that lends itself well to interpretation and also has theoretically rich underpinnings. The principal curve modification involves a dimensionality reduction step that is intended to improve estimation accuracy by reducing noise and improving the robustness of the algorithm with the high-dimensional data typically encountered in manufacturing. An effective visualization technique is also developed to help interpret the identified nonlinear variation pattern and aid in root cause identification and elimination. To further improve estimation robustness and accuracy and reduce computational expense, we propose a local PCA based polygonal line algorithm to identify the nonlinear patterns. We also develop an approach for separating and identifying the effects of multiple nonlinear variation patterns that are present simultaneously in the measurement data. This approach utilizes higher order cumulants and pairwise distance based clustering to separate the patterns and borrows from techniques that are used in linear blind source separation. With the groundwork laid for a versatile flexible and powerful nonlinear variation pattern modeling and identification framework, applications in autobody assembly and stamping processes are investigated. The pattern identification algorithms, together with the proposed visualization approach, provides an effective tool to aid in understanding the nature of the root causes of variation that affect a manufacturing process.
Incident Data Analysis Using Data Mining Techniques
(2010-01-16) Veltman, Lisa M.
There are several databases collecting information on various types of incidents, and most analyses performed on these databases usually do not expand past basic trend analysis or counting occurrences. This research uses the more robust methods of data mining and text mining to analyze the Hazardous Substances Emergency Events Surveillance (HSEES) system data by identifying relationships among variables, predicting the occurrence of injuries, and assessing the value added by the text data. The benefits of performing a thorough analysis of past incidents include better understanding of safety performance, better understanding of how to focus efforts to reduce incidents, and a better understanding of how people are affected by these incidents. The results of this research showed that visually exploring the data via bar graphs did not yield any noticeable patterns. Clustering the data identified groupings of categories across the variable inputs such as manufacturing events resulting from intentional acts like system startup and shutdown, performing maintenance, and improper dumping. Text mining the data allowed for clustering the events and further description of the data, however, these events were not noticeably distinct and drawing conclusions based on these clusters was limited. Inclusion of the text comments to the overall analysis of HSEES data greatly improved the predictive power of the models. Interpretation of the textual data?s contribution was limited, however, the qualitative conclusions drawn were similar to the model without textual data input. Although HSEES data is collected to describe the effects hazardous substance releases/threatened releases have on people, a fairly good predictive model was still obtained from the few variables identified as cause related.
Location Prediction in Social Media Based on Tie Strength
(2013-04-29) McGee, Jeffrey A
We propose a novel network-based approach for location estimation in social media that integrates evidence of the social tie strength between users for improved location estimation. Concretely, we propose a location estimator ? FriendlyLocation? that leverages the relationship between the strength of the tie between a pair of users, and the distance between the pair. Based on an examination of over 100 million geo-encoded tweets and 73 million Twitter user profiles, we identify several factors such as the number of followers and how the users interact that can strongly reveal the distance between a pair of users. We use these factors to train a decision tree to distinguish between pairs of users who are likely to live nearby and pairs of users who are likely to live in different areas. We use the results of this decision tree as the input to a maximum likelihood estimator to predict a user?s location. We find that this proposed method significantly improves the results of location estimation relative to a state-of-the-art technique. Our system reduces the average error distance for 80% of Twitter users from 40 miles to 21 miles using only information from the user?s friends and friends-of-friends, which has great significance for augmenting traditional social media and enriching location-based services with more refined and accurate location estimates.
Power System Online Stability Assessment using Synchrophasor Data Mining
(2013-04-30) Zheng, Ce
Traditional power system stability assessment based on full model computation shows its drawbacks in real-time applications where fast variations are present at both demand side and supply side. This work presents the use of data mining techniques, in particular the Decision Trees (DTs), for fast evaluation of power system oscillatory stability and voltage stability from synchrophasor measurements. A regression tree-based approach is proposed to predict the stability margins. Modal analysis and continuation power flow are the tools used to build the knowledge base for off-line DT training. Corresponding metrics include the damping ratio of critical electromechanical oscillation mode and MW-distance to the voltage instability region. Classification trees are used to group an operating point into predefined stability state based on the value of corresponding stability indicator. A novel methodology for knowledge base creation has been elaborated to assure practical and sufficient training data. Encouraging results are obtained through performance examination. The robustness of the proposed predictor to measurement errors and system topological variations is analyzed. A scheme has been proposed to tackle the problem of when and how to update the data mining tool for seamless online stability monitoring. The optimal placement for the phasor measurement units (PMU) based on the importance of DT variables is suggested. A measurement-based voltage stability index is proposed and evaluated using field PMU measurements. It is later revised to evaluate the impact of wind generation on distribution system voltage stability. Next, a new data mining tool, the Probabilistic Collocation Method (PCM), is presented as a computationally efficient method to conduct the uncertainty analysis. As compared with the traditional Monte Carlo simulation method, the collocation method could provide a quite accurate approximation with fewer simulation runs. Finally, we show how to overcome the disadvantages of mode meters and ringdown analyzers by using DTs to directly map synchrophasor measurements to predefined oscillatory stability states. The proposed measurement-based approach is examined using synthetic data from simulations on IEEE test systems, and PMU measurements collected from field substations. Results indicate that the proposed method complements the traditional model-based approach, enhancing situational awareness of control center operators in real time stability monitoring and control.

Browsing by Subject "data mining"

Results Per Page

Sort Options