Browsing by Subject "Data Mining"

Now showing 1 - 7 of 7

Automated Inclusive Design Heuristics Generation with Graph Mining
(2013-08-01) Sangelkar, Shraddha Chandrakant
Inclusive design is a concept intended to promote the development of products and environments equally usable by all users, irrespective of their age or ability. This research focuses on developing a method to derive heuristics for inclusive design. The research applies the actionfunction diagram to model the interaction between a user and a product, design difference classification to compare a typical product with its inclusive counterpart, graph theory to mathematically represent the comparison relations, and graph data mining to extract the design heuristics. The goal of this research is to formalize and automate the inclusive-design heuristics generation process. The rule generation allows statistical mining of the design guidelines from existing inclusive products. Formalization results show that, the rate of rule generation decreases as more products are added to the dataset. The automated method is particularly helpful in the developmental stages of graph mining applications for product design. The graph mining technique has capability for graph grammar induction, which is extended here to automate the generation of engineering grammars. In general, graph mining can be applied to extract design heuristics from any discrete and relational design data that can be represented as graphs. Concept generation studies are conducted to validate the heuristics derived in this research for inclusive product design. In addition, an inclusivity rating is created and verified to evaluate the inclusiveness of the conceptual ideas. Finally, appreciation and awareness about inclusive design is important in an engineering design course, hence, a module is compiled to teach inclusive design methods in a capstone design course. The results of the exploratory study and validation show that there is problem dependency in the application of the representation scheme. It cannot be stated with certainty at this point if the representation scheme is helpful for designing consumer products, where only the activities related to the upper body are involved. However, self-reported feedback indicates that the teaching module is effective in increasing the awareness and confidence about inclusive design.
Data mining of market information to assess at-home pork demand
(Texas A&M University, 2004-09-30) Asatryan, Armen A.
This study analyzes the economic and demographic patterns of at-home pork consumption for representative individuals over 18 years of age in the United States. Three data sets purchased by the National Pork Board (NPB) are mined for this purpose: (1) National Eating Trends (NET) data from National Panel Diary (NPD) on individuals' intake and their demographic characteristics; (2) weekly retail prices for fresh meats and fresh pork cuts from FreshLook; and (3) weekly retail prices for processed pork products from A.C. Nielsen. Heckman sample selection models are used to find demographic, health, and attitudinal/lifestyle patterns of consumption of twelve fresh and processed pork products as well as beef, chicken, and seafood. In the fall, individuals have a higher probability of eating beef, chicken, pork tenderloin, and bacon, but a lower probability of eating fresh seafood, canned ham, and smoked ham relative to the spring. The New England region has the highest likelihood of eating fresh pork, beef, chicken, seafood, pork roasts, pork tenderloin, and pork hotdogs. Blacks, on average, eat more fresh and processed pork, chicken, pork sausage, bacon, and canned ham, but less beef relative to whites. Concern about serving food with fat is negatively related with the likelihood of eating processed pork, lunchmeat, ham, and bacon, but it is positively related with the likelihood of eating pork hotdogs. A three-stage selectivity-adjusted censored LA/AIDS model is developed and estimated to find demand-price relationships for: (1) fresh meats (pork, beef, chicken, and seafood) and (2) nine fresh and processed pork cuts. However, aggregate fresh meats are substitutes for each other in at-home market, but there are substantial complementarities between pork cuts. Pork sausage is the major competitor for the processed products, pork roasts and pork tenderloin, but a major complement for pork ribs. There is relatively week substitutability between pork and beef, and relatively strong substitutability between pork and chicken and between beef and chicken. This could suggest opportunities for some joint marketing efforts between pork and beef commodity interests. This information can be used as a guide for marketing strategists for targeting and promotion as well as for category management of the disaggregated pork products.
Defining Social Network Structure Through Text Similarity Analysis: A Model for Promoting Collaboration and Examining Conditions Impacting the Success of Collaborative Endeavors Within a Research Community
(2007-05-22) Moser, Courtney Joy; Krumwiede, Kimberly Hoggatt
Given the breadth and sheer volume of accumulated scientific knowledge, individual researchers often lack the requisite knowledge and resources to adequately address increasingly complex problems; therefore, many researchers are realizing the advantages afforded by collaborative research practices. The application of text data mining technologies to social networking strategies provides a novel approach to identifying opportunities for scientific collaboration through text similarity analysis, provided by the computer program eTSNAP. The data set submitted to eTSNAP comprised 137 research abstracts representing individual scientists affiliated with the Regional Centers of Excellence in Biodefense and Emerging Infectious Diseases. Examination of the data in the form of tables, matrices, and interactive similarity network maps revealed the presence of eight discrete clusters of individuals, linked by the similarity of their abstracts. Further analysis of structural and functional characteristics of each cluster permitted the selection of a single cluster with the highest probability of collaborative success to serve as the pilot cluster. Members of this pilot cluster, renamed the "anthrax cluster" in reference to the common theme of research, received an introductory packet of information explaining the design of the project and soliciting participation in a preliminary survey, developed with intentions of assessing collaborative readiness and garnering practical information to assist in the preparation of a future teleconference. When multiple requests failed to elicit an adequate response, further attempts at establishing collaborative relationships between these researchers merely represented an exercise in futility. Evaluation of this project ultimately consisted of a secondary telephone interview with cluster members along with an in-depth literature review; both components of the final evaluation endeavored to isolate and examine factors that facilitate or inhibit collaboration within a research environment. Results suggest that similar interests alone cannot sustain successful collaboration; rather, complex interactions between a multitude of interconnected variables essentially determine collaborative outcomes.
The IRIDESCENT System: An Automated Data-Mining Method to Identify, Evaluate, and Analyze Sets of Relationships Within Textual Databases
(2003-02-01) Wren, Jonathan Daniel; Garner, Harold R.
Individuals are limited in their ability to read, remember and compare relationships within the vast amount of scientific literature available. This is not only because the amount of literature is increasing exponentially, but the number of things being researched within is as well. Adding to the scale of analysis are new technologies that increase the rate by which data is being gathered from scientific experiments. For most areas of research interest, the scale of analysis exceeds an individual's ability to be aware of all the relationships contained within. Thus, an informatics approach is necessary to identify large-scale trends, shared relationships and novel relationships that are not contained within the literature, but are the logical consequence of the relationships that are. A system has been designed to establish a network of relationships between "objects" of research interest (e.g. genes, chemical compounds, drugs, diseases and clinical phenotypes) by extracting information from scientific text in an automated manner. This system, called IRIDESCENT (Implicit Relationship IDEntification by in-Silico Construction of an Entity-based Network from Text), enables the discovery of novel relationships by identifying and scoring objects sharing large sets of relationships with an object of interest. IRIDESCENT also allows sets of objects to be analyzed for shared relationships, such as responding genes from a microarray experiment. Herein is described the development and workings of IRIDESCENT as well as several well-developed applications of the system.
Novel applications of data mining methodologies to incident databases
(Texas A&M University, 2006-08-16) Anand, Sumit
Incident databases provide an excellent opportunity to study the repeated situations of incidents in the process industry. The databases give an insight into the situation which led to an incident, and if studied properly can help monitor the process, equipment and chemical involved more closely, and reduce the number of incidents in the future. This study examined a subset of incidents from National Response Center??s Incident database, focusing mainly on fixed facility incidents in Harris County, Texas from 1990 to 2002. Data mining has been used in the financial and marketing arena for many decades to analyze and find patterns in large amounts of data. Realizing the limited capabilities of traditional methods of statistics, more robust techniques of data mining were applied to the subset of data and interesting patterns of chemical involved, equipment failed, component involved, etc. were found. Further, patterns obtained by data mining on the subset of data were used in modifying probabilities of failure of equipment and developing a decision support system.
Open source software development and maintenance: an exploratory analysis
(2009-06-02) Raja, Uzma
The purpose of this research was to create measures and models for the evaluation of Open Source Software (OSS) projects. An exploratory analysis of the development and maintenance processes in OSS was conducted for this purpose. Data mining and text mining techniques were used to discover knowledge from transactional datasets maintained on OSS projects. Large and comprehensive datasets were used to formulate, test and validate the models. A new multidimensional measure of OSS project performance, called project viability was defined and validated. A theoretical and empirical measurement framework was used to evaluate the new measure. OSS project data from SourceForge.net was used to validate the new measure. Results indicated that project viability is a measure of the performance of OSS projects. Three models were then created for each dimension of project viability. Multiple data mining techniques were used to create the models. Variables identified from process, product, resource and end-user characteristics of the project were used. The use of new variables created through text mining improved the performance of the models. The first model was created for OSS projects in the development phase. The results indicated that end-user involvement could play a significant role in the development of OSS projects. It was also discovered that certain types of projects are more suitable for development in OSS communities. The second model was developed for OSS projects in their maintenance phase. A two-stage model for maintenance performance was selected. The results indicated that high project usage and usefulness could improve the maintenance performance of OSS projects. The third model was developed to investigate the affects of maintenance activities on the project internal structure. Maintenance data for Linux project was used to develop a new taxonomy for OSS maintenance patches. These results were then used to study the affects of various types of patches on the internal structure of the software. It was found that performing proactive maintenance on the software moderates its internal structure.
Privacy-preserving data mining
(2009-05-15) Zhang, Nan
In the research of privacy-preserving data mining, we address issues related to extracting knowledge from large amounts of data without violating the privacy of the data owners. In this study, we first introduce an integrated baseline architecture, design principles, and implementation techniques for privacy-preserving data mining systems. We then discuss the key components of privacy-preserving data mining systems which include three protocols: data collection, inference control, and information sharing. We present and compare strategies for realizing these protocols. Theoretical analysis and experimental evaluation show that our protocols can generate accurate data mining models while protecting the privacy of the data being mined.

Browsing by Subject "Data Mining"

Results Per Page

Sort Options