Data cleaning and knowledge discovery in process data

Xu, Ph. D., Shu

Data cleaning and knowledge discovery in process data

dc.contributor.advisor	Edgar, Thomas F.	en
dc.contributor.committeeMember	Wojsznis, Willy	en
dc.contributor.committeeMember	Djurdjanovic, Dragan	en
dc.contributor.committeeMember	Rochelle, Gary T.	en
dc.contributor.committeeMember	Baldea, Michael	en
dc.contributor.committeeMember	Daniels, Michael J.	en
dc.creator	Xu, Ph. D., Shu	en
dc.date.accessioned	2016-02-09T16:35:17Z	en
dc.date.accessioned	2018-01-22T22:29:27Z
dc.date.available	2016-02-09T16:35:17Z	en
dc.date.available	2018-01-22T22:29:27Z
dc.date.issued	2015-12	en
dc.date.submitted	December 2015	en
dc.date.updated	2016-02-09T16:35:17Z	en
dc.description.abstract	This dissertation presents several methods for overcoming the Big Data challenges, with an emphasis on data cleaning and knowledge discovery in process data. Data cleaning and knowledge discovery is chosen as a main research area here due to its importance from both theoretical and practical points of view. Theoretical background and recent developments of data cleaning methods are reviewed from four aspects: missing data imputation, outlier detection, noise removal and time delay estimation. Moreover, the impact of contaminated data on model performance and corresponding improvement obtained by data cleaning methods are analyzed through both simulated and industrial case studies. The results provide a starting point for further advanced methodology development. It is hard to find a universally applicable method for data cleaning since every data set may have its own distinctive features. Thus, we have to customize available methods so that the quality of the data set is guaranteed. An integrated data cleaning scheme is proposed, which incorporates model building and performance evaluation, to provide guidance in tuning the parameters of data cleaning methods and prevent over-cleaning. A case study based on industrial data has been used to verify the feasibility and effectiveness of the proposed new method, during which a partial least squares (PLS) model was built and three univariate data cleaning procedures is tested. A time series Kalman filter (TSKF) is proposed that successfully handles outlier detection in dynamic systems, where normal process changes often mask the existence of outliers. The TSKF method combines a time series model fitting procedure with a modified Kalman filter to deal with additive outlier (AO) and innovational outlier (IO) detection problems in dynamic process data set. A comparative analysis of TSKF and available methods is performed on simulated and real chemical plant data. Root cause diagnosis of plant-wide oscillations, as a concrete example of data cleaning and knowledge discovery in the process data, is provided. Plant-wide oscillations can negatively influence the overall control performance of the process and the detection results are often affected by noise at different frequency ranges. To address such a problem, an information transfer method combining spectral envelope algorithm with spectral transfer entropy is proposed to detect and diagnose such oscillations within a specific frequency range, mitigating the effects from measurement noise. The feasibility and effectiveness of the proposed method are verified and compared with available methods through both simulated and industrial case studies.	en
dc.description.department	Chemical Engineering	en
dc.format.mimetype	application/pdf	en
dc.identifier	doi:10.15781/T2XT13	en
dc.identifier.uri	http://hdl.handle.net/2152/32920	en
dc.language.iso	en	en
dc.subject	Data cleaning	en
dc.subject	Knowledge discovery	en
dc.title	Data cleaning and knowledge discovery in process data	en
dc.type	Thesis	en

Collections

University of Texas at Austin

Data cleaning and knowledge discovery in process data

Files

Collections