Tensor generalizations of the singular value decomposition for integrative analysis of large-scale molecular biological data

Date

2007-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The structure of large-scale molecular biological data is often of an order higher than that of a matrix, especially when integrating data from different studies. Flattened into a matrix format, much of the information in the data is lost. I describe the use of higher-order generalizations of singular value decomposition (SVD) - both the higher-order singular value decomposition (HOSVD) and Parallel Factorization (PARAFAC) - in transforming tensors into simplified spaces. I apply these transformations to a series of DNA microarray datasets from different studies tabulated in a tensor of genes × time × conditions, specifically an integration of genome-scale mRNA expression data from three yeast-cell cycle time courses. One of the time courses was under exposure to the oxidative stress agent hydrogen peroxide (HP); another was exposed to menadione (MD) and the third was unstressed[45]. The HOSVD transforms the tensor to a “core tensor” of “eigenarrays” × “timeeigengenes” × “condition-eigengenes,” where the eigenarrays, time-eigengenes and condition-eigengenes are unique orthonormal superpositions of the genes, times and conditions, respectively. This HOSVD, also known as N-mode SVD, formulates the tensor as a linear superposition of all possible outer products of an eigenarray, a timeeigengene and a condition-eigengene, i.e., rank-1 “subtensors,” the superposition coefficients of which are tabulated in the core tensor. Each coefficient indicates the significance of the corresponding subtensor in terms of the overall information it captures in the data. PARAFAC reformulates the same data tensor into a sum of rank-1 tensor of F elements that best approximate the data tensor in a least square sense. I show that significant rank-1 subtensors can be associated with independent biological processes, which are manifested in the data tensor. Subtensors of the HOSVD capture the subprocesses: stress response, pheromone response and developmental stage. The data suggests that the conserved genes YKU70, MRE11, AIF1 and ZWF1, as well as the genes involved in the processes of retrotransposition, apoptosis and the oxidative pentose phosphate cycle may play significant, yet previously unrecognized, roles in the differential effects of HP and MD on cell cycle progression. Subtensors of PARAFAC capture the same biological processes as the 2 most significant HOSVD subtensors. A genome-wide correlation between DNA replication and initiation of RNA transcription, which is equivalent to a recently discovered correlation and might be due to a previously unknown mechanism of regulation, is independently uncovered.

Description

Keywords

Citation