Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs

Childs, Kevin

Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs

dc.contributor	Ioerger, Thomas
dc.creator	Childs, Kevin
dc.date.accessioned	2005-02-17T20:59:26Z
dc.date.accessioned	2017-04-07T19:49:21Z
dc.date.available	2005-02-17T20:59:26Z
dc.date.available	2017-04-07T19:49:21Z
dc.date.created	2004-12
dc.date.issued	2005-02-17
dc.description.abstract	There are three main categories of algorithms for identifying small transcription regulatory sequences in the promoters of genes, phylogenetic comparison, expectation maximization and combinatorial. For convenience, the combinatorial methods typically define motifs in terms of a canonical sequence and a set of sequences that have a small number of differences compared to the canonical sequence. Such motifs are referred to as (l, d)-motifs where l is the length of the motif and d indicates how many mismatches are allowed between an instance of the motif and the canonical motif sequence. There are limits to the complexity of the patterns of motifs that can be found by combinatorial methods. For some values of l and d, there will exist many sets of random words in a cluster of gene promoters that appear to form an (l, d)-motif. For these motifs, it will be impossible to distinguish biological motifs from randomly generated motifs. A better formalization of motifs is the (l, f, d)-motif that is derived from a biological consideration of motifs. The motivation for (l, f, d)-motifs comes from an examination of known transcription factor binding sites where typically a few positions in the motif are invariant. It is shown that there exist (l, f, d)-motifs that can be found in the promoters of gene clusters that would not be recognizable from random sequences if they were described as (l, d)-motifs. The inclusion of the f-value in the definition of motifs suggests that the sequence space that is occupied by a motif will consist of a several clusters of closely related sequences. An algorithm, CM, has been developed that identifies small sets of overabundant sequences in the promoters from a cluster of genes and then combines these simple sets of sequences to form complex (l, f, d)-motif models. A dataset from a yeast gene expression experiment is analyzed with CM. Known biological motifs and novel motifs are identified by CM. The performance of CM is compared to that of a popular expectation maximization algorithm, AlginACE, and to that from a simple combinatorial motif finding program.
dc.identifier.uri	http://hdl.handle.net/1969.1/1351
dc.language.iso	en_US
dc.publisher	Texas A&M University
dc.subject	d4bgene
dc.subject	promoter
dc.subject	motif
dc.subject	yeast
dc.title	Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs
dc.type	Book
dc.type	Thesis

Collections

Texas A&M University at College Station

Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs

Files

Collections