Browsing by Subject "yeast"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Combinatorial motif analysis in yeast gene promoters: the benefits of a biological consideration of motifs(Texas A&M University, 2005-02-17) Childs, KevinThere are three main categories of algorithms for identifying small transcription regulatory sequences in the promoters of genes, phylogenetic comparison, expectation maximization and combinatorial. For convenience, the combinatorial methods typically define motifs in terms of a canonical sequence and a set of sequences that have a small number of differences compared to the canonical sequence. Such motifs are referred to as (l, d)-motifs where l is the length of the motif and d indicates how many mismatches are allowed between an instance of the motif and the canonical motif sequence. There are limits to the complexity of the patterns of motifs that can be found by combinatorial methods. For some values of l and d, there will exist many sets of random words in a cluster of gene promoters that appear to form an (l, d)-motif. For these motifs, it will be impossible to distinguish biological motifs from randomly generated motifs. A better formalization of motifs is the (l, f, d)-motif that is derived from a biological consideration of motifs. The motivation for (l, f, d)-motifs comes from an examination of known transcription factor binding sites where typically a few positions in the motif are invariant. It is shown that there exist (l, f, d)-motifs that can be found in the promoters of gene clusters that would not be recognizable from random sequences if they were described as (l, d)-motifs. The inclusion of the f-value in the definition of motifs suggests that the sequence space that is occupied by a motif will consist of a several clusters of closely related sequences. An algorithm, CM, has been developed that identifies small sets of overabundant sequences in the promoters from a cluster of genes and then combines these simple sets of sequences to form complex (l, f, d)-motif models. A dataset from a yeast gene expression experiment is analyzed with CM. Known biological motifs and novel motifs are identified by CM. The performance of CM is compared to that of a popular expectation maximization algorithm, AlginACE, and to that from a simple combinatorial motif finding program.Item The dna ?saw puzzle??ructure model: the case studies of the rice and yeast genomes(2009-05-15) Liu, Yun-HuaHow does DNA make the abundant and diverged life world? To address this question, a DNA ?Jigsaw Puzzle? structure model was proposed and first tested by comprehensively analyzing the genome of the model dicot plant, Arabidopsis thaliana. However, it is unknown whether this model is held in other species. Here we report the studies of the DNA structure model using the monocot plant model species, rice (Oryza sativa), and the single-celled model species, yeast (Saccharomyces cerevisiae). Analyses of the genomes sequenced so far revealed that the genome of an organism consists of a limited number of sequence-specialized, so-called fundamental function elements. For a higher organism, these elements often include genes (GEN), retro-transposable elements (RTE), DNA transposable elements (DTE), simple sequence repeats (SSR) and low complex repeats (LCR). Datasets were developed for RTE, DTE, SSR, LCR and GEN as well as genes categorized into different function categories from the sequences of the rice and yeast genomes using appropriate window sizes. The datasets were subjected to statistical analyses to test the DNA ?Jigsaw Puzzle? structure model in terms of the unambiguousness, correlation, uniqueness and selection of their genome-constituting element arrays. The analyses were conducted with a series of window sizes of the sequences at both the whole genome and individual chromosome levels, both including and excluding the centromeric regions. The results showed that all fundamental function elements of the genomes as well as the genes categorized into different function categories were arrayed in the genomes in an unambiguous manner resembling linear ?Jigsaw Puzzles? at the whole genome and/or individual chromosome levels, no matter whether the centromeric regions were included or excluded. The analyses revealed that arraying of the genomic elements was correlated significantly and uniquely for each chromosome and each species. This further confirmed the non-random arraying characteristic of the genomic elements for the DNA ?Jigsaw Puzzle? structure model and suggested that the DNA ?Jigsaw Puzzle? structure is unique for an organism, which has probably resulted from natural selection. These results unambiguously support the hypothesis of the DNA ?Jigsaw Puzzle? structure model. Since the content, arraying and interaction pattern of the fundamental function elements were shown to be unique for each organism, variations of an organism in its DNA ?Jigsaw Puzzle? array would lead to phenotypic variations, thus resulting in different organisms. Moreover, the fundamental function elements constituting a genome, as the four nucleotides (A, T, G and C) of DNA, could be arrayed into an infinite number of DNA molecules, thus giving different forms of organisms. Therefore, the DNA ?Jigsaw Puzzle? structure model would provide a novel, but convincing explanation for the abundance, diversity and complexity of living organisms in the world.