A DNA sequence (sometimes genetic sequence) is a succession of letters representing the primary structure of a real or hypothetical DNAmolecule or strand, The possible letters are A, C, G, and T, representing the four nucleotide subunits of a DNA strand (adenine, cytosine, guanine, thymine), and typically these are printed abutting one another without gaps, as in the sequence AAAGTCTGAC. This coded sequence is sometimes referred to as genetic information. A succession of any number of nucleotides greater than four is liable to be called a sequence. With regard to its biological function, which may depend on context, a sequence may be sense or anti-sense (see DNA), and either coding or noncoding. DNA sequences can also contain "junk DNA".
DNA may be imagined to be a highly ordered, purposeful molecule, and one might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates.
DNA may be imagined to be a highly ordered, purposeful molecule, and one might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates, and confirm our intuition that it is far from random.
The DNAsequences were chosen to pass the following criteria: sufficient length to support this type of entropy estimation method, inclusion of a wide variety of species and sequence types to evaluate the generality of the method, and inclusion of sequences used to benchmark other published methods.
Nair reported the presence of an intrinsic DNA bend with a broad locus of curvature for the human cdc2 promoter and concluded that this intrinsic bend might influence the protein-induced bending of a promoter region and therefore the subsequent interaction of regulatory factors with the transcription machinery [Nair, 1998].
Sequences of human exons and introns were taken from the Non-Redundant Functionally Equivalent Sequences Database (NRFES) of Konopka [Konopka, 1993].
Furthermore, recent studies on the sequence patterns in nucleosomal DNA indicates that only certain DNAsequences bind to the most stable nucleosome subset and that there is a significant amount of these sequences in centromeres suggesting a functional role for these stable nucleosomes [Widlund et al., 1997].