|
7Gene finding typically refers to the area of computational biology that is concerned with algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically functional. This especially includes protein-coding genes, but may also include other functional elements such as RNA genes and regulatory regions. Gene finding is one of the first and most important steps in understanding the genome of a species once it has been sequenced. Computational biology is an interdisciplinary field that applies the techniques of computer science and applied mathematics to problems inspired by biology. ...
In biology the genome of an organism is the whole hereditary information of an organism that is encoded in the DNA (or, for some viruses, RNA). ...
The structure of part of a DNA double helix Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions for the development and function of living organisms. ...
For other meanings of this term, see gene (disambiguation). ...
A non-coding RNA (ncRNA) is any RNA molecule that functions without being translated into a protein. ...
In biochemistry, a regulatory region is a DNA base sequence that controls gene expression. ...
In genetics and biochemistry, sequencing means to determine the primary structure (or primary sequence) of an unbranched biopolymer. ...
In its earliest days, "gene finding" was based on painstaking experimentation on living cells and organisms. Statistical analysis of the rates of homologous recombination of several different genes could determine their order on a certain chromosome, and information from many such experiments could be combined to create a genetic map specifying the rough location of known genes relative to each other. Today, with comprehensive genome sequence and powerful computational resources at the disposal of the research community, gene finding has been redefined as a largely computational problem. Chromosomal crossover is the process by which two chromosomes, paired up during Prophase I of meiosis, exchange some distal portion of their DNA. Crossover occurs when two chromosomes, normally two homologous instances of the same chromosome, break and then reconnect but to the different end piece. ...
THERE ARE NOW 30 CHROMOSOMES!!!!!! Figure 1: A representation of a condensed eukaryotic chromosome, as seen during cell division. ...
A genetic map is a chromosome map of a species or experimental population that shows the position of its known genes and/or markers relative to each other, rather than as specific physical points on each chromosome. ...
Determining that a sequence is functional should be distinguished from determining the function of the gene or its product. The latter still demands in vivo experimentation through gene knockout and other assays, although frontiers of bioinformatics research are making it increasingly possible to predict the function of a gene based on its sequence alone. In vivo (Latin for (with)in the living). ...
A gene knockout is a genetically engineered organism that carries one or more genes in its chromosomes that has been made inoperative. ...
Map of the human X chromosome (from the NCBI website). ...
Extrinsic Approaches In extrinsic gene finding systems, the target genome is searched for sequences that are similar to extrinsic evidence in the form of the known sequence of a messenger RNA (mRNA) or protein product. Given an mRNA sequence, it is trivial to derive a unique genomic DNA sequence from which it had to have been transcribed. Given a protein sequence, a family of possible coding DNA sequences can be derived by reverse translation of the genetic code. Once candidate DNA sequences have been determined, it is a relatively straightforward algorithmic problem to efficiently search a target genome for matches, complete or partial, and exact or inexact. BLAST is a widely used system designed for this purpose. The life cycle of an mRNA in a eukaryotic cell. ...
A representation of the 3D structure of myoglobin, showing coloured alpha helices. ...
A micrograph of ongoing gene transcription of ribosomal RNA illustrating the growing primary transcripts. ...
RNA codons. ...
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences. ...
A high degree of similarity to a known messenger RNA or protein product is strong evidence that a region of a target genome is a protein-coding gene. However, to apply this approach systemically requires extensive sequencing of mRNA and protein products. Not only is this expensive, but in complex organisms, only a subset of all genes in the organism's genome are expressed at any given time, meaning that extrinsic evidence for many genes is not readily accessible in any single cell culture. Thus, in order to collect extrinsic evidence for most or all of the genes in a complex organism, many hundreds or thousands of different cell types must be studied, which itself presents further difficulties. For example, some human genes may be expressed only during development as an embryo or fetus, which might be difficult to study for ethical reasons. Despite these difficulties, extensive transcript and protein sequence databases have been generated for human as well as other important model organisms in biology, such as mice and yeast. For example, the RefSeq database contains transcript and protein sequence from many different species, and the Ensembl system comprehensively maps this evidence to human and several other genomes. It is, however, likely that these databases are both incomplete and contain small but significant amounts of erroneous data. Ensembl is a bioinformatics research project aiming to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. It is run in a collaboration between the Wellcome Trust Sanger Institute and the European Bioinformatics Institute, an outstation of the European Molecular Biology Laboratory. ...
Ab Initio Approaches Because of the inherent expense and difficulty in obtaining extrinsic evidence for many genes, it is also necessary to resort to ab initio gene finding, in which genomic DNA sequence alone is systematically searched for certain tell-tale signs of protein-coding genes. These signs can be broadly categorized as either signals, specific sequences that indicate the presence of a gene nearby, or content, statistical properties of protein-coding sequence itself. Ab initio gene finding might be more accurately characterized as gene prediction, since extrinsic evidence is generally required to conclusively establish that a putative gene is functional. In the genomes of prokaryotes, genes have specific and relatively well-understood promoter sequences (signals), such as the Pribnow box and transcription factor binding sites, which are easy to systematically identify. Also, the sequence coding for a protein occurs as one contiguous open reading frame (content), which is typically many hundred or thousands of base pairs long. The statistics of stop codons are such that even finding an open reading frame of this length is a fairly informative sign. (Since 3 of the 64 possible codons in the genetic code are stop codons, one would expect a stop codon approximately every 20-25 codons, or 60-75 base pairs, in a random sequence.) Furthermore, protein-coding DNA has certain periodicities and other statistical properties that are easy to detect in sequence of this length. These characteristics make prokaryotic gene finding relatively straightforward, and well-designed systems are able to achieve high levels of accuracy. Prokaryotes are unicellular (in rare cases, multicellular) organisms without a nucleus. ...
A promoter is a DNA sequence that contains the information, in the form of DNA sequences, that permits the proper activation or repression of the gene which it controls, i. ...
The Pribnow box (also known as the Pribnow-Schaller box) is the sequence TATAAT of six nucleotides (thymine-adenine-thymine-etc. ...
In molecular biology, a transcription factor is a protein that binds DNA at a specific promoter or enhancer region or site, where it regulates transcription. ...
An open reading frame or ORF is any sequence of DNA or RNA that can be translated into a protein. ...
Base pairs, of a DNA molecule. ...
Ab initio gene finding in eukaryotes, especially complex organisms like humans, is considerably more challenging for several reasons. First, the promoter and other regulatory signals in these genomes are more complex and less well-understood than in prokaryotes, making them more difficult to reliably recognize. Two classic examples of signals identified by eukaryotic gene finders are CpG islands and binding sites for a poly(A) tail. Kingdoms Eukaryotes are organisms with complex cells, in which the genetic material is organized into membrane-bound nuclei. ...
CpG islands are regions of DNA near and in the promoter of a mammalian gene where a large concentration of phosphodiester-linked cytosine and guanine pairs exist. ...
Polyadenylation is the covalent linkage of a polyadenylyl moiety to a messenger RNA (mRNA) molecule. ...
Second, splicing mechanisms employed by eukaryotic cells mean that a particular protein-coding sequence in the genome is divided into several parts (exons), separated by non-coding sequences (introns). (Splice sites are themselves another signal that eukaryotic gene finders are often designed to identify.) A typical protein-coding gene in humans might be divided into a dozen exons, each less than two hundred base pairs in length, and some as short as twenty to thirty. It is therefore much more difficult to detect periodicities and other known content properties of protein-coding DNA in eukaryotes. In genetics, splicing is a modification of genetic information prior to translation. ...
The exon portion of a DNA strand encodes a specific portion of a protein. ...
Diagram of the location of introns and exons within a gene. ...
Advanced gene finders for both prokaryotic and eukaryotic genomes typically use complex probabilistic models, such as hidden Markov Models, in order to combine information from a variety of different signal and content measurements. The Glimmer system is a widely used and highly accurate gene finder for prokaryotes. Eukaryotic ab initio gene finders, by comparison, have achieved only limited success; a notable example is the GENSCAN program. State transitions in a hidden Markov model (example) x â hidden states y â observable outputs a â transition probabilities b â output probabilities A hidden Markov model (HMM) is a statistical model where the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine...
GLIMMER stands for Gene Locator and Interpolated Markov ModelER. GLIMMER was the first bioinformatics system for finding genes that used the interpolated Markov model formalism. ...
Other Signals It has been suggested that signals other than those directly detectable in sequences may improve gene prediction. For example, the role of secondary structure in the identification of regulatory motifs has been reported. [1] In addition, it has been suggested that RNA secondary structure prediction helps splice site prediction. [2],[3],[4],[5]
Comparative Genomics Approaches As the entire genomes of many different species are sequenced, a promising direction in current research on gene finding is a comparative genomics approach. This is based on the principle that the forces of natural selection cause genes and other functional elements undergo mutation at a slower rate than the rest of the genome, since mutations in functional elements are more likely to negatively impact the organism than mutations elsewhere. Genes can thus be detected by comparing the genomes of related species to detect this evolutionary pressure for conservation. This approach was first applied to the mouse and human genomes, using programs such as SLAM, SGP and Twinscan. Comparative genomics is the study of relationships between the genomes of different species or strains. ...
Darwins illustrations of beak variation in the finches of the Galápagos Islands, which hold 13 closely related species that differ most markedly in the shape of their beaks. ...
Comparative gene finding can also be used to project high quality annotations from one genome to another. Notable examples include Projector, GeneWise and GeneMapper. Such techniques now play a central role in the annotation of all genomes.
External links References - ^ Hiller M, Pudimat R, Busch A, Backofen R. Using RNA secondary structures to guide sequence motif finding towards single-stranded regions.Nucleic Acids Res. 2006;34(17):e117. Entrez PubMed 16987907
- ^ Patterson DJ, Yasuhara K, Ruzzo WL. Pre-mRNA secondary structure prediction aids splice site prediction. Pac Symp Biocomput. 2002;:223-234. Entrez PubMed 11928478
- ^ Marashi SA, Goodarzi H, Sadeghi M, Eslahchi C, Pezeshk H. Importance of RNA secondary structure information for yeast donor and acceptor splice site predictions by neural networks. Comput Biol Chem. 2006;30(1):50-57.Entrez PubMed 16386465
- ^ Marashi SA, Eslahchi C, Pezeshk H, Sadeghi M. Impact of RNA structure on the prediction of donor and acceptor splice sites.BMC Bioinformatics. 2006;7:297. Entrez PubMed 16772025
- ^ Rogic, S. The role of pre-mRNA secondary structure in gene splicing in Saccharomyces cerevisiae. PhD Dissertation, 2006, University of British Columbia. http://www.cs.ubc.ca/grads/resources/thesis/Nov06/Rogic_Sanja.pdf
Anatomy - Astrobiology - Biochemistry - Bioinformatics - Botany - Cell biology - Ecology - Developmental biology - Evolutionary biology - Genetics - Genomics - Marine biology - Human biology - Microbiology - Molecular biology - Origin of life - Paleontology - Parasitology - Pathology - Physiology - Taxonomy - Zoology The Entrez logo The Entrez Global Query Cross-Database Search System allows access to databases at the National Center for Biotechnology Information (NCBI) website. ...
The Entrez logo The Entrez Global Query Cross-Database Search System allows access to databases at the National Center for Biotechnology Information (NCBI) website. ...
The Entrez logo The Entrez Global Query Cross-Database Search System allows access to databases at the National Center for Biotechnology Information (NCBI) website. ...
The Entrez logo The Entrez Global Query Cross-Database Search System allows access to databases at the National Center for Biotechnology Information (NCBI) website. ...
Genomics is the study of an organisms entire genome. ...
Genome projects are scientific endeavours that aim to map the genome of a living being or of a species (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus), that is, the complete set of genes caried by this living being or virus. ...
// Overview Polyploid (in Greek: ÏολλαÏλÏν - multiple) cells or organisms contain more than two copies (ploidy) of their chromosomes. ...
Glycomics, or glycobiology is a discipline of biology that deals with the structure and function of oligosaccharides (chains of sugars). ...
The Human Genome Project (HGP) is a project to de-code (i. ...
TECAN Genesis 2000 robot preparing Ciphergen SELDI-TOF protein chips for proteomic pattern analysis. ...
Chemogenomics can be defined as a genomic response to chemical compounds. ...
Structural genomics or structural bioinformatics refers to the analysis of macromolecular structure particularly proteins. ...
The terms pharmacogenomics and pharmacogenetics tend to be used interchangeably, and a precise, consensus definition of either remains elusive. ...
Pharmacogenomics is the branch of pharmaceutics which deals with the influence of genetic variation on drug response in patients by correlating gene expression or single-nucleotide polymorphisms with a drugs efficacy or toxicity. ...
Toxicogenomics is a form of analysis by which the activity of a particular toxin or chemical substance on living tissue can be identified based upon a profiling of its known effects on genetic material. ...
Map of the human X chromosome (from the NCBI website). ...
Cheminformatics (also known as chemoinformatics and chemical informatics) is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. ...
Systems biology is the study of the interactions between the components of a biological system, and how these interactions give rise to the function and behaviour of that system (for example, the enzymes and metabolites in a metabolic pathway)[1][2]. Typically, a cellular network is modelled mathematically. ...
This article or section does not adequately cite its references or sources. ...
Anatomical drawing of the human muscles from the Encyclopédie. ...
This article does not adequately cite its references or sources. ...
Biochemistry is the study of the chemical processes and transformations in living organisms. ...
Map of the human X chromosome (from the NCBI website). ...
Pinguicula grandiflora Botany is the scientific study of plantlife. ...
Cell biology (also called cellular biology or formerly cytology, from the Greek kytos, container) is an academic discipline that studies cells. ...
Ernst Haeckel coined the term oekologie in 1866. ...
Views of a Foetus in the Womb, Leonardo da Vinci, ca. ...
Evolutionary biology is a subfield of biology concerned with the origin and descent of species, as well as their change, multiplication, and diversity over time. ...
For a non-technical introduction to the topic, please see Introduction to genetics. ...
Genomics is the study of an organisms entire genome. ...
Marine biology is the scientific study of the plants, animals and other organisms that live in the ocean or any other body of water. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
An agar plate streaked with microorganisms Microbiology is the study of microorganisms, which are unicellular or cell-cluster microscopic organisms. ...
Molecular biology is the study of biology at a molecular level. ...
This article does not adequately cite its references or sources. ...
Paleontology, palaeontology or palæontology is the study of prehistoric life forms on Earth through the examination of plant and animal fossils. ...
Parasitology is the study of parasites, their hosts, and the relationship between them. ...
Pathology (from Greek pathos, feeling, pain, suffering; and logos, study of; see also -ology) is the study of the processes underlying disease and other forms of illness, harmful abnormality, or dysfunction. ...
Leonardo da Vincis Vitruvian Man, an important early achievement in the study of physiology. ...
Taxonomy, sometimes alpha taxonomy, is the science of finding, describing and naming organisms, thus giving rise to taxa. ...
Zoology (rarely spelled zoölogy) is the biological discipline which involves the study of non-human animals. ...
|