|
The Gene Ontology project, or GO, provides a controlled vocabulary to describe gene and gene product attributes in any organism. It can be broadly split into two parts. The first is the ontology itself--actually three ontologies, each representing a key concept in Molecular Biology: the molecular function of gene products; their role in multi-step biological processes; and their localization to cellular components. The ontolog(ies) are continuously updated, and new versions are made available on a monthly basis. Controlled vocabularies are used in indexing schemes, subject headings, thesauri and taxonomies. ...
For other uses, see Gene (disambiguation). ...
A gene product is the biochemical material, either RNA or protein, resulting from expression of a gene. ...
Domains and Kingdoms Nanobes Acytota Cytota Bacteria Neomura Archaea Eukaryota Bikonta Apusozoa Rhizaria Excavata Archaeplastida Rhodophyta Glaucophyta Plantae Heterokontophyta Haptophyta Cryptophyta Alveolata Unikonta Amoebozoa Opisthokonta Choanozoa Fungi Animalia An ericoid mycorrhizal fungus Life on Earth redirects here. ...
In both computer science and information science, an ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. ...
Molecular biology is the study of biology at a molecular level. ...
The second part is annotation, the characterization of gene products using terms from the ontology. The members of the GO Consortium submit their data and it is made publicly available through the GO website. The GO is also part of a larger classification effort, the Open Biomedical Ontologies (OBO). Open Biomedical Ontologies (formerly Open Biological Ontologies) is an effort to create controlled vocabularies for shared use across different biological and medical domains. ...
History
The Gene Ontology was originally constructed in 1998 by a consortium of researchers studying the genome of three model organisms: Drosophila melanogaster (fruit fly), Mus musculus (mouse), and Saccharomyces cerevisiae (brewers' or bakers' yeast). Many other model organism databases have joined the Gene Ontology consortium, contributing both annotations for the genes of one or more organisms and also contributing to the development of the ontologies. As of January 2008, GO contains over 24,500 terms applicable to a wide variety of biological organisms. There is a significant body of literature on the development and use of GO, and it has become a standard tool in the bioinformatics arsenal. In genetics and biochemistry, sequencing means to determine the primary structure (or primary sequence) of an unbranched biopolymer. ...
A model organism is a species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms. ...
Binomial name Meigen, 1830[1] Drosophila melanogaster (from the Greek for black-bellied dew-lover) is a two-winged insect that belongs to the Diptera, the order of the flies. ...
Binomial name Mus musculus Linnaeus, 1758 Mus musculus is the common house mouse. ...
Binomial name Meyen ex E.C. Hansen Saccharomyces cerevisiae is a species of budding yeast. ...
Map of the human X chromosome (from the NCBI website). ...
Gene Ontology terms Each GO term consists of a unique alphanumerical identifier, a common name, synonyms (if applicable), and a definition. When a term has multiple meanings depending on species, the GO uses a "sensu" tag to differentiate among them. Terms are classified into only one of the three ontologies, which are each structured as a directed acyclic graph. A simple directed acyclic graph In computer science and mathematics, a directed acyclic graph, also called a dag or DAG, is a directed graph with no directed cycles; that is, for any vertex v, there is no nonempty directed path that starts and ends on v. ...
New terms and annotations are suggested by members of the research and annotation communities. Once submitted, they are reviewed by members of the GO consortium to determine their applicability. If it is decided that a term in the ontology is not appropriate, it is deprecated, or marked as "obsolete". This can happen for a number of reasons, such as being outside the scope of the ontology or being misleadingly named or defined. The ontology file is freely available from the GO website; the terms can be searched and browsed online using the GO browser AmiGO. The Gene Ontology project also provides mappings of its terms to other classification systems covering the same areas of biology.
Gene Ontology Associations A number of organizations, including model organism databases and large multispecies protein databases, perform analyses of protein sequences and issue tables of associations between putative gene products and GO terms. These are freely available from the GO website and can be downloaded individually or viewed online using AmiGO. In many older genetic sequence databases, annotations bear little or no indication of their provenance so that a user cannot readily ascertain the nature and strength of the evidence behind them, which leads to what is known in the field as the 'transitive annotation problem.' Some gene is characterized by actual wet lab experiments, and its sequence deposited in a major public database with annotation from those experiments. Other sequences that have not been characterized in the lab are annotated based on their sequence similarity to this one, and these other sequences in turn form the basis for yet more annotations, and so forth. Thus a user cannot know how many steps of sequence similarity stand between the annotation for some genetic sequence and any actual wet-lab data. Wet Laboratories are defined as laboratories where chemicals, drugs, or other material or biological matter are tested and analyzed requiring water, direct ventilation, and specialized piped utilities. ...
A GO association has metadata indicating: - Who made the assertion that this GO term applies to the putative product of this protein sequence
- When this assertion was made
- One or more three-letter Evidence code(s) denoting the type of evidence on which this assertion is based.
Any automatic program output that has not been curated by a human being gets the evidence code IEA meaning Inferred from Electronic Annotation. The use of a code other than IEA implies that a human curator has checked this annotation. For instance TAS for Traceable Author Statement means a curator has read a published scientific paper and the metadata for that association bears a citation to that paper. On the other hand, ISS for Inferred from Sequence Similarity means a human curator has reviewed the output from a sequence similarity search and verified that it is biologically meaningful.
See also GOCat - An Automatic GO Categorizer/Browser to help Functional Annotation from Biomedical Texts [1] GoPubMed - Explore PubMed/MEDLINE with Gene Ontology Comparative Toxicogenomics Database - CTD integrates Gene Ontology terms with toxicogenomic and disease data Protein Ontology Project — Provides access to the Protein Ontology (PO) and reference documents describing the PO and its uses. EAGLi - A Terminology-powered (Gene Ontology, Swiss-Prot keywords...) biomedical question answering engine for MEDLINE [2] PAMGO, the Plant-Associated Microbe Gene Ontology DAVID bioinformatics - A free online bioinformatics resources provides functional interpretation of large lists of genes derived from genomic studies GoPubMed is a knowledge-based search engine for biomedical texts. ...
Swiss-Prot is a curated biological database of protein sequences created in 1986 by Amos Bairoch during his PhD and developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute. ...
External links - Gene Ontology Consortium — Provides access to the ontologies, software tools, annotated gene product lists, and reference documents describing the GO and its uses.
- Open Biomedical Ontologies (OBO)
|