Help: Batch Extract Interface and Results
Contents
The batch extract interface for SOURCE allows retrieval of a subset of the data available in SOURCE for many entries at once. This function will be useful to users who are interested in large sets of genes or clones (such as those present on DNA microarrays).
You must input a list of identifiers for which you wish to extract data. These identifiers can be GenBank Accessions, dbEST cloneIDs, UniGene ClusterIDs, UniGene gene names, or UniGene gene symbols. To upload your data you can select a file on your local computer using the "Input File" field or "Browse..." button. This file must only contain one column and each line must contain a separate identifier. Alternatively, you can type your list of identifiers into the text field, separated by the "Return" or "Enter" key.
After submitting the identifiers, you must chose which type of identifier you are using and also from which species they stem. The species designation will be ignored in the case of CloneIDs, GenBank Accessions, or UniGene ClusterIDs, since for these the organism of orgin is intrinsic to the identifier. CloneIDs solely consisting of digits (i.e. not containing a prefix identifying the source of the clone such as 'IMAGE:' or 'ATCC:') that are not found in the SOURCE database will be assumed to be IMAGE clones and also searched for with the 'IMAGE:' prefix.
You can choose to extract one or more of the types of data described below.
- UniGene Cluster ID
This is the unique identifier for a cluster as defined by unigene. They are of the form 'Hs.#' for H. sapiens, 'Mm.#' for M. musculus, and 'Rn.#' for R. norvegicus.
- UniGene Name
This is the official UniGene name for a given cluster.
- UniGene Symbol
This is the official UniGene symbol for a given cluster.
- Gene Aliases
This field is a double pipe-delimited list (i.e. entries are separated by "||") of synonyms for the given cluster. The list of synonyms is compiled from a number of sources, including LocusLink, SwissProt, OMIM, and MGD.
- LocusLink ID
This is the LocusLink identifier that is associated with a given
cluster. If there are multiple IDs associated with the same cluster
these are all listed, separated by a semicolon.
- Enzymatic Function
This is the "Enzymatic Function" field from SwissProt which provides information on the enzymatic activity of the protein product of a given gene. Please be aware of and adhere to the SwissProt copyright statement that is found on the batch extract page.
- Subcellular Location
This is the "Subcellular Location" field from SwissProt which provides information regarding the localization of the protein product of a given gene. Please be aware of and adhere to the SwissProt copy
right statement that is found on the batch extract page.
- Chromosome Location
This is the number of the chromosome on which a gene resides as
curated by UniGene.
- Cytoband
This is the cytoband at which a gene is located as curated by UniGene.
- Gene Ontology and Other Annotations
These fields hold annotations provided by groups such as LocusLink and
Proteome and controlled vocabularies such as Gene Ontology (GO). For controlled vocabularies, the fields contain:
- the type of ontology
- the term
- a coded representation of the evidence for this annotation
- the source of the annotation
The meaning of the evidence codes can
be found here.
The separate sections of a given annotation are separated by single
pipes (i.e. the | character) and different annotations are separated
by two backslashes (i.e. //).
- Representative mRNA Accession
This is the GenBank accession number for the mRNA sequence that is the "best"
representative of a given UniGene cluster.
- Representative Uniprot Accession
This is the Unirpot accession number for the mRNA sequence that is the "best"
representative of a given UniGene cluster. This information is
obtained from the EBI.
The Uniprot accession is useful to map human genes to GO Terms and can
be used in the GO::TermFinder
at Princeton. This is a tool for finding significant GO terms shared among a list of genes from your organism, helping you discover what these genes may have in common.
Since SOURCE is currently using UniGene as the central database to
which all other databases are linked, a gene of interest must be in
UniGene in order for data to be available for it. If an identifier is
not found in UniGene, the batch extract script will return a warning
stating that it was not found. Similarly, if a cloneID or accession
number maps to multiple UniGene clusters, the script will not extract
data but rather return a warning stating that the identifier does not
map to a single UniGene cluster. In order to suppress these warnings (i.e. to leave those
identifiers out of the final results file) you can check the
appropriate box in the "Error Conditions" section of the form. Note
that you can also choose to see
the cluster IDs for chimeric clusters by selecting the "Show all
Cluster IDs if in multiple Clusters" choice. If you concurrently
chose to suppress the entries that map to multiple UniGene clusters, the suppression takes
presedence.
Once your request has been processed you will see a link to the results file. In order to save it to your computer, please right-click on the word "results" and chose the "Save target as" (Internet Explorer) or "Save link as" (Netscape) option.
External Sites
- UniGene
- dbEST
- SwissProt
- LocusLink
- GeneCards
Single Entry SOURCE Search
|
Batch SOURCE Search
Please send comments or questions to: array@princeton.edu