contact us | site map | view my clipboard |

Burkholderia Genome Database - Overview

Overview of the Burkholderia Genome Database

The Burkholderia Genome database contains many features that facilitate within-genome and multiple-genome comparison of annotations.

In brief, this new database includes the following major new features which are centered around aiding comparative analysis of Burkholderia with other species.

  • Search annotations associated with any Burkholderia genome sequence who's annotations have been made available.
  • View and compare annotations within a Burkholderia species (i.e. gene family) or between species.
  • Perform multiple sequence alignments (DNA and protein) and BLAST-based searches using sequences specified during a comparison of multiple genes.
  • Navigate between different species genomes and view alternate annotations using Gbrowse.
  • Customize Gbrowse further to suit your needs (i.e. select and reorder annotation tracks).
  • Under an individual gene view, you can view a graphic containing subcellular localization information for the protein associated with that gene, and there is an expanded list of links to other resources (more functional category information, links to other genome databases when suitable, etc.).
  • Perform DNA Motif Searches.
  • Search for genes belonging to an orthologous group.
  • View genome summaries of each Burkholderia genome.

For other Burkholderia species information we encourage you to examine the other excellent Burkholderia genome databases available (see our links).  For an individual gene view, we also provide a link to the associated annotation at other genome database sites, when possible. 

We will be expanding the database further over the next year to include more information relevant to cystic fibrosis research and increased database flexibility. We encourage you to contact us with any features you would like to see in this database.




More information and examples:

 

Searching the Database of Genome Annotations

Note: The search engine will return a maximum of 40000 hits. If your search results in more than this number, please refine it by specifying more search terms or reducing the number of genomes and using the 'AND' operator to the left of the drop down boxes that prompt you to select from a list of search fields. You may also retrieve all annotations from our downloads page if you still wish to perform searches involving greater than 40000 hits. If you require assistance with this feature, including retrieving a single large search result, please feel free to contact us.

Simple Keyword Search

Located at the top of the search page, the simple keyword search provides a means to search all genomes in the database by the most commonly used field-groups (gene & protein name, locusID and all fields). By default, simple keyword searches match anything that contains the text entered, however one can specify that an 'exact' text search is performed by checking the "exact" check box next to the text box. Exact searches allow you to find anything that contains only that word or phrase, with no surrounding text (i.e. find protein names that are exactly named as just outer membrane protein, versus many others that are named outer membrane protein X, outer membrane protein Y, etc...").

Advanced Boolean Search

This search feature allows one to specify which genomes and fields will be searched. It allows you to search more than one field at once using an AND or OR or NOT operator. As with the simple keyword search, an "exact" search can be specified by checking the box next to the text box .

Note: We also recommend, as a complement to searching the annotations, that you search, using BLAST, for homologs to a protein of interest, to find additional genes/proteins of interest that may have annotations that don't show up in your annotation search (i.e. if you want all homologs of a particular membrane protein, but the particular membrane protein has been named differently by different Burkholderia Genome Projects). From a BLAST search you can also select genes/proteins of interest for further study, and compare them using the compare feature with other genes/proteins you hold on your clipboard, as described below.

Browse by Function or Subcellular Localization

In addition to the searches listed above, you can browse all genomes by function classification (including Burkholderia Genome project, TIGRFAM, TIGR roles and sub roles, and COG classifications) or subcellular localization.

Viewing Results

Search results can be hierarchialy sorted by two fields to ease interpretation of hits. The default search sorting is performed first by product name and then by species name but other options are also available. The drop-down boxes can be used to select alternative sorting patterns, for example, to view results by ortholgous cluster, the 'COG Value' option can be used.

Searching the Log File of Annotation Updates

The Burkholderia Genome Database contains a Boolean-searchable log of all updates that have been made to the genome annotation. Fields include names of the participants who have made the submission along with structured details and the dates of the submission and locus ID (e.g. PA number). As in the case of simple and advanced searches of the genome annotations above, the log of updates can be browsed and ordered by any of the above parameters or searched using the Boolean search interface.


Comparison of multiple annotations

1) Adding annotations to clipboard

It is now possible to perform multiple comparisons of annotations returned from a keyword search of annotation fields or a BLAST-based search of protein sequence databases. Click the checkboxes (Fig.1: blue arrow) to the left of the locus ID and then click click on the 'Add to clipboard' button (Fig. 1: red arrow) in order to add annotations to your clipboard . The clipboard will automatically appear after you make your first selection.

Figure 1: Screen shot of the Search results page demonstrating how to add annotations to clipboard.

return to top

2) Managing annotations on clipboard

When you are finished selecting annotations, click on the "Compare" button (Fig. 2: blue arrow) to view them all or click on the "Clear all" button to the right of the clipboard (Fig. 2: red arrow) to erase its contents. Individual entries can be removed by clicking on drop (Fig. 2: black arrow).

Note: A maximum of 15 annotations can be added to the clipboard at one time.

Figure 2: Screen shot of the clipboard used to store annotations selected by the user.

return to top

 

3) Comparing and performing analyses on selected annotations

After clicking on the "Compare" button, a display of all the selected annotations and images of surrounding genes appears and text can be downloaded to a tab-delimited file by clicking the "download clipboard annotations to text file" link. One can view details about specific surrounding genes by clicking on the image map (Fig. 3: red arrow) which links to individual gene annotations or view less-detailed gene/product/coordinate info by placing the cursor over a gene on the map (Fig. 3: blue arrow). By clicking the "flip orientation" link (pink arrow) above the graphical image of a gene, the orientation of the genome can be reversed in order to better facilitate comparison of multiple annotations. The left-to-right sorting of the annotations can be changed by selecting primary and secondary fields and sort orders from the drop down boxes at the top of the page (orange arrow) and then clicking the "sort" button.

 

Multiple Alignment using ClustalW

Clicking on the "Align" button (Fig. 3: green arrow) takes all nucleotide or amino acid sequences currently in the sequence row and formats them for multiple alignment using ClustalW (1.83). The slow/accurate alignment setting is used in this analysis with a default CLUSTALW output format. Note: This analysis works best when aligning 12 or fewer sequences. When aligning 13-15 sequences, please expect a significantly longer waiting period, since the number of sequences aligned greatly impacts on the length of time to perform an alignment.


BLAST search against other nucleotide and protein sequences in the database

A BLAST interface can be accessed through a link on the 'Tools' page . This tools allows neucleotide and protein BLAST searches to be poerformed on user-defined sequences. Query sequences can either be pasted into the tool's text box or filled automatically by clicking on a 'BLAST' button. These buttons can be found on a gene's reference page (ex: Bcen_4451) or on an annotation comparison page. Clicking on the "BLAST Search" link (Fig. 3: black arrow) below any nucleotide sequence being compared will direct you to a pre-formatted BLAST search page containing the specific sequence. Once there, you have the option of selecting the genome databases you wish to search and also fine-tune the search parameters. You can also specify your own sequences and genomes to search by going to the BLAST search page.

Other improvements include a new interface for viewing and sorting results returned by BLAST search tool. Output from a protein database search can now be viewed as a list of proteins (with gene name and locus ID) along with the details of each alignment (expect-value, bit score, % identity, etc). In addition, the results can be added to the clipboard utility for further analysis/viewing. Output from nucleotide sequence database search is viewable in a list format with links to GBrowse view of each aligned region.

Figure 3: Comparison of multiple annotations.

return to top

 

 

GBrowse: How to move between different species

Burkholderia GBrowse (based on software developed by Stein et al., 2002) provides an alternate presentation of all the bacterial genomes in the Burkholderia Genome Database. Using checkboxes, the user can select annotation information to view including alternate gene names, protein names, motifs/structures as well as orthologs and knockout data, and perform a search based on criteria they specify. GBrowse then fetches the region of genome specified by the user’s search criteria and presents the specified landmarks to the user in a detailed view containing one or more horizontal tracks representing individual sequence features for that area (Fig. 4). The user is free to zoom in and out according to the level of magnification/resolution desired (Fig. 4: blue arrow). Landmarks on each track usually contain a link to detailed information on additional web sites.

Navigation between species

One can navigate between species via computationally predicted, putative orthologs based on reciprocal best BLAST hits (RBBHs). When viewing the annotation for a specific gene, follow the link to its associated GBrowse entry. Once there, scroll down through the various tracks until you find the tracks containing putative orthologs (if available) in any of the other species in the database (Fig. 4: red arrows). Clicking on any of the glyphs in the track containing putative orthologs will bring you to its main entry in the database.

Figure 4: Navigation between closely related Burkholderia genomes via putative ortholog tracks in GBrowse.

return to top

 

How to Reorder and Display Different GBrowse Tracks

Using the main GBrowse portal, various views and categories of tracks can be turned on or off. For a complete yet slow-to-load view, the 'All tracks' option can be selected. Alternatively, any of theother options can be selected to load a tailored view more quickly. For full control over what is displayed, the customization panel can be accessed by selecting the 'Custom' option or by scrolling down to it on a GBrowse page.

The GBrowse page lists several different options that allow you to specify what tracks to view, their order, data source, etc. In order to select or remove a track, scroll to the bottom of the page until you see the list of available tracks (Fig. 5: blue arrow). Those currently being viewed will contain a check mark in the box next to the name of the track. After selecting or removing tracks, click on the "Update Image" button (Fig. 5: green arrow) and the updated image of all the tracks will appear. One can also specify the order by which tracks appear by clicking on the "Configure tracks..." button (Fig. 5: pink arrow).

Figure 5: GBrowse options panel for configuring display of annotation tracks.


 

Stacked View of Orthologs and Surrounding Genes

An ortholog view available from each gene card allows researchers to view all orthologs of a given gene (with surrounding genes also viewable for context) in a stacked view (Fig. 6), making it easier to compare local gene order changes around a given gene. The top image (Fig. 6: red arrow) represents the reference sequence or gene card that you just linked from. The ortholog view contains all putative orthologs, with links to their gene cards containing more detailed information (Fig. 6: blue arrows) plus additional precise assessment of orthology based on Ortholuge (Fig 6: green arrows). You can also click on the image links (Fig. 6: pink arrows) to recenter the stacked ortholog view on the gene you clicked upon.

Figure 6: Stacked overview of putative orthologs showing genomic context.

return to top


DNA Motif Search

The DNA Motif Search Tool can be accessed through a link on the Tools page. This search tool accepts an IUPAC-formatted variable length DNA sequence and converts it into a regular expression used to search Burkholderia genome sequences. Upon search completion, an online report or downloadable tab-delimited file is produced containing information on all regions the motif is found in.

For example, entering:

AAGS{3,8}TTN{3,20}TTGAC

will return a results page will all matching sequences that begin with AAG, followed by a C or a G appearing 3 to 8 times, then by TT, then by any nucleotide 3 to 20 times, and finally ending with TTGAC.

return to top


Viewing Orthologous Groups

Researchers can perform searches based on orthologous groups. A link from the gene card will take you to a search summary page that lists all of the genes that belong to the same ortholgous group. The annotations for genes in the same orthologous group can be downloaded and viewed the same way a regular gene search would allow.

These orthologous groups allow the generation of more inclusive datasets of Burkholderia genes mapped to their putative orthologs in other Burkholderia strains. These groups are generated starting with pair-wise BLASTp analyses run on all genomes in the database to find reciprocal best BLAST hits (RBBH) for each gene. These analyses often produced multiple candidate genes for RBBH status; the results were narrowed down by comparing the genes flanking the query with the genes flanking the hit. If the two candidate genes were directly adjacent, then both were accepted as RBBHs incoving putative in-parology. Intra-genome BLASTp analyses were also performed to find in-parologs.

The ortholog groups were built by starting with a seed gene and then adding all genes to which it had an RBB or in-paralog relationship. Every new gene added was then treated as a seed gene and the addition process was repeated until all qualifying genes were added.



return to top


Viewing Genome Summary Pages

Researchers can view summaries of each different strain as well as access tools, browse annotations, and download sequences directly from these pages. On the overview, the summary is also broken down by individual chromosomes and plasmids; gene features as well as protein localizations for each chromsome and plasmid can be viewed.

If you have any questions about the database or annotation, please feel free to contact us.


return to top