Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Wissensmanagement in der Bioinformatik

Instructions

What is Ali Baba?

[Installation] - [Submitting queries] - [Graphs resulting from queries] - [Graph-import from KEGG] - [Appending new queries to existing graphs] - [Navigating through the graph] - [Maximum path length] - [Editing graphs] - [Storing results] - [Filtering preferences] - [Removing unsure associations] - [Text view] - [PubMed Related Articles] - [Feedback modus] - [Expert users: Access a local database] - [Known bugs]

Getting started: Installation

Ali Baba is a stand-alone application that can be started from a web-browser. That means, it will not be displayed as a page in your browser, but launch in a window of its own instead. Using this technique (as compared to offering you an executable file to run) ensures that you will always start the latest version of Ali Baba, without having to worry about installations etc. (other than the initial setup of Java Web Start, see below). A number of interesting Bioinformatics tools are available as Java Web Start applications nowadays, such as the multiple alignment editor Jalview and the comparative genomics tool GOTaxExplorer. Once you installed Java Web Start on your machine, you can easily run all of those.

Simply click on the "Start Ali Baba" button on the start page. You will see a splash screen showing the Ali Baba logo first. Your browser then downloads a file called "alibaba.jnlp", which contains all libraries necessary to launch the application. Java Web Start automatically retrieves all these libraries and launches Ali Baba afterwards. You first will be asked to accept a certificate, which is required to start the application.

Windows: After you click on the "Start Ali Baba" button, a window might pop-up asking you what to do with the file called "alibaba.jnlp". One option should be "Open with: Java(TM) Web Start Launcher", which you should choose. If this option is not available, see Requirements below. You might want to check the box "Do this automatically"; from then on, all Java Webstart applications (files downloaded from the internet ending with .jnlp) will automatically get launched without presenting this little window again. After clicking "OK", you will get another screen with a security warning. Please accept the certificate/digital signature (press "Run"). To skip this message in the future (for starting Ali Baba only), check the box "Always trust content from this publisher". A few files will be downloaded now, and Ali Baba opens in its own Window.

System requirements: Java 1.5 or higher, including Java Web Start (javaws). Java Web Start might need administrator privileges to install on your local computer. See the Java Web Start home page for further instructions.
In case your browser does not already recognize JNLP files, you can add this file type and assign a default application to it. Some browsers will ask you what to do with the file "alibaba.jnlp" (or all other files ending with .jnlp). In this case, select "Open with: Java(TM) Web Start Launcher" from the list. If there is no such item, but you are sure that Java/Java Web Start is installed on your machine, you can select "Other..." from the list and try to locate the Web Start executable (see common directories below).
Most browsers also allow to set default applications for file types (for instance, Java Web Start for JNLP-files) in the Preferences-menu. Please consult your web browsers manual on how to do this. In general, there should be a preferences dialog called "Helper applications", "Download actions", or something similar. Here, add a file type with the extension "jnlp" (MIME-type: "application/x-java-jnlp-file", description: "Java Network Launched Application"). Assign a helper application, Java Web Start.
Common Java/Java Web Start directories: On Linux systems, JavaWS often is installed at "/usr/local/Java-2.1_5/jre/javaws/javaws" or similar (the Java-2.1_5 could be something like jdk1.5 or jre1.6). On Windows, this might be something like "C:\Programs\Java\jre\javaws\javaws", sometimes "jre1.6\bin\javaws".

Users behind a proxy: in cases where your computer is located behind a proxy (large companies or academic institutions), you might need to change the proxy settings for the Java Web Start environment. Please see here for details on altering the settings and ask your system administrator.

Problems? In case you still are experiencing problems after following these instructions, don't give up, send us an e-mail!



Submitting queries

You can submit any query to Ali Baba by putting search terms into the upper input field. This search terms must be in PubMed query syntax. By default, all terms separated by white spaces are required to appear in the resulting abstracts. An "OR" between two terms requires only one of these to appear in the text. You can narrow down the terms to categories like author, title, or journal. To search only the author fields of MEDLINE, this would look like "lastname[au]". Please consult Searching PubMed for proper instructions on the query language.

You can limit the number of abstracts to retrieve with the input field "Max. results". By default, this is set to 20 abstracts. Limiting the number of abstracts results in higher performance of the retrieval process and smaller graphs. Higher numbers result in slower retrieval and larger graphs. Ali Baba currently needs approximately one second to parse two abstracts. There is an overall limit of 1000 abstracts.

Which abstracts discuss the protein with the UniProt ID P17302?

Ali Baba also allows you to search for UniProt IDs. Ali Baba will then search for PubMed abstracts that mention (all) the protein(s) in your query. This helps you to find abstracts discussing proteins that have many synonyms. You do not need to query PubMed for all synonyms to retrieve all relevant abstracts (and write a complex query), but just search for the UniProt ID. In case you enter multiple UniProt IDs, Ali Baba will search abstracts that mention all these proteins. To launch such a query, please enter the UniProt IDs, separated by white spaces, into the query field. Currently, no additional keywords are allowed for this type of query.
Note that this query is not always species-specific. Ali Baba will assign multiple UniProt IDs to proteins for which it could not resolve the exact species properly. Queries thus will retrieve abstracts that discuss the protein(s) you searched for in any species, although your UniProt ID(s) depict a protein in a particular organism.

Automatic query expansion

Ali Baba helps you to search for proteins. When querying for a single UniProt ID or a list of UniProt IDs that starts with '#AUTOEXPAND ', Ali Baba will expand all UniProt IDs to a list of synonyms known for each protein.
For instance, with a query
     P17302
you will actually search PubMed for
     Connexin-43 OR Cx43 OR GJA1 OR GJAL .
The query
     #AUTOEXPAND Q13158 Q14790
will be send to PubMed as
     ("Protein FADD" OR FADD OR MORT1) AND ("Caspase-8 precursor" OR "EC 3.4.22.61" OR CASP-8 OR MACH OR "FADD-like ICE" OR FLICE OR "Apoptotic protease Mch-5" OR CAP4 OR CASP8 OR MCH5) .
Note that for two or more UniProt IDs without the '#AUTOEXPAND', the auto-expansion will not be active.



Graphs resulting from queries

After Ali Baba has successfully retrieved and parsed all abstracts, the resulting graph will be shown. Each node represents an entity (protein, cell, drug, etc.), with different colors for each entity class:

  • cells,
  • compounds,
  • diseases,
  • drugs,
  • enzymes,
  • nutrients,
  • proteins,
  • reactions,
  • species, and
  • tissues.

A node in grey indicates that Ali Baba has found a biomedical entity, but is not sure about the actual type (in most cases, it will be either a protein or a drug, though).

An edge between two entities represents an association between these two. This might either be a protein-protein interaction extracted via a pattern matching technique, or a co-occurrence of both entities in the same sentence. Arrows indicate whether the association is directed or not (for instance, active and passive parts of the association, agent-target dependency). The gray value of each edge corresponds with the confidence score assigned to the association, where black edges represent sure associations, and light gray edges unsure ones.
You can switch to an alternate graph with a radial layout view by selecting [View|Layout|Radial] from the menu.
Note that information on nutrients is only available in a customized version of Ali Baba.



Graph-import from KEGG

Ali Baba also offers to import pathways from the KEGG database. This way, you can, for instance, compare KEGG graphs to recent findings extracted from the literature. There are two ways to import KEGG pathways.

Import KEGG pathways directly from the KEGG server: Ali Baba offers to retrieve KEGG pathways directly from the KEGG server (ftp.genome.jp). You will first want to retrieve recent KEGG data (names of reactions, enzymes, compounds) using 'File|Update KEGG Synonyms' in Ali Baba's menu bar. This step is necessary only once (or after major updates of KEGG) and takes 1-2min. A progress bar will inform you about the current status of the download. After you have updated the list, you can use 'File|Open KEGG file'. You will see a list of all current pathways and can pick the one you want to have displayed in Ali Baba. After clicking 'OK', loading the graph takes up to 20sec, dependent on its size. You can search for a specific pathway by typing a keyword into the 'Search' field. Additionally, you can restrict the list to a particular species. Activate the checkbox 'Specify organism'. Click on the 'Help' button, choose a species from the list, and click on 'Select'. An abbreviation (for instance, 'hsa' for 'human') will appear in the corresponding field.

Load from existing Ali Baba-XML files: For a quick look into combining Ali Baba with KEGG, we provide some examples for KEGG pathways in Ali Baba's XML format. After you downloaded these files to your computer, you can load them as described in the section on storing.

Load from local database: see section on expert users.



Appending new queries to existing graphs

Whenever you submit a new query to Ali Baba, the existing graph is removed. In case you want to retain old results, please check the box "Append queries" from the "Preferences" menu. This will combine the results of all queries into a single graph. This method works for combining multiple PubMed queries, KEGG pathways with PubMed queries, and multiple KEGG pathways.



Navigating through the graph

The graph represents a simple overview over entities and relations contained in the literature that was retrieved as a result from your query. To get more information, you can use the graph by means of access to this underlying literature. At first, the left hand window contains the graph, and the right hand window ("Information" panel) contains a list of all entities found in the texts, sorted by their respective categories, and displayed as a tree view. A double-click on an entity class will open the tree further and list all entities belonging to this class. You can access all facts described in the literature using the graph and/or the tree view.

Clicking on a node in the graph (or an entry in the tree view) will result in the information for the corresponding entity being displayed in the right hand window. First, this consists of all synonyms for the selected entity, as encountered in the abstracts. Each synonym is a link to a data base that provides more information on this entry. For proteins, an entry page from UniProt will open in your browser, cells, drugs, and diseases link to the MeSH tree, and so on. Clicking on "(show all)" will open a page containing information on all entries, not only one single synonym, in the browser.

Underneath the list of names and synonyms, you can find all sentences that contain the selected entity, with this and all other entities being highlighted in the corresponding colors (called "Textual evidence&quot). Ali Baba links each sentence to the source abstract from PubMed, simply click on the PubMed ID, "PMID", to open the corresponding abstract in your web browser.

To show all associations for the selected entity, expand the tree further by double-clicking the entry in the upper right hand panel. This will first show all entity classes for which an associtated partner could be found. Each class expands to a list of all instances. Selecting an instance will result in information on the association being displayed. First, all names and synonyms of the agent and target are shown, again as links to the respective data bases. The type of the relation and the confidence score is shown as well. Below, you will find all sentences that are proof for the selected association.

Screenshot of AliBaba

The above image shows a graph as it results from our sample query. The encircled entity "caspase-8" was selected and detailed information on this object is shown in the upper right tree view. One of its interaction partners is the protein "cFlip". The lower part shows more information on this association including its type and confidence score.

You can alter the view (zoom factor, parts to show) using either the zoom buttons in the lower left hand corner (zoom in, zoom out, fit graph into window) or the mouse. Alter the zoom factor with the mouse wheel. Clicking on a white space in the graph and moving the mouse moves around the graph.

You will note that the layout of the graph is constantly rearranged, especially in the beginning after you send a query. To toggle this constant rearrangment of, use the play/pause button in the bottom panel, F9, or 'Animate layout' from the 'Preferences' menu.

The slider for the 'Edge length' (bottom panel) influences the length of the edges between associated nodes. If you want nodes to be further away, move the slider to the right. This provides a clearer overview over nodes and edges, especially for large and dense graphs. Move to a particular part of the graph you are interested in, zoom into this area, and enlarge the edge lengths.



Maximum path length

When browsing through a graph, you might be interested in single entities and their immediate association partners only. Ali Baba contains a Path length view that reduces the graph to a selected node and showing only such nodes that are less than a specified number of hops away. Switch to this mode by selecting Menu:View|Layout|Path length. Below the graph view (in the Zoom bar), there will appear a selector for the maximum path length. After clicking on a node, Ali Baba will only show such nodes that are close enough to the selected one. Selecting another node will then do the same for the new node: some nodes get removed, some others will appear. This way, you can click along a path that interests you, and see only a small part of the graph; this small part contains only nodes close to the path (the current node) you selected.



Editing graphs

Once you have retrieved a graph, you can edit it yourself by adding or deleting nodes and edges. Access this functionality using the right mouse button in the graph view. Clicking on the white space with the right mouse buttons offers the option to add a single node. Clicking on an existing node offers to edit this node (that is, change it's label or type), delete it, or add a new edge starting from this node. You can also add a new edge by clicking on the prospective source node while holding the Ctrl-key, and then on the target node, still holding the Ctrl-key. Clicking on an existing edge enables you to edit or delete this edge. When you decide to add a new node or edit an existing one, a small window will occur that allows you to set the following options. 'Node name' refers to the label as shown in the graph; you can choose an 'Entity type' among protein, drug, etc.; 'Coordinates' lets you place the node in the window -- if you do not change this setting, a new node will appear on the position you clicked. Inserting or editing an edge allows to specify the source and target nodes, as well as a direction (for example, from source to target or bidirectional). You can set/alter the label for the edge that will be displayed in the graph.



Storing results

You can store your current results using [File|Save as] for later acces on your local hard disk. [File|Open] opens such stored result files. The format of all result files is XML. Note that not the actual positions of nodes and edges are stored, but rather the information to reconstruct the graph: entities, associations, names, types, entity classes, and source texts. The style sheet that defines colors etc. is not stored locally; thus, for later access, your computer has to be online, or all nodes will be black.



Filtering preferences

In the [Preferences] menu, you find an entry called [Filter preferences]. Using this dialog, you can toggle the types of entities (proteins, cells, etc.) you see in the graph; simply add or remove the check mark in front of the wanted/unwanted entity types.
Min. degree filter enables a slider on the bottom panel. This slider allows you to subsequently reduce the graph by removing nodes that have no neighbors (unconnected nodes), only one neighbor, two, and so on. Move the slider to the left to see all nodes (even unconnected ones) and to the right to see only the nodes with the highest connectivity.
Aggregate cells leads to protein-cell associations being visualized as bubbles (cells) containing their associated partners (proteins). A bubble called "lysosome" thus might contain the two proteins "FADD" and "Fad", instead of having three interconnected nodes.

Removing unsure associations

Ali Baba assigns a confidence score to every single association between to objects. This score states how confident Ali Baba is that both objects actually relate to each other. The score is a value between zero and one, with one meaning a 100% sure relation. Various aspects influence the respective score for pairs of objects. The more often the objects are discussed together, the higher the score gets. The different techniques we use for extracting the relation (pattern matching or simple co-occurrence of the two objects) also yield certainty measures: they reflect how well a sentence fits a pattern, or how far from each other objects are mentioned in the text.



Text view

Despite navigating through the graph to obtain information, you can also start with the abstracts. In the information panel on the right side, select [Texts] on the top. You will see a list of all publications (their titles) in PubMed order (recentness, descending). Clicking on a title will show the full abstract including all annotations in the lower right panel.



PubMed Related Articles

When you are viewing a full abstract in the lower right panel --you can get there by either clicking on [more] shown at the end of each evidence text, or by selecting [Texts] on top of the upper right panel--, you will find a link at the bottom of each abstract saying [Show PubMed Related Articles as new graph]. Clicking on this link initiates a new query to PubMed that retrieves articles related to the current one. For this, Ali Baba uses PubMed's own 'Related Articles' functionality. All related articles will then be shown as a new graph, relacing the old one. Note that at the moment, we restrict the list of related articles to the first 20, as returned by PubMed.



Feedback modus

The recognition of entities is not always correct, as you might notice. Sometimes, Ali Baba predicts a false entity type for a given word; sometimes, Ali Baba misses to mark a word accordingly. In case you find such an error, we would appreciate your help. Switch to the Feedback modus by checking the box in the lower right corner (right below the text box). Now, if you mark a word or multiple words, a dialog will pop up where you can suggest and submit your correction. Mark a work by double clicking on it, or by moving the mouse with pressed left button, just like highlighting text for copy and paste. All suggestions will be collected and included in our software from time to time. Thanks a lot for your help!



Expert users: Access a local database

Set up a local database

Ali Baba can also access graphs using a local relational database. This means, you can store (and later retrieve) graphs not only in the XML format described previously. We describe the approprirate database scheme here. Note that this is suited for a MySQL database, which is free for academic usage. Ali Baba needs a small configuration file that contains the access parameters to you local database (user name, password, database name); please download this file and alter it accordingly.
Once you have retrieved a graph via a PubMed query or imported a graph from KEGG, you can store the graph (for instance, after editing it) via [File|Store as..] and after pasting a description of the current graph. You can access this and other graphs using [File|Retrieve..].

Files necessary for installation of the Ali Baba local database:


Files necessary for installation of the database interface to KEGG:

Store and retrieve KEGG data into/from the local database

Load from your local copy of KEGG in a relational database: For the local copy, you need a MySQL database with KEGG data. We describe database scheme here; you can use this file directly as SQL source statements. Note that this is suited for a MySQL database, which is free for academic usage. Ali Baba needs a small configuration file that contains the access parameters to you local database (user name, password, database name); please download this file and alter it accordingly. A summary of these files can also be found in the database section.
KEGG offers an easy access to its data via FTP, please see here (academic users only). If you experience problems importing KEGG data into the described scheme, please contact us. You can use the local relational database not only to retrieve, but also to store (for instance, edited) KEGG graphs; you can also store Ali Baba graphs in the database, please see the database section.



Known bugs and problems

Linux:
  • Installation: On some Linux systems, there are multiple copies of an executable called 'javaws' installed. If you select the wrong one during installation, Ali Baba will not start up.

Objects/Texts panel:
  • Sometimes, immediately after a query returned results, the [Objects] panel (upper right corner, where you find the list of proteins, drugs, etc. found) also contains titles of abstracts. Clicking on the [Texts] panel and switching back to [Objects] solves this problem.
MacOS:
  • On some MacOS systems, if you want to use external links in Ali Baba (to PubMed, UniProt, etc.), a web browser has to be running in the background already. On other systems, clicking a link will automatically launch your default web browser.