Gene and Pathways Interactions Graph User's Guide

Contents

Introduction
Configuring the Pathway and Gene Interaction Display
Data Sources and Methods
References

Introduction

The Pathways and Gene Interactions graph accompanies the Protein Interactions" track and displays a detailed gene interaction and pathway graph based on data collected from two sources: curated pathway/protein-interaction databases and interactions found through text mining of PubMed abstracts.

The curated data was imported from 23 pathway or protein-interaction database (see Methods section below). Curators at these databases typically read research articles, collect protein interactions from them and store them in a web-accessible database. Pathway databases like Reactome or WikiPathways describe a whole set of interactions and the type of effect, e.g. the WNT pathway and sometimes annotate indirect or inferred effects with a physical interaction. They often work from review articles. In contrast, protein interaction databases focus more on the original literature that describes the results of the biochemical protein-interaction experiments and focus less on the effect or direction of the interaction.

The text mining data was generated in collaboration with the Microsoft Research Project Hanover Team using Literome machine-reading. Literome is a natural-language processing (NLP) system that analyzes sentences and tries to extract the proteins and the type of interaction. A simple example is a sentence like, "PTEN negatively regulates AKT3", which gets transformed to "PTEN-AKT3" and "regulation: negative". The text mining system was run on all 20 million PubMed abstracts at the end of 2014 and can also be queried through the website Literome.

Configuring the Pathway and Gene Interaction Display

Clicking on an item in the track display takes you to a page that includes gene interaction graph with detailed information on the directionality and support for the various interactions displayed. The graph is initially centered on the gene clicked in the track display, with this gene also being highlighted in yellow. [image here?] By default, only the top 25 most supported interactions are displayed, but this number can be increased, decreased or filtered using the controls above the image. You can filter the interactions displayed using the drop-down menu to display subsets by their support:

Genes in the interaction graph are connected by a number different types of lines, with each type of line and the line properties themselves indicating different levels of support from text mining and databases. [image here?]

Lines may include arrows showing the directionality of this interaction. In these cases, the directionality is determined by majority support. For example, imagine an interaction between protein A and protein B, two articles support that A acts on B while a single article supports the opposite, B acting on A. In this case, since there are more articles supporting A acting on B, then the arrow will be drawn such that it starts at A and points to B.

From the "Annotate Genes" drop-down, you can annotate genes based on GNF2 average expression, drugability from DrugBank entries, cancer type in the COSMIC Cancer Gene Census, and the number of non-silent mutations identified by the PanCancer analysis project. For the GNF2 expression and PanCancer Mutation coloring, genes will be colored on a sliding scale from light grey to black, with those items with the highest expression or the largest number of non-silent mutations being colored the darkest and those with lower expression or fewer mutations being colored grey. Genes will be colored dark blue if there is no information in the database. [image here?]

You can mouse-over items in the display to display more details about the gene such as their product. [image here?] If you've chosen to annotate genes with one of the various databases, then it will display that information as well. You can mouse-over the connecting lines between genes to see more details about the evidence that supports this connection. [image here?] If you click on the line connecting two proteins, you can see a SumBasic-selected snippet of text from a Pubmed abstract and, if it is a curated interaction, the supporting information from the pathway or interaction databases.

Below the graph of gene interactions and pathways, there is table of less supported interactions. These are interactions which were mentioned only a few times each in the literature. [image here?] The numbers shown on mouse-over for each interaction represents the number of articles and number of databases that support this interaction.

You can export the currently display gene interaction graph in a variety of formats including PDF, SVG, Cytoscape, and JSON.

The gene interaction graph can be recentered around a new gene in a few different ways: (1) clicking a gene in the existing interaction graph, (2) clicking the triangle next to a gene in the table of minor interactions below the graph, (3) searching for a gene name in the search box above the graph.

Data Sources and Methods

Human protein interactions from the following databases were imported:

The quantitative contribution of each database in terms of number of gene-pairs is available here.

PubMed abstracts were downloaded from the National Library of Medicine (NLM) website. The abstracts were then tokenized and parsed syntactically using the SPLAT toolkit. Protein and Gene names were identified and normalized after which potential interactions were extracted using the MSR "Protein and Pathway Extractors". The results were then mapped to the genome using their HGNC gene symbols.

References

Poon H, Quirk C, DeZiel C, Heckerman D. Literome: PubMed-scale genomic knowledge base in the cloud Bioinformatics. 2014 Oct;30(19):2840-2. PMID: 24939151