Daniel Clarke, Icahn School of Medicine at Mount Sinai, New York

Title: Integrating Bioinformatics Tools for Knowledge Exploration Workflows

To form novel hypotheses biomedical researchers rely more and more on information stored in public databases and bioinformatics tools that can query these databases. We have integrated several widely used bioinformatics tools and databases developed by the Ma’ayan Laboratory into a directed multi-graph with nodes representing fundamental data objects, for example, gene sets, signatures, disease or drug terms, and edges representing the transformations performed by various tools (for example, enrichment analysis, principal component analysis, or a PubMed search). We then use this graph to direct and facilitate user-driven exploration of the landscape of available knowledge stemming from an initial query or from a given dataset. As a case study, we use this system to investigate the role of under-studied protein kinases in diabetic nephropathy.

Dexter Pratt, UC San Diego School of Medicine, San Diego

Title: Investigation of Proteomic Datasets using Biological Network Analysis Tools in the Cytoscape Ecosystem

Analyses based on molecular interaction networks and pathway mechanism models have a long history of use in the investigation of high throughput gene expression and proteomic data. The widely used Cytoscape desktop application (cytoscape.org) is one of preeminent tools in this field and is evolving into an ecosystem of both desktop, cloud services and web applications. NDEx, the Network Data Exchange (ndexbio.org), is a central element of the Cytoscape cloud that is a resource for network content and a framework for storing, sharing, publishing and computing with networks. This presentation will review Cytoscape tools relevant to the analysis of proteomic datasets including NDEx, Cytoscape desktop apps, and new web applications. In the following workshop, participants will apply tools described in the presentation to the investigation of an example data set.

Igor Jurisica, University Health Network, Toronto, Ontario

Title: Data-driven (precision) medicine: from data to models to insights and treatments

To fathom complex disease development processes, we need to systematically
integrate diverse types of information and link them using relevant
annotations and relationships, leading to meaningful modeling. This ranges
from multiple high-throughput datasets, functional annotations and
orthology data to expert knowledge about biochemical reactions and
biological pathways. Such integrative systems are used to develop new
hypotheses and answer complex questions such as what type of system
perturbation may result in a desired change in cellular function; what
factors cause disease; will patients respond to a given treatment, etc.

Precision medicine needs to be data-driven and corresponding analyses
comprehensive and systematic. We will not find new treatments if only
testing known targets and studying characterized pathways. Thousands of
potentially important proteins remain pathway or interactome “orphans”.
Computational biology methods can help fill this gap with accurate
predictions, but the biological validation and further experiments are
essential. Intertwining computational prediction and modeling with
biological experiments will lead to more useful findings faster and more

These computational predictions improved human interactome coverage
relevant to both basic and translational research, and importantly, helped
us to identify, validate and characterize prognostic signatures. Combined,
these results may lead to unraveling mechanism of action for therapeutics,
re-positioning existing drugs for novel use and prioritizing multiple
candidates based on predicted toxicity, identifying groups of patients
that may benefit from treatment and those where a given drug would be

Application of graph theory, data mining, machine learning and advanced
visualization enables data-driven, precision medicine. Intertwining
computational prediction and modeling with biological experiments will
lead to more useful findings faster and more economically.

João Carlos Setubal, Chemistry Institute, University of São Paulo, SP

Title: A transcriptome-based signature of pathological angiogenesis predicts breast cancer patient survival

Compostagem termofílica é uma rica fonte de enzimas relacionadas com degradação de biomassa. No projeto metazoo estudamos a compostagem realizada no Parque Zoológico de São Paulo utilizando sequenciamento de nova geração. Com base no sequenciamento do DNA total (shotgun) de dezenas de amostras, montamos um catálogo com mais de 10 milhões de sequências codificadoras de proteínas.  Esse catálogo foi mineirado em busca de enzimas termoestáveis com bom potencial tecnológico. Através de uma metodologia de aprendizado de máquina chegamos a um subconjunto de 231 enzimas candidatas promissoras. Para quatro dessas candidatas realizamos ensaios experimentais confirmando termoestabilidade e atividade enzimática.

Lydia Y. Liu, University of Toronto

Title: Proteomics Data Integration in Cancer: The Value of Multimodality

“Increasingly translational cancer studies will quantify many types of molecular information in specific model systems or patient samples. These most frequently include the germline and somatic mutation profiles (including point mutations, copy number aberrations and genomic rearrangements), the transcriptome, immune infiltrates, the epigenome and the evolutionary timing of variants at each of these levels. Proteomic data analyses thus need to leverage that data to better understand information flow in cancer cells, develop robust biomarkers and understand the molecular origins of complex phenotypes. We will discuss the broad data-analytic strategies for these large datasets, and the challenges with integrating proteomic data with other datatypes. This will include practical examples of recent work performing such integrative analyses in primary cancer cohorts, and the value of statistical, machine-learning, information-theoretic and network strategies. Overall, we show that data integration across multiple levels of the central dogma improves our understanding of cancer phenotypes. Indeed biomarkers comprising multiple classes of biomolecules systematically outperform those that include only one: despite being an analytic challenge, multi-modality is a key opportunity for the future development of oncoproteomics.”

Mariana Boroni, National Cancer Institute, RJ

Title: Identifying new therapeutic strategies for Colorectal cancer in the Big Data Era.

O câncer colorretal (CCR) é um dos carcinomas de maior incidência e mortalidade no mundo e tem como fatores de risco o baixo consumo de vegetais e alto consumo de carne vermelha e/ou processada, sobrepeso e sedentarismo. O CCR é uma doença altamente heterogênea, apresentando quatro subtipos moleculares, com diferenças na localização anatômica, no microambiente tumoral e nas vias moleculares alteradas. Essa grande heterogeneidade afeta significativamente a resposta a diferentes tratamentos e o prognóstico dos pacientes. Neste sentido, novas estratégias terapêuticas devem ser desenvolvidas considerando-se as alterações moleculares mais relevantes nos subtipos moleculares tumorais. As análises para identificação de alvos terapêuticos se baseiam no conceito de “druggable genome”, isto é, identificação de genes que codificam famílias protéicas específicas que interagem com fármacos, e que estão diretamente relacionados com o estabelecimento da doença. Com base nestas informações, o objetivo do nosso estudo é sugerir o reposicionamento de drogas atualmente utilizadas para o tratamento de outros tipos tumorais, assim como a identificação de novos alvos terapêuticos a partir da análise dos padrões de expressão gênica e do perfil de interação proteína-proteína nos subtipos moleculares de CCR.

Nina Hirata, Institute of Mathematics and Statistics, University of São Paulo, SP

Title: Machine Learning and Computational Thinking

Machine learning techniques are tools often used to automate certain
types of data processing needed for data analysis. They are
particularly useful to analyze multidimensional data of complex nature
or large amounts of data. In this talk we will start relating
computational algorithms to machine learning and discussing how
computational thinking is essential for the effective use of
computational tools, including machine learning techniques. We will
then introduce basic concepts and methods of machine learning.
At the end, the discussed concepts and methods will be explored
through practical hands-on application examples.