Heterogeneous Graph Learning

Knowledge graphs are visualization of information with multi-type relations (edges) among some multi-type entities (nodes) within an environment of interest. AThe heterogeneous aspect comes from the graph having two or more types of nodes and two or more types of edges.

Usage

In recent studies on Alzheimer's drug repurposing (Reference 1) Graph Neural Networks (GNN) were used to identify biological interactions from prior knowledge databases containing information on effective treatments and genes associated with high risk. The nodes of the graph included drugs, genes, pathways and gene ontology (GO) connected by interactions including drug-target interaction, drug-drug structural similarity, gene-gene interaction, gene-pathway association, gene-GO association and drug-GO association. A comprehensive graph can be created combining existing information from multiple sources on the biological interactions of complex drug-gene relationships, and from there we can use machine learning to train a model and address the incomplete knowledge in the graph. Note that in this paper the genes are encoded/embedded, simply meaning they are represented as numerical vectors/matrices to capture their function. To give a broad overview of the study's workflow:

A knowledge graph is built to describe the interaction between drugs, genes, gene ontology and pathways
Node ~~emeddings~~embeddings are derived using a multi-relational Variational Graph Auto-Encoder (VGAE)
Machine learning model ranks drug-~~canidates~~candidates based on multi-level evidence
Drug combinations are searched for complementary exposure patterns, using previously ranked drug ~~canidates~~candidates
Validate drug combinations with "oxidative stress responses"

By keeping the focus previously approved drugs, the graph can identify synergy between medications that treat complex diseases. The "Complementary Exposure Pattern" of analysis used suggests drug combinations are effective when the target of the drugs hit the disease module without overlap.

While this study breaks ground in applying knowledge graphs with multi-task learning to fragmented multi-modal data, it is subject to the limitations of each dataset it combines. The knowledge graph alone extracts data from:

Universal protein-protein interactions from STRING
Interaction between genes, drugs, GO, and pathways from Comparative ~~Toxicogenomics~~Toxicogenomic Database (CTD)
Drug-Drug associations based on structural similarities using scores from the RDkit package
Gene IDs from National Center for Biotechnology information (NCBI)
High confidence Alzheimer's associated genes from Agora's nominated gene list

If any of the data-driven drug efficacy is biased, then so is the model. Some of the datasets contained in vitro studies (lab based ~~experiments,~~experiments usually in a test tube), which does not guarantee identical treatment outcomes. However, given the lack of clinical evidence from a huge list of ~~compounds~~compounds, this research could still provide insight on drug combinations in future research opportunities.

References

[1] Synthesize heterogeneous biological knowledge via representation learning for Alzheimer’s disease drug repurposing

[2] Pytorch-Geometric Docs