Semantic representation of neurochemical molecules — An unsupervised approach to predict drug effectiveness
This project was carried out as part of the TechLabs “Digital Shaper Program” in Düsseldorf (Winter Term 2021/22).
Abstract
The blood-brain barrier (BBB) is one of the key protective elements in our brain. It separates the central nervous system (CNS) from the circulatory system and protects the brain against intrusive chemicals or foreign particles including some therapeutic agents. One of the reasons for the relative low success rate of neuropharmaceuticals is due to the BBB blocking the drug’s entry into the brain, resulting in insufficient CNS exposure. Thus, most of the drugs fail to reach the market. Traditional experimental approaches to evaluate the Blood-Brain Barrier (BBB) permeability of a drug are expensive and time consuming. Therefore, we aimed to develop the estimate propensities of compounds to penetrate the BBB. By means of mol2vec, an unsupervised machine learning approach to learn vector representations of molecular substructures, we derived a vector representation for each of the drugs present in the blood-brain barrier penetration (BBBP) dataset. We calculated cosine similarities, to measure how close each drug is to all other drugs. Moreover, by drawing their molecular smiles, we also observed if similar drugs are in fact similarly connected or not. For any ineffective neurochemical drug (unable to cross the blood-brain barrier) we can use our vector representation to predict most similar drugs that are, instead, effective. The problem can also be extended to non-neurochemical drugs.
Introduction
The blood-brain barrier (BBB) is one of the key protective elements in our brain. It separates the central nervous system (CNS) from the circulatory system and protects the brain against intrusive chemicals or foreign particles including some therapeutic agents.
One of the most demanding areas in global pharmaceutical market is neuropharmaceuticals. The success rate of neuropharmaceuticals is very less compared to that of other therapeutic areas. One of the reasons for the relative low success rate is due to the BBB blocking the drug’s entry into the brain, resulting in insufficient CNS exposure. Thus, most of the drugs fail to reach the market. The major challenge in the field of CNS pharmacokinetics and pharmacodynamics is permeability criteria of BBB.
Problems and goals
We aimed to clustering compounds for blood-brain barrier penetration.
Methods
By using mol2vec, an unsupervised machine learning approach to learn vector representations of molecular substructures, we derived a vector representation for each of the drugs present in the blood-brain barrier penetration (BBBP) dataset.
Datasets and clustering tools used for this projects:
The blood-brain barrier penetration (BBBP) dataset contains:
- “name” — Name of the compound
-”smiles” — SMILES representation of the molecular structure
-“p_np” — Binary labels for penetration/non-penetration
Experimental steps:
Results
We calculated cosine similarities, to measure how close each drug is to all other drugs. Moreover, by drawing their molecular smiles, we also observed if similar drugs are in fact similarly connected or not.
Outlook and conclusion
For any ineffective neurochemical drug (unable to cross the blood-brain barrier) we can use our vector representation to predict most similar drugs that are, instead, effective. The problem can also be extended to non-neurochemical drugs which able to penetrate the BBB, predict most similar drugs which would not have effects on the brain.
GitHub repository: https://github.com/ppcodelearn/healthai
The Team:
Sumanta Barman: Artifical Intelligence
Philipus Putra: Artifical Intelligence
Dora Petrella: Mentor