Advancing Protein-Ligand Co-Folding through Physics-Informed Machine Learning
Research Project
|
28.10.2023
- 31.05.2025
Current machine learning approaches for protein-ligand complexes often depend on limited and potentially biased data. Recently, it has been demonstrated that integrating physical aspects, such as hydration thermodynamics, can result in more robust models while requiring only a small portion of the available protein-ligand complex data. This strategy has been validated for rigid docking, but recent advancements in generative modeling have paved the way for extending this approach to more versatile forms, accommodating flexibility and blind docking. We aim to develop a robust protein-ligand cofolding framework by integrating physics-based elements with cutting-edge techniques in generative modeling and reinforcement learning, such as diffusion and GFlowNets. The main objectives are: (1) creating a cascading resolution model that can generate ensembles of protein-ligand complex structures by modeling their probability distributions. This hybrid data and physics-driven model leverages informative priors and physics-based projection; (2) enhancing robustness against skewed data by incorporating physical principles, particularly hydration data and thermodynamics, which can be predicted efficiently using machine learning; (3) exploring the generative potential of GFlowNets and their explorative nature that fosters the sampling of unvisited states. The integration of accurate physics into generative modeling is expected to open numerous opportunities in structure-based drug design.
Funding
Advancing Protein-Ligand Co-Folding through Physics-Informed Machine Learning