UNIverse - Public Research Portal

Bioinformatics (Schwede)

Projects & Collaborations

34 found
Show per page
Project cover

Evolutionary-scale interpretation of protein functions in the human gut microbiome

Research Project  | 3 Project Members

Thanks to over a decade of metagenomics efforts, we have a catalogue of over 170 million unique putative proteins from the human gut microbiome. About 40% of these do not have function assigned (dark proteins), hence limiting our understanding of the well-established relationship between the gut microbiome and human health. Current homology-based methods used for functional assignment have reached their limits because of the availability of reference data. But we are now in a new Era of computational biology, where deep-learning-based approaches allow us to predict protein structures (e.g. AlphaFold2), functions (e.g. deepFRI), and molecular mechanisms at extremely high levels of detail. This opens the door to further model protein interactions and remote evolutionary relationships (e.g. the Protein Universe Atlas) at unprecedented scales. Following up on these developments, our aim is to construct an integrative view of putative molecular functions and biological roles of proteins from the human gut microbiome by combining deep learning-driven, structure-based function prediction with a large-scale view of the protein universe.

Project cover

Open Research Data (ORD) best practices for computational macromolecular models. (Short title: ModelArchive)

Research Project  | 7 Project Members

Open Research Data (ORD) best practices for computational macromolecular models Biological macromolecules such as proteins, DNA or RNA are essential for almost all biological processes. To gain insights into their function, life science research relies on accurate information on their 3D structure. Typically, such structures are determined experimentally at atomic resolution via X-ray crystallography, NMR, and increasingly single particle cryo EM techniques. In recent years, computational methods for structure prediction have made impressive progress, achieving near-experimental accuracy in predicting 3D structures of proteins. This breakthrough has large implications for structure-based approaches in different research fields, including life sciences, biomedical research, ecology, protein engineering, biotechnology and green chemistry. Not surprisingly, the journal Nature has nominated protein structure prediction as "Method of the Year 2021". Since the creation of the Protein Data Bank in 1971, the structural biology community pioneered open research data principles. The PDB (https://www.wwpdb.org) is the single global archive of 3D structures of biological macromolecules determined by experimental techniques, but not for structures obtained through computational modelling. As a consequence, computational models are often stored in undefined locations in a variety of incompatible formats, and lack essential metadata indicating their usability (e.g. model quality estimates or licence information). Following a recommendation given in an international community workshop, we have developed an archive for computed macromolecular structures (https://modelarchive.org) and an extension of the mmCIF data format to store metadata. With the technical infrastructure of ModelArchive now established, we are in a good position to further develop respective ORD practices in our community. This includes promotion of best practices for data and metadata interoperability standards, collaborating with scientific journals and funding agencies on establishing deposition policies, improving reusability of protein models by promoting accuracy estimates, and interlinking with other ORD resources to make models easily findable and accessible.

Project cover

The Protein Universe Atlas

Research Project  | 3 Project Members

The term "protein universe" refers to the collection of all possible proteins that can be constructed from the small alphabet of 22 proteinogenic amino acids1,2. In this representation, functionally characterised proteins correspond to stars, protein families to galaxies, and protein superfamilies to clusters of galaxies, surrounded by all those sequences which are evolutionary related but not hitherto functionally characterised or sampled by nature. In this project, we will develop a new web service to navigate through the landscape of this universe that is currently covered by all catalogued natural proteins - the "Protein Universe Atlas". We will apply deep learning protein language models (pLMs) and abstract protein structure representations to model this landscape in three dimensions (3D), providing users with an interactive and integrative platform that will facilitate the annotation, biocuration and further study of a protein, a set of proteins, or all proteins catalogued so far.

Project cover

LIGATE - Ligand Generator and portable drug discovery platform AT Exascale

Research Project  | 6 Project Members

LIGATE is an EU funded project that aims to integrate and co-design best in class European components on Computer-Aided Drug Design (CADD) solutions exploiting today high-end supercomputer and tomorrow Exascale resources. The implementation of machine learning, extreme scale computer simulations, and big data analytics in the drug design and development process offer an excellent opportunity to lower the risk of investment and reduce the time to patient. The availability of powerful computing resources, new numerical models for simulations, and artificial intelligence increase the accuracy and predictability of CADD, reducing the costs and time for the design and the production of novel drugs.