Faculty of Science
Faculty of Science
UNIverse - Public Research Portal

Biomedical Data Analysis

Projects & Collaborations

16 found
Show per page
Project cover

Video for Scientific Outreach of the Research Network Responsible Digital Society

Research Networks of the University of Basel  | 8 Project Members

The research network "Responsible Digital Society" is involved in a variety of ways to strengthen the promotion of interdisciplinary exchange and cooperative research in the field of digital transformation.

In the area of research, the network creates forums for regular scientific exchange and supports the coordination of interdisciplinary research proposals. In the area of promoting young researchers, the network organizes summer and winter schools for them. In the area of networking, the network promotes regular exchanges with industrial partners in the region. In the area of outreach, the network strengthens the public dialogue by organizing colloquia and panel discussions on digitization with guests from various disciplines.

Project cover

Exploiting the CXCR4-CD44 axis for cancer treatment

Research Project  | 1 Project Members

In the tumor microenvironment (TME), cancerous cells, together with stromal and immune cells are organized within a specific extracellular matrix (ECM) that provides cues orchestrating cell behaviour. Therapies which harness our inherent ability to destroy tumors, such as antibodies that activate cytotoxic lymphocytes (CTL) to re-engage tumor cell killing (immune checkpoint therapy, ICT), are successful in treating some but not all cancer patients. Emerging evidence suggests a key role of the ECM in regulating the crosstalk between cancer and immune cells. Therefore, a more complete understanding of how cells and the ECM in the TME interact will be important to design new and complementary approaches to treat cancer. We propose a signaling axis comprising ECM and CXCR4, involving tenascin-C (TNC), hyaluronan (HA), CXCL12 and CD44, to modulate tumor immunity thereby compromising ICT. We aim at understanding how CXCR4 signaling complexes form and how this impacts CTL behaviour by using proteomics, in vivo targeting and computational modelling. As CXCR4 plays an important role in CTL reactivation, more insight about its regulation in space and time may provide novel information with therapeutic potential for improving ICT.

Project cover

Entropy and Synchrony Markers for Modeling Cognitive Decline in Patients with Parkinsons Disease

Research Project  | 2 Project Members

Parkinson's disease dementia (PDD) is a complication in the course of Parkinson's disease (PD). The pathophysiological process, however, is not completely understood, and it is of high practical importance to develop new methods for detecting the cognitive decline in PD in a very early state. Recent studies have shown that quantitative EEG (QEEG) measurements are among the most promising methods to predict and monitor cognitive decline. While QEEG is not affected by repetitive examination artifacts, limitations include that the conventional analysis by power spectra doesn't reflect sufficiently the complexity of the underlying neurophysiological process. Therefore, we aim to establish an analytical AI-based tool operating on entropy and synchrony measures to capture more of the complex mechanisms underlying cognitive decline in some patients with PD.

Project cover

weObserve: Integrating Citizen Observers and High Throughput Sensing Devices for Big Data Collection, Integration, and Analysis

Research Project  | 6 Project Members

Even though hypothesis driven research is the fundamental core of scientific advance, scientific progress by monitoring and subsequently analysing is crucial for certain phenomena of the real world. Specialized high throughput sensor devices can be used in controlled lab environments that provide very large data collections. These collections can be analysed to come up with new findings by verifying or falsifying concrete hypotheses. However, the majority of scientific domains is more complex and cannot rely on such rather simple data gathering and processing pipelines: first, the phenomena to be monitored in the real world are complex in their spatial and temporal dynamic, and confounding factors are typically of multi-causal origin. Thus, the phenomena can-not be isolated nor can natural environments be rebuilt in controlled lab environments, which was one of the lessons learned from the Biosphere II programme. Monitoring in the field is essential, but it can hardly be done with conventional sensor technology only at landscape scale, since there are always technical trade-offs between spatial resolution, coverage, temporal resolution and in-terpretability. Furthermore, even if resolution and coverage of sensors is satisfactory, it is not obvious where and when to deploy these high precision/throughput sensors to capture relevant phenomena. In the weObserve project, we will rely on citizen observers to provide semantically rich information directly from the field to complement and enrich existing sensor data. In addition, monitoring data from citizen observers will be used to anticipate where interesting phenomena are supposed to take place, to use societal knowledge and judgement on relevance of phenomena and to deploy high resolution sensing devices in these areas. From a technical point of view, weObserve will address the collection, integration, and processing of heterogeneous data in applica-tions which cannot rely on off-the-shelf sensing devices for moni-monitoring purposes. Heterogeneity includes the volume of data provided via different channels, the precision, the coverage in time and location, and also the predictability. Data collection will there-fore seamlessly combine several types of data sources: (i) high throughput sensing devices which produce very large volumes of data and cover large areas, but with rather low resolution, (ii) Citizen Observers which provide semantically rich data, but with varying levels of precision and substantial sampling bias, and (iii) specific high resolution sensing devices that need to be manually deployed. Data integration will deal with such heterogeneous data. For subsequent analysis, the origin (provenance) and uncer-tainty of individual data items needs to be kept in an integrated data set. Data analysis will detect hidden patterns and the main explanatory factors in data collections. Integration of domain knowledge into the analysis process will be essential for detecting sampling biases and confound-ing factors. Specific emphasis needs to be put on analysis models that can deal with multiple data queues varying in size, reliability, representation and resolution. A further important aspect con-cerns visualization of detected patterns as a means for improving communication between those using the data for scientific purposes and those collecting it. WeObserve will design, implement, integrate, and evaluate the individual parts of the data collection/integration/analysis pipeline in two selected applications, namely i.) monitoring of soil degradation and landslides, and ii.) monitoring of bird migration, with complementary requirements and different ways to gather data.

Project cover

Causality and Physics-informed Machine Learning

Research Project  | 4 Project Members

Describing and discovering causal relationships is a concept at the crossroads of philosophy, mathematics and statistics. It has recently gained popularity in the machine learning community. While statistical dependence tells us when two variables tend to change simultaneously, causal models aim at making statements about a direction behind this simultaneousness. Models of causal relationships thus allow questions like ("Does smoking increase the probability of developing an illness?", "What happens to a phenotype if a gene is knocked out?" or "What happens to a system if one of its variables is set to a given value?") to be tackled. A number of approaches to modelling causality have been proposed, ranging from ones based purely on observational data (i.e. passive observation of a system) through mixed data to interventional (experimental) data, where an experiment controlling for a particular variable has been performed. Possible application areas include, among others, genetic and biomedical data, social network analysis or financial data. We focus on the approach where the relationship between observational and interventional distributions is a measure of causality and is quantified with information theoretic tools. Building on that, we propose methods of time series modelling, causal graph recovery and causal segmentation. We apply the approach to genetic and EEG data.

Project cover

Computer aided Methods for Diagnosis and Early Risk Assessment for Parkinson`s Disease Dementia

Research Project  | 4 Project Members

Neurodegenerative disorders begin insidiously in midlife and are relentlessly progressive. Currently, there exists no established curative or protective treatment, and they constitute a major and increasing health problem and, in consequence, an economic burden in aging populations globally. Parkinson's disease (PD), following Alzheimer's disease (AD), is the second most common neurodegenerative disorder worldwide, estimated to occur in approximately 1% of population above 60 and at least in 3% in individuals above 80 years of age. In Switzerland, about 15'000 persons are diagnosed with PD. In addition to motor signs, which due to recent medical progress can be treated satisfactorily in most cases, non-motor symptoms and signs severely affect the well-being of patients. They include mood disorders, psychosis, cognitive decline, disorders of circadian rhythms, as well as vegetative and cardiovascular dysregulation. Neurodegeneration in PD progresses for years before clinical diagnosis is possible, at which time e.g. 80% of dopaminergic neurons in the Substantia nigra are lost already. Therefore, any clinical targeting disease modification, prognosis and personalized treatment including guiding the indication for deep brain stimulation (DBS) requires reliable and valid biomarkers. The main goal of this research project is the identification of a pertinent set of genetic and neurophysiological markers for diagnosis and early risk assessment of PD-dementia. Our approach has a distinct interdisciplinary basis, in that it fosters close collaborations between physicians, neuroscientists, psychiatrists, psychologists, computer scientists and statisticians. Based on current research findings we postulate that a combination of (1) quantitative electroencephalographic measures (QEEG, e.g. frequency power and connectivity patterns and network analysis), (2) genetic biomarkers (e.g. MAPT, COMT, GBA, APOE) and (3) neuropsychological assessment improves early recognition and monitoring of cognitive decline in PD. To test this hypothesis, this project proposes an interdisciplinary long-term study of patients diagnosed with PD without signs of dementia, among them a subgroup of patients undergoing DBS. The workup of the proposed study includes collection of clinical, neuropsychological, neurophysiological and genotyping data at the baseline, as well as at 3, 4 and 5 years follow-ups. Sophisticated statistical models that can deal with noisy measurements, missing values and heterogeneous data types will be used to extract the best combination of biomarkers and neuropsychological variables for diagnosis and prediction of prognosis of PD-dementia. Besides this clinical perspective, this project further aims at deciphering the unknown disease mechanisms in PD both on a genetic and neurophysiological level, with particular emphasis of the interplay of genetic markers and temporal changes in the functional connectivity of the brain over time.

Project cover

Bayesian Neighbourhood Estimation

Research Project  | 4 Project Members

In this project, we take a Bayesian perspective of estimating the neighbourhood of a set of p query variables in an undirected network of dependencies. Gaussian Graphical Models (GGM) are a tool for repre- senting such relationships in an interpretable way. In a classical GGM setting, the sparsity pattern of the inverse covariance matrix W encodes conditional independence between variables of the graph. Consequently, various estimators have been proposed that reduce the number of parameters by imposing sparsity constraints on W, e.g. the graphical lasso procedure and its Bayesian extensions. We consider a sub-network corresponding to the neighbourhood of a set of query variables, where the set of potential neighbours is big. We aim at developing an efficient inference scheme such that the estimation of the sub-network is possible without inferring the entire network. In real world situations it is often the case that we have to estimate a full network but interpret only part of it. An example of such a situation is modelling the dependence between clinical variables and a potentially large set of genetic explanatory variables. Here, we would be more interested in establishing the links between these portions, rather than examining the links within the portions themselves. The proposed idea averts prohibitive computations on the whole network and makes it possible to estimate only the parts of interest. An additional challenge is the ability to handle missing values and heterogenous data, i.e. continuous and discrete random variables at the same time. We plan to achieve this by a copula extension.

Project cover

Copula Distributions in Machine Learning: Models, Inference and Applications

Research Project  | 1 Project Members

In the last years, copula models have become popular tools for modeling multivariate data. The underlying idea is to separate the "pure" dependency between random variables from the influence of the marginals. The main focus of research, however, was on parametric (and most often bivariate) copulas in econometrics applications, and only recently "truly" multivariate copula constructions have been considered. Finding principled ways of building these constructions, however, is commonly considered as a hard problem. The machine learning field was largely unaffected by these developments, despite the fact that inferring the dependency structure in high-dimensional data is one of the most fundamental problems in machine learning. On the other hand, machine learners have developed a rich repertoire of methods for structure learning, and exactly these methods have the potential to make copula constructions useful in real-world settings with noisy and partially missing observations. It is, thus, not surprising that there is a constantly increasing number of machine learning publications which aim at using structure learning methods for copula-based inference. In general, however, the use of copulas in machine learning has been restricted to density estimation problems based either on Gaussian copulas, or on aggregating standard bivariate pair copulas, while other directions such as clustering, multi-view learning, compression and dynamical models have not been explored in this context. In this proposal we will try to close this gap by focusing on clustering, on the connection to information theory (which will also include a connection to dynamical systems), and on finding new ways for using non-parametric pair copulas. From an application point of view, the study of such models is interesting, because inferring hidden structure is presumably one of the most successful applications of machine learning methods in domains generating massive and noise-affected data volumes, such as molecular biology. The precise separation of "dependency" and "marginals" in copula models bears the potential to overcome limitations of current techniques, be it too restrictive distributional assumptions or model selection problems. Our proposal is divided into four work-packages which address the following questions: (i) How can copulas be linked to information theory, and what consequences will this link have on modeling dynamical systems? On the application side, these questions are motivated by studying time-resolved gene expression data and node-ranking problems in gene networks. (ii) How can copulas be used for deriving flexible cluster models that can model arbitrary marginals and are robust to noise, missing values and outliers? The motivation comes from a mixed continuous/discrete dataset containing multi-channel EEG recordings and clinical measurements. (iii) How can we use copulas for simultaneously learning network structures and detecting "key modules" in these networks given external relevance information ? Our interest in this question comes from analyzing gene expression data and the aim to detect subnetworks based on clinical variables. (iv) How can we use empirical pair copulas in network learning? The motivation stems from our experience that the problem of selecting "suitable" pair copulas has no obvious solution in practice. In the proposed project, we plan to answer these questions, thereby pushing the state-of-the-art in machine learning problems involving copula distributions.