Recent publications

Improved cardiovascular risk prediction using targeted plasma proteomics in primary prevention

In the present study, we compared a protein-based risk model with a model using traditional risk factors in predicting CV events in the primary prevention setting

Graph Space Embedding

We propose the Graph Space Embedding (GSE), a technique that maps the input into a space where interactions are implicitly encoded, with little computations required. We provide theoretical results on an optimal regime for the GSE, namely a feasibility region for its parameters, and demonstrate the experimental relevance of our findings.

Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography

Using fecal 16S ribosomal RNA gene sequencing in 2,084 participants of the Healthy Life in an Urban Setting (HELIUS) study, we show that individuals living in the same city tend to share similar gut microbiota characteristics with others of their ethnic background.

Domain Intelligible Models

Here, we describe algorithms that incorporate auxiliary information in terms of groups of predictors and the relationships between them into the metagenome learning task to build intelligible models.

Manifold Mixing for Stacked Regularization

In this paper, we propose a novel algorithm that takes multiple data sources, constructs corresponding manifolds, and “mixes” information across them to find the common denominators in the observable outcomes.

Improvement of Insulin Sensitivity after Lean Donor Feces in Metabolic Syndrome Is Driven by Baseline Intestinal Microbiota Composition

The beneficial effects of lean donor FMT on glucose metabolism are associated with changes in intestinal microbiota and plasma metabolites and can be predicted based on baseline fecal microbiota composition.

Unsupervised Multi-View Feature Selection for Tumor Subtype Identification

In this paper Unsupervised Multi-View Feature Selection algorithm is used to simultaneously extract a relevant subset of features and to perform clustering that is consistent across different views.

Impact drugs targeting cardiometabolic risk on the gut microbiota

Improving the understanding of the gut microbiome drug interaction can provide clinical directions for therapy by optimizing drug efficacy or providing new targets for drug development.

Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing

Here, we compared six bioinformatic pipelines for the analysis of amplicon sequence data: three OTU-level flows (QIIME-uclust, MOTHUR, and USEARCH-UPARSE) and three ASV-level (DADA2, Qiime2-Deblur, and USEARCH-UNOISE3).

Effect of Vegan Fecal Microbiota Transplantation on Carnitine‐ and Choline‐Derived Trimethylamine‐N‐Oxide Production and Vascular Inflammation in Patients With Metabolic Syndrome

Single lean vegan‐donor fecal microbiota transplantation in metabolic syndrome patients resulted in detectable changes in intestinal microbiota composition but failed to elicit changes in TMAO production capacity or parameters related to vascular inflammation.

Differential DNA methylation in familial hypercholesterolemia

Multiple linear regression analyses were used to explore DNA methylation differences between the two groups in genes related to lipid metabolism. A gradient boosting machine learning model was applied to investigate accumulated genome-wide differences between these groups.

Discovery of Salivary Gland Tumors’ Biomarkers via Co-Regularized Sparse-Group Lasso

In this study, we discovered a panel of discriminative microRNAs in salivary gland tumors by application of statistical machine learning methods.

Protein Space Embedding Kernel for Plaque Volume Prediction

In this work we aim to address prediction of plaque volume and ischaemia associated with CAD using state-of-the-art statistical machine learning models and targeted proteomics data.

Intestinal Fungal Dysbiosis Is Associated With Visceral Hypersensitivity in Patients With Irritable Bowel Syndrome and Rats

In an analysis of patients with IBS and controls, we associated fungal dysbiosis with IBS.The intestinal fungi might therefore be manipulated for treatment of IBS-related visceral hypersensitivity.

Microbiome: Does disease start in the mouth, the gut or both?

The gut microbiome is one of the largest microbiomes and it is generally regarded as a friend: it helps train the immune system, keeps dangerous colonizers away, and produces small molecules that nurture the cells that line the colon.

Feature Selection via Co-regularized Sparse-Group Lasso

We propose the co-regularized sparse-group lasso algorithm: a technique that allows the incorporation of auxiliary information into the learning task in terms of “groups” and “distances” among the predictors.


Interpretable Models via Pairwise Permutations Algorithm. Large scale high dimensional datasets are frequently analysed by non interpretable, black box models. Afterwards, explainability can be obtained by applying a model agnostic approach to identify the importance of each feature. A popular technique to estimate these importances is the permutation method. However, a notable drawback is its bias towards correlated features, while the truly relevant ones are left in the dark. We propose the novel Pairwise Permutations Algorithm (PPA) with the aim of reducing the correlation bias in feature importance values. We also provide a theoretical foundation that builds upon previous work on permutation importance.

    Joint work with Diogo Bastos and Manon Balvers. Download the code and an example dataset.

Graph Space Embedding: Bridging Efficiency with Model Transparency. We introduce the Graph Space Embedding (GSE) kernel, a technique that maps the input into a "random walk-based" space where interactions are implicitly encoded, with little computations required. Our model incorporates contextual information about the features and, unlike standard black-box models, is also interpretable.

Domain Intelligible Model. We adapt sparse Generalized Additive Model (sparse GAM), to be applicable to the task of variable selection in high dimensional, microbiome (-omics) dataset. GAMs are less general in comparison to "fully" nonparametric models, but have a notable advantage of being readily interpretable and easier to estimate using a simple backfitting algorithm. Recently, standard additive models have been successfully applied in the biomedical domain, and they can be naturally extended to include various interactions among predictors.

Co-regularized sparse-group lasso. We introduce the co-regularized sparse-group lasso algorithm: a technique that allows the incorporation of auxiliary information into the learning task in terms of groups of predictors and the relationship between those groups. The proposed cost function requires related groups of predictors to provide similar contributions to the final response, and thus, guides the feature selection process using auxiliary information. Our algorithm is particularly suitable for a wide range of biological applications where good predictive performance is required and, in addition to that, it is also important to retrieve all relevant predictors so as to deepen the understanding of the underlying biological process.

Unsupervised multi-view feature selection via co-regularization. Existing unsupervised feature selection algorithms are designed to extract the most relevant subset of features that can facilitate clustering and interpretation of the obtained results. However, these techniques are not applicable in many real-world scenarios where one has an access to datasets consisting of multiple views/representations (e.g. various omics profiles, medical text records coupled with FMRI images, etc). Proposed method can leverage information from these different views and produce more robust and accurate results in comparison to traditional methods.

KeCo: kernel-based online co-agreement algorithm. This online algorithm uses a co-agreement strategy to take into account unlabelled data and to improve classification performance. Unlike the standard online methods it is naturally applicable to many real-world situations where data is available in multiple representations. In addition, our online algorithm allows learning non-linear relations in the data via kernel functions.

Personalized microbial network inference via co-regularized spectral clustering. Based on the results of co-regularized spectral clustering this code visualizes two groups of individuals with different topology of their microbial interaction network. The results of microbial network inference suggest that niche-wise interactions are different in these two groups. The network visualization is implemented in Python and in Matlab.

Online co-regularized algorithm. The proposed algorithm is particularly applicable to learning tasks where large amounts of (unlabeled) data are available for training. The algorithm co-regularizes prediction functions on unlabeled data points and leads to improved performance in comparison to several baseline methods on UCI benchmarks and a real world natural language processing datasets.

Probabilistic preference learner/ranker - ProbRank. The algorithm can learn a ranking function based on pairwise comparison data, that is, data about the ranking function values is provided in terms of pairwise comparisons at the given locations. This is accomplished in two ways: a) Approximating the marginal likelihood using expectation propagation and carrying out maximum likelihood procedure on the hyper-parameters. In this case the square exponential covariance function is used. b) Considering ranking as a regression with Gaussian noise and Gaussian processes prior, given the score differences.

Joint work with Botond Cseke. Download Matlab implementation of the probabilistic preference learning models described in "Kernel principal component ranking: Robust ranking on noisy data".

E-MaLeS 1.0 (3rd place in the FOF division of the CADE ATP System Competition). E-MaLeS 1.0 is an automated theorem prover which is based on E prover. E-MaleS 1.0 uses E with different strategies than the standard auto mode. Furthermore it employs strategy splitting, e.g. it runs several strategies. Note that this version is very CASC focused.

Joint work with Daniel Kuehlwein. Download the code.

Multi-output ranker for automated reasoning. Joint work with Daniel Kuehlwein. Download the code and the data used in the experiments of the paper "Multi-Output Ranking for Automated Reasoning".