My research is at the interface of AI, machine learning, and bioinformatics primarily with applications in cardio-metabolic domain. I am an associate professor at AmsterdamUMC and founder of HORAIZON. Together with my team I work on developing novel learning models and explainable algorithms for big biomedical data (microbiome, proteome, epigenome, etc).


Targeted proteomics improves cardiovascular risk prediction in secondary prevention

Depicting the composition of gut microbiota in a population with varied ethnic origins

Covered Information Disentanglement: Model Transparency

Introducing the Continuous Glucose Data Analysis (CGDA) R Package

Improvement of Insulin Sensitivity after Lean Donor Feces in Metabolic Syndrome

Peripheral blood DNA methylation biomarkers accurately predict response

Interpretable Models via Pairwise Permutations Algorithm

Improved cardiovascular risk prediction using targeted plasma proteomics

Graph Space Embedding: International Joint Conference on Artificial Intelligence

Microbiome: Does disease start in the mouth, the gut or both?

Effects of fecal microbiota transplant on DNA methylation

Impact drugs targeting cardiometabolic risk on the gut microbiota

Differential DNA methylation in familial hypercholesterolemia

Domain intelligible models: looking for microbial biomarkers

Interpretable Models via Pairwise Permutations Algorithm.

Large scale high dimensional datasets are frequently analysed by non interpretable, black box models. Afterwards, explainability can be obtained by applying a model agnostic approach to identify the importance of each feature. A popular technique to estimate these importances is the permutation method. However, a notable drawback is its bias towards correlated features, while the truly relevant ones are left in the dark. We propose the novel Pairwise Permutations Algorithm (PPA) with the aim of reducing the correlation bias in feature importance values. We also provide a theoretical foundation that builds upon previous work on permutation importance.

Joint work with Diogo Bastos and Manon Balvers. Download the code and an example dataset.

Graph Space Embedding: Bridging Efficiency with Model Transparency.

We introduce the Graph Space Embedding (GSE) kernel, a technique that maps the input into a "random walk-based" space where interactions are implicitly encoded, with little computations required. Our model incorporates contextual information about the features and, unlike standard black-box models, is also interpretable.

Joint work with João Belo. Download the code and an example dataset.

Domain Intelligible Model.

We adapt sparse Generalized Additive Model (sparse GAM), to be applicable to the task of variable selection in high dimensional, microbiome (-omics) dataset. GAMs are less general in comparison to "fully" nonparametric models, but have a notable advantage of being readily interpretable and easier to estimate using a simple backfitting algorithm. Recently, standard additive models have been successfully applied in the biomedical domain, and they can be naturally extended to include various interactions among predictors.

Joint work with Sultan Imangaliyev. Download the code and an example dataset.

Co-regularized sparse-group lasso.

We introduce the co-regularized sparse-group lasso algorithm: a technique that allows the incorporation of auxiliary information into the learning task in terms of groups of predictors and the relationship between those groups. The proposed cost function requires related groups of predictors to provide similar contributions to the final response, and thus, guides the feature selection process using auxiliary information. Our algorithm is particularly suitable for a wide range of biological applications where good predictive performance is required and, in addition to that, it is also important to retrieve all relevant predictors so as to deepen the understanding of the underlying biological process.

Joint work with Paula L. Amaral Santos. Download the code and an example dataset.

Unsupervised multi-view feature selection via co-regularization.

Existing unsupervised feature selection algorithms are designed to extract the most relevant subset of features that can facilitate clustering and interpretation of the obtained results. However, these techniques are not applicable in many real-world scenarios where one has an access to datasets consisting of multiple views/representations (e.g. various omics profiles, medical text records coupled with FMRI images, etc). Proposed method can leverage information from these different views and produce more robust and accurate results in comparison to traditional methods.

Joint work with Sultan Imangaliyev. Download the code and an example dataset.

KeCo: kernel-based online co-agreement algorithm.

This online algorithm uses a co-agreement strategy to take into account unlabelled data and to improve classification performance. Unlike the standard online methods it is naturally applicable to many real-world situations where data is available in multiple representations. In addition, our online algorithm allows learning non-linear relations in the data via kernel functions.

Joint work with Laurens van de Wiel. Download the code and an example dataset.

Personalized microbial network inference via co-regularized spectral clustering.

Based on the results of co-regularized spectral clustering this code visualizes two groups of individuals with different topology of their microbial interaction network. The results of microbial network inference suggest that niche-wise interactions are different in these two groups. The network visualization is implemented in Python and in Matlab.

Joint work with Sultan Imangaliyev. Download the code and an example dataset.

Online co-regularized algorithm.

The proposed algorithm is particularly applicable to learning tasks where large amounts of (unlabeled) data are available for training. The algorithm co-regularizes prediction functions on unlabeled data points and leads to improved performance in comparison to several baseline methods on UCI benchmarks and a real world natural language processing datasets.

Joint work with Tom de Ruijter. Download the code and an example dataset.

Probabilistic preference learner/ranker - ProbRank.

The algorithm can learn a ranking function based on pairwise comparison data, that is, data about the ranking function values is provided in terms of pairwise comparisons at the given locations. This is accomplished in two ways: a) Approximating the marginal likelihood using expectation propagation and carrying out maximum likelihood procedure on the hyper-parameters. In this case the square exponential covariance function is used. b) Considering ranking as a regression with Gaussian noise and Gaussian processes prior, given the score differences.

Joint work with Botond Cseke. Download Matlab implementation of the probabilistic preference learning models described in "Kernel principal component ranking: Robust ranking on noisy data".

E-MaLeS 1.0 (3rd place in the FOF division of the CADE ATP System Competition).

E-MaLeS 1.0 is an automated theorem prover which is based on E prover. E-MaleS 1.0 uses E with different strategies than the standard auto mode. Furthermore it employs strategy splitting, e.g. it runs several strategies. Note that this version is very CASC focused.

Joint work with Daniel Kuehlwein. Download the code.

Multi-output ranker for automated reasoning.

Joint work with Daniel Kuehlwein. Download the code and the data used in the experiments of the paper "Multi-Output Ranking for Automated Reasoning".

Looking for a new challenge?

@HORAIZON we’re always happy to hear from qualified candidates. If you want to be considered for future openings, we’ll review your resumé and keep you in mind if any relevant positions become available. Email us at: info@horaizon.nl