Targeted proteomics improves cardiovascular risk prediction in secondary
prevention
Depicting the composition of gut microbiota in a population with varied ethnic
origins
Covered Information Disentanglement: Model Transparency
Introducing the Continuous Glucose Data Analysis (CGDA) R Package
Improvement of Insulin Sensitivity after Lean Donor Feces in Metabolic
Syndrome
Peripheral blood DNA methylation biomarkers accurately predict response
Interpretable Models via Pairwise Permutations
Algorithm
Graph Space Embedding: International Joint
Conference on Artificial Intelligence
Microbiome: Does disease start in the mouth,
the gut or both?
Effects of fecal microbiota transplant on DNA
methylation
Impact drugs targeting cardiometabolic risk on
the gut microbiota
Differential DNA methylation in familial
hypercholesterolemia
Improved cardiovascular risk prediction using
targeted plasma proteomics
Domain intelligible models: looking for
microbial biomarkers
Models and code
Interpretable Models via Pairwise Permutations Algorithm.
Large scale high dimensional datasets are frequently analysed by non
interpretable, black box models. Afterwards, explainability can be obtained by
applying a model agnostic approach to identify the importance of each feature.
A popular technique to estimate these importances is the permutation method.
However, a notable drawback is its bias towards correlated features, while the
truly relevant ones are left in the dark. We propose the novel Pairwise
Permutations Algorithm (PPA) with the aim of reducing the correlation bias in feature importance values. We also provide a theoretical foundation that builds
upon previous work on permutation importance.
Joint work with Diogo Bastos and Manon Balvers. Download the code and an
example dataset..
Graph Space Embedding: Bridging Efficiency with Model Transparency
We introduce the Graph Space Embedding (GSE) kernel, a technique that maps
the input into a "random walk-based" space where interactions are implicitly
encoded, with little computations required. Our model incorporates contextual
information about the features and, unlike standard black-box models, is also
interpretable.
Joint work with João Belo. Download the code and an example dataset..
We adapt sparse Generalized Additive Model (sparse GAM), to be applicable to
the task of variable selection in high dimensional, microbiome (-omics) dataset.
GAMs are less general in comparison to "fully" nonparametric models, but have
a notable advantage of being readily interpretable and easier to estimate using
a simple backfitting algorithm. Recently, standard additive models have been
successfully applied in the biomedical domain, and they can be naturally
extended to include various interactions among predictors.
Joint work with Sultan Imangaliyev. Download the code and an example
dataset.
We introduce the co-regularized sparse-group lasso algorithm: a technique that
allows the incorporation of auxiliary information into the learning task in terms
of groups of predictors and the relationship between those groups. The
proposed cost function requires related groups of predictors to provide similar
contributions to the final response, and thus, guides the feature selection
process using auxiliary information. Our algorithm is particularly suitable for a
wide range of biological applications where good predictive performance is
required and, in addition to that, it is also important to retrieve all relevant
predictors so as to deepen the understanding of the underlying biological
process.
Joint work with Paula L. Amaral Santos. Download the code and an example
dataset.
Unsupervised multi-view feature selection via co-regularization.
Existing unsupervised feature selection algorithms are designed to extract the
most relevant subset of features that can facilitate clustering and interpretation
of the obtained results. However, these techniques are not applicable in many
real-world scenarios where one has an access to datasets consisting of multiple
views/representations (e.g. various omics proles, medical text records coupled
with FMRI images, etc). Proposed method can leverage information from these
different views and produce more robust and accurate results in comparison to
traditional methods.
Joint work with Sultan Imangaliyev. Download the code and an example
dataset.
This online algorithm uses a co-agreement strategy to take into account
unlabelled data and to improve classi cation performance. Unlike the standard
online methods it is naturally applicable to many real-world situations where
data is available in multiple representations. In addition, our online algorithm
allows learning non-linear relations in the data via kernel functions.
Joint work with Laurens van de Wiel. Download the code and an example
dataset.
Personalized microbial network inference via co-regularized spectral clustering.
Based on the results of co-regularized spectral clustering this code visualizes
two groups of individuals with different topology of their microbial interaction
network. The results of microbial network inference suggest that niche-wise
interactions are different in these two groups. The network visualization is
implemented in Python and in Matlab.
Joint work with Sultan Imangaliyev. Download the code and an example
dataset.
The proposed algorithm is particularly applicable to learning tasks where large
amounts of (unlabeled) data are available for training. The algorithm co
regularizes prediction functions on unlabeled data points and leads to improved
performance in comparison to several baseline methods on UCI benchmarks
and a real world natural language processing datasets.
Joint work with Tom de Ruijter. Download the code and an example dataset.
The algorithm can learn a ranking function based on pairwise comparison data,
that is, data about the ranking function values is provided in terms of pairwise
comparisons at the given locations. This is accomplished in two ways: a)
Approximating the marginal likelihood using expectation propagation and
carrying out maximum likelihood procedure on the hyper-parameters. In this
case the square exponential covariance function is used. b) Considering ranking
as a regression with Gaussian noise and Gaussian processes prior, given the
score differences.
Joint work with Botond Cseke. Download Matlab implementation of the
probabilistic preference learning models described in "Kernel principal
component ranking: Robust ranking on noisy data".
E-MaLeS 1.0 (3rd place in the FOF division of the CADE ATP System Competition).
E-MaLeS 1.0 is an automated theorem prover which is based on E prover. E
MaleS 1.0 uses E with different strategies than the standard auto mode.
Furthermore it employs strategy splitting, e.g. it runs several strategies. Note
that this version is very CASC focused.
Joint work with Daniel Kuehlwein.
Download the code.
@HORAIZON we’re always happy to hear from qualied candidates. If you want
to be considered for future openings, we’ll review your resumé and keep you in
mind if any relevant positions become available.