Below I showcase some of my projects in data science and machine learning. For an up to date list of my papers please refer to the publications page.
Conditional sampling with MGANs
Conditional sampling is a fundamental problem in statistics and machine learning. Consider the supervised learning problem of predicting an output at an input
. We cast this problem as the problem of identifying the conditional measure
from a training data set consisting of input-output samples
.
Monotone Generative Adversarial Networks (MGANs) are a variant of the standard GANs, by adding appropriate structural and monotonicity conditions, that are able to sample the desired conditionals . More precisely, MGANs give a mapping
so that for any new input
the map
pushes a standard Gaussian on
to the desired conditional
.

- Nikola B. Kovachki, Ricardo Baptista, Bamdad Hosseini, and Youssef M. Marzouk, “Conditional sampling with monotone GANs” (2020). url:https://arxiv.org/abs/2006.06755
Spectral clustering
Clustering is an unsupervised learning technique aiming to identify meaningful coarse structures in a point cloud . Spectral clustering is a particularly successful approach to this problem where a graph is constructed on
as well as a graph Laplacian operator
. Then an embedding
is defined using the eigenvector of
that maps
to a low-dimensional space where the clustering of
is revealed easier.

Theoretical analysis of spectral clustering is still quite scarce. We studied the discrete and the continuum limits of spectral clustering by analyzing the geometry of the Laplacian embedding when the points in
are i.i.d. with respect to a mixture model. We showed that as the components of this mixture become better separated spectral clustering will identify the correct clusters in
with high probability.

- Nicolas Garcia Trillos, Franca Hoffmann and Bamdad Hosseini “Geometric structure of graph Laplacian embeddings” (2019). url:https://arxiv.org/abs/1901.10651.
Continuum limit of graph Laplacians
goes to infinity, the graph Laplacian operators
for certain values of the parameters


- Franca Hoffmann, Bamdad Hosseini, Assad A. Oberai and Andrew M. Stuart “Spectral analysis of weighted Laplacians arising in data clustering” (2019). url:https://arxiv.org/abs/1909.06389
Semi-supervised learning
Semi-supervised learning (SSL) is the problem of labelling a collection of unlabelled points from noisily observed labels of a small subset
. The probit and one-hot methods are two widely used approaches to SSL — probit recovers binary labels while one-hot can handle finitely many labels. Both methods combine the observed labels with geometric information about
to label the rest of the points. Similar to spectral clustering a graph Laplacian operator
is constructed on
. The labels on all of
are then found by solving an optimization problem consisting of a data fidelity term for the observed labels on
, and a Dirichlet energy regularization term involving the graph Laplacian
that injects the geometry of
encoded in the spectrum of
.
We analyze consistency of SSL and show that, under appropriate geometric assumptions the probit and one-hot methods recover the correct label of all points in in the limit of small observational noise. Our analysis reveals interesting interactions between different hyperparameters in both methods.

- Franca Hoffmann, Bamdad Hosseini, Zhi Ren and Andrew M. Stuart “Consistency of semi-supervised learning algorithms on graphs: Probit and one-hot methods”. Journal Of Machine Learning Research (2020, In press). url:https://arxiv.org/abs/1906.07658