Consistency of semi-supervised learning algorithms on graphs: Probit and one-hot methods

We just uploaded a new manuscript on consistency of certain semi-supervised learning algorithms on graphs. This joint work with my wonderful collaborators Dr. Franca Hoffmann, Zhi Ren and Prof. Andrew Stuart. The first of multiple papers we’ve been working on where we develop a framework for consistency analysis of semi-supervised learning algorithms.

https://arxiv.org/abs/1906.07658

Abstract:

Graph-based semi-supervised learning is the problem of propagating labels from a small number of labelled data points to a larger set of unlabelled data. This paper is concerned with the consistency of optimization-based techniques for such problems, in the limit where the labels have small noise and the underlying unlabelled data is well clustered. We study graph-based probit for binary classification, and a natural generalization of this method to multi-class classification using one-hot encoding. The resulting objective function to be optimized comprises the sum of a quadratic form defined through a rational function of the graph Laplacian, involving only the unlabelled data, and a fidelity term involving only the labelled data. The consistency analysis sheds light on the choice of the rational function defining the optimization.

Geometric structure of graph Laplacian embeddings

We just uploaded a new preprint on the geometry of graph Laplacian embeddings in the continuum limit. Joint work with Nicolas Garcia trillos and Franca Hoffmann.

https://arxiv.org/abs/1901.10651

Abstract:

We analyze the spectral clustering procedure for identifying coarse structure in a data set x_1, \dots, x_n , and in particular study the geometry of graph Laplacian embeddings which form the basis for spectral clustering algorithms. More precisely, we assume that the data is sampled from a mixture model supported on a manifold M embedded in \mathbb{R}^d and pick a connectivity length-scale \varepsilon >0 to construct a kernelized graph Laplacian. We introduce a notion of a well-separated mixture model which only depends on the model itself, and prove that when the model is well separated, with high probability the embedded data set concentrates on cones that are centered around orthogonal vectors. Our results are meaningful in the regime where \varepsilon = \varepsilon(n) is allowed to decay to zero at a slow enough rate as the number of data points grows. This rate depends on the intrinsic dimension of the manifold on which the data is supported.

Two Metropolis-Hastings algorithms for non-Gaussian priors

I just updated my preprint (https://arxiv.org/abs/1804.07833) on arXiv after major revisions following reviewer comments. Here’s the new abstract:

We introduce two classes of Metropolis-Hastings algorithms for sampling target measures that are absolutely continuous with respect to underlying non-Gaussian measures on infinite-dimensional Hilbert spaces. We particularly focus on measures that are highly non-Gaussian and utilize certain properties of the underlying prior measures to construct autoregressive-type proposal kernels that are prior reversible and result in algorithms that satisfy detailed balance with respect to the target measures. We then introduce a new class of prior measures, called the Bessel-K priors, as a generalization of the gamma distribution to measures in infinite dimensions. The Bessel-K priors interpolate between well-known priors such as the gamma distribution and Besov priors and can model sparse or compressible parameters. We present concrete instances of our algorithms for the Bessel-K priors and present numerical examples in density estimation, finite-dimensional denoising and deconvolution on the circle.

 

New preprint on model calibration and source inversion in atmospheric dispersion

We just uploaded a new preprint “Simultaneous model calibration and source inversion in atmospheric dispersion models” that I co-authored with Juan Garcia and John Stockie. This is mainly thanks to Juan’s work for his master’s thesis. I am very proud of him. Here’s the abstract:

We present a cost-effective method for model calibration and solution of source inversion problems in atmospheric dispersion modelling. We use Gaussian process emulations of atmospheric dispersion models within a Bayesian framework for solution of inverse problems. The model and source parameters are treated as unknowns and we obtain point estimates and approximation of uncertainties for sources while simultaneously calibrating the forward model. The method is validated in the context of an industrial case study involving emissions from a smelting operation for which cumulative monthly measurements of zinc particulate depositions are available.

A day of celebration at SFU

I had the opportunity to celebrate my graduation with some of the lovelies people in my life. My two supervisors John Stockie and Nilima Nigam hooded me at the convocation ceremony. I was also awarded the Governor General’s gold medal by President Andrew Petter.  This is a huge honor which I credit to my supervisors as well as the wonderful people at SFU who supported me for the past six years.

 

Convocationphoto