Sparse signal recovery
A common problem in signal processing is to recover a sparse signal from linear measurements. Traditional sparse signal recovery literature focuses on signals that have a sparse representation in discrete and incoherent dictionaries. However, the signals arising in imaging, radar, and localization are typically sparse in continuously indexed dictionaries that are highly coherent. We have developed theory showing that a convex program perfectly recovers signals that are sparse in a large class of continuously indexed dictionaries.
Specifically, in [J8] we have shown that a convex program recovers signals that are sparse in the continuous dictionary of time and frequency shifted versions of a window function. An implication of our result is that the resolution limit of a traditional delay-Doppler radar—that is inversely proportional to the bandwidth and effective duration of the transmitted probing signal—can be overcome. In [C13], we prove a related result for MIMO radar. Both results are part of a bigger picture: in our recent work [P2], we characterize a large class of continuously indexed dictionaries that allow for sparse signal recovery.
Papers and material
[P2] Generalized line spectral estimation via convex optimization, R. Heckel and M. Soltanolkotabi, Sep. 2016.
Slides for a talk given at the FTW Telecommunications Forum at TU Vienna and the EPFL-Idiap-ETH Sparsity Workshop at EPFL.
Clustering high-dimensional data
One of the major challenges in modern data analysis is to extract relevant information from large, high-dimensional data sets. Often this amounts to finding low-dimensional structure by partitioning points into a union of low-dimensional subspaces, referred to as subspace clustering (illustrated below with points lying near a union of three subspaces).
In practice, it is often desirable to first project the data points into a lower-dimensional space and to perform clustering there; this reduces storage requirements and computational cost.
We have shown for three popular subspace clustering algorithms, that dimensionality reduction down to the order of the subspace dimensions is possible without incurring notable performance degradation. This idea translates into significant gains in running time of the algorithms [J9, C11 ].
Papers and material
[J7] Robust subspace clustering via thresholding, R. Heckel and H. Bölcskei, IEEE Transactions on Information Theory, Nov. 2015.
[C9] Neighborhood selection for thresholding-based subspace clustering, R. Heckel, E. Agustsson, and H. Bölcskei, ICASSP, May 2014.
[C8] Noisy subspace clustering via thresholding, R. Heckel and H. Bölcskei, ISIT, July 2013.
[C7] Subspace clustering via thresholding and spectral clustering, R. Heckel and H. Bölcskei, ICASSP, May 2013.
[C11] Subspace clustering of dimensionality-reduced data, R. Heckel, M. Tschannen, and H. Bölcskei, ISIT, July 2014.
Active ranking from pairwise comparisons
Consider the problem of estimating a ranking based on noisy comparisons between pairs of items. Such rank aggregation problems arise in recommender systems, peer grading for ranking students in massive open online courses, and online surveys for assessing the popularity of proposals in a population.
We propose a simple and computationally efficient active ranking algorithm that provably succeeds in recovering any desired ranking with a sample complexity that is optimal up to logarithmic factors. Contrary to our work, a large number of prior works assume parametric models such as the Bradley-Terry-Luce model. It has been an open question as to whether or not imposing these parametric assumptions allows for improved ranking algorithms. For stochastic comparison models, in which the pairwise probabilities are bounded away from zero, we resolved this issue by proving a lower bound for parametric models. This shows, perhaps surprisingly, that these popular parametric modeling choices offer at most logarithmic gains for stochastic comparisons.
Paper and material
Active ranking from pairwise comparisons and when parametric assumptions don’t help
DNA data storage
Due to its longevity and enormous information density, DNA is an attractive storage medium. However, practical constraints on reading and writing DNA require the data to be stored on several short DNA fragments, that cannot be spatially ordered. We developed and implemented a coding scheme tailored to this unique channel model. Combining this information theoretic with a physical protection mechanism, we showed (with accelerated aging experiments) that digital information can be stored on DNA and perfectly retrieved after thousands of years.
Papers and material
Robust chemical preservation of digital information on DNA in silica with error-correcting codes
Slides for an invited talk at the Leading Edge Embedded NVM Workshop, 2015.
At IBM Research, we designed and implemented a recommender system (currently employed by IBM), that is able to not only generate accurate recommendations, but also interpretable ones. This is important in a business-to-business setting, where recommendations are generated for sales staff and not directly for the client.
Papers and material
Interpretable recommendations via overlapping co-clusters, R. Heckel and M. Vlachos, December 2015, arXiv.
Towards interpretable predictive models in B2B recommender systems, M. Vlachos, V.G. Vassiliadis, R. Heckel, A. Labbi, IBM Journal of Research and Development, Sep. 2016.
Method and system for identifying dependent components, R. Heckel, V. Vasileiadis, and M. Vlachos, US Patent 14/935476, filed Nov. 2015.
The obfuscation and protection of data rights, R. Heckel and M. Vlachos, US Patent 14/805514, filed July 2015.