EE Systems Seminar
Abstract:
High dimensional "big data" is often encountered in various fields, such as audio analysis, video analytics and data mining. One of the associated challenges is how to extract meaningful information from such data, which is generally difficult to analyze as is. Usually this is done by some manipulation of extracted, pre-defined features. However, such an approach requires some prior knowledge or model in order to identify suitable features. An alternative approach is to use general dimensionality reduction methods, which are purely unsupervised (namely, do not require prior knowledge), and aim to extract ("learn") a lower-dimensional representation. Dimensionality reduction methodologies reduce the size (dimension) of objects in the dataset while preserving the coherence of the original data, such that clustering, classification, manifold learning and many other data analysis tasks can be applied in the reduced space.
Diffusion maps (Coifman and Lafon 2006), a leading method for dimensionality reduction is based on the intrinsic geometry of the analyzed dataset. The method utilizes local connectivities to construct a Markov matrix (enforcing an implied Random Walk process between objects). Based on this matrices' spectral decomposition dimensionality reduction is achieved, furthermore a new distance is defined which approximates the geodesic distance between data points.
In this study we consider learning a reduced dimensionality representation from datasets obtained under multiple views. Such multiple views of datasets can be obtained, for example, when the same underlying process is observed using several different modalities, or measured with different instrumentation. Our goal is to effectively exploit the availability of such multiple views for various purposes, such as non-linear embedding, manifold learning, spectral clustering, anomaly detection and non-linear system identification.
Our proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. We do this by defining a cross-view model, in which an implied Random Walk process between objects is restrained to hop between the various views. Our method is robust to scaling of each dataset, and is insensitive to small structural changes in the data. Within this framework, we define new diffusion distances and analyze the spectra of the new kernels. We examine the applicability of the proposed method to manifold learning and spectral clustering. In addition, our approach enables to define a new semi-supervised learning paradigm, which consists of sampling one view from a physical system, while generating a second view using some prior knowledge regarding the underlying physical model.
In this talk I will describe Diffusion Maps with an application to Musical Key extraction and describe our proposed method for Multi View Diffusion Maps.
Bio:
Ofir Lindenbaum: received B.Sc. degrees in 2010, in electrical engineering and in physics (both summa cum laude), from the Technion-Israel Institute of Technology. He is now pursuing his PhD in electrical engineering at the School of Electrical Engineering at Tel-Aviv University. His areas of interest include machine learning, applied and computational harmonic analysis, musical signals analysis.