Rigorous Systems Research Group (RSRG) Seminar
High-level image understanding problems are diverse, yet they share some common traits. They are fundamental problems in computer vision including image classification, fine-grained classification, semantic image segmentation, bounding prediction, depth estimation, object detection and so on. In the prevalence of deep learning, computer vision researchers have design different representation learning algorithms to solve these problems. However, the solutions are always a balance of semantics, scale, context, and resolution. It is therefore desirable to understand whether there exists a universal learning framework that can be applied to all the problems effectively. We start the investigation by first drawing connection between image classification and semantic segmentation. We find that resolution of layer output plays an important role in prediction performance and new designs of networks are proposed to produce better high-resolution feature maps. We move further to propose Deep Layer Aggregation (DLA) that can learn how to aggregate scale and context information produced from each layer. We apply DLA to multiple domains and we find that our simple learning procedure can give state-of-the-art results for those problems. I will also talk about our on-going efforts in annotating driving videos to understand challenges of image representation in real-world applications.