Lecturers: Roberta Ferrario, Marco Cristani, Daniele Porello
Period: 04-May-2015 to 08-May-2015
Course objectives and contents:
Image and video understanding is the process of converting elementary visual entities (pixels, voxels) to symbolic forms of knowledge (textual tags, predicates), by means of various kinds of models (statistical classifiers, neural networks, expert systems, etc.). It represents the highest processing level in a computer vision system, operating usually on top of a basic processing layer, which extracts intermediate image representations (patches, volumes). Due to the unconstrained nature of photographic images and videos, and the lack of fully reliable low-level features, the process of image understanding may be helped by grounding it with a prior semantic model describing any domain knowledge, which may operate during both learning and inference. This semantic layer is usually represented by means of an ontology, intended as a set of primitive concepts and relations expressed by axioms providing an interpretation to the vocabulary chosen for the visual description of a domain.
After the early steps in the eighties, the research domain that cross-pollinates computer vision and formal ontology stagnated, limited probably by the lack of available domain ontologies. However, more recently, with the creation of shared resources as ImageNet, TinyImage, Labelme, on the computer vision side, and WordNet on the formal ontology side, the area exploded, leading to an exponential growth in the scientific community.
The intertwinement of an implicit, automatic and algorithmic level, with an explicit, analytic and computational one seems to be constitutive of visual knowledge. How they influence and complement each other in the production of such visual knowledge is something that deserves a thorough investigation. The potentialities of the cross-fertilization between computer vision and formal ontology are thus particularly promising, especially for classification tasks.
The aim of the course is to offer to graduate students a range of recent techniques, innovative ideas and solutions for exploiting the potential synergies emerging from the integration of the two domains, computer vision and machine learning on one side, and formal ontology on the other, that can be applied to object and event recognition, scene, image and video understanding, with the long term goals of promoting the development of a proper visual ontology and a better understanding of how such a visual ontology could be used for visual inference. This will be achieved by 3/4 of standard lessons and 1/4 of laboratory sessions in MATLAB and formal logic languages.