Cognitive Vision: On Deep Semantics in Visuo-Spatial Computing
Mehul Bhatt (Orebro University, and University of Bremen), and
Jakob Suchan (University of Bremen)
This tutorial at AAAI 2018 presents cognitive vision from the perspectives of language, logic, and artificial intelligence. The tutorial focusses on application areas where explainability and semantic interpretation of dynamic visuo-spatial imagery are central, e.g., for commonsense scene understanding; vision for robotics and HRI; narrative interpretation from the viewpoints of visuo-auditory perception & digital media, multimodal sensemaking of data.
We particularly highlight Deep (Visuo-Spatial) Semantics, denoting the existence of systematic formalisation and declarative programming methods -e.g., pertaining to space and motion- supporting query answering, relational learning, non-monotonic abductive inference, and embodied simulation. Here, we particularly demonstrate the integration of methods from knowledge representation and computer vision with a focus on reasoning & learning about space, action, and change. In the backdrop of areas as diverse as architecture design, cognitive film studies, cognitive robotics, and eye-tracking, this tutorial covers both applications and basic methods concerned with topics such as:
– explainable visual perception,
– semantic video understanding,
– language generation from video,
– declarative spatial reasoning, and
– computational models of narrative.
No special background needed; participants need only be generally interested in AI, Cognitive Science, or HCI. We especially encourage early doctoral researchers, and educators wanting to learn about general tools for logic-based reasoning about visual imagery.