本篇综述在对交互感知(Interactive Perception,IP)的一些基础概念与历史发展的介绍与厘清后,给出了一个比较清晰的标准来区分IP是以怎样的方式被应用到各类领域。
Forceful Interactions:
Any action that exerts a potentially time-varying force upon the environment is a forceful interaction
Create Novel Signals (CNS): 通过交互产生所返回的各类传感器信号。如触觉、视觉数据。
Action Perception Regularity (APR):
Forceful interactions reveal regularities in the combined space of sensor information (S) and action parameters (A) over time (t).
This regularity is constituted by the repeatable, multi-modal sensory data that is created when executing the same action in the same environment.
Knowing this regularity corresponds to understanding the causal relationship between action and sensory response given specific environment properties.
下图是对各种Perception方法的一个简单的总结与介绍。实际上Active Perception跟Interactive Perception 的关系也变得模糊了,之所以这么分,是因为在这篇综述之前的大部分AP方法还是主要单单基于视觉进行观察。所以不满足的条件


这里以关节模型估计(Articulation Model Estimation)和基于触觉的属性估计(Haptic Property Estimation)为例进行简单说明:

A. How is the signal in leveraged?
只要是使用到APR的IP方法,肯定都会有用到CNS的。前面提到过,其实这里的APR更具体地表述为先验知识(prior knowledge),也是一种限制,更是一种通过交互来对内部特征的利用。
A prior is a source of information that aids in the interpretation of the sensor signal by rejecting noise, possibly by projecting the signal into a lower dimensional space.
B. What priors are employed?
具体分为基于动力学模型的prior(Priors on the Dynamics),还有基于观察的prior(Priors on the Observations)。
Priors on the Dynamics
a)Given/Specified/Engineered Priors
b)Learned Priors
通过action去学习动力学模型.(learn a dynamics model of the environment given an action.)
例如GPS(Guided Policy Search):
learn the mapping from current state to next best action in a policy search framework
Priors on the Observations
Regularities can also be encoded in the observation model that relates the state of the system to the raw sensory signals.
In the case, where the mapping between state and observation is hand-designed, the state usually refers to some physical quantity.
In the case where the state representation is learned,it is not so easily interpretable.
实际上就是怎么从 raw sensory signal提取出state。
a)Given/Specified/Engineered Observation Models
One example are models of multi-view or perspective geometry for camera sensors . Often, approaches also assume access to an object database (OD) that allows them to predict how the objects will be observed through a given sensor.
b)Learned State Representations
相当于CNN直接输入raw image提取特征。(learn a suitable, task-specific state representation
directly from observations)
C. Does the approach perform action selection?
实际上就是等同于RL中的explore 与exploit的关系
balance between exploration (performing an action to improve perception as much as possible) and exploitation (performing an action thatmaximizes progress towards the manipulation goal).
D,E,F如字面意思理解, 不做赘述.
D. What is the objective: Perception, Manipulation or Both?
E. Are multiple sensor modalities exploited?
F. How is uncertainty modeled and used?
目前基本都是利用基于视觉的传感器信息,怎么利用更多类型的感知信息?如何选择特征?explore or exploit?