本篇综述在对交互感知(Interactive Perception,IP)的一些基础概念与历史发展的介绍与厘清后,给出了一个比较清晰的标准来区分IP是以怎样的方式被应用到各类领域。

基本概念

Forceful Interactions:

Any action that exerts a potentially time-varying force upon the environment is a forceful interaction

Create Novel Signals (CNS): 通过交互产生所返回的各类传感器信号。如触觉、视觉数据。

Action Perception Regularity (APR):
这些数据或者信号所具有的内部结构与特征(Regularity,个人理解,类似于CNN从一堆图像中提取出高层次的共同的特征与架构即feature)。

Forceful interactions reveal regularities in the combined space S×A×tS×A×t of sensor information (S) and action parameters (A) over time (t).

This regularity is constituted by the repeatable, multi-modal sensory data that is created when executing the same action in the same environment.

Knowing this regularity corresponds to understanding the causal relationship between action and sensory response given specific environment properties.

下图是对各种Perception方法的一个简单的总结与介绍。实际上Active Perception跟Interactive Perception 的关系也变得模糊了,之所以这么分,是因为在这篇综述之前的大部分AP方法还是主要单单基于视觉进行观察。所以不满足FF的条件

IP的应用与分类

应用

这里以关节模型估计(Articulation Model Estimation)和基于触觉的属性估计(Haptic Property Estimation)为例进行简单说明:

关节模型估计:如上图所示,机器人(由立体摄像机和视锥指示)试图估计桌上两个乐高积木的关节模型。在不同的情况下,机器人可以获得的信息量是不同的。[左]机器人只能改变视角以获取更多信息。[中]机器人可以观察到一个人举起乐高积木时产生的丰富的感觉信号。[右]机器人可以与场景交互并观察产生的感官信号。因此,它通过指定的交互可以得到更多信息。只有在最右边的情况下,才能可靠地评估出关节模型。

基于触觉的属性估计:如上图所示,机器人试图估计球体的重量。在不同的情况下,机器人可以获得的信息量是不同的。[左]机器人只能改变视角以获取更多信息。[中]机器人能观察到一个人推球产生的丰富的感觉信号。[右]机器人可以推动球体本身,观察产生的感官信号,即球体静止的位置。在最后一种情况下,它通过指定的推力可以得到更多信息。只有在最右边的情况下,才能够可靠的评估出球体的质量。

分类


这里的分类基于5大标准(A-F):

A. How is the signal in S×A×tS×A×t leveraged?

只要是使用到APR的IP方法,肯定都会有用到CNS的。前面提到过,其实这里的APR更具体地表述为先验知识(prior knowledge),也是一种限制,更是一种通过交互来对内部特征的利用。

A prior is a source of information that aids in the interpretation of the sensor signal by rejecting noise, possibly by projecting the signal into a lower dimensional space.

比如要求物体是刚体(RO),机器人给定有push,grasp等操作,实际上就是对APR(所谓的regularity)的利用程度。上图所示最左边单纯靠视觉去追踪,其他啥也不用做,也不需要物体限制或者机器人操作;最右边就得满足很多限制要求的同时还会通过机械爪深度的进行交互,充分利用各类型传感器返回的各种信息。这就是exploit的差距。

B. What priors are employed?
具体分为基于动力学模型的prior(Priors on the Dynamics),还有基于观察的prior(Priors on the Observations)。

Priors on the Dynamics

a)Given/Specified/Engineered Priors

很好理解,如前所述,例如要求物体是刚体(rigid),机械臂在平面上(plane)进行push操作。

b)Learned Priors

通过action去学习动力学模型.(learn a dynamics model of the environment given an action.)

例如GPS(Guided Policy Search):

learn the mapping from current state to next best action in a policy search framework

Priors on the Observations

Regularities can also be encoded in the observation model that relates the state of the system to the raw sensory signals.

In the case, where the mapping between state and observation is hand-designed, the state usually refers to some physical quantity.

In the case where the state representation is learned,it is not so easily interpretable.

实际上就是怎么从 raw sensory signal提取出state。

a)Given/Specified/Engineered Observation Models

基于专家先验手工设计的特征,可以理解为model-based,例如有个物体的数据库模型

One example are models of multi-view or perspective geometry for camera sensors . Often, approaches also assume access to an object database (OD) that allows them to predict how the objects will be observed through a given sensor.

b)Learned State Representations

相当于CNN直接输入raw image提取特征。(learn a suitable, task-specific state representation
directly from observations
)

C. Does the approach perform action selection?

实际上就是等同于RL中的explore 与exploit的关系

balance between exploration (performing an action to improve perception as much as possible) and exploitation (performing an action thatmaximizes progress towards the manipulation goal).

D,E,F如字面意思理解, 不做赘述.

D. What is the objective: Perception, Manipulation or Both?

E. Are multiple sensor modalities exploited?

F. How is uncertainty modeled and used?

思考与挑战

目前基本都是利用基于视觉的传感器信息,怎么利用更多类型的感知信息?如何选择特征?explore or exploit?