Robots need to be aware of human presence and actions for safety and to engage in collaborative actions. We leverage on the low-latency and high temporal resolution of event-cameras to detect and track human beings and infer their actions online by high frequency human pose estimation, gaze tracking, blink detection and face pose estimation.
Fast motions associated to the movements of the subject (and of the robot observing the subject) are challenging for main-stream RGB sensors, as the tracking is limited by the sensor output itself (in general around 50-60 Hz). In this context, event-cameras provide a potential solution to the tracking of fast-moving objects, and in particular human poses. This is, among other things, due to the high temporal resolution of event-cameras that provides information in the “blind” time between the frames of a standard camera. We aim at continuous low latency tracking of moving targets, to overcome the problem of missing motion information in between inherently low frequency detections, that ultimately limits the speed that can be effectively tracked. We apply this concept to tracking humans, by combining low frequency joints detection by human pose estimation networks with high-speed local joints tracking. This approach leads to more accurate real-time estimation that is crucial for smooth and safe human-robot interaction and for accurate monitoring of motion patterns for rehabilitation therapy assessment.
Methods: event-driven ML, implemented in SNN, event-driven motion estimation and tracking.