Visibility Aware In-Hand Object Pose Tracking in Videos With Transformers

Phan Xuan Tan, Dinh Cuong Hoang, Eiji Kamioka, Anh Nhat Nguyen, Duc Thanh Tran, Van Hiep Duong, Anh Truong Mai, Duc Long Pham, Khanh Toan Phan, Xuan Tung Dinh, Tran Thi Thuy Trang, Xuan Duong Pham, Nhat Linh Nguyen, Thu Uyen Nguyen, Viet Anh Trinh, Khanh Duong Tran, Son Anh Bui

Research output: Contribution to journalArticlepeer-review

Abstract

In-hand object pose estimation is essential in various engineering applications, such as quality inspection, reverse engineering, and automated manufacturing processes. However, achieving accurate pose estimation becomes difficult when objects are heavily occluded by the hand or blurred due to motion. To address these challenges, we propose a novel framework that leverages the power of transformers for spatial-temporal reasoning across video sequences. Our approach utilizes transformers to capture both spatial relationships within each frame and temporal dependencies across consecutive frames, allowing the model to aggregate information over time and improve pose predictions. A key innovation of our framework is the introduction of a visibility-aware module, which dynamically adjusts pose estimates based on the object’s visibility. This module utilizes temporally-aware features extracted by the transformers, allowing the model to aggregate pose information across multiple frames. By integrating this aggregated information, the model can maintain high accuracy even when portions of the object are not visible in certain frames. This capability is particularly crucial in dynamic environments where the object’s appearance can change rapidly due to hand movements or interactions with other objects. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art techniques, achieving a 6% improvement in overall accuracy and over 11% better performance in handling occlusions.

Original languageEnglish
Pages (from-to)35733-35749
Number of pages17
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Keywords

  • Pose estimation
  • deep learning
  • intelligent systems
  • machine vision
  • robot vision systems
  • supervised learning

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Visibility Aware In-Hand Object Pose Tracking in Videos With Transformers'. Together they form a unique fingerprint.

Cite this