BioVL-QR: Egocentric Biochemical
Vision-and-Language Dataset
Using Micro QR Codes

Tomohiro Nishimoto, Taichi Nishimura, Koki Yamamoto, Keisuke Shirai, Hirotaka Kameko,
Yuto Haneji, Tomoya Yoshida, Keiya Kajimura, Taiyu Cui, Chihiro Nishiwaki,
Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Shinsuke Mori,
Kyoto University, Osaka Medical and Pharmaceutical University
Overview of BioVL-QR

BioVL-QR is a biochemical vision-and-language dataset consisting of
23 egocentric experiment videos, corresponding protocols, and vision-and-language alignments.
We label objects appearing in the video using a Micro QR Code.

Abstract

This paper introduces BioVL-QR, a biochemical vision-and-language dataset comprising 23 egocentric experiment videos, the corresponding protocols, and vision-and-language alignments. A major challenge in understanding biochemical videos is detecting equipment, reagents, and containers because of the cluttered environment and indistinguishable objects. Previous studies assumed manual object annotation, which is costly and time-consuming. To address the issue, we focus on Micro QR Codes. However, detecting objects using only Micro QR Codes is still difficult due to blur and occlusion caused by object manipulation. To overcome this, we propose an object labeling method combining a Micro QR Code detector with an off-the-shelf hand object detector. As an application of the method and BioVL-QR, we tackled the task of localizing the procedural steps in an instructional video. The experimental results show that using Micro QR Codes and our method improves biochemical video understanding.

Sample Video