Shuguang Dou, Xinyang Jiang, Lu Liu, Lu Ying, Caihua Shan, Yifei Shen, Xuanyi Dong, Yun Wang, Dongsheng Li, Cairong Zhao
The conventional approach to image recognition has been based on raster graphics, which can suffer from aliasing and information loss when scaled up or down. In this paper, we propose a novel approach that leverages the benefits of vector graphics for object localization and classification. Our method, called YOLaT (You Only Look at Text), takes the textual document of vector graphics as input, rather than rendering it into pixels. YOLaT builds multi-graphs to model the structural and spatial information in vector graphics and utilizes a dual-stream graph neural network (GNN) to detect objects from the graph...
April 26, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence