Most recent papers in the journal IEEE Transactions on Pattern Analysis and Machine Intelligence

#1

JOURNAL ARTICLE

OoD-Control: Generalizing Control in Unseen Environments.

Nanyang Ye, Zhaoyu Zeng, Jundong Zhou, Lin Zhu, Yuxiao Duan, Yifei Wu, Junqi Wu, Haoqi Zeng, Qinying Gu, Xinbing Wang, Chenghu Zhou

Generalizing out-of-distribution (OoD) is critical but challenging in real applications such as unmanned aerial vehicle (UAV) flight control. Previous machine learning-based control has shown promise in dealing with complex real-world environments but suffers huge performance degradation facing OoD scenarios, posing risks to the stability and safety of UAVs. In this paper, we found that the introduced random noises during training surprisingly yield theoretically guaranteed performances via a proposed functional optimization framework...

38687660

April 30, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#2

JOURNAL ARTICLE

Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities.

Guihong Li, Duc Hoang, Kartikeya Bhardwaj, Ming Lin, Zhangyang Wang, Radu Marculescu

Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate NAS from the expensive training process. The key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters. The proxies proposed so far are usually inspired by recent progress in theoretical understanding of deep learning and have shown great potential on several datasets and NAS benchmarks. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches, with an emphasis on their hardware awareness...

38687659

April 30, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#3

JOURNAL ARTICLE

A Review of State-of-the-Art Mixed-Precision Neural Network Frameworks.

Mariam Rakka, Mohammed E Fouda, Pramod Khargonekar, Fadi Kurdahi

Mixed-precision Deep Neural Networks (DNNs) provide an efficient solution for hardware deployment, especially under resource constraints, while maintaining model accuracy. Identifying the ideal bit precision for each layer, however, remains a challenge given the vast array of models, datasets, and quantization schemes, leading to an expansive search space. Recent literature has addressed this challenge, resulting in several promising frameworks. This paper offers a comprehensive overview of the standard quantization classifications prevalent in existing studies...

38683716

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#4

JOURNAL ARTICLE

Uncertainty-boosted Robust Video Activity Anticipation.

Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang

Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored. This reduces the model generalization ability and deep understanding on video content, leading to serious error accumulation and degraded performance. In this paper, we address the uncertainty learning problem and propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results...

38683715

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#5

JOURNAL ARTICLE

Learning to Holistically Detect Bridges From Large-Size VHR Remote Sensing Imagery.

Yansheng Li, Junwei Luo, Yongjun Zhang, Yihua Tan, Jin-Gang Yu, Song Bai

Bridge detection in remote sensing images (RSIs) plays a crucial role in various applications, but it poses unique challenges compared to the detection of other objects. In RSIs, bridges exhibit considerable variations in terms of their spatial scales and aspect ratios. Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size very-high-resolution (VHR) RSIs. However, the lack of datasets with large-size VHR RSIs limits the deep learning algorithms' performance on bridge detection...

38683714

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#6

JOURNAL ARTICLE

Consistency-Aware Anchor Pyramid Network for Crowd Localization.

Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Anton van den Hengel, Nicu Sebe, Ming-Hsuan Yang, Qingming Huang

Crowd localization aims to predict the positions of humans in images of crowded scenes. While existing methods have made significant progress, two primary challenges remain: (i) a fixed number of evenly distributed anchors can cause excessive or insufficient predictions across regions in an image with varying crowd densities, and (ii) ranking inconsistency of predictions between the testing and training phases leads to the model being sub-optimal in inference. To address these issues, we propose a Consistency-Aware Anchor Pyramid Network (CAAPN) comprising two key components: an Adaptive Anchor Generator (AAG) and a Localizer with Augmented Matching (LAM)...

38683713

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#7

JOURNAL ARTICLE

Searching to Exploit Memorization Effect in Deep Learning with Noisy Labels.

Hansi Yang, Quanming Yao, Bo Han, James T Kwok

Sample selection approaches are popular in robust learning from noisy labels. However, how to control the selection process properly so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success of automated machine learning (AutoML), we propose to control the selection process by bi-level optimization. Specifically, we parameterize the selection process by exploiting the general patterns of the memorization effect in the upper-level, and then update these parameters using predicting accuracy obtained from model training in the lower-level...

38683712

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#8

JOURNAL ARTICLE

A Versatile Framework for Multi-Scene Person Re-Identification.

Wei-Shi Zheng, Junkai Yan, Yi-Xing Peng

Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges...

38683711

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#9

JOURNAL ARTICLE

Hierarchical Recognizing Vector Graphics and A New Chart-based Vector Graphics Dataset.

Shuguang Dou, Xinyang Jiang, Lu Liu, Lu Ying, Caihua Shan, Yifei Shen, Xuanyi Dong, Yun Wang, Dongsheng Li, Cairong Zhao

The conventional approach to image recognition has been based on raster graphics, which can suffer from aliasing and information loss when scaled up or down. In this paper, we propose a novel approach that leverages the benefits of vector graphics for object localization and classification. Our method, called YOLaT (You Only Look at Text), takes the textual document of vector graphics as input, rather than rendering it into pixels. YOLaT builds multi-graphs to model the structural and spatial information in vector graphics and utilizes a dual-stream graph neural network (GNN) to detect objects from the graph...

38669166

April 26, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#10

JOURNAL ARTICLE

Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators.

Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Sohrab Amirghodsi, Yuqian Zhou, Jiebo Luo

Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects...

38669165

April 26, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#11

JOURNAL ARTICLE

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment.

Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan

While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning, under-modeling of temporal dynamics, detached video-language view. In this work, we target enhancing VLMs with a fine-grained structural spatio-temporal alignment learning method (namely Finsta). First of all, we represent the input texts and videos with fine-grained scene graph (SG) structures, both of which are further unified into a holistic SG (HSG) for bridging two modalities...

38662568

April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#12

JOURNAL ARTICLE

Appearance-based Gaze Estimation with Deep Learning: A Review and Benchmark.

Yihua Cheng, Haofei Wang, Yiwei Bao, Feng Lu

Human gaze provides valuable information on human focus and intentions, making it a crucial area of research. Recently, deep learning has revolutionized appearance-based gaze estimation. However, due to the unique features of gaze estimation research, such as the unfair comparison between 2D gaze positions and 3D gaze vectors and the different pre-processing and post-processing methods, there is a lack of a definitive guideline for developing deep learning-based gaze estimation algorithms. In this paper, we present a systematic review of the appearance-based gaze estimation methods using deep learning...

38662567

April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#13

JOURNAL ARTICLE

Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling.

Qi Zhang, Shanshe Wang, Xinfeng Zhang, Chuanmin Jia, Zhao Wang, Siwei Ma, Wen Gao

Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine's perceptual characteristics are not leveraged effectively, resulting in suboptimal compression efficiency. To overcome these limitations, this paper introduces Satisfied Machine Ratio (SMR), a metric that statistically evaluates the perceptual quality of compressed images and videos for machines by aggregating satisfaction scores from them...

38662566

April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#14

JOURNAL ARTICLE

Latency-aware Unified Dynamic Networks for Efficient Image Recognition.

Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang

Dynamic computation has emerged as a promising strategy to improve the inference efficiency of deep networks. It allows selective activation of various computing units, such as layers or convolution channels, or adaptive allocation of computation to highly informative spatial regions in image features, thus significantly reducing unnecessary computations conditioned on each input sample. However, the practical efficiency of dynamic models does not always correspond to theoretical outcomes. This discrepancy stems from three key challenges: 1) The absence of a unified formulation for various dynamic inference paradigms, owing to the fragmented research landscape; 2) The undue emphasis on algorithm design while neglecting scheduling strategies, which are critical for optimizing computational performance and resource utilization in CUDA-enabled GPU settings; and 3) The cumbersome process of evaluating practical latency, as most existing libraries are tailored for static operators...

38662565

April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#15

JOURNAL ARTICLE

What Makes Deviant Places?

Jin-Hwi Park, Young-Jae Park, Ilyung Cheong, Junoh Lee, Young Eun Huh, Hae-Gon Jeon

Urban safety plays an essential role in the quality of citizens' lives and in the sustainable development of cities. In recent years, researchers have attempted to apply machine learning techniques to identify the role of location-specific attributes in the development of urban safety. However, existing studies have mainly relied on limited images (e.g., map images, single- or four-directional images) of areas based on a relatively large geographical unit and have narrowly focused on severe crime rates, which limits their predictive performance and implications for urban safety...

38656859

April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#16

JOURNAL ARTICLE

Novel Uncertainty Quantification through Perturbation-Assisted Sample Synthesis.

Yifei Liu, Rex Shen, Xiaotong Shen

This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy...

38656858

April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#17

JOURNAL ARTICLE

Learning Graph Attentions via Replicator Dynamics.

Bo Jiang, Ziyan Zhang, Sheng Ge, Beibei Wang, Xiao Wang, Jin Tang

Graph Attention (GA) which aims to learn the attention coefficients for graph edges has achieved impressive performance in GNNs on many graph learning tasks. However, existing GAs are usually learned based on edges' (or connected nodes') features which fail to fully capture the rich structural information of edges. Some recent research attempts to incorporate the structural information into GA learning but how to fully exploit them in GA learning is still a challenging problem. To address this challenge, in this work, we propose to leverage a new Replicator Dynamics model for graph attention learning, termed Graph Replicator Attention (GRA)...

38656857

April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#18

JOURNAL ARTICLE

A Survey on Efficient Vision Transformers: Algorithms, Techniques, and Performance Benchmarking.

Lorenzo Papa, Paolo Russo, Irene Amerini, Luping Zhou

Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution...

38656856

April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#19

JOURNAL ARTICLE

NeuralRecon: Real-Time Coherent 3D Scene Reconstruction from Monocular Video.

Xi Chen, Jiaming Sun, Yiming Xie, Hujun Bao, Xiaowei Zhou

We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction...

38656855

April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#20

JOURNAL ARTICLE

Cross-Modal Hashing Method with Properties of Hamming Space: A New Perspective.

Zhikai Hu, Yiu-Ming Cheung, Mengke Li, Weichao Lan

Cross-modal hashing (CMH) has attracted considerable attention in recent years. Almost all existing CMH methods primarily focus on reducing the modality gap and semantic gap, i.e., aligning multi-modal features and their semantics in Hamming space, without taking into account the space gap, i.e., difference between the real number space and the Hamming space. In fact, the space gap can affect the performance of CMH methods. In this paper, we analyze and demonstrate how the space gap affects the existing CMH methods, which therefore raises two problems: solution space compression and loss function oscillation...

38652619

April 23, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

Use the journals feature with a free QxMD account.

IEEE Transactions on Pattern Analysis and Machine Intelligence

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips