Papers in the journal IEEE Transactions on Pattern Analysis and Machine Intelligence (Page 2)

#21

JOURNAL ARTICLE

Tianfei Zhou, Wenguan Wang

Deep learning based semantic segmentation solutions have yielded compelling results over the preceding decade. They encompass diverse network architectures (FCN based or attention based), along with various mask decoding schemes (parametric softmax based or pixel-query based). Despite the divergence, they can be grouped within a unified framework by interpreting the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, we reveal inherent limitations within the parametric segmentation regime, and accordingly develop a nonparametric alternative based on non-learnable prototypes...

38598386

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#22

JOURNAL ARTICLE

A Modular Neural Motion Retargeting System Decoupling Skeleton and Shape Perception.

Jiaxu Zhang, Zhigang Tu, Junwu Weng, Junsong Yuan, Bo Du

Motion mapping between characters with different structures but corresponding to homeomorphic graphs, meanwhile preserving motion semantics and perceiving shape geometries, poses significant challenges in skinned motion retargeting. We propose M-R2ET, a modular neural motion retargeting system to comprehensively address these challenges. The key insight driving M-R2ET is its capacity to learn residual motion modifications within a canonical skeleton space. Specifically, a cross-structure alignment module is designed to learn joint correspondences among diverse skeletons, enabling motion copy and forming a reliable initial motion for semantics and geometry perception...

38598385

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#23

JOURNAL ARTICLE

STMixer: A One-Stage Sparse Action Detector.

Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context information outside. Recently, a few query-based action detectors have been proposed to predict action instances in an end-to-end manner...

38598384

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#24

JOURNAL ARTICLE

Representing Noisy Image Without Denoising.

Shuren Qi, Yushu Zhang, Chao Wang, Tao Xiang, Xiaochun Cao, Yong Xiang

A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such methods generally exhibit inefficient process and unstable result, limiting their practical applications. In this paper, we explore a non-learning paradigm that aims to derive robust representation directly from noisy images, without the denoising as pre-processing...

38598383

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#25

JOURNAL ARTICLE

Learning Local and Global Temporal Contexts for Video Semantic Segmentation.

Guolei Sun, Yun Liu, Henghui Ding, Min Wu, Luc Van Gool

Contextual information plays a core role for video semantic segmentation (VSS). This paper summarizes contexts for VSS in two-fold: local temporal contexts (LTC) which define the contexts from neighboring frames, and global temporal contexts (GTC) which represent the contexts from the whole video. As for LTC, it includes static and motional contexts, corresponding to static and moving content in neighboring frames, respectively. Previously, both static and motional contexts have been studied. However, there is no research about simultaneously learning static and motional contexts (highly complementary)...

38598382

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#26

JOURNAL ARTICLE

Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects.

Kexin Zhang, Qingsong Wen, Chaoli Zhang, Rongyao Cai, Ming Jin, Yong Liu, James Y Zhang, Yuxuan Liang, Guansong Pang, Dongjin Song, Shirui Pan

Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article...

38598381

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#27

JOURNAL ARTICLE

PPDM++: Parallel Point Detection and Matching for Fast and Accurate HOI Detection.

Yue Liao, Si Liu, Yulu Gao, Aixi Zhang, Zhimin Li, Fei Wang, Bo Li

Human-Object Interaction (HOI) detection aims to understand human activities by detecting interaction triplets. Previous HOI detection methods adopt a two-stage instance-driven paradigm. Unfortunately, many non-interactive human-object pairs generated by the first stage are the main obstacle impeding HOI detectors from high efficiency and promising performance. To remedy this, we propose a novel top-down interaction-driven paradigm, detecting interactions first and bridging interactive human-object pairs through interactions...

38598380

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#28

JOURNAL ARTICLE

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments.

Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang

Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. It becomes increasingly crucial in the field of embodied AI, with potential applications in autonomous navigation, search and rescue, and human-robot interaction. In this paper, we propose to address a more practical yet challenging counterpart setting - vision-language navigation in continuous environments (VLN-CE). To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments...

38593013

April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#29

JOURNAL ARTICLE

Multiview Tensor Spectral Clustering via Co-regularization.

Hongmin Cai, Yu Wang, Fei Qi, Zhuoyao Wang, Yiu-Ming Cheung

Graph-based multi-view clustering encodes multi-view data into sample affinities to find consensus representation, effectively overcoming heterogeneity across different views. However, traditional affinity measures tend to collapse as the feature dimension expands, posing challenges in estimating a unified alignment that reveals both crossview and inner relationships. To tackle this challenge, we propose to achieve multi-view uniform clustering via consensus representation coregularization. First, the sample affinities are encoded by both popular dyadic affinity and recent high-order affinities to comprehensively characterize spatial distributions of the HDLSS data...

38593012

April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#30

JOURNAL ARTICLE

"seeing" ENF From Neuromorphic Events: Modeling and Robust Estimation.

Lexuan Xu, Guang Hua, Haijian Zhang, Lei Yu

Most artificial lights exhibit subtle fluctuations in intensity and frequency in response to the influence of the grid's alternating current, providing the potential to estimate the Electric Network Frequency (ENF) from conventional frame-based videos. Nevertheless, the performance of Video-based ENF (V-ENF) estimation largely relies on the imaging quality and thus may suffer from significant interference caused by non-ideal sampling, scene diversity, motion interference, and extreme lighting conditions. In this paper, we show that the ENF can be extracted without the above limitations from a new modality provided by the so-called event camera, a neuromorphic sensor that encodes the light intensity variations and asynchronously emits events with extremely high temporal resolution and high dynamic range...

38593011

April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#31

JOURNAL ARTICLE

Context-Based Meta-Reinforcement Learning with Bayesian Nonparametric Models.

Zhenshan Bing, Yuqi Yun, Kai Huang, Alois Knoll

Deep reinforcement learning agents usually need to collect a large number of interactions to solve a single task. In contrast, meta-reinforcement learning (meta-RL) aims to quickly adapt to new tasks using a small amount of experience by leveraging the knowledge from training on a set of similar tasks. State-of-the-art context-based meta-RL algorithms use the context to encode the task information and train a policy conditioned on the inferred latent task encoding. However, most recent works are limited to parametric tasks, where a handful of variables control the full variation in the task distribution, and also failed to work in non-stationary environments due to the few-shot adaptation setting...

38593010

April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#32

JOURNAL ARTICLE

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks.

Lei Zhang, Yuhang Zhou, Yi Yang, Xinbo Gao

Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks. Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. Besides, commonly used adaptive learning and fine-tuning technique is unsuitable for adversarial defense since it is essentially a zero-shot problem when deployed. Thus, to tackle this challenge, we propose an attack-agnostic defense method named Meta Invariance Defense (MID)...

38587963

April 8, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#33

JOURNAL ARTICLE

Convergence Analysis of Mean Shift.

Ryoya Yamasaki, Toshiyuki Tanaka

The mean shift (MS) algorithm seeks a mode of the kernel density estimate (KDE). This study presents a convergence guarantee of the mode estimate sequence generated by the MS algorithm and an evaluation of the convergence rate, under fairly mild conditions, with the help of the argument concerning the Łojasiewicz inequality. Our findings extend existing ones covering analytic kernels and the Epanechnikov kernel. Those are significant in that they cover the biweight kernel, which is optimal among non-negative kernels in terms of the asymptotic statistical efficiency for the KDE-based mode estimation...

38587962

April 8, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#34

JOURNAL ARTICLE

A Closed-Form, Pairwise Solution to Local Non-Rigid Structure-from-Motion.

Shaifali Parashar, Yuxuan Long, Mathieu Salzmann, Pascal Fua

A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express local, differential constraints between pairs of images, from which the surface normal at any point can be obtained by solving a system of polynomial equations. While this approach is more successful than its counterparts relying on global constraints, the resulting methods face two main problems: First, most of the equation systems they formulate are of high degree and must be solved using computationally expensive polynomial solvers...

38578851

April 4, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#35

JOURNAL ARTICLE

SpectralGPT: Spectral Remote Sensing Foundation Model.

Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, Jocelyn Chanussot

The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT)...

38568772

April 3, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#36

JOURNAL ARTICLE

Scalable Video Object Segmentation With Identification Mechanism.

Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS). Previous VOS methods decode features with a single positive object, limiting the learning of multi-object representation as they must match and segment each target separately under multi-object scenarios. Additionally, earlier techniques catered to specific application objectives and lacked the flexibility to fulfill different speed-accuracy requirements. To address these problems, we present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST)...

38564351

April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#37

JOURNAL ARTICLE

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits.

Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu

Audio-visual approaches involving visual inputs have laid the foundation for recent progress in speech separation. However, the optimization of the concurrent usage of auditory and visual inputs is still an active research area. Inspired by the cortico-thalamo-cortical circuit, in which the sensory processing mechanisms of different modalities modulate one another via the non-lemniscal sensory thalamus, we propose a novel cortico-thalamo-cortical neural network (CTCNet) for audio-visual speech separation (AVSS)...

38564350

April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#38

JOURNAL ARTICLE

NeRF-Texture: Synthesizing Neural Radiance Field Textures.

Yi-Hua Huang, Yan-Pei Cao, Yu-Kun Lai, Ying Shan, Lin Gao

Texture synthesis is a fundamental problem in computer graphics that would benefit various applications. Existing methods are effective in handling 2D image textures. In contrast, many real-world textures contain meso-structure in the 3D geometry space, such as grass, leaves, and fabrics, which cannot be effectively modeled using only 2D image textures. We propose a novel texture synthesis method with Neural Radiance Fields (NeRF) to capture and synthesize textures from given multi-view images. In the proposed NeRF texture representation, a scene with fine geometric details is disentangled into the meso-structure textures and the underlying base shape...

38564349

April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#39

JOURNAL ARTICLE

Transformer based Pluralistic Image Completion with Reduced Information Loss.

Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu

Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize 2563 RGB values to a small number (such as 512) of quantized color values. The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer. To mitigate these issues, we propose a new transformer based framework called "PUT"...

38564348

April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#40

JOURNAL ARTICLE

Rotation Equivariant Proximal Operator for Deep Unfolding Methods in Image Restoration.

Jiahong Fu, Qi Xie, Deyu Meng, Zongben Xu

The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost "white box" network architecture with high interpretability. In this architecture, only the predefined component of the proximal operator, known as a proximal network, needs manual configuration, enabling the network to automatically extract intrinsic image priors in a data-driven manner...

38557620

April 1, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

Use the journals feature with a free QxMD account.

IEEE Transactions on Pattern Analysis and Machine Intelligence

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips