journal
Journals IEEE Transactions on Pattern A...

IEEE Transactions on Pattern Analysis and Machine Intelligence

https://read.qxmd.com/read/38598386/prototype-based-semantic-segmentation
#21
JOURNAL ARTICLE
Tianfei Zhou, Wenguan Wang
Deep learning based semantic segmentation solutions have yielded compelling results over the preceding decade. They encompass diverse network architectures (FCN based or attention based), along with various mask decoding schemes (parametric softmax based or pixel-query based). Despite the divergence, they can be grouped within a unified framework by interpreting the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, we reveal inherent limitations within the parametric segmentation regime, and accordingly develop a nonparametric alternative based on non-learnable prototypes...
April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38598385/a-modular-neural-motion-retargeting-system-decoupling-skeleton-and-shape-perception
#22
JOURNAL ARTICLE
Jiaxu Zhang, Zhigang Tu, Junwu Weng, Junsong Yuan, Bo Du
Motion mapping between characters with different structures but corresponding to homeomorphic graphs, meanwhile preserving motion semantics and perceiving shape geometries, poses significant challenges in skinned motion retargeting. We propose M-R2ET, a modular neural motion retargeting system to comprehensively address these challenges. The key insight driving M-R2ET is its capacity to learn residual motion modifications within a canonical skeleton space. Specifically, a cross-structure alignment module is designed to learn joint correspondences among diverse skeletons, enabling motion copy and forming a reliable initial motion for semantics and geometry perception...
April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38598384/stmixer-a-one-stage-sparse-action-detector
#23
JOURNAL ARTICLE
Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang
Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context information outside. Recently, a few query-based action detectors have been proposed to predict action instances in an end-to-end manner...
April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38598383/representing-noisy-image-without-denoising
#24
JOURNAL ARTICLE
Shuren Qi, Yushu Zhang, Chao Wang, Tao Xiang, Xiaochun Cao, Yong Xiang
A long-standing topic in artificial intelligence is the effective recognition of patterns from noisy images. In this regard, the recent data-driven paradigm considers 1) improving the representation robustness by adding noisy samples in training phase (i.e., data augmentation) or 2) pre-processing the noisy image by learning to solve the inverse problem (i.e., image denoising). However, such methods generally exhibit inefficient process and unstable result, limiting their practical applications. In this paper, we explore a non-learning paradigm that aims to derive robust representation directly from noisy images, without the denoising as pre-processing...
April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38598382/learning-local-and-global-temporal-contexts-for-video-semantic-segmentation
#25
JOURNAL ARTICLE
Guolei Sun, Yun Liu, Henghui Ding, Min Wu, Luc Van Gool
Contextual information plays a core role for video semantic segmentation (VSS). This paper summarizes contexts for VSS in two-fold: local temporal contexts (LTC) which define the contexts from neighboring frames, and global temporal contexts (GTC) which represent the contexts from the whole video. As for LTC, it includes static and motional contexts, corresponding to static and moving content in neighboring frames, respectively. Previously, both static and motional contexts have been studied. However, there is no research about simultaneously learning static and motional contexts (highly complementary)...
April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38598381/self-supervised-learning-for-time-series-analysis-taxonomy-progress-and-prospects
#26
JOURNAL ARTICLE
Kexin Zhang, Qingsong Wen, Chaoli Zhang, Rongyao Cai, Ming Jin, Yong Liu, James Y Zhang, Yuxuan Liang, Guansong Pang, Dongjin Song, Shirui Pan
Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article...
April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38598380/ppdm-parallel-point-detection-and-matching-for-fast-and-accurate-hoi-detection
#27
JOURNAL ARTICLE
Yue Liao, Si Liu, Yulu Gao, Aixi Zhang, Zhimin Li, Fei Wang, Bo Li
Human-Object Interaction (HOI) detection aims to understand human activities by detecting interaction triplets. Previous HOI detection methods adopt a two-stage instance-driven paradigm. Unfortunately, many non-interactive human-object pairs generated by the first stage are the main obstacle impeding HOI detectors from high efficiency and promising performance. To remedy this, we propose a novel top-down interaction-driven paradigm, detecting interactions first and bridging interactive human-object pairs through interactions...
April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38593013/etpnav-evolving-topological-planning-for-vision-language-navigation-in-continuous-environments
#28
JOURNAL ARTICLE
Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang
Vision-language navigation is a task that requires an agent to follow instructions to navigate in environments. It becomes increasingly crucial in the field of embodied AI, with potential applications in autonomous navigation, search and rescue, and human-robot interaction. In this paper, we propose to address a more practical yet challenging counterpart setting - vision-language navigation in continuous environments (VLN-CE). To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments...
April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38593012/multiview-tensor-spectral-clustering-via-co-regularization
#29
JOURNAL ARTICLE
Hongmin Cai, Yu Wang, Fei Qi, Zhuoyao Wang, Yiu-Ming Cheung
Graph-based multi-view clustering encodes multi-view data into sample affinities to find consensus representation, effectively overcoming heterogeneity across different views. However, traditional affinity measures tend to collapse as the feature dimension expands, posing challenges in estimating a unified alignment that reveals both crossview and inner relationships. To tackle this challenge, we propose to achieve multi-view uniform clustering via consensus representation coregularization. First, the sample affinities are encoded by both popular dyadic affinity and recent high-order affinities to comprehensively characterize spatial distributions of the HDLSS data...
April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38593011/-seeing-enf-from-neuromorphic-events-modeling-and-robust-estimation
#30
JOURNAL ARTICLE
Lexuan Xu, Guang Hua, Haijian Zhang, Lei Yu
Most artificial lights exhibit subtle fluctuations in intensity and frequency in response to the influence of the grid's alternating current, providing the potential to estimate the Electric Network Frequency (ENF) from conventional frame-based videos. Nevertheless, the performance of Video-based ENF (V-ENF) estimation largely relies on the imaging quality and thus may suffer from significant interference caused by non-ideal sampling, scene diversity, motion interference, and extreme lighting conditions. In this paper, we show that the ENF can be extracted without the above limitations from a new modality provided by the so-called event camera, a neuromorphic sensor that encodes the light intensity variations and asynchronously emits events with extremely high temporal resolution and high dynamic range...
April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38593010/context-based-meta-reinforcement-learning-with-bayesian-nonparametric-models
#31
JOURNAL ARTICLE
Zhenshan Bing, Yuqi Yun, Kai Huang, Alois Knoll
Deep reinforcement learning agents usually need to collect a large number of interactions to solve a single task. In contrast, meta-reinforcement learning (meta-RL) aims to quickly adapt to new tasks using a small amount of experience by leveraging the knowledge from training on a set of similar tasks. State-of-the-art context-based meta-RL algorithms use the context to encode the task information and train a policy conditioned on the inferred latent task encoding. However, most recent works are limited to parametric tasks, where a handful of variables control the full variation in the task distribution, and also failed to work in non-stationary environments due to the few-shot adaptation setting...
April 9, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38587963/meta-invariance-defense-towards-generalizable-robustness-to-unknown-adversarial-attacks
#32
JOURNAL ARTICLE
Lei Zhang, Yuhang Zhou, Yi Yang, Xinbo Gao
Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks. Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. Besides, commonly used adaptive learning and fine-tuning technique is unsuitable for adversarial defense since it is essentially a zero-shot problem when deployed. Thus, to tackle this challenge, we propose an attack-agnostic defense method named Meta Invariance Defense (MID)...
April 8, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38587962/convergence-analysis-of-mean-shift
#33
JOURNAL ARTICLE
Ryoya Yamasaki, Toshiyuki Tanaka
The mean shift (MS) algorithm seeks a mode of the kernel density estimate (KDE). This study presents a convergence guarantee of the mode estimate sequence generated by the MS algorithm and an evaluation of the convergence rate, under fairly mild conditions, with the help of the argument concerning the Łojasiewicz inequality. Our findings extend existing ones covering analytic kernels and the Epanechnikov kernel. Those are significant in that they cover the biweight kernel, which is optimal among non-negative kernels in terms of the asymptotic statistical efficiency for the KDE-based mode estimation...
April 8, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38578851/a-closed-form-pairwise-solution-to-local-non-rigid-structure-from-motion
#34
JOURNAL ARTICLE
Shaifali Parashar, Yuxuan Long, Mathieu Salzmann, Pascal Fua
A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express local, differential constraints between pairs of images, from which the surface normal at any point can be obtained by solving a system of polynomial equations. While this approach is more successful than its counterparts relying on global constraints, the resulting methods face two main problems: First, most of the equation systems they formulate are of high degree and must be solved using computationally expensive polynomial solvers...
April 4, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38568772/spectralgpt-spectral-remote-sensing-foundation-model
#35
JOURNAL ARTICLE
Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, Jocelyn Chanussot
The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT)...
April 3, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38564351/scalable-video-object-segmentation-with-identification-mechanism
#36
JOURNAL ARTICLE
Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang
This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS). Previous VOS methods decode features with a single positive object, limiting the learning of multi-object representation as they must match and segment each target separately under multi-object scenarios. Additionally, earlier techniques catered to specific application objectives and lacked the flexibility to fulfill different speed-accuracy requirements. To address these problems, we present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST)...
April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38564350/an-audio-visual-speech-separation-model-inspired-by-cortico-thalamo-cortical-circuits
#37
JOURNAL ARTICLE
Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu
Audio-visual approaches involving visual inputs have laid the foundation for recent progress in speech separation. However, the optimization of the concurrent usage of auditory and visual inputs is still an active research area. Inspired by the cortico-thalamo-cortical circuit, in which the sensory processing mechanisms of different modalities modulate one another via the non-lemniscal sensory thalamus, we propose a novel cortico-thalamo-cortical neural network (CTCNet) for audio-visual speech separation (AVSS)...
April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38564349/nerf-texture-synthesizing-neural-radiance-field-textures
#38
JOURNAL ARTICLE
Yi-Hua Huang, Yan-Pei Cao, Yu-Kun Lai, Ying Shan, Lin Gao
Texture synthesis is a fundamental problem in computer graphics that would benefit various applications. Existing methods are effective in handling 2D image textures. In contrast, many real-world textures contain meso-structure in the 3D geometry space, such as grass, leaves, and fabrics, which cannot be effectively modeled using only 2D image textures. We propose a novel texture synthesis method with Neural Radiance Fields (NeRF) to capture and synthesize textures from given multi-view images. In the proposed NeRF texture representation, a scene with fine geometric details is disentangled into the meso-structure textures and the underlying base shape...
April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38564348/transformer-based-pluralistic-image-completion-with-reduced-information-loss
#39
JOURNAL ARTICLE
Qiankun Liu, Yuqi Jiang, Zhentao Tan, Dongdong Chen, Ying Fu, Qi Chu, Gang Hua, Nenghai Yu
Transformer based methods have achieved great success in image inpainting recently. However, we find that these solutions regard each pixel as a token, thus suffering from an information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration. 2) They quantize 2563 RGB values to a small number (such as 512) of quantized color values. The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer. To mitigate these issues, we propose a new transformer based framework called "PUT"...
April 2, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38557620/rotation-equivariant-proximal-operator-for-deep-unfolding-methods-in-image-restoration
#40
JOURNAL ARTICLE
Jiahong Fu, Qi Xie, Deyu Meng, Zongben Xu
The deep unfolding approach has attracted significant attention in computer vision tasks, which well connects conventional image processing modeling manners with more recent deep learning techniques. Specifically, by establishing a direct correspondence between algorithm operators at each implementation step and network modules within each layer, one can rationally construct an almost "white box" network architecture with high interpretability. In this architecture, only the predefined component of the proximal operator, known as a proximal network, needs manual configuration, enabling the network to automatically extract intrinsic image priors in a data-driven manner...
April 1, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
journal
journal
34134
2
3
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.