journal
Journals IEEE Transactions on Pattern A...

IEEE Transactions on Pattern Analysis and Machine Intelligence

https://read.qxmd.com/read/38687660/ood-control-generalizing-control-in-unseen-environments
#1
JOURNAL ARTICLE
Nanyang Ye, Zhaoyu Zeng, Jundong Zhou, Lin Zhu, Yuxiao Duan, Yifei Wu, Junqi Wu, Haoqi Zeng, Qinying Gu, Xinbing Wang, Chenghu Zhou
Generalizing out-of-distribution (OoD) is critical but challenging in real applications such as unmanned aerial vehicle (UAV) flight control. Previous machine learning-based control has shown promise in dealing with complex real-world environments but suffers huge performance degradation facing OoD scenarios, posing risks to the stability and safety of UAVs. In this paper, we found that the introduced random noises during training surprisingly yield theoretically guaranteed performances via a proposed functional optimization framework...
April 30, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38687659/zero-shot-neural-architecture-search-challenges-solutions-and-opportunities
#2
JOURNAL ARTICLE
Guihong Li, Duc Hoang, Kartikeya Bhardwaj, Ming Lin, Zhangyang Wang, Radu Marculescu
Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate NAS from the expensive training process. The key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters. The proxies proposed so far are usually inspired by recent progress in theoretical understanding of deep learning and have shown great potential on several datasets and NAS benchmarks. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches, with an emphasis on their hardware awareness...
April 30, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38683716/a-review-of-state-of-the-art-mixed-precision-neural-network-frameworks
#3
JOURNAL ARTICLE
Mariam Rakka, Mohammed E Fouda, Pramod Khargonekar, Fadi Kurdahi
Mixed-precision Deep Neural Networks (DNNs) provide an efficient solution for hardware deployment, especially under resource constraints, while maintaining model accuracy. Identifying the ideal bit precision for each layer, however, remains a challenge given the vast array of models, datasets, and quantization schemes, leading to an expansive search space. Recent literature has addressed this challenge, resulting in several promising frameworks. This paper offers a comprehensive overview of the standard quantization classifications prevalent in existing studies...
April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38683715/uncertainty-boosted-robust-video-activity-anticipation
#4
JOURNAL ARTICLE
Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang
Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evolution process and dynamic correlation in event labels, has been somehow ignored. This reduces the model generalization ability and deep understanding on video content, leading to serious error accumulation and degraded performance. In this paper, we address the uncertainty learning problem and propose an uncertainty-boosted robust video activity anticipation framework, which generates uncertainty values to indicate the credibility of the anticipation results...
April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38683714/learning-to-holistically-detect-bridges-from-large-size-vhr-remote-sensing-imagery
#5
JOURNAL ARTICLE
Yansheng Li, Junwei Luo, Yongjun Zhang, Yihua Tan, Jin-Gang Yu, Song Bai
Bridge detection in remote sensing images (RSIs) plays a crucial role in various applications, but it poses unique challenges compared to the detection of other objects. In RSIs, bridges exhibit considerable variations in terms of their spatial scales and aspect ratios. Therefore, to ensure the visibility and integrity of bridges, it is essential to perform holistic bridge detection in large-size very-high-resolution (VHR) RSIs. However, the lack of datasets with large-size VHR RSIs limits the deep learning algorithms' performance on bridge detection...
April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38683713/consistency-aware-anchor-pyramid-network-for-crowd-localization
#6
JOURNAL ARTICLE
Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Anton van den Hengel, Nicu Sebe, Ming-Hsuan Yang, Qingming Huang
Crowd localization aims to predict the positions of humans in images of crowded scenes. While existing methods have made significant progress, two primary challenges remain: (i) a fixed number of evenly distributed anchors can cause excessive or insufficient predictions across regions in an image with varying crowd densities, and (ii) ranking inconsistency of predictions between the testing and training phases leads to the model being sub-optimal in inference. To address these issues, we propose a Consistency-Aware Anchor Pyramid Network (CAAPN) comprising two key components: an Adaptive Anchor Generator (AAG) and a Localizer with Augmented Matching (LAM)...
April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38683712/searching-to-exploit-memorization-effect-in-deep-learning-with-noisy-labels
#7
JOURNAL ARTICLE
Hansi Yang, Quanming Yao, Bo Han, James T Kwok
Sample selection approaches are popular in robust learning from noisy labels. However, how to control the selection process properly so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success of automated machine learning (AutoML), we propose to control the selection process by bi-level optimization. Specifically, we parameterize the selection process by exploiting the general patterns of the memorization effect in the upper-level, and then update these parameters using predicting accuracy obtained from model training in the lower-level...
April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38683711/a-versatile-framework-for-multi-scene-person-re-identification
#8
JOURNAL ARTICLE
Wei-Shi Zheng, Junkai Yan, Yi-Xing Peng
Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges...
April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38669166/hierarchical-recognizing-vector-graphics-and-a-new-chart-based-vector-graphics-dataset
#9
JOURNAL ARTICLE
Shuguang Dou, Xinyang Jiang, Lu Liu, Lu Ying, Caihua Shan, Yifei Shen, Xuanyi Dong, Yun Wang, Dongsheng Li, Cairong Zhao
The conventional approach to image recognition has been based on raster graphics, which can suffer from aliasing and information loss when scaled up or down. In this paper, we propose a novel approach that leverages the benefits of vector graphics for object localization and classification. Our method, called YOLaT (You Only Look at Text), takes the textual document of vector graphics as input, rather than rendering it into pixels. YOLaT builds multi-graphs to model the structural and spatial information in vector graphics and utilizes a dual-stream graph neural network (GNN) to detect objects from the graph...
April 26, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38669165/structure-guided-image-completion-with-image-level-and-object-level-semantic-discriminators
#10
JOURNAL ARTICLE
Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Sohrab Amirghodsi, Yuqian Zhou, Jiebo Luo
Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects...
April 26, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38662568/enhancing-video-language-representations-with-structural-spatio-temporal-alignment
#11
JOURNAL ARTICLE
Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan
While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning, under-modeling of temporal dynamics, detached video-language view. In this work, we target enhancing VLMs with a fine-grained structural spatio-temporal alignment learning method (namely Finsta). First of all, we represent the input texts and videos with fine-grained scene graph (SG) structures, both of which are further unified into a holistic SG (HSG) for bridging two modalities...
April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38662567/appearance-based-gaze-estimation-with-deep-learning-a-review-and-benchmark
#12
JOURNAL ARTICLE
Yihua Cheng, Haofei Wang, Yiwei Bao, Feng Lu
Human gaze provides valuable information on human focus and intentions, making it a crucial area of research. Recently, deep learning has revolutionized appearance-based gaze estimation. However, due to the unique features of gaze estimation research, such as the unfair comparison between 2D gaze positions and 3D gaze vectors and the different pre-processing and post-processing methods, there is a lack of a definitive guideline for developing deep learning-based gaze estimation algorithms. In this paper, we present a systematic review of the appearance-based gaze estimation methods using deep learning...
April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38662566/perceptual-video-coding-for-machines-via-satisfied-machine-ratio-modeling
#13
JOURNAL ARTICLE
Qi Zhang, Shanshe Wang, Xinfeng Zhang, Chuanmin Jia, Zhao Wang, Siwei Ma, Wen Gao
Video Coding for Machines (VCM) aims to compress visual signals for machine analysis. However, existing methods only consider a few machines, neglecting the majority. Moreover, the machine's perceptual characteristics are not leveraged effectively, resulting in suboptimal compression efficiency. To overcome these limitations, this paper introduces Satisfied Machine Ratio (SMR), a metric that statistically evaluates the perceptual quality of compressed images and videos for machines by aggregating satisfaction scores from them...
April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38662565/latency-aware-unified-dynamic-networks-for-efficient-image-recognition
#14
JOURNAL ARTICLE
Yizeng Han, Zeyu Liu, Zhihang Yuan, Yifan Pu, Chaofei Wang, Shiji Song, Gao Huang
Dynamic computation has emerged as a promising strategy to improve the inference efficiency of deep networks. It allows selective activation of various computing units, such as layers or convolution channels, or adaptive allocation of computation to highly informative spatial regions in image features, thus significantly reducing unnecessary computations conditioned on each input sample. However, the practical efficiency of dynamic models does not always correspond to theoretical outcomes. This discrepancy stems from three key challenges: 1) The absence of a unified formulation for various dynamic inference paradigms, owing to the fragmented research landscape; 2) The undue emphasis on algorithm design while neglecting scheduling strategies, which are critical for optimizing computational performance and resource utilization in CUDA-enabled GPU settings; and 3) The cumbersome process of evaluating practical latency, as most existing libraries are tailored for static operators...
April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38656859/what-makes-deviant-places
#15
JOURNAL ARTICLE
Jin-Hwi Park, Young-Jae Park, Ilyung Cheong, Junoh Lee, Young Eun Huh, Hae-Gon Jeon
Urban safety plays an essential role in the quality of citizens' lives and in the sustainable development of cities. In recent years, researchers have attempted to apply machine learning techniques to identify the role of location-specific attributes in the development of urban safety. However, existing studies have mainly relied on limited images (e.g., map images, single- or four-directional images) of areas based on a relatively large geographical unit and have narrowly focused on severe crime rates, which limits their predictive performance and implications for urban safety...
April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38656858/novel-uncertainty-quantification-through-perturbation-assisted-sample-synthesis
#16
JOURNAL ARTICLE
Yifei Liu, Rex Shen, Xiaotong Shen
This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy...
April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38656857/learning-graph-attentions-via-replicator-dynamics
#17
JOURNAL ARTICLE
Bo Jiang, Ziyan Zhang, Sheng Ge, Beibei Wang, Xiao Wang, Jin Tang
Graph Attention (GA) which aims to learn the attention coefficients for graph edges has achieved impressive performance in GNNs on many graph learning tasks. However, existing GAs are usually learned based on edges' (or connected nodes') features which fail to fully capture the rich structural information of edges. Some recent research attempts to incorporate the structural information into GA learning but how to fully exploit them in GA learning is still a challenging problem. To address this challenge, in this work, we propose to leverage a new Replicator Dynamics model for graph attention learning, termed Graph Replicator Attention (GRA)...
April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38656856/a-survey-on-efficient-vision-transformers-algorithms-techniques-and-performance-benchmarking
#18
JOURNAL ARTICLE
Lorenzo Papa, Paolo Russo, Irene Amerini, Luping Zhou
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tackle computer vision applications. Their main feature is the capacity to extract global information through the self-attention mechanism, outperforming earlier convolutional neural networks. However, ViT deployment and performance have grown steadily with their size, number of trainable parameters, and operations. Furthermore, self-attention's computational and memory cost quadratically increases with the image resolution...
April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38656855/neuralrecon-real-time-coherent-3d-scene-reconstruction-from-monocular-video
#19
JOURNAL ARTICLE
Xi Chen, Jiaming Sun, Yiming Xie, Hujun Bao, Xiaowei Zhou
We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction...
April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38652619/cross-modal-hashing-method-with-properties-of-hamming-space-a-new-perspective
#20
JOURNAL ARTICLE
Zhikai Hu, Yiu-Ming Cheung, Mengke Li, Weichao Lan
Cross-modal hashing (CMH) has attracted considerable attention in recent years. Almost all existing CMH methods primarily focus on reducing the modality gap and semantic gap, i.e., aligning multi-modal features and their semantics in Hamming space, without taking into account the space gap, i.e., difference between the real number space and the Hamming space. In fact, the space gap can affect the performance of CMH methods. In this paper, we analyze and demonstrate how the space gap affects the existing CMH methods, which therefore raises two problems: solution space compression and loss function oscillation...
April 23, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
journal
journal
34134
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.