Most recent papers in the journal IEEE Transactions on Pattern Analysis and Machine Intelligence

#1

JOURNAL ARTICLE

Fast Building Instance Proxy Reconstruction for Large Urban Scenes.

Jianwei Guo, Haobo Qin, Yinchang Zhou, Xin Chen, Liangliang Nan, Hui Huang

Digitalization of large-scale urban scenes (in particular buildings) has been a long-standing open problem, which attributes to the challenges in data acquisition, such as incomplete scene coverage, lack of semantics, low efficiency, and low reliability in path planning. In this paper, we address these challenges in urban building reconstruction from aerial images, and we propose an effective workflow and a few novel algorithms for efficient 3D building instance proxy reconstruction for large urban scenes. Specifically, we propose a novel learning-based approach to instance segmentation of urban buildings from aerial images followed by a voting-based algorithm to fuse the multi-view instance information to a sparse point cloud (reconstructed using a standard Structure from Motion pipeline)...

38625775

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#2

JOURNAL ARTICLE

Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.

Ruonan Zhang, Gaoyun An, Yiqing Hao, Dapeng Oliver Wu

Scene Graph Generation (SGG) aims to detect visual relationships in an image. However, due to long-tailed bias, SGG is far from practical. Most methods depend heavily on the assistance of statistics co-occurrence to generate a balanced dataset, so they are dataset-specific and easily affected by noises. The fundamental cause is that SGG is simplified as a classification task instead of a reasoning task, thus the ability capturing the fine-grained details is limited and the difficulty in handling ambiguity is increased...

38625774

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#3

JOURNAL ARTICLE

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models.

Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models...

38625773

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#4

JOURNAL ARTICLE

On the Benefit of Optimal Transport for Curriculum Reinforcement Learning.

Pascal Klink, Carlo D'Eramo, Jan Peters, Joni Pajarinen

Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequence of learning tasks, starting from easy ones and subsequently increasing their difficulty. Although the potential of curricula in RL has been clearly shown in various works, it is less clear how to generate them for a given learning environment, resulting in various methods aiming to automate this task. In this work, we focus on framing curricula as interpolations between task distributions, which has previously been shown to be a viable approach to CRL...

38625772

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#5

JOURNAL ARTICLE

Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data.

Yukun Cao, Yuan Feng, Hairu Wang, Xike Xie, S Kevin Zhou

Recently, there has been a trend of designing neural data structures to go beyond handcrafted data structures by leveraging patterns of data distributions for better accuracy and adaptivity. Sketches are widely used data structures in real-time web analysis, network monitoring, and self-driving to estimate item frequencies of data streams within limited space. However, existing sketches have not fully exploited the patterns of the data stream distributions, making it challenging to tightly couple them with neural networks that excel at memorizing pattern information...

38619951

April 15, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#6

JOURNAL ARTICLE

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces.

Pattarawat Chormai, Jan Herrmann, Klaus-Robert Muller, Gregoire Montavon

Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e...

38607718

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#7

JOURNAL ARTICLE

Deep Learning Methods for Calibrated Photometric Stereo and Beyond.

Yakun Ju, Kin-Man Lam, Wuyuan Xie, Huiyu Zhou, Junyu Dong, Boxin Shi

Photometric stereo recovers the surface normals of an object from multiple images with varying shading cues, i.e., modeling the relationship between surface orientation and intensity at each pixel. Photometric stereo prevails in superior per-pixel resolution and fine reconstruction details. However, it is a complicated problem because of the non-linear relationship caused by non-Lambertian surface reflectance. Recently, various deep learning methods have shown a powerful ability in the context of photometric stereo against non-Lambertian surfaces...

38607717

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#8

JOURNAL ARTICLE

RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion.

Haowen Wang, Zhengping Che, Yufan Yang, Mingyuan Wang, Zhiyuan Xu, Xiuquan Qiao, Mengshi Qi, Feifei Feng, Jian Tang

Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of incomplete depth maps imposes significant challenges for subsequent vision applications, prompting the development of numerous depth completion techniques to mitigate this problem...

38607716

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#9

JOURNAL ARTICLE

Ensemble Predictors: Possibilistic Combination of Conformal Predictors for Multivariate Time Series Classification.

Andrea Campagner, Marilia Barandas, Duarte Folgado, Hugo Gamboa, Federico Cabitza

In this article we propose a conceptual framework to study ensembles of conformal predictors (CP), that we call Ensemble Predictors (EP). Our approach is inspired by the application of imprecise probabilities in information fusion. Based on the proposed framework, we study, for the first time in the literature, the theoretical properties of CP ensembles in a general setting, by focusing on simple and commonly used possibilistic combination rules. We also illustrate the applicability of the proposed methods in the setting of multivariate time-series classification, showing that these methods provide better performance (in terms of both robustness, conservativeness, accuracy and running time) than both standard classification algorithms and other combination rules proposed in the literature, on a large set of benchmarks from the UCR time series archive...

38607715

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#10

JOURNAL ARTICLE

Sensitivity-Aware Density Estimation in Multiple Dimensions.

Aleix Boquet-Pujadas, Pol Del Aguila Pla, Michael Unser

We formulate an optimization problem to estimate probability densities in the context of multidimensional problems that are sampled with uneven probability. It considers detector sensitivity as an heterogeneous density and takes advantage of the computational speed and flexible boundary conditions offered by splines on a grid. We choose to regularize the Hessian of the spline via the nuclear norm to promote sparsity. As a result, the method is spatially adaptive and stable against the choice of the regularization parameter, which plays the role of the bandwidth...

38607714

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#11

JOURNAL ARTICLE

Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds.

Shikun Li, Xiaobo Xia, Jiankang Deng, Shiming Gey, Tongliang Liu

Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a powerful tool to tackle the label noise. In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent. However, due to the high complexity of annotator- and instance-dependent transition matrices (AIDTM), annotation sparsity, which means each annotator only labels a tiny part of instances, makes modeling AIDTM very challenging...

38607713

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#12

JOURNAL ARTICLE

Bridging Actions: Generate 3D Poses and Shapes In-Between Photos.

Wen-Li Wei, Jen-Chun Lin

Generating realistic 3D human motion has been a fundamental goal of the game/animation industry. This work presents a novel transition generation technique that can bridge the actions of people in the foreground by generating 3D poses and shapes in-between photos, allowing 3D animators/novice users to easily create/edit 3D motions. To achieve this, we propose an adaptive motion network (ADAM-Net) that effectively learns human motion from masked action sequences to generate kinematically compliant 3D poses and shapes in-between given temporally-sparse photos...

38607712

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#13

JOURNAL ARTICLE

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning.

Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen

3D dense captioning requires a model to translate its understanding of an input 3D scene into several captions associated with different object regions. Existing methods adopt a sophisticated "detect-then-describe" pipeline, which builds explicit relation modules upon a 3D detector with numerous hand-crafted components. While these methods have achieved initial success, the cascade pipeline tends to accumulate errors because of duplicated and inaccurate box estimations and messy 3D scenes. In this paper, we first propose Vote2Cap-DETR, a simple-yet-effective transformer framework that decouples the decoding process of caption generation and object localization through parallel decoding...

38607711

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#14

JOURNAL ARTICLE

Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition.

Yukun Zuo, Hantao Yao, Liansheng Zhuang, Changsheng Xu

Audio-visual video recognition (AVVR) aims to integrate audio and visual clues to categorize videos accurately. While existing methods train AVVR models using provided datasets and achieve satisfactory results, they struggle to retain historical class knowledge when confronted with new classes in real-world situations. Currently, there are no dedicated methods for addressing this problem, so this paper concentrates on exploring Class Incremental Audio-Visual Video Recognition (CIAVVR). For CIAVVR, since both stored data and learned model of past classes contain historical knowledge, the core challenge is how to capture past data knowledge and past model knowledge to prevent catastrophic forgetting...

38607710

April 12, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#15

JOURNAL ARTICLE

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction.

Lei Guan, Dongsheng Li, Yanqi Shi, Jian Meng

In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w...

38602857

April 11, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#16

JOURNAL ARTICLE

Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search.

Tianyu Huang, Haoang Li, Liangzu Peng, Yinlong Liu, Yun-Hui Liu

Estimating the rigid transformation with 6 degrees of freedom based on a putative 3D correspondence set is a crucial procedure in point cloud registration. Existing correspondence identification methods usually lead to large outlier ratios (> 95% is common), underscoring the significance of robust registration methods. Many researchers turn to parameter search-based strategies (e.g., Branch-and-Bround) for robust registration. Although related methods show high robustness, their efficiency is limited to the high-dimensional search space...

38602856

April 11, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#17

JOURNAL ARTICLE

On the Consistency and Large-Scale Extension of Multiple Kernel Clustering.

Weixuan Liang, Chang Tang, Xinwang Liu, Yong Liu, Jiyuan Liu, En Zhu, Kunlun He

Existing multiple kernel clustering (MKC) algorithms have two ubiquitous problems. From the theoretical perspective, most MKC algorithms lack sufficient theoretical analysis, especially the consistency of learned parameters, such as the kernel weights. From the practical perspective, the high complexity makes MKC unable to handle large-scale datasets. This paper tries to address the above two issues. We first make a consistency analysis of an influential MKC method named Simple Multiple Kernel k-Means (SimpleMKKM)...

38602855

April 11, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#18

JOURNAL ARTICLE

PERF: Panoramic Neural Radiance Field from a Single Panorama.

Guangcong Wang, Peng Wang, Zhaoxi Chen, Wenping Wang, Chen Change Loy, Ziwei Liu

Neural Radiance Field (NeRF) has achieved substantial progress in novel view synthesis given multi-view images. Recently, some works have attempted to train a NeRF from a single image with 3D priors. They mainly focus on a limited field of view with a few occlusions, which greatly limits their scalability to real-world 360-degree panoramic scenarios with large-size occlusions. In this paper, we present PERF, a 360-degree novel view synthesis framework that trains a panoramic neural radiance field from a single panorama...

38598389

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#19

JOURNAL ARTICLE

Hybrid Open-set Segmentation with Synthetic Negative Data.

Matej Grcic, Sinisa Segvic

Open-set segmentation can be conceived by complementing closed-set classification with anomaly detection. Many of the existing dense anomaly detectors operate through generative modelling of regular data or by discriminating with respect to negative data. These two approaches optimize different objectives and therefore exhibit different failure modes. Consequently, we propose a novel anomaly score that fuses generative and discriminative cues. Our score can be implemented by upgrading any closed-set segmentation model with dense estimates of dataset posterior and unnormalized data likelihood...

38598388

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#20

JOURNAL ARTICLE

Bayesian optimization for sparse neural networks with trainable activation functions.

Mohamed Fakhfakh, Lotfi Chaari

In the literature on deep neural networks, there is considerable interest in developing activation functions that can enhance neural network performance. In recent years, there has been renewed scientific interest in proposing activation functions that can be trained throughout the learning process, as they appear to improve network performance, especially by reducing overfitting. In this paper, we propose a trainable activation function whose parameters need to be estimated. A fully Bayesian model is developed to automatically estimate from the learning data both the model weights and activation function parameters...

38598387

April 10, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

Use the journals feature with a free QxMD account.

IEEE Transactions on Pattern Analysis and Machine Intelligence

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips