Papers with the keyword scene recognition (Page 2)

#21

JOURNAL ARTICLE

Consistency-Aware Anchor Pyramid Network for Crowd Localization.

Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Anton van den Hengel, Nicu Sebe, Ming-Hsuan Yang, Qingming Huang

Crowd localization aims to predict the positions of humans in images of crowded scenes. While existing methods have made significant progress, two primary challenges remain: (i) a fixed number of evenly distributed anchors can cause excessive or insufficient predictions across regions in an image with varying crowd densities, and (ii) ranking inconsistency of predictions between the testing and training phases leads to the model being sub-optimal in inference. To address these issues, we propose a Consistency-Aware Anchor Pyramid Network (CAAPN) comprising two key components: an Adaptive Anchor Generator (AAG) and a Localizer with Augmented Matching (LAM)...

38683713

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#22

JOURNAL ARTICLE

A Versatile Framework for Multi-Scene Person Re-Identification.

Wei-Shi Zheng, Junkai Yan, Yi-Xing Peng

Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges...

38683711

April 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#23

JOURNAL ARTICLE

Adopting Graph Neural Networks to Analyze Human-Object Interactions for Inferring Activities of Daily Living.

Peng Su, Dejiu Chen

Human Activity Recognition (HAR) refers to a field that aims to identify human activities by adopting multiple techniques. In this field, different applications, such as smart homes and assistive robots, are introduced to support individuals in their Activities of Daily Living (ADL) by analyzing data collected from various sensors. Apart from wearable sensors, the adoption of camera frames to analyze and classify ADL has emerged as a promising trend for achieving the identification and classification of ADL...

38676184

April 17, 2024: Sensors

#24

JOURNAL ARTICLE

Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators.

Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Sohrab Amirghodsi, Yuqian Zhou, Jiebo Luo

Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects...

38669165

April 26, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#25

JOURNAL ARTICLE

Probing the content of affective semantic memory following caregiving-related early adversity.

Anna Vannucci, Andrea Fields, Paul A Bloom, Nicolas L Camacho, Tricia Choy, Amaesha Durazi, Syntia Hadis, Chelsea Harmon, Charlotte Heleniak, Michelle VanTieghem, Mary Dozier, Michael P Milham, Simona Ghetti, Nim Tottenham

Cognitive science has demonstrated that we construct knowledge about the world by abstracting patterns from routinely encountered experiences and storing them as semantic memories. This preregistered study tested the hypothesis that caregiving-related early adversities (crEAs) shape affective semantic memories to reflect the content of those adverse interpersonal-affective experiences. We also tested the hypothesis that because affective semantic memories may continue to evolve in response to later-occurring positive experiences, child-perceived attachment security will inform their content...

38664866

April 25, 2024: Developmental Science

#26

JOURNAL ARTICLE

Accuracy and efficiency stereo matching network with adaptive feature modulation.

Sen Lin, Xinxin Zhuo, Baozhen Qi

Feature enhancement plays a crucial role in improving the quality and discriminative power of features used in matching tasks. By enhancing the informative and invariant aspects of features, the matching process becomes more robust and reliable, enabling accurate predictions even in challenging scenarios, such as occlusion and reflection in stereo matching. In this paper, we propose an end-to-end dual-dimension feature modulation network called DFMNet to address the issue of mismatches in interference areas...

38662662

2024: PloS One

#27

JOURNAL ARTICLE

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment.

Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan

While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning, under-modeling of temporal dynamics, detached video-language view. In this work, we target enhancing VLMs with a fine-grained structural spatio-temporal alignment learning method (namely Finsta). First of all, we represent the input texts and videos with fine-grained scene graph (SG) structures, both of which are further unified into a holistic SG (HSG) for bridging two modalities...

38662568

April 25, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#28

JOURNAL ARTICLE

The time course of encoding specific and gist episodic memory representations among young and older adults.

Nathaniel R Greene, Moshe Naveh-Benjamin

How rapidly can we encode the specifics versus the gist of episodic memories? Competing theories have opposing answers, but empirical tests are based primarily on tasks of item memory. Few studies have addressed this question with tasks measuring the binding of event components (e.g., a person and a location), which forms the core of episodic memory. None of these prior studies included older adults, whose episodic memories are less specific in nature. We addressed this critical gap by presenting face-scene pairs (e...

38661636

April 25, 2024: Journal of Experimental Psychology. General

#29

JOURNAL ARTICLE

NeuralRecon: Real-Time Coherent 3D Scene Reconstruction from Monocular Video.

Xi Chen, Jiaming Sun, Yiming Xie, Hujun Bao, Xiaowei Zhou

We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction...

38656855

April 24, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#30

JOURNAL ARTICLE

Exploring the Semantic-Inconsistency Effect in Scenes Using a Continuous Measure of Linguistic-Semantic Similarity.

Claudia Damiano, Maarten Leemans, Johan Wagemans

Viewers use contextual information to visually explore complex scenes. Object recognition is facilitated by exploiting object-scene relations (which objects are expected in a given scene) and object-object relations (which objects are expected because of the occurrence of other objects). Semantically inconsistent objects deviate from these expectations, so they tend to capture viewers' attention (the semantic-inconsistency effect ). Some objects fit the identity of a scene more or less than others, yet semantic inconsistencies have hitherto been operationalized as binary (consistent vs...

38652604

April 23, 2024: Psychological Science

#31

JOURNAL ARTICLE

DeepMesh: Differentiable Iso-Surface Extraction.

Benoit Guillard, Edoardo Remelli, Artem Lukoianov, Pierre Yvernay, Stephan R Richter, Timur Bagautdinov, Pierre Baque, Pascal Fua

Geometric Deep Learning has recently made striking progress with the advent of continuous deep implicit fields. They allow for detailed modeling of watertight surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable parameterization that is unlimited in resolution. Unfortunately, these methods are often unsuitable for applications that require an explicit mesh-based surface representation because converting an implicit field to such a representation relies on the Marching Cubes algorithm, which cannot be differentiated with respect to the underlying implicit field...

38648137

April 22, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#32

JOURNAL ARTICLE

A multi-featured expression recognition model incorporating attention mechanism and object detection structure for psychological problem diagnosis.

Xiufeng Zhang, Bingyi Li, Guobin Qi

Expression is the main method for judging the emotional state and psychological condition of the human body, and the prediction of changes in facial expressions can effectively determine the mental health of a person, thus avoiding serious psychological or psychiatric disorders due to early negligence. From a computer vision perspective, most researchers have focused on studying facial expression analysis, and in some cases, body posture is also considered. However their performance is more limited under unconstrained natural conditions, which requires more information to be used in human emotion analysis...

38641188

April 18, 2024: Physiology & Behavior

#33

JOURNAL ARTICLE

Macaque claustrum, pulvinar and putative dorsolateral amygdala support the cross-modal association of social audio-visual stimuli based on meaning.

Mathilda Froesel, Maëva Gacoin, Simon Clavagnier, Marc Hauser, Quentin Goudard, Suliann Ben Hamed

Social communication draws on several cognitive functions such as perception, emotion recognition and attention. The association of audio-visual information is essential to the processing of species-specific communication signals. In this study, we use functional magnetic resonance imaging in order to identify the subcortical areas involved in the cross-modal association of visual and auditory information based on their common social meaning. We identified three subcortical regions involved in audio-visual processing of species-specific communicative signals: the dorsolateral amygdala, the claustrum and the pulvinar...

38637993

April 18, 2024: European Journal of Neuroscience

#34

JOURNAL ARTICLE

Beyond visual integration: sensitivity of the temporal-parietal junction for objects, places, and faces.

Johannes Rennig, Christina Langenberger, Hans-Otto Karnath

One important role of the TPJ is the contribution to perception of the global gist in hierarchically organized stimuli where individual elements create a global visual percept. However, the link between clinical findings in simultanagnosia and neuroimaging in healthy subjects is missing for real-world global stimuli, like visual scenes. It is well-known that hierarchical, global stimuli activate TPJ regions and that simultanagnosia patients show deficits during the recognition of hierarchical stimuli and real-world visual scenes...

38637870

April 18, 2024: Behavioral and Brain Functions: BBF

#35

JOURNAL ARTICLE

Mediating sequential turn-on and turn-off fluorescence signals for discriminative detection of Ag + and Hg 2+ via readily available CdSe quantum dots.

Rong Wang, Zi Yi Xu, Ting Li, Nian Bing Li, Hong Qun Luo

Realizing the accurate recognition and quantification of heavy metal ions is pivotal but challenging in the environmental, biological, and physiological science fields. In this work, orange fluorescence emitting quantum dots (OQDs) have been facilely synthesized by one-step method. The participation of silver ion (Ag+ ) can evoke the unique aggregation-induced emission (AIE) of OQDs, resulting in prominent fluorescence enhancement, which is scarcely reported previously. Moreover, the Ag+ -triggered turn-on fluorescence can be continuously shut down by mercury ion (Hg2+ )...

38636427

April 13, 2024: Spectrochimica Acta. Part A, Molecular and Biomolecular Spectroscopy

#36

JOURNAL ARTICLE

Multiscale apple recognition method based on improved CenterNet.

Han Zhou

Traditional apple-picking robots are unable to detect apples in real-time in complex environments. In order to improve detection efficiency, a fast CenterNet apple recognition method for multiple apple targets in dense scenes is proposed. This method can quickly and accurately identify multiple apple targets in dense scenes. The backbone network mainly consists of resnet-44 fully convolutional network, region of interest network (RPN), and region of interest (ROI). The experimental results show that the improved YoloV5 network model has a higher recognition accuracy of 94...

38633658

April 15, 2024: Heliyon

#37

JOURNAL ARTICLE

Weakly supervised temporal action localization with actionness-guided false positive suppression.

Zhilin Li, Zilei Wang, Qinying Liu

Weakly supervised temporal action localization aims to locate the temporal boundaries of action instances in untrimmed videos using video-level labels and assign them the corresponding action category. Generally, it is solved by a pipeline called "localization-by-classification", which finds the action instances by classifying video snippets. However, since this approach optimizes the video-level classification objective, the generated activation sequences often suffer interference from class-related scenes, resulting in a large number of false positives in the prediction results...

38626617

April 15, 2024: Neural Networks: the Official Journal of the International Neural Network Society

#38

JOURNAL ARTICLE

Fast Building Instance Proxy Reconstruction for Large Urban Scenes.

Jianwei Guo, Haobo Qin, Yinchang Zhou, Xin Chen, Liangliang Nan, Hui Huang

Digitalization of large-scale urban scenes (in particular buildings) has been a long-standing open problem, which attributes to the challenges in data acquisition, such as incomplete scene coverage, lack of semantics, low efficiency, and low reliability in path planning. In this paper, we address these challenges in urban building reconstruction from aerial images, and we propose an effective workflow and a few novel algorithms for efficient 3D building instance proxy reconstruction for large urban scenes. Specifically, we propose a novel learning-based approach to instance segmentation of urban buildings from aerial images followed by a voting-based algorithm to fuse the multi-view instance information to a sparse point cloud (reconstructed using a standard Structure from Motion pipeline)...

38625775

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#39

JOURNAL ARTICLE

Bridging Visual and Textual Semantics: Towards Consistency for Unbiased Scene Graph Generation.

Ruonan Zhang, Gaoyun An, Yiqing Hao, Dapeng Oliver Wu

Scene Graph Generation (SGG) aims to detect visual relationships in an image. However, due to long-tailed bias, SGG is far from practical. Most methods depend heavily on the assistance of statistics co-occurrence to generate a balanced dataset, so they are dataset-specific and easily affected by noises. The fundamental cause is that SGG is simplified as a classification task instead of a reasoning task, thus the ability capturing the fine-grained details is limited and the difficulty in handling ambiguity is increased...

38625774

April 16, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence

#40

JOURNAL ARTICLE

Figure-ground segmentation based on motion in the archerfish.

Svetlana Volotsky, Ronen Segev

Figure-ground segmentation is a fundamental process in visual perception that involves separating visual stimuli into distinct meaningful objects and their surrounding context, thus allowing the brain to interpret and understand complex visual scenes. Mammals exhibit varying figure-ground segmentation capabilities, ranging from primates that can perform well on figure-ground segmentation tasks to rodents that perform poorly. To explore figure-ground segmentation capabilities in teleost fish, we studied how the archerfish, an expert visual hunter, performs figure-ground segmentation...

38616235

April 15, 2024: Animal Cognition

Use the keywords feature with a free QxMD account.

scene recognition

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips