keyword
https://read.qxmd.com/read/38300778/emotional-video-captioning-with-vision-based-emotion-interpretation-network
#21
JOURNAL ARTICLE
Peipei Song, Dan Guo, Xun Yang, Shengeng Tang, Meng Wang
Effectively summarizing and re-expressing video content by natural languages in a more human-like fashion is one of the key topics in the field of multimedia content understanding. Despite good progress made in recent years, existing efforts usually overlooked the emotions in user-generated videos, thus making the generated sentence a bit boring and soulless. To fill the research gap, this paper presents a novel emotional video captioning framework in which we design a Vision-based Emotion Interpretation Network to effectively capture the emotions conveyed in videos and describe the visual content in both factual and emotional languages...
February 1, 2024: IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society
https://read.qxmd.com/read/38261479/every-problem-every-step-all-in-focus-learning-to-solve-vision-language-problems-with-integrated-attention
#22
JOURNAL ARTICLE
Xianyu Chen, Jinhui Yang, Shi Chen, Louis Wang, Ming Jiang, Qi Zhao
Integrating information from vision and language modalities has sparked interesting applications in the fields of computer vision and natural language processing. Existing methods, though promising in tasks like image captioning and visual question answering, face challenges in understanding real-life issues and offering step-by-step solutions. In particular, they typically limit their scope to solutions with a sequential structure, thus ignoring complex inter-step dependencies. To bridge this gap, we propose a graph-based approach to vision-language problem solving...
January 23, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38249011/images-words-and-imagination-accessible-descriptions-to-support-blind-and-low-vision-art-exploration-and-engagement
#23
JOURNAL ARTICLE
Stacy A Doore, David Istrati, Chenchang Xu, Yixuan Qiu, Anais Sarrazin, Nicholas A Giudice
The lack of accessible information conveyed by descriptions of art images presents significant barriers for people with blindness and low vision (BLV) to engage with visual artwork. Most museums are not able to easily provide accessible image descriptions for BLV visitors to build a mental representation of artwork due to vastness of collections, limitations of curator training, and current measures for what constitutes effective automated captions. This paper reports on the results of two studies investigating the types of information that should be included to provide high-quality accessible artwork descriptions based on input from BLV description evaluators...
January 18, 2024: Journal of Imaging
https://read.qxmd.com/read/38235175/dataset-of-clinical-cases-images-image-labels-and-captions-from-open-access-case-reports-from-pubmed-central-1990-2023
#24
JOURNAL ARTICLE
Mauro Andrés Nievas Offidani, Claudio Augusto Delrieux
This paper details the acquisition, structure and preprocessing of the MultiCaRe Dataset, a multimodal case report dataset which contains data from 75,382 open access PubMed Central articles spanning the period from 1990 to 2023. The dataset includes 96,428 clinical cases, 135,596 images, and their corresponding labels and captions. Data extraction was performed using different APIs and packages such as Biopython, requests, Beautifulsoup, BioC API for PMC and EuropePMC RESTful API. Image labels were created based on the contents of their corresponding captions, by using Spark NLP for Healthcare and manual annotations...
February 2024: Data in Brief
https://read.qxmd.com/read/38227993/contact-lens-sensor-for-ocular-inflammation-monitoring
#25
JOURNAL ARTICLE
Yuqi Shi, Lin Wang, Yubing Hu, Yihan Zhang, Wenhao Le, Guohui Liu, Michael Tomaschek, Nan Jiang, Ali K Yetisen
Contact lens sensors have been emerging as point-of-care devices in recent healthcare developments for ocular physiological condition monitoring and diagnosis. Fluorescence sensing technologies have been widely applied in contact lens sensors due to their accuracy, high sensitivity, and specificity. As ascorbic acid (AA) level in tears is closely related to ocular inflammation, a fluorescent contact lens sensor incorporating a BSA-Au nanocluster (NC) probe is developed for in situ tear AA detection. The NCs are firstly synthesized to obtain a fluorescent probe, which exhibits high reusability through the quench/recover (KMnO4 /AA) process...
January 6, 2024: Biosensors & Bioelectronics
https://read.qxmd.com/read/38203152/enhancing-surveillance-systems-integration-of-object-behavior-and-space-information-in-captions-for-advanced-risk-assessment
#26
JOURNAL ARTICLE
Minseong Jeon, Jaepil Ko, Kyungjoo Cheoi
This paper presents a novel approach to risk assessment by incorporating image captioning as a fundamental component to enhance the effectiveness of surveillance systems. The proposed surveillance system utilizes image captioning to generate descriptive captions that portray the relationship between objects, actions, and space elements within the observed scene. Subsequently, it evaluates the risk level based on the content of these captions. After defining the risk levels to be detected in the surveillance system, we constructed a dataset consisting of [Image-Caption-Danger Score]...
January 3, 2024: Sensors
https://read.qxmd.com/read/38190676/user-unified-semantic-enhancement-with-momentum-contrast-for-image-text-retrieval
#27
JOURNAL ARTICLE
Yan Zhang, Zhong Ji, Di Wang, Yanwei Pang, Xuelong Li
As a fundamental and challenging task in bridging language and vision domains, Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality, and its key challenge is to measure the semantic similarity across different modalities. Although significant progress has been achieved, existing approaches typically suffer from two major limitations: (1) It hurts the accuracy of the representation by directly exploiting the bottom-up attention based region-level features where each region is equally treated...
January 5, 2024: IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society
https://read.qxmd.com/read/38124859/retracted-medical-image-captioning-using-optimized-deep-learning-model
#28
Computational Intelligence And Neuroscience
[This retracts the article DOI: 10.1155/2022/9638438.].
2023: Computational Intelligence and Neuroscience
https://read.qxmd.com/read/38109234/enhancing-visual-grounding-in-vision-language-pre-training-with-position-guided-text-prompts
#29
JOURNAL ARTICLE
Alex Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan
Vision-Language Pre-Training (VLP) has demonstrated remarkable potential in aligning image and text pairs, paving the way for a wide range of cross-modal learning tasks. Nevertheless, we have observed that VLP models often fall short in terms of visual grounding and localization capabilities, which are crucial for many downstream tasks, such as visual reasoning. In response, we introduce a novel Position-guided Text Prompt (PTP) paradigm to bolster the visual grounding abilities of cross-modal models trained with VLP...
December 18, 2023: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38083226/image-captioning-for-the-visually-impaired-and-blind-a-recipe-for-low-resource-languages
#30
JOURNAL ARTICLE
Batyr Arystanbekov, Askat Kuzdeuov, Shakhizat Nurgaliyev, Huseyin Atakan Varol
Visually impaired and blind people often face a range of socioeconomic problems that can make it difficult for them to live independently and participate fully in society. Advances in machine learning pave new venues to implement assistive devices for the visually impaired and blind. In this work, we combined image captioning and text-to-speech technologies to create an assistive device for the visually impaired and blind. Our system can provide the user with descriptive auditory feedback in the Kazakh language on a scene acquired in real-time by a head-mounted camera...
July 2023: Annual International Conference of the IEEE Engineering in Medicine and Biology Society
https://read.qxmd.com/read/38062184/refcap-image-captioning-with-referent-objects-attributes
#31
JOURNAL ARTICLE
Seokmok Park, Joonki Paik
In recent years, significant progress has been made in visual-linguistic multi-modality research, leading to advancements in visual comprehension and its applications in computer vision tasks. One fundamental task in visual-linguistic understanding is image captioning, which involves generating human-understandable textual descriptions given an input image. This paper introduces a referring expression image captioning model that incorporates the supervision of interesting objects. Our model utilizes user-specified object keywords as a prefix to generate specific captions that are relevant to the target object...
December 7, 2023: Scientific Reports
https://read.qxmd.com/read/38048244/protoclip-prototypical-contrastive-language-image-pretraining
#32
JOURNAL ARTICLE
Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Shaoqiu Zheng, Ying Tan, Erjin Zhou
Contrastive language image pretraining (CLIP) has received widespread attention since its learned representations can be transferred well to various downstream tasks. During the training process of the CLIP model, the InfoNCE objective aligns positive image-text pairs and separates negative ones. We show an underlying representation grouping effect during this process: the InfoNCE objective indirectly groups semantically similar representations together via randomly emerged within-modal anchors. Based on this understanding, in this article, prototypical contrastive language image pretraining (ProtoCLIP) is introduced to enhance such grouping by boosting its efficiency and increasing its robustness against the modality gap...
December 4, 2023: IEEE Transactions on Neural Networks and Learning Systems
https://read.qxmd.com/read/38042601/radiology-report-generation-with-medical-knowledge-and-multilevel-image-report-alignment-a-new-method-and-its-verification
#33
JOURNAL ARTICLE
Guosheng Zhao, Zijian Zhao, Wuxian Gong, Feng Li
Medical report generation is an integral part of computer-aided diagnosis aimed at reducing the workload of radiologists and physicians and alerting them of misdiagnosis risks. In general, medical report generation is an image captioning task. Since medical reports have long sequences with data bias, the existing medical report generation models lack medical knowledge and ignore the interaction alignment between the two modalities of reports and images. The current paper attempts to mitigate these deficiencies by proposing an approach based on knowledge enhancement with multilevel alignment (MKMIA)...
December 2023: Artificial Intelligence in Medicine
https://read.qxmd.com/read/38035197/exsclaim-harnessing-materials-science-literature-for-self-labeled-microscopy-datasets
#34
JOURNAL ARTICLE
Eric Schwenker, Weixin Jiang, Trevor Spreadbury, Nicola Ferrier, Oliver Cossairt, Maria K Y Chan
This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electron microscopy dataset containing thousands of keyword-annotated nanostructure images. Moreover, it is demonstrated how a combination of statistical topic modeling and semantic word similarity comparisons can be used to increase the number and variety of keyword annotations on top of the standard annotations from EXSCLAIM! With large-scale imaging datasets constructed from scientific literature, users are well positioned to train neural networks for classification and recognition tasks specific to microscopy-tasks often otherwise inhibited by a lack of sufficient annotated training data...
November 10, 2023: Patterns
https://read.qxmd.com/read/38034836/dense-captioning-and-multidimensional-evaluations-for-indoor-robotic-scenes
#35
JOURNAL ARTICLE
Hua Wang, Wenshuai Wang, Wenhao Li, Hong Liu
The field of human-computer interaction is expanding, especially within the domain of intelligent technologies. Scene understanding, which entails the generation of advanced semantic descriptions from scene content, is crucial for effective interaction. Despite its importance, it remains a significant challenge. This study introduces RGBD2Cap, an innovative method that uses RGBD images for scene semantic description. We utilize a multimodal fusion module to integrate RGB and Depth information for extracting multi-level features...
2023: Frontiers in Neurorobotics
https://read.qxmd.com/read/38018504/the-readability-of-patient-facing-social-media-posts-on-common-otolaryngologic-diagnoses
#36
JOURNAL ARTICLE
Elliot Morse, Eseosa Odigie, Helen Gillespie, Anaïs Rameau
OBJECTIVE: To assess the readability of patient-facing educational information about the most common otolaryngology diagnoses on popular social media platforms. STUDY DESIGN: Cross-sectional study. SETTING: Social media platforms. METHODS: The top 5 otolaryngologic diagnoses were identified from the National Ambulatory Medical Care Survey Database. Facebook, Twitter, TikTok, and Instagram were searched using these terms, and the top 25 patient-facing posts from unique accounts for each search term and poster type (otolaryngologist, other medical professional, layperson) were identified...
November 29, 2023: Otolaryngology—Head and Neck Surgery
https://read.qxmd.com/read/37952385/self-supervised-multi-modal-training-from-uncurated-images-and-reports-enables-monitoring-ai-in-radiology
#37
JOURNAL ARTICLE
Sangjoon Park, Eun Sun Lee, Kyung Sook Shin, Jeong Eun Lee, Jong Chul Ye
The escalating demand for artificial intelligence (AI) systems that can monitor and supervise human errors and abnormalities in healthcare presents unique challenges. Recent advances in vision-language models reveal the challenges of monitoring AI by understanding both visual and textual concepts and their semantic correspondences. However, there has been limited success in the application of vision-language models in the medical domain. Current vision-language models and learning strategies for photographic images and captions call for a web-scale data corpus of image and text pairs which is not often feasible in the medical domain...
November 7, 2023: Medical Image Analysis
https://read.qxmd.com/read/37935806/your-smartphone-could-act-as-a-pulse-oximeter-and-as-a-single-lead-ecg
#38
JOURNAL ARTICLE
Ahsan Mehmood, Asma Sarouji, M Mahboob Ur Rahman, Tareq Y Al-Naffouri
In the post-covid19 era, every new wave of the pandemic causes an increased concern/interest among the masses to learn more about their state of well-being. Therefore, it is the need of the hour to come up with ubiquitous, low-cost, non-invasive tools for rapid and continuous monitoring of body vitals that reflect the status of one's overall health. In this backdrop, this work proposes a deep learning approach to turn a smartphone-the popular hand-held personal gadget-into a diagnostic tool to measure/monitor the three most important body vitals, i...
November 6, 2023: Scientific Reports
https://read.qxmd.com/read/37935698/deeppatent2-a-large-scale-benchmarking-corpus-for-technical-drawing-understanding
#39
JOURNAL ARTICLE
Kehinde Ajayi, Xin Wei, Martin Gryder, Winston Shields, Jian Wu, Shawn M Jones, Michal Kucer, Diane Oyen
Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such as image captioning, which has primarily been carried out on natural images, still struggle to produce accurate and meaningful captions on sketched images often included in scientific and technical documents. The advancement of other tasks such as 3D reconstruction from 2D images requires larger datasets with multiple viewpoints...
November 7, 2023: Scientific Data
https://read.qxmd.com/read/37930907/image-captioning-with-controllable-and-adaptive-length-levels
#40
JOURNAL ARTICLE
Ning Ding, Chaorui Deng, Mingkui Tan, Qing Du, Zhiwei Ge, Qi Wu
Image captioning is one of the fundamental problems of computer vision and has drawn great attention over the years. However, most existing methods in image captioning focus on improving the quality of the image captions, while ignoring the ability of controlling the caption style. In this work, we aim to improve the controllability of image captioning methods, especially, by choosing to describe the image either roughly or in detail. We find this can be achieved by adding a simple length level embedding into existing models, which enables them to generate length-controllable captions describing the image at a specified level of detail, and further improve the diversity...
November 6, 2023: IEEE Transactions on Pattern Analysis and Machine Intelligence
keyword
keyword
168929
2
3
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.