keyword
https://read.qxmd.com/read/38625780/visual-analytics-for-efficient-image-exploration-and-user-guided-image-captioning
#1
JOURNAL ARTICLE
Yiran Li, Junpeng Wang, Prince Aboagye, Chin-Chia Michael Yeh, Yan Zheng, Liang Wang, Wei Zhang, Kwan-Liu Ma
Recent advancements in pre-trained language-image models have ushered in a new era of visual comprehension. Leveraging the power of these models, this paper tackles two issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of data biases within them; (2) the evaluation of image captions and steering of their generation process. On the one hand, by visually examining the captions generated from language-image models for an image dataset, we gain deeper insights into the visual contents, unearthing data biases that may be entrenched within the dataset...
April 16, 2024: IEEE Transactions on Visualization and Computer Graphics
https://read.qxmd.com/read/38611501/application-of-multimodal-transformer-model-in-intelligent-agricultural-disease-detection-and-question-answering-systems
#2
JOURNAL ARTICLE
Yuchun Lu, Xiaoyi Lu, Liping Zheng, Min Sun, Siyu Chen, Baiyan Chen, Tong Wang, Jiming Yang, Chunli Lv
In this study, an innovative approach based on multimodal data and the transformer model was proposed to address challenges in agricultural disease detection and question-answering systems. This method effectively integrates image, text, and sensor data, utilizing deep learning technologies to profoundly analyze and process complex agriculture-related issues. The study achieved technical breakthroughs and provides new perspectives and tools for the development of intelligent agriculture. In the task of agricultural disease detection, the proposed method demonstrated outstanding performance, achieving a precision, recall, and accuracy of 0...
March 28, 2024: Plants (Basel, Switzerland)
https://read.qxmd.com/read/38560155/xrayswingen-automatic-medical-reporting-for-x-ray-exams-with-multimodal-model
#3
JOURNAL ARTICLE
Gilvan Veras Magalhães, Roney L de S Santos, Luis H S Vogado, Anselmo Cardoso de Paiva, Pedro de Alcântara Dos Santos Neto
The importance of radiology in modern medicine is acknowledged for its non-invasive diagnostic capabilities, yet the manual formulation of unstructured medical reports poses time constraints and error risks. This study addresses the common limitation of Artificial Intelligence applications in medical image captioning, which typically focus on classification problems, lacking detailed information about the patient's condition. Despite advancements in AI-generated medical reports that incorporate descriptive details from X-ray images, which are essential for comprehensive reports, the challenge persists...
April 15, 2024: Heliyon
https://read.qxmd.com/read/38547777/hierarchical-medical-image-report-adversarial-generation-with-hybrid-discriminator
#4
JOURNAL ARTICLE
Junsan Zhang, Ming Cheng, Qiaoqiao Cheng, Xiuxuan Shen, Yao Wan, Jie Zhu, Mengxuan Liu
BACKGROUND AND OBJECTIVES: Generating coherent reports from medical images is an important task for reducing doctors' workload. Unlike traditional image captioning tasks, the task of medical image report generation faces more challenges. Current models for generating reports from medical images often fail to characterize some abnormal findings, and some models generate reports with low quality. In this study, we propose a model to generate high-quality reports from medical images. METHODS: In this paper, we propose a model called Hybrid Discriminator Generative Adversarial Network (HDGAN), which combines Generative Adversarial Network (GAN) with Reinforcement Learning (RL)...
March 21, 2024: Artificial Intelligence in Medicine
https://read.qxmd.com/read/38545917/improved-image-caption-rating-datasets-game-and-model
#5
JOURNAL ARTICLE
Andrew Taylor Scott, Lothar D Narins, Anagha Kulkarni, Mar Castanon, Benjamin Kao, Shasta Ihorn, Yue-Ting Siu, Ilmi Yoon
How well a caption fits an image can be difficult to assess due to the subjective nature of caption quality. What is a good caption? We investigate this problem by focusing on image-caption ratings and by generating high quality datasets from human feedback with gamification. We validate the datasets by showing a higher level of inter-rater agreement, and by using them to train custom machine learning models to predict new ratings. Our approach outperforms previous metrics - the resulting datasets are more easily learned and are of higher quality than other currently available datasets for image-caption rating...
April 2023: Extended Abstracts on Human Factors in Computing Systems
https://read.qxmd.com/read/38544059/insights-into-object-semantics-leveraging-transformer-networks-for-advanced-image-captioning
#6
JOURNAL ARTICLE
Deema Abdal Hafeth, Stefanos Kollias
Image captioning is a technique used to generate descriptive captions for images. Typically, it involves employing a Convolutional Neural Network (CNN) as the encoder to extract visual features, and a decoder model, often based on Recurrent Neural Networks (RNNs), to generate the captions. Recently, the encoder-decoder architecture has witnessed the widespread adoption of the self-attention mechanism. However, this approach faces certain challenges that require further research. One such challenge is that the extracted visual features do not fully exploit the available image information, primarily due to the absence of semantic concepts...
March 11, 2024: Sensors
https://read.qxmd.com/read/38539736/style-enhanced-transformer-for-image-captioning-in-construction-scenes
#7
JOURNAL ARTICLE
Kani Song, Linlin Chen, Hengyou Wang
Image captioning is important for improving the intelligence of construction projects and assisting managers in mastering construction site activities. However, there are few image-captioning models for construction scenes at present, and the existing methods do not perform well in complex construction scenes. According to the characteristics of construction scenes, we label a text description dataset based on the MOCS dataset and propose a style-enhanced Transformer for image captioning in construction scenes, simply called SETCAP...
March 1, 2024: Entropy
https://read.qxmd.com/read/38537293/advancing-medical-imaging-with-language-models-featuring-a-spotlight-on-chatgpt
#8
JOURNAL ARTICLE
Mingzhe Hu, Joshua Yuan Qian, Shaoyan Pan, Yuheng Li, Richard L J Qiu, Xiaofeng Yang
This review paper aims to serve as a comprehensive guide and instructional resource for researchers seeking to effectively implement language models in medical imaging research. First, we presented the fundamental principles and evolution of language models, dedicating particular attention to large language models. We then reviewed the current literature on how language models are being used to improve medical imaging, emphasizing a range of applications such as image captioning, report generation, report classification, findings extraction, visual question response systems, interpretable diagnosis and so on...
March 27, 2024: Physics in Medicine and Biology
https://read.qxmd.com/read/38524308/hollman-facilitations-a-user-friendly-tool-of-supporting-children-with-visual-impairment-and-their-families-in-daily-life
#9
JOURNAL ARTICLE
Tiziana Battistin, Silvia Trentin, Enrica Polato, Maria Eleonora Reffo
The Robert Hollman Foundation (RHF) designed "Hollman Facilitations" (HF), a user-friendly way of supporting children with visual impairment (VI) and their families on a daily basis. This tool consists of specifically designed pictures on simple A4 sheets, which highlight with images and captions the key aspects of these children's everyday lives. Professionals can easily modify Hollman Facilitations to customize them to the unique developmental needs of every single child with VI and to their individualized strengths and weaknesses...
June 2024: MethodsX
https://read.qxmd.com/read/38508675/icga-gpt-report-generation-and-question-answering-for-indocyanine-green-angiography-images
#10
JOURNAL ARTICLE
Xiaolan Chen, Weiyi Zhang, Ziwei Zhao, Pusheng Xu, Yingfeng Zheng, Danli Shi, Mingguang He
BACKGROUND: Indocyanine green angiography (ICGA) is vital for diagnosing chorioretinal diseases, but its interpretation and patient communication require extensive expertise and time-consuming efforts. We aim to develop a bilingual ICGA report generation and question-answering (QA) system. METHODS: Our dataset comprised 213 129 ICGA images from 2919 participants. The system comprised two stages: image-text alignment for report generation by a multimodal transformer architecture, and large language model (LLM)-based QA with ICGA text reports and human-input questions...
March 20, 2024: British Journal of Ophthalmology
https://read.qxmd.com/read/38507381/cross-modal-retrieval-with-noisy-correspondence-via-consistency-refining-and-mining
#11
JOURNAL ARTICLE
Xinran Ma, Mouxing Yang, Yunfan Li, Peng Hu, Jiancheng Lv, Xi Peng
The success of existing cross-modal retrieval (CMR) methods heavily rely on the assumption that the annotated cross-modal correspondence is faultless. In practice, however, the correspondence of some pairs would be inevitably contaminated during data collection or annotation, thus leading to the so-called Noisy Correspondence (NC) problem. To alleviate the influence of NC, we propose a novel method termed Consistency REfining And Mining (CREAM) by revealing and exploiting the difference between correspondence and consistency...
March 20, 2024: IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society
https://read.qxmd.com/read/38506968/leveraging-the-capabilities-of-ai-novice-neurology-trained-operators-performing-cardiac-pocus-in-patients-with-acute-brain-injury
#12
JOURNAL ARTICLE
Jennifer Mears, Safa Kaleem, Rohan Panchamia, Hooman Kamel, Chris Tam, Richard Thalappillil, Santosh Murthy, Alexander E Merkler, Cenai Zhang, Judy H Ch'ang
BACKGROUND: Cardiac point-of-care ultrasound (cPOCUS) can aid in the diagnosis and treatment of cardiac disorders. Such disorders can arise as complications of acute brain injury, but most neurologic intensive care unit (NICU) providers do not receive formal training in cPOCUS. Caption artificial intelligence (AI) uses a novel deep learning (DL) algorithm to guide novice cPOCUS users in obtaining diagnostic-quality cardiac images. The primary objective of this study was to determine how often NICU providers with minimal cPOCUS experience capture quality images using DL-guided cPOCUS as well as the association between DL-guided cPOCUS and change in management and time to formal echocardiograms in the NICU...
March 20, 2024: Neurocritical Care
https://read.qxmd.com/read/38504017/a-visual-language-foundation-model-for-computational-pathology
#13
JOURNAL ARTICLE
Ming Y Lu, Bowen Chen, Drew F K Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, Anil V Parwani, Andrew Zhang, Faisal Mahmood
The accelerated adoption of digital pathology and advances in deep learning have enabled the development of robust models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain, and a model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities...
March 2024: Nature Medicine
https://read.qxmd.com/read/38470582/towards-video-anomaly-retrieval-from-video-anomaly-detection-new-benchmarks-and-model
#14
JOURNAL ARTICLE
Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang
Video anomaly detection (VAD) has been paid increasing attention due to its potential applications, its current dominant tasks focus on online detecting anomalies, which can be roughly interpreted as the binary or multiple event classification. However, such a setup that builds relationships between complicated anomalous events and single labels, e.g., "vandalism", is superficial, since single labels are deficient to characterize anomalous events. In reality, users tend to search a specific video rather than a series of approximate videos...
March 12, 2024: IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society
https://read.qxmd.com/read/38446647/bridging-the-cross-modality-semantic-gap-in-visual-question-answering
#15
JOURNAL ARTICLE
Boyue Wang, Yujian Ma, Xiaoyan Li, Junbin Gao, Yongli Hu, Baocai Yin
The objective of visual question answering (VQA) is to adequately comprehend a question and identify relevant contents in an image that can provide an answer. Existing approaches in VQA often combine visual and question features directly to create a unified cross-modality representation for answer inference. However, this kind of approach fails to bridge the semantic gap between visual and text modalities, resulting in a lack of alignment in cross-modality semantics and the inability to match key visual content accurately...
March 6, 2024: IEEE Transactions on Neural Networks and Learning Systems
https://read.qxmd.com/read/38421845/zeronlg-aligning-and-autoencoding-domains-for-zero-shot-multimodal-and-multilingual-natural-language-generation
#16
JOURNAL ARTICLE
Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, Yaowei Wang, David A Clifton
Natural Language Generation (NLG) accepts input data in the form of images, videos, or text and generates corresponding natural language text as output. Existing NLG methods mainly adopt a supervised approach and rely heavily on coupled data-to-text pairs. However, for many targeted scenarios and for non-English languages, sufficient quantities of labeled data are often not available. As a result, it is necessary to collect and label data-text pairs for training, which is both costly and time-consuming. To relax the dependency on labeled data of downstream tasks, we propose an intuitive and effective zero-shot learning framework, ZeroNLG, which can deal with multiple NLG tasks, including image-to-text (image captioning), video-to-text (video captioning), and text-to-text (neural machine translation), across English, Chinese, German, and French within a unified framework...
February 29, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38373123/zero-shot-video-grounding-with-pseudo-query-lookup-and-verification
#17
JOURNAL ARTICLE
Yu Lu, Ruijie Quan, Linchao Zhu, Yi Yang
Video grounding, the process of identifying a specific moment in an untrimmed video based on a natural language query, has become a popular topic in video understanding. However, fully supervised learning approaches for video grounding that require large amounts of annotated data can be expensive and time-consuming. Recently, zero-shot video grounding (ZS-VG) methods that leverage pre-trained object detectors and language models to generate pseudo-supervision for training video grounding models have been developed...
February 19, 2024: IEEE Transactions on Image Processing: a Publication of the IEEE Signal Processing Society
https://read.qxmd.com/read/38366336/characterizing-anti-vaping-posts-for-effective-communication-on-instagram-using-multimodal-deep-learning
#18
JOURNAL ARTICLE
Zidian Xie, Shijian Deng, Pinxin Liu, Xubin Lou, Chenliang Xu, Dongmei Li
INTRODUCTION: Instagram is a popular social networking platform for sharing photos with a large proportion of youth and young adult users. We aim to identify key features in anti-vaping Instagram image posts associated with high social media user engagement by artificial intelligence. AIMS AND METHODS: We collected 8972 anti-vaping Instagram image posts and hand-coded 2200 Instagram images to identify nine image features such as warning signs and person-shown vaping...
February 15, 2024: Nicotine & Tobacco Research
https://read.qxmd.com/read/38349824/smart-syntax-calibrated-multi-aspect-relation-transformer-for-change-captioning
#19
JOURNAL ARTICLE
Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Qingming Huang
Change captioning aims to describe the semantic change between two similar images. In this process, as the most typical distractor, viewpoint change leads to the pseudo changes about appearance and position of objects, thereby overwhelming the real change. Besides, since the visual signal of change appears in a local region with weak feature, it is difficult for the model to directly translate the learned change features into the sentence. In this paper, we propose a syntax-calibrated multi-aspect relation transformer to learn effective change features under different scenes, and build reliable cross-modal alignment between the change features and linguistic words during caption generation...
February 13, 2024: IEEE Transactions on Pattern Analysis and Machine Intelligence
https://read.qxmd.com/read/38308094/content-analysis-of-oral-mouth-cancer-related-posts-on-instagram
#20
JOURNAL ARTICLE
Omar Al Karadsheh, Alaa Atef, Dua'a Alqaisi, Siraj Zabadi, Yazan Hassona
OBJECTIVE: To examine the content of Instagram posts about oral cancer and assess its usefulness in promoting oral cancer awareness and early detection practices. METHODS: A systematic search of Instagram for posts about oral (mouth) cancer was conducted using the hashtags #oral cancer and #mouth cancer. Posts usefulness in promoting awareness and early detection was assessed using the early detection usefulness score, and caption readability was assessed using the Flesch Kincaid readability score...
February 2, 2024: Oral Diseases
keyword
keyword
168929
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.