Please use this identifier to cite or link to this item:
http://hdl.handle.net/1893/36966
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Khan, Sulaiman | en_UK |
dc.contributor.author | Biswas, Md Rafiul | en_UK |
dc.contributor.author | Murad, Alina | en_UK |
dc.contributor.author | Ali, Hazrat | en_UK |
dc.contributor.author | Shah, Zubair | en_UK |
dc.date.accessioned | 2025-04-03T00:03:01Z | - |
dc.date.available | 2025-04-03T00:03:01Z | - |
dc.date.issued | 2024-08-07 | en_UK |
dc.identifier.uri | http://hdl.handle.net/1893/36966 | - |
dc.description.abstract | Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantly important. In this study, we explore the potential of the Gemini (gemini-1.0-pro-visionlatest) and GPT-4V (gpt-4-vision-preview) models for medical image analysis using two modalities of medical image data. Utilizing synthetic and real imaging data, both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images. Experimental results demonstrate that both Gemini and GPT4 could perform some interpretation of the input images. In this specific experiment, Gemini was able to perform slightly better than the GPT-4V on the classification task. In contrast, responses associated with GPT-4V were mostly generic in nature. Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images. We also identify key limitations associated with the early investigation study on MLLMs for specialized tasks in medical image analysis. | en_UK |
dc.language.iso | en | en_UK |
dc.publisher | IEEE | en_UK |
dc.relation | Khan S, Biswas MR, Murad A, Ali H & Shah Z (2024) An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging. <i>2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI)</i>, San Jose, CA, USA, 07.08.2024-09.08.2024. https://doi.org/10.1109/iri62200.2024.00056 | en_UK |
dc.rights.uri | http://www.rioxx.net/licenses/under-embargo-all-rights-reserved | en_UK |
dc.subject | LLM | en_UK |
dc.subject | ChatGPT | en_UK |
dc.subject | Gemini AI | en_UK |
dc.subject | Multimodal data | en_UK |
dc.subject | Retina | en_UK |
dc.subject | Lung | en_UK |
dc.title | An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging | en_UK |
dc.type | Conference Paper | en_UK |
dc.rights.embargodate | 2999-12-31 | en_UK |
dc.identifier.doi | 10.1109/iri62200.2024.00056 | en_UK |
dc.citation.issn | 2835-5776 | en_UK |
dc.citation.publicationstatus | Published | en_UK |
dc.citation.peerreviewed | Refereed | en_UK |
dc.type.status | VoR - Version of Record | en_UK |
dc.author.email | ali.hazrat@stir.ac.uk | en_UK |
dc.citation.conferencedates | 2024-08-07 - 2024-08-09 | en_UK |
dc.citation.conferencelocation | San Jose, CA, USA | en_UK |
dc.citation.conferencename | 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI) | en_UK |
dc.citation.date | 08/10/2024 | en_UK |
dc.citation.isbn | 979-8-3503-5118-7 | en_UK |
dc.contributor.affiliation | Hamad Bin Khalifa University | en_UK |
dc.contributor.affiliation | Hamad Bin Khalifa University | en_UK |
dc.contributor.affiliation | Foundation University, Islamabad | en_UK |
dc.contributor.affiliation | Sohar University | en_UK |
dc.contributor.affiliation | Hamad Bin Khalifa University | en_UK |
dc.identifier.wtid | 2069246 | en_UK |
dc.contributor.orcid | 0000-0003-3058-5794 | en_UK |
dc.date.accepted | 2024-06-17 | en_UK |
dcterms.dateAccepted | 2024-06-17 | en_UK |
dc.date.filedepositdate | 2024-11-12 | en_UK |
rioxxterms.apc | not required | en_UK |
rioxxterms.type | Conference Paper/Proceeding/Abstract | en_UK |
rioxxterms.version | VoR | en_UK |
local.rioxx.author | Khan, Sulaiman| | en_UK |
local.rioxx.author | Biswas, Md Rafiul| | en_UK |
local.rioxx.author | Murad, Alina| | en_UK |
local.rioxx.author | Ali, Hazrat|0000-0003-3058-5794 | en_UK |
local.rioxx.author | Shah, Zubair| | en_UK |
local.rioxx.project | Internal Project|University of Stirling|https://isni.org/isni/0000000122484331 | en_UK |
local.rioxx.freetoreaddate | 2024-11-12 | en_UK |
local.rioxx.licence | http://www.rioxx.net/licenses/under-embargo-all-rights-reserved|| | en_UK |
local.rioxx.filename | An_Early_Investigation_into_the_Utility_of_Multimodal_Large_Language_Models_in_Medical_Imaging.pdf | en_UK |
local.rioxx.filecount | 1 | en_UK |
local.rioxx.source | 979-8-3503-5118-7 | en_UK |
Appears in Collections: | Computing Science and Mathematics Conference Papers and Proceedings |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
An_Early_Investigation_into_the_Utility_of_Multimodal_Large_Language_Models_in_Medical_Imaging.pdf | Fulltext - Published Version | 1.15 MB | Adobe PDF | Under Permanent Embargo Request a copy |
This item is protected by original copyright |
Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/
If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.