Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/36697
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAbd-alrazaq, Alaaen_UK
dc.contributor.authorNashwan, Abdulqadir Jen_UK
dc.contributor.authorShah, Zubairen_UK
dc.contributor.authorAbujaber, Ahmaden_UK
dc.contributor.authorAlhuwail, Darien_UK
dc.contributor.authorSchneider, Jensen_UK
dc.contributor.authorAlSaad, Rawanen_UK
dc.contributor.authorAli, Hazraten_UK
dc.contributor.authorAlomoush, Waleeden_UK
dc.contributor.authorAhmed, Arfanen_UK
dc.contributor.authorAziz, Sarahen_UK
dc.date.accessioned2025-03-08T01:11:53Z-
dc.date.available2025-03-08T01:11:53Z-
dc.date.issued2024-03-05en_UK
dc.identifier.othere49411en_UK
dc.identifier.urihttp://hdl.handle.net/1893/36697-
dc.description.abstractBackground: Research gaps refer to unanswered questions in the existing body of knowledge, either due to a lack of studies or inconclusive results. Research gaps are essential starting points and motivation in scientific research. Traditional methods for identifying research gaps, such as literature reviews and expert opinions, can be time consuming, labor intensive, and prone to bias. They may also fall short when dealing with rapidly evolving or time-sensitive subjects. Thus, innovative scalable approaches are needed to identify research gaps, systematically assess the literature, and prioritize areas for further study in the topic of interest. Objective: In this paper, we propose a machine learning–based approach for identifying research gaps through the analysis of scientific literature. We used the COVID-19 pandemic as a case study. Methods: We conducted an analysis to identify research gaps in COVID-19 literature using the COVID-19 Open Research (CORD-19) data set, which comprises 1,121,433 papers related to the COVID-19 pandemic. Our approach is based on the BERTopic topic modeling technique, which leverages transformers and class-based term frequency-inverse document frequency to create dense clusters allowing for easily interpretable topics. Our BERTopic-based approach involves 3 stages: embedding documents, clustering documents (dimension reduction and clustering), and representing topics (generating candidates and maximizing candidate relevance). Results: After applying the study selection criteria, we included 33,206 abstracts in the analysis of this study. The final list of research gaps identified 21 different areas, which were grouped into 6 principal topics. These topics were: “virus of COVID-19,” “risk factors of COVID-19,” “prevention of COVID-19,” “treatment of COVID-19,” “health care delivery during COVID-19,” “and impact of COVID-19.” The most prominent topic, observed in over half of the analyzed studies, was “the impact of COVID-19.” Conclusions: The proposed machine learning–based approach has the potential to identify research gaps in scientific literature. This study is not intended to replace individual literature research within a selected topic. Instead, it can serve as a guide to formulate precise literature search queries in specific areas associated with research questions that previous publications have earmarked for future exploration. Future research should leverage an up-to-date list of studies that are retrieved from the most common databases in the target area. When feasible, full texts or, at minimum, discussion sections should be analyzed rather than limiting their analysis to abstracts. Furthermore, future studies could evaluate more efficient modeling algorithms, especially those combining topic modeling with statistical uncertainty quantification, such as conformal prediction.en_UK
dc.language.isoenen_UK
dc.publisherJMIR Publications Inc.en_UK
dc.relationAbd-alrazaq A, Nashwan AJ, Shah Z, Abujaber A, Alhuwail D, Schneider J, AlSaad R, Ali H, Alomoush W, Ahmed A & Aziz S (2024) Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study. <i>JMIR Formative Research</i>, 8, Art. No.: e49411. https://doi.org/10.2196/49411en_UK
dc.rightsThis is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.en_UK
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en_UK
dc.subjectresearch gapsen_UK
dc.subjectresearch gapen_UK
dc.subjectresearch topicen_UK
dc.subjectresearch topicsen_UK
dc.subjectscientific literatureen_UK
dc.subjectliterature reviewen_UK
dc.subjectmachine learningen_UK
dc.subjectCOVID-19en_UK
dc.subjectBERTopicen_UK
dc.subjecttopic clusteringen_UK
dc.subjecttext analysisen_UK
dc.subjectBERTen_UK
dc.subjectNLPen_UK
dc.subjectnatural language processingen_UK
dc.subjectreview methodsen_UK
dc.subjectreview methodologyen_UK
dc.subjectSARS-CoV-2en_UK
dc.subjectcoronavirusen_UK
dc.subjectCOVIDen_UK
dc.titleMachine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Studyen_UK
dc.typeJournal Articleen_UK
dc.identifier.doi10.2196/49411en_UK
dc.identifier.pmid38441952en_UK
dc.citation.jtitleJMIR Formative Researchen_UK
dc.citation.issn2561-326Xen_UK
dc.citation.volume8en_UK
dc.citation.publicationstatusPublisheden_UK
dc.citation.peerreviewedRefereeden_UK
dc.type.statusVoR - Version of Recorden_UK
dc.author.emailali.hazrat@stir.ac.uken_UK
dc.citation.date05/03/2024en_UK
dc.contributor.affiliationWeill Cornell Medicineen_UK
dc.contributor.affiliationHamad Medical Corporationen_UK
dc.contributor.affiliationHamad Bin Khalifa Universityen_UK
dc.contributor.affiliationHamad Medical Corporationen_UK
dc.contributor.affiliationKuwait Universityen_UK
dc.contributor.affiliationHamad Bin Khalifa Universityen_UK
dc.contributor.affiliationWeill Cornell Medicineen_UK
dc.contributor.affiliationSohar Universityen_UK
dc.contributor.affiliationWeill Cornell Medicineen_UK
dc.contributor.affiliationWeill Cornell Medicineen_UK
dc.identifier.isiWOS:001183499200002en_UK
dc.identifier.scopusid2-s2.0-85191195535en_UK
dc.identifier.wtid2074200en_UK
dc.contributor.orcid0000-0001-7695-4626en_UK
dc.contributor.orcid0000-0003-4845-4119en_UK
dc.contributor.orcid0000-0001-7389-3274en_UK
dc.contributor.orcid0000-0002-8704-4991en_UK
dc.contributor.orcid0000-0001-5038-3044en_UK
dc.contributor.orcid0000-0002-0546-2816en_UK
dc.contributor.orcid0000-0002-3235-0860en_UK
dc.contributor.orcid0000-0003-3058-5794en_UK
dc.contributor.orcid0000-0002-2937-4327en_UK
dc.contributor.orcid0000-0002-4025-5767en_UK
dc.contributor.orcid0000-0002-0861-9743en_UK
dc.date.accepted2024-02-06en_UK
dcterms.dateAccepted2024-02-06en_UK
dc.date.filedepositdate2024-12-13en_UK
dc.subject.tagCOVID-19en_UK
rioxxterms.apcnot requireden_UK
rioxxterms.versionVoRen_UK
local.rioxx.authorAbd-alrazaq, Alaa|0000-0001-7695-4626en_UK
local.rioxx.authorNashwan, Abdulqadir J|0000-0003-4845-4119en_UK
local.rioxx.authorShah, Zubair|0000-0001-7389-3274en_UK
local.rioxx.authorAbujaber, Ahmad|0000-0002-8704-4991en_UK
local.rioxx.authorAlhuwail, Dari|0000-0001-5038-3044en_UK
local.rioxx.authorSchneider, Jens|0000-0002-0546-2816en_UK
local.rioxx.authorAlSaad, Rawan|0000-0002-3235-0860en_UK
local.rioxx.authorAli, Hazrat|0000-0003-3058-5794en_UK
local.rioxx.authorAlomoush, Waleed|0000-0002-2937-4327en_UK
local.rioxx.authorAhmed, Arfan|0000-0002-4025-5767en_UK
local.rioxx.authorAziz, Sarah|0000-0002-0861-9743en_UK
local.rioxx.projectInternal Project|University of Stirling|https://isni.org/isni/0000000122484331en_UK
local.rioxx.freetoreaddate2024-12-13en_UK
local.rioxx.licencehttp://creativecommons.org/licenses/by/4.0/|2024-12-13|en_UK
local.rioxx.filenameformative-2024-1-e49411.pdfen_UK
local.rioxx.filecount1en_UK
local.rioxx.source2561-326Xen_UK
Appears in Collections:Computing Science and Mathematics Journal Articles

Files in This Item:
File Description SizeFormat 
formative-2024-1-e49411.pdfFulltext - Published Version215.83 kBAdobe PDFView/Open


This item is protected by original copyright



A file in this item is licensed under a Creative Commons License Creative Commons

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.