Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/28020
Full metadata record
DC FieldValueLanguage
dc.contributor.authorMai, Florianen_UK
dc.contributor.authorGalke, Lukasen_UK
dc.contributor.authorScherp, Ansgaren_UK
dc.date.accessioned2018-10-24T14:34:50Z-
dc.date.available2018-10-24T14:34:50Z-
dc.date.issued2018-12-31en_UK
dc.identifier.urihttp://hdl.handle.net/1893/28020-
dc.description.abstractFor (semi-)automated subject indexing systems in digital libraries, it is often more practical to use metadata such as the title of a publication instead of the full-text or the abstract. Therefore, it is desirable to have good text mining and text classification algorithms that operate well already on the title of a publication. So far, the classification performance on titles is not competitive with the performance on the full-texts if the same number of training samples is used for training. However, it is much easier to obtain title data in large quantities and to use it for training than full-text data. In this paper, we investigate the question how models obtained from training on increasing amounts of title training data compare to models from training on a constant number of full-texts. We evaluate this question on a large-scale dataset from the medical domain (PubMed) and from economics (EconBiz). In these datasets, the titles and annotations of millions of publications are available, and they outnumber the available full-texts by a factor of 20 and 15, respectively. To exploit these large amounts of data to their full potential, we develop three strong deep learning classifiers and evaluate their performance on the two datasets. The results are promising. On the EconBiz dataset, all three classifiers outperform their full-text counterparts by a large margin. The best title-based classifier outperforms the best full-text method by 9.4%. On the PubMed dataset, the best title-based method almost reaches the performance of the best full-text classifier, with a difference of only 2.9%.en_UK
dc.language.isoenen_UK
dc.publisherACMen_UK
dc.relationMai F, Galke L & Scherp A (2018) Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 18th ACM/IEEE on Joint Conference on Digital Libraries, Fort Worth, TX, USA, 03.06.2018-07.06.2018. New York: ACM, pp. 169-178. https://doi.org/10.1145/3197026.3197039en_UK
dc.rightsThe publisher does not allow this work to be made publicly available in this Repository. Please use the Request a Copy feature at the foot of the Repository record to request a copy directly from the author. You can only request a copy if you wish to use this work for your own research or private study.en_UK
dc.rights.urihttp://www.rioxx.net/licenses/under-embargo-all-rights-reserveden_UK
dc.subjecttext classificationen_UK
dc.subjectdeep learningen_UK
dc.subjectdigital librariesen_UK
dc.titleUsing Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Texten_UK
dc.typeConference Paperen_UK
dc.rights.embargodate2999-12-31en_UK
dc.rights.embargoreason[p169-mai.pdf] The publisher does not allow this work to be made publicly available in this Repository therefore there is an embargo on the full text of the work.en_UK
dc.identifier.doi10.1145/3197026.3197039en_UK
dc.citation.jtitleProceedings of the ACM/IEEE Joint Conference on Digital Librariesen_UK
dc.citation.spage169en_UK
dc.citation.epage178en_UK
dc.citation.publicationstatusPublisheden_UK
dc.type.statusVoR - Version of Recorden_UK
dc.contributor.funderEuropean Commissionen_UK
dc.author.emailansgar.scherp@stir.ac.uken_UK
dc.citation.btitleProceedings of the 18th ACM/IEEE on Joint Conference on Digital Librariesen_UK
dc.citation.conferencedates2018-06-03 - 2018-06-07en_UK
dc.citation.conferencelocationFort Worth, TX, USAen_UK
dc.citation.conferencename18th ACM/IEEE on Joint Conference on Digital Librariesen_UK
dc.citation.isbn9781450351782en_UK
dc.publisher.addressNew Yorken_UK
dc.contributor.affiliationUniversity of Kielen_UK
dc.contributor.affiliationUniversity of Kielen_UK
dc.contributor.affiliationUniversity of Kielen_UK
dc.identifier.scopusid2-s2.0-85048891192en_UK
dc.identifier.wtid1007148en_UK
dc.contributor.orcid0000-0002-2653-9245en_UK
dc.date.accepted2018-03-08en_UK
dcterms.dateAccepted2018-03-08en_UK
dc.date.filedepositdate2018-10-18en_UK
rioxxterms.apcnot requireden_UK
rioxxterms.typeConference Paper/Proceeding/Abstracten_UK
rioxxterms.versionVoRen_UK
local.rioxx.authorMai, Florian|en_UK
local.rioxx.authorGalke, Lukas|en_UK
local.rioxx.authorScherp, Ansgar|0000-0002-2653-9245en_UK
local.rioxx.projectProject ID unknown|European Commission (Horizon 2020)|en_UK
local.rioxx.freetoreaddate2268-12-01en_UK
local.rioxx.licencehttp://www.rioxx.net/licenses/under-embargo-all-rights-reserved||en_UK
local.rioxx.filenamep169-mai.pdfen_UK
local.rioxx.filecount1en_UK
local.rioxx.source9781450351782en_UK
Appears in Collections:Computing Science and Mathematics Conference Papers and Proceedings

Files in This Item:
File Description SizeFormat 
p169-mai.pdfFulltext - Published Version1.2 MBAdobe PDFUnder Permanent Embargo    Request a copy


This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.