Automatic Title Generation in Scientific Articles for Authorship Assistance: A Summarization Approach

Jan Wira Gotama Putra; Masayu Leylia Khodra

doi:10.5614/itbj.ict.res.appl.2017.11.3.3

Authors

Jan Wira Gotama Putra Department of Computer Science, School of Electrical Engineering & Informatics, Bandung Institute of Technology, Jalan Ganesa No.10, Bandung 40132
Masayu Leylia Khodra Department of Computer Science, School of Electrical Engineering & Informatics, Bandung Institute of Technology, Jalan Ganesa No.10, Bandung 40132

DOI:

https://doi.org/10.5614/itbj.ict.res.appl.2017.11.3.3

Keywords:

adaptive K-nearest neighbor(AKNN), chemistry domain, computational linguistics domain, rhetorical categories, scientific article, summarization, title generation.

Abstract

This paper presents a studyon automatic title generation for scientific articles considering sentence information types known as rhetorical categories. A title can be seenas a high-compression summary of a document. A rhetorical category is an information type conveyed by the author of a text for each textual unit, for example: background, method, or result of the research. The experiment in this studyfocused on extracting the research purpose and research method information for inclusion in a computer-generated title. Sentences are classifiedinto rhetorical categories, after which these sentences are filtered using three methods. Three title candidates whose contents reflect the filtered sentencesare then generated using a template-based or an adaptive K-nearest neighbor approach. The experiment was conducted using two different dataset domains: computational linguistics and chemistry. Our study obtained a 0.109-0.255 F1-measure score on average for computer-generated titles compared to original titles. In a human evaluation the automatically generated titles were deemed 'relatively acceptable' in the computational linguistics domain and 'not acceptable' in the chemistry domain. It can be concluded that rhetorical categories have unexplored potential to improve the performance of summarization tasks in general.

Downloads

Download data is not yet available.

References

Jamali, H.R. & Nikzad, M., Article Title Type and Its Relation with the Number of Downloads and Citations, Scientometrics, 88 (2), pp. 653-661, 2011.

Xu, H., Martin, E. & Mihidadia, A., Extractive Summarization Based on Keyword Profile and Language Model, In Proceedings of North American Chapter of the ACL - Human Language Technologies (HLT), pp. 123-132, 2015.

Pavia, C.E., da Silveira Nogueira Lima, J.P. & Paiva, B.S.R., Articles with Short Titles Describing the Results are Cited More Often. CLINICS, 65(6), pp. 509-513, 2012.

Tolga, A., Selection of Authors, Titles and Writing a Manuscript Abstract, Turkish Journal of Urology, 39(1), pp. 5-7, 2013.

Letchford, A., Moat, H.S. & Preis, T., The Advantage of Short Paper Titles, Royal Society Open Science, 2015.

Jin, R. & Hauptmann, A.G., Automatic Title Generation for Spoken Broadcast News, In Proceedings of North American Chapter of the ACL -Human Language Technologies (HLT), pp.1-3, 2001.

Kong, S-Y., Wang, C-C., Kuo, K-C. & Lee, L-S., Automatic Title Generation for Chinese Spoken Documents with A Delicate Scored Viterbi Algorithm, In Spoken Language Technology (SLT) Workshop, pp. 165-168, 2008.

Colmenares, C.A., Litvak, M., Matrach, A. & Silvestry, F., HEADS: Headline Generation as Sequence Prediction Using an Abstract Feature-Rich Space, In Proceedings of North American Chapter of the ACL -Human Language Technologies (HLT), pp. 133-142, 2015.

Teufel, S., Argumentative Zoning: Information Extraction from Scientific Text, PhD Thesis, Edinburgh: University of Edinburgh, 1999.

Teufel, S., Siddhartan, A. & Batchelor, C., Towards Discipline-Independent Argumentative Zoning: Evidence From Chemistry and Computational Linguistics, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1493-1502, 2009.

Contractor, D., Fan, G.Y. & Koheren, A., Using Argumentative Zones for Extractive Summarization of Scientific Articles, In Proceedings of Computational Linguistics (COLING), pp. 663-678, 2012.

Putra, J.W.G. & Khodra, M.L., Rhetorical Sentence Classification for Automatic Title Generation in Scientific Article. In Journal of TELKOMNIKA, 15(2), pp. 656-664, 2017.

Putra, J.W.G. & Fujita, K., Scientific Paper Title Validity Checker Utilizing Vector Space Model and Topics Model, In Proceedings of Konferensi Nasional Informatika (KNIF), pp. 69-74, 2015.

Kupiec, J., Pedersen, J. & Chen, F., A Trainable Document Summarizer, In Proceedings of Special Interest Group in Information Retrieval, pp. 68-73, 1995.

Teufel, S. & Moens, M., Summarizing Scientific Articles - Experiments with Relevance and Rhetorical Status. Journal of Computational Linguistics, 28, 4, pp. 409-445, 2002.

Wong, K-F., Wu, M. & Li, J.W., Extractive Summarization Using Supervised and Semi-Supervised Learning, In Proceedings of Computational Linguistics, pp. 985-992, 2008.

Widyantoro, D.H., Khodra, M.L., Riyanto, B. & Aziz, E.A., A Multiclass-Based Classification Strategy for Rhetorical Sentence Categorization from Scientific Papers, Journal of ICT Research and Applications, 7(3), pp. 235-249, 2013.

Teufel, S. & Moens, M., Discourse-Level Argumentation in Scientific Articles: Human Automatic Annotation, In Towards Standards and Tools for Discourse Tagging - ACL 1999 Workshop, 1999.

Seaghdha, D.O. & Teufel, S., Unsupervised Learning of Rhetorical Structure with Un-Topic Models, In Proceedings of Computational Linguistics (COLING), pp.2-13, 2014.

Chen, S-C. & Lee, L-S., Automatic Title Generation for Chinese Spoken Documents Using an Adaptive K-Nearest-Neighbor Approach, In Proceedings European Conference of Speech Communication and Technology, pp. 2813-2816, 2003.

Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J. & McClosky, D., The Stanford CoreNLP Natural Language Processing Toolkit, In Proceedings of the 52nd Annual Meeting of the Association of Computational Linguistics, 2014.

Clark, A., Fox, C. & Lappin, S., The Handbook of Computational Linguistics and Natural Language Processing, John Wiley & Sons, Singapore, 2010.