Rhetorical Sentences Classification Based on Section Class and Title of Paper for Experimental Technical Papers
DOI:
https://doi.org/10.5614/itbj.ict.res.appl.2015.9.3.5Abstract
Rhetorical sentence classification is an interesting approach for making extractive summaries but this technique still needs to be developed because the performance of automatic rhetorical sentence classification is still poor. Rhetorical sentences are sentences that contain rhetorical words or phrases. Rhetorical sentences not only appear in the contents of a paper but also in the title. In this study, features related to section class and title class that have been proposed in a previous research were further developed. Our method uses different techniques to reach automatic section class extraction for which we introduce new, format-based features. Furthermore, we propose automatic rhetoric phrase extraction from the title. The corpus we used was a collection of technical-experimental scientific papers. Our method uses the Support Vector Machine (SVM) algorithm and the Naïve Bayesian algorithm for classification. The four categories used were: Problem, Method, Data, and Result. It was hypothesized that these features would be able to improve classification accuracy compared to previous methods. The F-measure for these categories reached up to 14%.
Downloads
References
Teufel. S., Argumentative Zoning: Information Extraction from Scientific Text, Ph.D Dissertation, University of Edinburgh, Edinburgh, Scotland, 1999.
Kodra, M.L., Widyantoro, D.H., Aziz, E.A. & Trilaksono, B.R., Information Extraction from Scientific Paper Using Rhetorical Classifier, International Conference on Electrical Engineering and Informatics, Bandung, Indonesia, 2011.
Edmundson, H.P, New Methods in Automatic Extracting, Journal of the ACM (JACM), 16(2), pp. 264-285, 1969.
Helen, A., Widyantoro, D.H. & Purwarianti, A., Extraction and Classification of Rhetorical Sentences of Experimental Technical Paper Based on Section Class, Second International Conference on Information and Comunication Technology (ICoICT), 978-1-4799,3580-2, pp. 419-424, IEEE, 2014.
APA Experimental Paper Writing Format, https://owl.english.purdue.edu/owl/resource/560/13/ (16 October 2015)
Helen, A., Purwarianti, A. & Widyantoro, D.H., Developing the Research Map Framework to Present the Positioning Research Automatically, Proceeding of Information System Conference (KNSI), pp. 1458-1465 Mataram, Februari, 2013. (Text in Indonesian)
Teufel, S., Argumentative Classification of Extracted Sentences as a First Step Towards Flexible Abstracting, In Advances in Automatic Text Summarization books, Mani and M.T. Maybury (Eds.), pp. 155-176, 1999.
Luhn, H.P., The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development, 2(2), pp. 159-165, 1958.
Kupiec. J., Pedersen, J. & Chen, F., A Trainable Document Summarizer, Proceeding of ACM SIGIR, pp. 68-72, 1995,
Baxendale, P.B., Machine-made Index for Technical Literature - An Experiment, IBM Journal of Research and Development, 2 (4), pp. 354-361, 1958.
Shiyanm. O., Khoo, C.S.G. & Goh, D.H., Design and Development of A Concept-based Multi Document Summarization System for Research Abstracts, Journal of Information Science, 34, pp. 308-326, 2008.
Yamamoto. Y., & Takagi, T., A Sentence Classification System for Multi Biomedical Literature Summarization, Proceeding of the 21st International Conference on Data Engineering Workshops, pp 1163 IEEE Computer Society Washington, DC, USA, 2005.
Verma, T., Renu, R. & Gaur, D., Tokenization and Filtering Process in Rapid Miner, International Journal of Applied Information Systems, 7(2), pp. 16-18, 2014.
Futrele, R.P., Satterley, J. & McCorma, T., A New NLP System for Biomedical Text Analysis, NLP-NG, IEEE 978-1-4244-5121-0, pp. 296-301, 2009.
Ungurean, C. & Burileanu, D., An Advanced NLP Framework for High-Quality Text-to-Speech Synthesis, IEEE 978-1-4577-0441-3, pp. 1-6, 2011.
Raje, S., Tulangekar, T., Waghe, R., Pathak, R. & Mahale, P., Extraction of Key Phrases from Document using Statistical and Linguistic Analysis, Proceedings of 4th International Conference on Computer Science & Education, IEEE 978-1-4244-3521-0, pp. 161-164, 2009.
Teufel, S., Siddhartan, A. & Batchelor, C., Towards Discipline-Independent Argumentative Zoning Evidence from Chemistry and Computational Linguistics, Singapore, Proc. of the Conference on Empirical Methods in Natural Language Processing, 2009.
Yu, F., Xuan, H.W. & Zeng, D., Key-Phrase Extraction Based on a Combination of CRF Model with Document Structure, Eighth International Conference on Computational Intelligence and Security, IEEE 978-0-7695-4896-8, pp. 406-410, 2012.
Widyantoro, D.H. & Helen, A., Preposition-based Pattern Sequence for Rhetorical Phrase Extraction in Title Scientific Papers, in Proceeding RCCIE (Regional Conference on Computer and Information Engineering), Yogyakarta 7-8 October, 2014.