Automatic Tailored Multi-Paper Summarization based on Rhetorical Document Profile and Summary Specification

Masayu Leylia Khodra; Dwi Hendratmo Widyantoro; E. Aminudin Aziz; Bambang Riyanto Trilaksono

doi:10.5614/itbj.ict.2012.6.3.4

Authors

Masayu Leylia Khodra School of Electrical Engineering and Informatics, Bandung Institute of Technology, Jalan Ganesa No.10, Bandung 40132
Dwi Hendratmo Widyantoro School of Electrical Engineering and Informatics, Bandung Institute of Technology, Jalan Ganesa No.10, Bandung 40132
E. Aminudin Aziz Faculty of Language and Arts Education, Indonesia University of Education, Jalan Dr. Setiabudhi No. 229 Bandung
Bambang Riyanto Trilaksono School of Electrical Engineering and Informatics, Bandung Institute of Technology, Jalan Ganesa No.10, Bandung 40132

DOI:

https://doi.org/10.5614/itbj.ict.2012.6.3.4

Abstract

In order to assist researchers in addressing time constraint and low relevance in using scientific articles, an automatic tailored multi-paper summarization (TMPS) is proposed. In this paper, we extend Teufel's tailored summary to deal with multi-papers and more flexible representation of user information needs. Our TMPS extracts Rhetorical Document Profile (RDP) from each paper and presents a summary based on user information needs. Building Plan Language (BPLAN) is introduced as a formalization of Teufel's building plan and used to represent summary specification, which is more flexible representation user information needs. Surface repair is embedded within the BPLAN for improving the readability of extractive summary. Our experiment shows that the average performance of RDP extraction module is 94.46%, which promises high quality of extracts for summary composition. Generality evaluation shows that our BPLAN is flexible enough in composing various forms of summary. Subjective evaluation provides evidence that surface repair operators can improve the resulting summary readability.

Downloads

Download data is not yet available.

References

Maxie, G., Critical Writing and Reading of Review Articles, The Canadian Veterinary Journal, 31(6), pp.413-414, 1990.

Torraco, R.J., Writing Integrative Literature Reviews: Guidelines and Examples, Human Resource Development Review, 4(3), 356-367, 2005.

Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D. & Zajic, D., Using Citations to Generate Surveys of Scientific Paradigms, in Proc. of HLT/ NAACL 2009.

Wang, M., Tanaka, H. & Zhong, Y., Generating Summaries of Multiple Technical Articles,in Proc. of Sino-Japan Symposium on IIN, 2000.

Fiszman, M. & Rindflesch, T.C., Abstraction Summarization for Managing the Biomedical Research Literature, in Proc. of HLT/NAACL 2004.

Jiaming, Z. Exploiting Textual Structures of Technical Papers for Automatic Multi-Document Summarization, PhD Thesis, NUS, 2008

Shiyan, O., Khoo, C.S.G. & Goh, D.H., Design and Development of A Concept-based Multi Document Summarization System for Research Abstracts, Journal of Information Science, 34, pp.308-326, 2008.

Saracevic, T., Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part III: Behavior and Effects of Relevance, Journal of The American Society For Information Science And Technology, 58, pp. 2126-2144, 2007.

Borlund, P., The Concept of Relevance, Journal of The American Society for Information Science and Technology, 54, pp. 913-925, 2003.

Teufel, S., Argumentative Zoning: Information Extraction from Scientific Text, PhD Dissertation, University of Edinburgh, 1999.

Teufel, S., Siddhartan, A. & Batchelor, C., Towards DisciplineIndependent Argumentative zoning Evidence from Chemistry and Computational linguistics, in Proc. of Conference on Empirical Methods in NLP 2009.

Jones, K. S., Automatic Summarising: The State of The Art, Information Processing and Management, 43, pp.1449-1481, 2007.

Marcu, D., Automatic Abstracting, Encyclopedia of Library and Information Science, pp.245-256, 2003.

Gupta, V. & Lehal, G.S., A Survey of Text Summarization Extractive Techniques, Journal of Emerging Technologies In Web Intelligence, 2(3), 2010.

Radev, D. R., Hovy, E. & McKeown, K., Introduction to the Special Issue on Summarization, Journal Computational Linguistics- Summarization, 28(4), 2002.

Agarwal, N. & Gvr, K., Towards Multi-Document Summarization of Scientific Articles:Making Interesting Comparisons with SciSumm, in Proc. of the Workshop on Automatic Summarization ACL 2011.

Bird, S., Dale, R., Dorr, B.J. & Gibson, B., The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational linguistics, in Proc. of Language Resources and Evaluation Conference 2008.

PDFBox, http://pdfbox.apache.org/,(November 2011).

Luebbert, D. L., Method and System for Handling Text That Includes Paragraph Delimiters of Differing Formats, US Patent June 1996.

OpenNLP, http://opennlp.sourceforge.net, (April 2011).

Reynar, J.C. & Ratnaparkhi, A., A Maximum Entropy Approach to Identifying Sentence Boundaries, in Proc. of the 5th Conference on Applied Natural Language Processing 1997.

Hyland, K. & Tse, P., Metadiscourse in Academic Writing: A Reappraisal, Applied Linguistics, 25, pp.156-177, 2004.

Hsu, C.W., Chang, C.C. & Lin, C.J., A Practical Guide to Support Vector Classification, www.csie.ntu.edu.tw/~cjlin/papers/guide/, (December 2009).

Sun, A., Lim, E.P. & Liu, Y., On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study, DSS Elsevier 2009.

Chau M., Chen, H., A Machine Learning Approach to Web Page Filtering Using Content and Structure Analysis, DSS Elsevier, 44, pp. 482-494, 2008.

Sun, A., Lim, E.P. & Ng, W.K., Performance Measurement Framework for Hierarchical Text Classification, Journal of The American Society For Information Science And Technology, 54, pp. 1014-1028, 2003.

Zhang, Y., Dang, Y., Chen, H., Thurmond, M. &Larson, C., Automatic Online News Monitoring and Classification for Syndromic Surveillance, DSS Elsevier 2009.

Hsu, C.W. & Lin, C.J., A Comparison of Methods for Multiclass Support Vector Machines, IEEE Transactions On Neural Networks, 13, pp.415-425, 2002.

Rifkin, R. & Klautau, A., In Defense of One-Vs-All Classication, Journal of Machine Learning Research, 5, pp.101-141, 2004.

Allwein, E.L., Schapire, R.E. & Singer, Y., Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers, Journal of Machine Learning Research, 1, pp. 113-141, 2000.

Hinkel, E., Tense, Aspect and The Passive Voice in L1 And L2 Academic Texts, Language Teaching Research, 8, pp. 5-29, 2004.

WordNet: Standoff Files, http://wordnet.princeton.edu,(April 2012).

Fellbaum, C., Osherson, A. & Clark, P.E., Putting Semantics into WordNet's "Morphosemantic" Links, Springer Lecture Notes in Informatics, 5603, pp. 350-358, 2009.

Carbonell, J., Goldstein, J., The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, in Proc. of SIGIR 1998.

Khodra, M.L., Widyantoro, D.H., Aziz, E.A. & Trilaksono, B.R., Information Extraction for Scientific Paper Using Rhetorical Classifier, in Proc. of ICEEI 2011.

Gorrell, G., Ford, N., Madden, A., Holdridge, P. & Eadlestone, B., Countering Method Bias in Questionnaire-Based User Studies, Journal of Documentation, 67, pp.507-524, 2011.