Automatic Tailored Multi-Paper Summarization based on Rhetorical Document Profile and Summary Specification

Masayu Leylia Khodra, Dwi Hendratmo Widyantoro, E. Aminudin Aziz, Bambang Riyanto Trilaksono

Abstract


In  order  to  assist  researchers  in  addressing  time  constraint  and  low relevance  in  using  scientific  articles,  an  automatic  tailored  multi-paper summarization  (TMPS)  is  proposed.  In  this  paper,  we  extend  Teufel’s  tailored summary  to  deal  with  multi-papers  and  more  flexible  representation  of  user information needs. Our TMPS extracts Rhetorical Document Profile (RDP) from each paper and  presents a summary based on user information needs.  Building Plan  Language  (BPLAN)  is  introduced  as  a  formalization  of  Teufel’s  building plan  and  used  to  represent summary  specification,  which  is  more  flexible representation user information needs. Surface repair is embedded within the BPLAN  for  improving  the  readability  of  extractive summary.  Our  experiment shows that the average performance of RDP extraction module is 94.46%, which promises  high  quality  of  extracts  for  summary  composition.  Generality evaluation  shows  that  our  BPLAN  is  flexible  enough  in  composing  various forms  of summary.  Subjective  evaluation  provides evidence that  surface repair operators can improve the resulting summary readability.


Full Text:

PDF

References


Maxie, G., Critical Writing and Reading of Review Articles, The Canadian Veterinary Journal, 31(6), pp.413-414, 1990.

Torraco, R.J., Writing Integrative Literature Reviews: Guidelines and Examples, Human Resource Development Review, 4(3), 356-367, 2005.

Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D. & Zajic, D., Using Citations to Generate Surveys of Scientific Paradigms, in Proc. of HLT/ NAACL 2009.

Wang, M., Tanaka, H. & Zhong, Y., Generating Summaries of Multiple Technical Articles,in Proc. of Sino-Japan Symposium on IIN, 2000.

Fiszman, M. & Rindflesch, T.C., Abstraction Summarization for Managing the Biomedical Research Literature, in Proc. of HLT/NAACL 2004.

Jiaming, Z. Exploiting Textual Structures of Technical Papers for Automatic Multi-Document Summarization, PhD Thesis, NUS, 2008

Shiyan, O., Khoo, C.S.G. & Goh, D.H., Design and Development of A Concept-based Multi Document Summarization System for Research Abstracts, Journal of Information Science, 34, pp.308-326, 2008.

Saracevic, T., Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part III: Behavior and Effects of Relevance, Journal of The American Society For Information Science And Technology, 58, pp. 2126-2144, 2007.

Borlund, P., The Concept of Relevance, Journal of The American Society for Information Science and Technology, 54, pp. 913-925, 2003.

Teufel, S., Argumentative Zoning: Information Extraction from Scientific Text, PhD Dissertation, University of Edinburgh, 1999.

Teufel, S., Siddhartan, A. & Batchelor, C., Towards DisciplineIndependent Argumentative zoning Evidence from Chemistry and Computational linguistics, in Proc. of Conference on Empirical Methods in NLP 2009.

Jones, K. S., Automatic Summarising: The State of The Art, Information Processing and Management, 43, pp.1449-1481, 2007.

Marcu, D., Automatic Abstracting, Encyclopedia of Library and Information Science, pp.245-256, 2003.

Gupta, V. & Lehal, G.S., A Survey of Text Summarization Extractive Techniques, Journal of Emerging Technologies In Web Intelligence, 2(3), 2010.

Radev, D. R., Hovy, E. & McKeown, K., Introduction to the Special Issue on Summarization, Journal Computational Linguistics- Summarization, 28(4), 2002.

Agarwal, N. & Gvr, K., Towards Multi-Document Summarization of Scientific Articles:Making Interesting Comparisons with SciSumm, in Proc. of the Workshop on Automatic Summarization ACL 2011.

Bird, S., Dale, R., Dorr, B.J. & Gibson, B., The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational linguistics, in Proc. of Language Resources and Evaluation Conference 2008.

PDFBox, http://pdfbox.apache.org/,(November 2011).

Luebbert, D. L., Method and System for Handling Text That Includes Paragraph Delimiters of Differing Formats, US Patent June 1996.

OpenNLP, http://opennlp.sourceforge.net, (April 2011).

Reynar, J.C. & Ratnaparkhi, A., A Maximum Entropy Approach to Identifying Sentence Boundaries, in Proc. of the 5th Conference on Applied Natural Language Processing 1997.

Hyland, K. & Tse, P., Metadiscourse in Academic Writing: A Reappraisal, Applied Linguistics, 25, pp.156-177, 2004.

Hsu, C.W., Chang, C.C. & Lin, C.J., A Practical Guide to Support Vector Classification, www.csie.ntu.edu.tw/~cjlin/papers/guide/, (December 2009).

Sun, A., Lim, E.P. & Liu, Y., On Strategies for Imbalanced Text Classification Using SVM: A Comparative Study, DSS Elsevier 2009.

Chau M., Chen, H., A Machine Learning Approach to Web Page Filtering Using Content and Structure Analysis, DSS Elsevier, 44, pp. 482-494, 2008.

Sun, A., Lim, E.P. & Ng, W.K., Performance Measurement Framework for Hierarchical Text Classification, Journal of The American Society For Information Science And Technology, 54, pp. 1014-1028, 2003.

Zhang, Y., Dang, Y., Chen, H., Thurmond, M. &Larson, C., Automatic Online News Monitoring and Classification for Syndromic Surveillance, DSS Elsevier 2009.

Hsu, C.W. & Lin, C.J., A Comparison of Methods for Multiclass Support Vector Machines, IEEE Transactions On Neural Networks, 13, pp.415-425, 2002.

Rifkin, R. & Klautau, A., In Defense of One-Vs-All Classication, Journal of Machine Learning Research, 5, pp.101-141, 2004.

Allwein, E.L., Schapire, R.E. & Singer, Y., Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers, Journal of Machine Learning Research, 1, pp. 113-141, 2000.

Hinkel, E., Tense, Aspect and The Passive Voice in L1 And L2 Academic Texts, Language Teaching Research, 8, pp. 5-29, 2004.

WordNet: Standoff Files, http://wordnet.princeton.edu,(April 2012).

Fellbaum, C., Osherson, A. & Clark, P.E., Putting Semantics into WordNet’s “Morphosemantic” Links, Springer Lecture Notes in Informatics, 5603, pp. 350-358, 2009.

Carbonell, J., Goldstein, J., The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries, in Proc. of SIGIR 1998.

Khodra, M.L., Widyantoro, D.H., Aziz, E.A. & Trilaksono, B.R., Information Extraction for Scientific Paper Using Rhetorical Classifier, in Proc. of ICEEI 2011.

Gorrell, G., Ford, N., Madden, A., Holdridge, P. & Eadlestone, B., Countering Method Bias in Questionnaire-Based User Studies, Journal of Documentation, 67, pp.507-524, 2011.




DOI: http://dx.doi.org/10.5614%2Fitbj.ict.2012.6.3.4

Refbacks

  • There are currently no refbacks.


Contact Information:

ITB Journal Publisher, LPPM – ITB, 

Center for Research and Community Services (CRCS) Building Floor 7th, 
Jl. Ganesha No. 10 Bandung 40132, Indonesia,

Tel. +62-22-86010080,

Fax.: +62-22-86010051;

e-mail: jictra@lppm.itb.ac.id.