Free Model of Sentence Classifier for Automatic Extraction of Topic Sentences
This research employs free model that uses only sentential features without paragraph context to extract topic sentences of a paragraph. For finding optimal combination of features, corpus-based classification is used for constructing a sentence classifier as the model. The sentence classifier is trained by using Support Vector Machine (SVM). The experiment shows that position and meta-discourse features are more important than syntactic features to extract topic sentence, and the best performer (80.68%) is SVM classifier with all features.
Jinha, A.E., Article 50 Million: An Estimate of the Number of Scholarly Articles in Existence, 2010.
Jones, K.S., Automatic summarising: The state of the art. Information Processing and Management, 43, pp. 1449-1481, 2007.
Teufel, S. Argumentative Zoning: Information Extraction from Scientific Text. PhD Dissertation, University of Edinburgh, 1999.
Khodra, M.L., Widyantoro, D.H., Aziz, E.A., Trilaksono, B.R., Konstruksi Koleksi Utama Paragraf, in Proc. Konferensi Nasional Informatika, 2010.
McCarthy, P.M., et al., Identifying Topic Sentencehood, Behavior Research Methods, http://brm.psychonomic-journals.org/ , 2008.
Kaplan, R., Cultural Thought Patterns in Inter-Cultural Education.Landmark Essay on ESL Writing, 1966.
Baxendale, P.B., Machine-made index for technical literature—an experiment. IBM Journal of Research and Development, 1958.
Smith, C.G., Braddock Revisited: The Frequency and Placement of Topic Sentences in Academic Writing, The Reading Matrix, 8(1), pp. 78-95, 2008.
Theijssen, D., Features for Automatic Discourse Analysis of Paragraphs: Finding Features to Detect Rhetorical Relations Between Sentences Within Paragraphs, Master thesis, Department of Linguistics, Radboud University Nijmegen, 2007.
Hyland, K. & Tse, P., Metadiscourse in Academic Writing: A Reappraisal, Applied Linguistics 25/2, pp. 156-177, Oxford University Press, 2004.
Kupiec, J., et al., A Trainable Document Summarizer, ACM SIGIR, 1995.
Teufel, S., Moens, M., Sentence Extraction as A Classification Task, Proceedings of the ACL, 1997.
ACL Anthology Reference Corpus (ACL ARC): http://aclarc.comp.nus.edu.sg/ (August 2009).
Bird, S., et al., The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics, in Proc. of Language Resources and Evaluation Conference (LREC 08), Marrakesh, Morocco, May 2008.
Stanford Parser: a statistical parser, The Stanford Natural Language Processing Group, http://nlp.stanford.edu/software/lex-parser.shtml, March 18th , 2010.
The Penn Treebank Project, http://www.cis.upenn.edu/~Treebank/, October 2nd, 2010.
Relationship between sentences, http://www1.fccj.org/lchandouts/reading labhandouts/R6%20Rel.%20bet.%20Sentences.doc , April 22nd, 2010.
WordNet: a lexical database for English, Princeton University, http://wordnet.princeton.edu/, December 10th, 2009.
MIT Java Wordnet Interface, MIT, http://projects.csail.mit.edu/jwi/ , December 10th, 2009.
MRC psycholinguistics database, http://www.psy.uwa.edu.au/mrcdataba se/uwa_mrc.htm , March 22nd, 2010.
jMRC - MRC Psycholinguistic Database Java Interface v0.9, http://mi.eng.cam.ac.uk/~farm2/jmrc/index.html , March 22nd, 2010.
Kohavi, R. & John, G., Wrappers for feature subset selection, Artificial Intelligence, 97(1-2), pp. 273-324, 1997
Paz, E.C., et al., Feature Selection in Scientific Applications, in Proc. International Conference on Knowledge Discovery and Data Mining, 2004
Joachims, T., Learning To Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms, Dissertation, University Dortmund, Kluwer Academic Publishers, 2001.
Chih-Chung Chang and Chih-Jen Lin, LIBSVM -- A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ , November 19th, 2009.
Platt, J.C., Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, in Advances in Large Margin Classifiers, MIT Press, 2009.
Lin, H.T., et al., A Note on Platt’s Probabilistic Outputs for Support Vector Machines, Technical Report, Department of Computer Science, National Taiwan University, 2004
Hsu, C.W., et al., Practical Guide to Support Vector Classification, http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, December 16th, 2009.
Sebastiani, F., Machine Learning in Automated Text Categorization, ACM Computing Surveys, 34(1), March 2002.
- There are currently no refbacks.
ITB Journal Publisher, LPPM – ITB,
Center for Research and Community Services (CRCS) Building Floor 7th,
Jl. Ganesha No. 10 Bandung 40132, Indonesia,