Analysis of parameters on author attribution of Spanish electronic short texts

  1. Mario Crespo 1
  1. 1 Illinois State University
    info

    Illinois State University

    Normal, Estados Unidos

    ROR https://ror.org/050kcr883

Revista:
Research in Corpus Linguistics (RiCL)

ISSN: 2243-4712

Año de publicación: 2016

Número: 4

Páginas: 25-32

Tipo: Artículo

DOI: 10.32714/RICL.04.03 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: Research in Corpus Linguistics (RiCL)

Resumen

Abstract – Forensic Linguistics is the analysis of the language which is related to law, either as evidence or as legal discourse. Authorship attribution is the task of identifying the author of a document when the language is used as evidence in a courtroom, so it will be of interest to police investigators and the wider judicial process. Recent advances in Forensic Linguistics are related to the analysis of texts coming from emails, social networks and messages coming from mobile phones. This work continues previous research and explores how different classification algorithms, the size of the text and the type of linguistic feature used in authorship attribution may affect the results in the authorship attribution of Spanish short messages on online forums. Important differences in precision have been assessed when varying both the size of the texts investigated and the algorithms used for classification.

Referencias bibliográficas

  • Abbasi, Ahmed and Hsinchun Chen. 2005. Applying authorship analysis to extremist-group web forum messages. Intelligent Systems IEEE 20/5: 67–75.
  • Aggarwal, Charu C. 2014. Data classification: algorithms and applications. New York: CRC Press.
  • Bhargava, Mudit, Pulkit Mehndiratta and Krishma Asawa. 2013. Stylometric analysis for authorship attribution on twitter. In Vasudha Bhatnagar and Srinath Srinivasa eds. Big data analytics. Switzerland: Springer International Publishing, 37–47.
  • Bobicev, Victoria, Marina Sokolova, Khaled El Emam and Stan Matwin. 2013. Authorship attribution in health forums. In Galia Angelova, Kalina Bontcheva and Ruslan Mitkov eds. RANLP: Proceedings of Recent Advances in Natural Language Processing. Bulgaria: INCOMA Ltd, 74–82.
  • Coulthard, Malcolm and Alice Johnson. 2007. An introduction to forensic linguistics: language in evidence. New York: Routledge.
  • Crespo, Mario. 2015. Stylistic authorship comparison and attribution of Spanish news. Procedia – Social and Behavioral Sciences 212: 198–204.
  • Cristani, Marco, Giorgio Roffo, Cristina Segalin, Loris Bazzani, Alessandro Vinciarelli and Vittorio Murino. 2012. Conversationally-inspired stylometric features for authorship attribution in instant messaging. Proceedings of the 20th ACM International Conference on Multimedia, 1121–1124.
  • Eder, Maciej and Jan Rybicki. 2011. Stylometry with R. Digital Humanities 2011: conference abstracts, 308–311.
  • Eder, Maciej, Jan Rybicki and Mike Kestemont. 2014. Stylo: a package for stylometric analyses. <http://goo.gl/pYppNN>.
  • El Manar El Bouanani, Sara and Ismail Kassou. 2014. Authorship analysis studies: a survey. International Journal of Computer Applications 86/12: 22–29.
  • Grant, Tim. 2008. Approaching questions in forensic authorship analysis. In John Gibbons and María Teresa Turell eds. Dimensions of forensic linguistics. Vol. 5. Amsterdam: John Benjamins, 215–229.
  • Grieve, Jack. 2007. Quantitative authorship attribution: an evaluation of techniques. Literary and Linguistic Computing 22/3: 251–270.
  • Guillén Nieto, Victoria, Chelo Vargas Sierra, María Pardiño Juan, Patricio Martínez Barco and Armando Suárez Cueto. 2008. Exploring state-of-the-art software for forensic authorship identification. International Journal of English Studies 8: 1–28.
  • Jiménez, Miriam. 2012. La lingüística forense: licencia para investigar la lengua. In Elena Garayzábal, Miriam Jiménez and Mercedes Reigosa eds. Lingüística forense: la lingüística en el ámbito legal y policial. Madrid: Euphonia, 28– 50.
  • McMenamin, Gerald R. 2002. Forensic linguistics: advances in forensic stylistics. Boca Raton: CRC.
  • McMenamin, Gerald R. 2010. Forensic stylistics: theory and practice of forensic stylistics. In Malcolm Coulthard and Alice Johnson eds. The Routledge handbook of forensic linguistics. London: Routledge, 473–486.
  • Olsson, John. 2004. Forensic linguistics. An introduction to language, crime and the law. London: Continuum.
  • Olsson, John and June Luchjenbroers. 2014. Forensic linguistics. New York: Bloomsbury.
  • Pallmann, Philip. 2015. Applied meta-analysis with R. Journal of Applied Statistics 42/4: 914–915.
  • Picornell García, Isabel. 2012. La aplicación de atribución de autoría en la investigación e inteligencia: la aplicación práctica. In Elena Garayzábal, Miriam Jiménez and Mercedes Reigosa eds. Lingüística forense: la lingüística en el ámbito legal y policial. Madrid: Euphonia, 80–93.
  • Rico Sulayes, Antonio. 2012. Quantitative authorship attribution of users of Mexican drug dealing related online forums. PhD. Georgetown University.
  • Silva, Rui Sousa, Gustavo Laboreiro, Luís Sarmento, Tim Grant, Eugénio Oliveira and Belinda Maia. 2012. Twazn me!!! (automatic authorship analysis of micro-blogging messages). In Rafael Muñoz, Andrés Montoyo, Elisabeth Métais eds. Natural Language Processing and information systems. Heidelberg: Springer, 161–168.
  • Stamatatos, Efstathios. 2009. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60/3: 538–556.
  • Tamboli, Mubin Shaukat and Rajesh S. Prasad. 2013. Authorship analysis and identification techniques: a review. International Journal of Computer Applications 77/16: 11–15.
  • Turell, María Teresa. 2008. Plagiarism. In John Gibbons and María Teresa Turell eds. Dimensions of forensic linguistics. Vol. 5. Amsterdam: John Benjamins, 265–299.
  • Zheng, Rong, Jiexun Li, Hsinchun Chen and Zan Huang. 2006. A framework for authorship identification of online messages: writing‐style features and classification. Journal of the American Society for Information Science and Technology 57/3: 378–393.