This page offers a small selection of papers about measures of distinctiveness, which offer a good introduction. The project bibliography (under construction) is available on Zotero.

The list of recommended literature can be found here.

Search the bibliography:

Achananuparp, Palakorn, Xiaohua Hu, and Xiajiong Shen, ‘The Evaluation of Sentence Similarity Measures’, in Data Warehousing and Knowledge Discovery, ed. by Il-Yeol Song, Johann Eder, and Tho Manh Nguyen (Berlin, Heidelberg: Springer Berlin Heidelberg, 2008), mmmmmclxxxii, 305–16 <>
Adamzik, Kirsten, Textsorten – Texttypologie. Eine Kommentierte Bibliographie (Münster: Nodus, 1995) <>
Adamzik, Kirsten, Roger Gaberell, and Gottfried Kolde, Kontrastive Textologie: Untersuchungen zur deutschen und französischen Sprach- und Literaturwissenschaft, Textsorten, 2 (Tübingen: Stauffenburg-Verl, 2001)
Albitar, Shereen, Sébastien Fournier, and Bernard Espinasse, ‘An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification’, in Web Information Systems Engineering – WISE 2014, ed. by Boualem Benatallah, Azer Bestavros, Yannis Manolopoulos, Athena Vakali, and Yanchun Zhang (Cham: Springer International Publishing, 2014), 105–14 <>
Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter, ‘Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words’, ed. by Enrico Scalas, PLoS ONE, 4.11 (2009), e7678 <>
Altmann, E. G., G. Cristadoro, and M. D. Esposti, ‘On the Origin of Long-Range Correlations in Texts’, Proceedings of the National Academy of Sciences, 109.29 (2012), 11582–87 <>
André Salem, Ludovic Lebart, ‘Statistique Textuelle’, ResearchGate, 1994 <> [accessed 7 September 2019]
Angenot, Marc, Le roman populaire: recherches en paralittérature (Montreal: Presses de l’université du Québec, 1975) <> [accessed 31 October 2018]
Angenot, Marc, Les Dehors de La Littérature, Unichamp Essentiel ; 31 (Champion, 2013)
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, 2005, pp. 729–37 <>
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, in IPCC 2005. Proceedings. International Professional Communication Conference, 2005. (IEEE, 2005), pp. 729–737
Auer, P., ‘Anmerkungen Zum Salienzbegriff in Der Soziolinguistik’, Linguistik Online, 66.4 (2014) <>
Auerbach, Erich, Mimesis: dargestellte Wirklichkeit in der abendländischen Literatur, Sammlung Dalp, 11. Auflage (Tübingen: A. Francke Verlag, 2015)
Baayen, Harald, ‘Statistical Models for Word Frequency Distributions: A Linguistic Evaluation’, Computers and the Humanities, 26.5–6 (1992), 347–63 <>
Baayen, R. H., Analyzing Linguistic Data: A Practical Introduction to Statistics Using R (Cambridge: Cambridge University Press, 2008) <>
Baeza-Yates, Ricardo, and Berthier Ribeiro Neto, Modern Information Retrieval (Harlow, 1999)
Baker, L. Douglas, and Andrew Kachites McCallum, ‘Distributional Clustering of Words for Text Classification’, in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  - SIGIR ’98 (presented at the the 21st annual international ACM SIGIR conference, Melbourne, Australia: ACM Press, 1998), pp. 96–103 <>
Baron, Alistair, Paul Rayson, and Dawn Archer, ‘Word Frequency and Key Word Statistics in Historical Corpus Linguistics’, Anglistik, 20.1 (2009)
Bassi, Erica, ‘A Contrastive Analysis of Keywords in Newspaper Articles on the “Kyoto Protocol”’, in Keyness in Texts, 2010 <> [accessed 23 July 2020]
Beliga, Slobodan, ‘Keyword Extraction: A Review of Methods and Approaches’, University of Rijeka, Department of Informatics, Rijeka, 2014, 1–9
Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić, ‘Selectivity-Based Keyword Extraction Method’, International Journal on Semantic Web and Information Systems (IJSWIS), 12.3 (2016), 1–26
Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić, ‘An Overview of Graph-Based Keyword Extraction Methods and Approaches’, Journal of Information and Organizational Sciences, 39.1 (2015), 1–20
Bertels, Ann, and Dirk Speelman, ‘“Keywords Method” versus “Calcul Des Spécificités”: A Comparison of Tools and Methods’, International Journal of Corpus Linguistics, 18.4 (2013), 536–60 <>
Bestgen, Yves, ‘Inadequacy of the Chi-Squared Test to Examine Vocabulary Differences between Corpora’, Literary and Linguistic Computing, 29.2 (2014), 164–70 <>
Bestgen, Yves, ‘Evaluating the Frequency Threshold for Selecting Lexical Bundles by Means of an Extension of the Fisher’s Exact Test’, Corpora, 13.2 (2018), 205–28 <>
Bharti, Santosh Kumar, and Korra Sathya Babu, ‘Automatic Keyword Extraction for Text Summarization: A Survey’, ArXiv Preprint ArXiv:1704.03242, 2017
Biber, Douglas, Variation Across Speech and Writing (Cambridge, 1988)
Biber, Douglas, ‘Methodological Issues Regarding Corpus-Based Analyses of Linguistic Variation’, Literary and Linguistic Computing, 5.4 (1990), 257–69 <>
Biber, Douglas, Biber Douglas, Professor Douglas Biber, Susan Conrad, and Randi Reppen, Corpus Linguistics: Investigating Language Structure and Use (Cambridge University Press, 1998)
Biemann, Chris, and Alexander Mehler, Text Mining: From Ontology Learning to Automated Text Processing Applications (Springer, 2014)
Blumenthal-Dramé, Alice, Adriana Hanulíková, and Bernd Kortmann, ‘Editorial: Perceptual Linguistic Salience: Modeling Causes and Consequences’, Frontiers in Psychology, 8 (2017) <>
Bondi, Marina, ‘Perspectives on Keywords and Keyness’, in Keyness in Texts, 2010 <> [accessed 23 July 2020]
Bonin, Emmanuel, and Alain Dallo, ‘Hyperbase et Lexico 3, outils lexicométriques pour l’historien’, Histoire & mesure, XVIII.3/4 (2003), 389–402 <>
Bordet, Geneviève, ‘Marina Bondi (Dir.), Mike Scott (Dir.), Keyness in Texts. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 2010’, ASp. La Revue Du GERAS, 71, 2017, 179–88 <> [accessed 23 July 2020]
Breuer, Ulrich, ‘Text/Sorte/Genre: Konkurrenz Und Konvergenz Linguistischer Und Literaturwissenschaftlicher Klassifikationen?’, Mitteilungen des deutschen Germanistenverbands, 44, 1997, 53–63
Brezina, Vaclav, and Miriam Meyerhoff, ‘Significant or Random?: A Critical Review of Sociolinguistic Generalisations Based on Large Corpora’, International Journal of Corpus Linguistics, 19.1 (2014), 1–28 <>
Brinegar, Claude S., ‘Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship’, Journal of the American Statistical Association, 58.301 (1963), 85–96 <>
Brinker, Klaus, Linguistische Textanalyse (Tübingen, 2001)
Bruza, P. D., D. W. Song, and K. F. Wong, ‘Aboutness from a Commonsense Perspective’, Journal of the American Society for Information Science, 51.12 (2000), 1090–1105 <<::AID-ASI1026>3.0.CO;2-Y>
Burrows, J. F., ‘Not Unless You Ask Nicely: The Interpretative Nexus Between Analysis and Information’, Literary and Linguistic Computing, 7.2 (1992), 91–109 <>
Burrows, John, ‘Who Wrote Shamela? Verifying the Authorship of a Parodic Text’, Digital Scholarship in the Humanities, 20.4 (2005), 437–50 <>
Burrows, John, ‘All the Way Through: Testing for Authorship in Different Frequency Strata’, Literary and Linguistic Computing, 22.1 (2007), 27–47 <>
Burrows, John, and Hugh Craig, ‘Lucy Hutchinson and the Authorship of Two Seventeenth-Century Poems: A Computational Approach’, The Seventeenth Century, 16.2 (2001), 259–82 <>
Campos, Ricardo, Vítor Mangaravite, Arian Pasquali, Alipio Jorge, Célia Nunes, and Adam Jatowt, ‘YAKE! Keyword Extraction from Single Documents Using Multiple Local Features’, Information Sciences, 509 (2020), 257–289
Chang, Kent K., and Simon DeDeo, ‘Divergence and the Complexity of Difference in Text and Culture’, Journal of Cultural Analytics, 2020, 17585 <>
Chen, Francine R., Thorsten H. Brants, and Annie E. Zaenen, ‘Systems and Methods for Sentence Based Interactive Topic-Based Text Summarization’, 2008 <> [accessed 17 September 2019]
Chen, Kewen, Zuping Zhang, Jun Long, and Hao Zhang, ‘Turning from TF-IDF to TF-IGM for Term Weighting in Text Classification’, Expert Systems with Applications, 66 (2016), 245–60 <>
Church, Kenneth, and William Gale, ‘Inverse Document Frequency (IDF): A Measure of Deviations from Poisson’, in Third Workshop on Very Large Corpora, 1995 <> [accessed 12 June 2020]
Clement, R., ‘Ngram and Bayesian Classification of Documents for Topic and Authorship’, Literary and Linguistic Computing, 18.4 (2003), 423–47 <>
Conference on Artificial Intelligence, Innovative Applications of Artificial Intelligence Conference, and Association for the Advancement of Artificial Intelligence, eds., Single Document Keyphrase Extraction Using Neighborhood Knowledge (Menlo Park, Calif: AAAI Press, 2008)