This page offers a small selection of papers about measures of distinctiveness, which offer a good introduction. The project bibliography (under construction) is available on Zotero.

The list of recommended literature can be found here.

Search the bibliography:

Achananuparp, Palakorn, Xiaohua Hu, and Xiajiong Shen, ‘The Evaluation of Sentence Similarity Measures’, in Data Warehousing and Knowledge Discovery, ed. by Il-Yeol Song, Johann Eder, and Tho Manh Nguyen (Berlin, Heidelberg: Springer Berlin Heidelberg, 2008), mmmmmclxxxii, 305–16 <>
Adamzik, Kirsten, Textsorten – Texttypologie. Eine Kommentierte Bibliographie (Münster: Nodus, 1995) <>
Adamzik, Kirsten, Roger Gaberell, and Gottfried Kolde, Kontrastive Textologie: Untersuchungen zur deutschen und französischen Sprach- und Literaturwissenschaft, Textsorten, 2 (Tübingen: Stauffenburg-Verl, 2001)
Albitar, Shereen, Sébastien Fournier, and Bernard Espinasse, ‘An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification’, in Web Information Systems Engineering – WISE 2014, ed. by Boualem Benatallah, Azer Bestavros, Yannis Manolopoulos, Athena Vakali, and Yanchun Zhang (Cham: Springer International Publishing, 2014), 105–14 <>
Allison, Sarah, Ryan Heuser, Matthew Jockers, Franco Moretti, and Michael Witmore, ‘Quantitative Formalism: An Experiment (Stanford Literary Lab, Pamphlet 1)’, Pamphlets of the Stanford Literary Lab, 2011 <>
Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter, ‘Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words’, ed. by Enrico Scalas, PLoS ONE, 4.11 (2009), e7678 <>
Altmann, E. G., G. Cristadoro, and M. D. Esposti, ‘On the Origin of Long-Range Correlations in Texts’, Proceedings of the National Academy of Sciences, 109.29 (2012), 11582–87 <>
Anderson, Wendy J., The Phraseology of Administrative French: A Corpus-Based Study, Language and Computers, 57 (Amsterdam: Rodopi, 2006)
André Salem, Ludovic Lebart, ‘Statistique Textuelle’, ResearchGate, 1994 <> [accessed 7 September 2019]
Andrej Karpathy, Let’s Build GPT: From Scratch, in Code, Spelled Out., 2023 <> [accessed 8 January 2024]
Angenot, Marc, Les Dehors de La Littérature, Unichamp Essentiel ; 31 (Champion, 2013)
Angenot, Marc, Le roman populaire: recherches en paralittérature (Montreal: Presses de l’université du Québec, 1975) <> [accessed 31 October 2018]
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, 2005, pp. 729–37 <>
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, in IPCC 2005. Proceedings. International Professional Communication Conference, 2005. (IEEE, 2005), pp. 729–37
Aubry, Danielle, Du roman-feuilleton à la série télévisuelle: pour une rhétorique du genre et de la sérialité (Peter Lang, 2006)
Auer, P., ‘Anmerkungen Zum Salienzbegriff in Der Soziolinguistik’, Linguistik Online, 66.4 (2014) <>
Auerbach, Erich, Mimesis: dargestellte Wirklichkeit in der abendländischen Literatur, Sammlung Dalp, 11. Auflage (Tübingen: A. Francke Verlag, 2015)
Baayen, Harald, ‘Statistical Models for Word Frequency Distributions: A Linguistic Evaluation’, Computers and the Humanities, 26.5–6 (1992), 347–63 <>
Baayen, R. H., Analyzing Linguistic Data: A Practical Introduction to Statistics Using R (Cambridge: Cambridge University Press, 2008) <>
Babbie, Earl R., The Practice of Social Research / Earl Babbie, Internat. ed., 12. ed. (Belmont, Calif: Wadsworth Cengage Learning, 2010)
Baeza-Yates, Ricardo, and Berthier Ribeiro Neto, Modern Information Retrieval (Harlow, 1999)
Baker, Paul, ‘Querying Keywords: Questions of Difference, Frequency, and Sense in Keywords Analysis’, Journal of English Linguistics, 32.4 (2004), 346–59 <>
Baker, L. Douglas, and Andrew Kachites McCallum, ‘Distributional Clustering of Words for Text Classification’, in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  - SIGIR ’98 (presented at the the 21st annual international ACM SIGIR conference, Melbourne, Australia: ACM Press, 1998), pp. 96–103 <>
Barber, Ros, ‘Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI’, Digital Scholarship in the Humanities, 36.3 (2021), 542–64 <>
Baron, Alistair, Paul Rayson, and Dawn Archer, ‘Word Frequency and Key Word Statistics in Historical Corpus Linguistics’, Anglistik, 20.1 (2009)
Bassi, Erica, ‘A Contrastive Analysis of Keywords in Newspaper Articles on the “Kyoto Protocol”’, in Keyness in Texts, 2010 <> [accessed 23 July 2020]
Baudou, Jacques, La Science-fiction, Que sais-je?, 1e édition (Presses Universitaires de France - PUF, 2003)
Baumann, Christiane, and Gisela Lerch, Extreme Gegenwart: französische Literatur der 80er Jahre (Bremen: Manholt, 1989)
Beliga, Slobodan, ‘Keyword Extraction: A Review of Methods and Approaches’, University of Rijeka, Department of Informatics, Rijeka, 2014, 1–9
Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić, ‘Selectivity-Based Keyword Extraction Method’, International Journal on Semantic Web and Information Systems (IJSWIS), 12.3 (2016), 1–26
Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić, ‘An Overview of Graph-Based Keyword Extraction Methods and Approaches’, Journal of Information and Organizational Sciences, 39.1 (2015), 1–20
Bentolila, Éric, ‘Le Roman Policier Français de 1970 et 2000 : Une Analyse Littéraire’ (unpublished These de doctorat, Université Grenoble Alpes (ComUE), 2016) <> [accessed 8 February 2023]
Bertels, Ann, and Dirk Speelman, ‘“Keywords Method” versus “Calcul Des Spécificités”: A Comparison of Tools and Methods’, International Journal of Corpus Linguistics, 18.4 (2013), 536–60 <>
Bestgen, Yves, ‘Inadequacy of the Chi-Squared Test to Examine Vocabulary Differences between Corpora’, Literary and Linguistic Computing, 29.2 (2014), 164–70 <>
Bestgen, Yves, ‘Evaluating the Frequency Threshold for Selecting Lexical Bundles by Means of an Extension of the Fisher’s Exact Test’, Corpora, 13.2 (2018), 205–28 <>
Bharti, Santosh Kumar, and Korra Sathya Babu, ‘Automatic Keyword Extraction for Text Summarization: A Survey’, ArXiv Preprint ArXiv:1704.03242, 2017
Biber, Douglas, Variation Across Speech and Writing (Cambridge, 1988)
Biber, Douglas, ‘Methodological Issues Regarding Corpus-Based Analyses of Linguistic Variation’, Literary and Linguistic Computing, 5.4 (1990), 257–69 <>
Biber, Douglas, Variation across Speech and Writing (Cambridge University Press, 1991)
Biber, Douglas, Biber Douglas, Professor Douglas Biber, Susan Conrad, and Randi Reppen, Corpus Linguistics: Investigating Language Structure and Use (Cambridge University Press, 1998)
Biemann, Chris, and Alexander Mehler, Text Mining: From Ontology Learning to Automated Text Processing Applications (Springer, 2014)
Blei, David M., ‘Probabilistic Topic Models’, Communications of the ACM, 55.4 (2012), 77 <>
Bleton, Paul, ‘“Meurtre” ne rime à rien. La ville dans le roman policier français des années 1958-1981’, Revue critique de fixxion française contemporaine, 10, 2014, 13–23
Blumenthal-Dramé, Alice, Adriana Hanulíková, and Bernd Kortmann, ‘Editorial: Perceptual Linguistic Salience: Modeling Causes and Consequences’, Frontiers in Psychology, 8 (2017) <>
Boleda, Gemma, ‘Distributional Semantics and Linguistic Theory’, Annual Review of Linguistics, 6.1 (2020), 213–34 <>
Bondi, Marina, ‘Perspectives on Keywords and Keyness’, in Keyness in Texts, 2010 <> [accessed 23 July 2020]
Bondi, Marina, and Mike Scott, eds., Keyness in Texts, Studies in Corpus Linguistics, v. 41 (Amsterdam ; Philadelphia: John Benjamins Pub. Co, 2010)
Bondi, Marina, and Mike Scott, eds., Keyness in Texts, Studies in Corpus Linguistics, Volume 41 (Amsterdam Philadelphia: John bejamins Publishing Company, 2010)
Bonin, Emmanuel, and Alain Dallo, ‘Hyperbase et Lexico 3, outils lexicométriques pour l’historien’, Histoire & mesure, XVIII.3/4 (2003), 389–402 <>
Bordet, Geneviève, ‘Marina Bondi (Dir.), Mike Scott (Dir.), Keyness in Texts. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 2010’, ASp. La Revue Du GERAS, 71, 2017, 179–88 <> [accessed 23 July 2020]