This page offers a small selection of papers about measures of distinctiveness, which offer a good introduction. The project bibliography (under construction) is available on Zotero.

The list of recommended literature can be found here.

Search the bibliography:

thinking
Achananuparp, Palakorn, Xiaohua Hu, and Xiajiong Shen, ‘The Evaluation of Sentence Similarity Measures’, in Data Warehousing and Knowledge Discovery, ed. by Il-Yeol Song, Johann Eder, and Tho Manh Nguyen (Berlin, Heidelberg: Springer Berlin Heidelberg, 2008), mmmmmclxxxii, 305–16 <https://doi.org/10.1007/978-3-540-85836-2_29>
Adamzik, Kirsten, Textsorten – Texttypologie. Eine Kommentierte Bibliographie (Münster: Nodus, 1995) <https://www.unige.ch/lettres/alman/fr/enseignants/anciens/adamzik/arbeitskreis-textsorten/>
Adamzik, Kirsten, Roger Gaberell, and Gottfried Kolde, Kontrastive Textologie: Untersuchungen zur deutschen und französischen Sprach- und Literaturwissenschaft, Textsorten, 2 (Tübingen: Stauffenburg-Verl, 2001)
Albitar, Shereen, Sébastien Fournier, and Bernard Espinasse, ‘An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification’, in Web Information Systems Engineering – WISE 2014, ed. by Boualem Benatallah, Azer Bestavros, Yannis Manolopoulos, Athena Vakali, and Yanchun Zhang (Cham: Springer International Publishing, 2014), 105–14 <https://doi.org/10.1007/978-3-319-11749-2_8>
Allison, Sarah, Ryan Heuser, Matthew Jockers, Franco Moretti, and Michael Witmore, ‘Quantitative Formalism: An Experiment (Stanford Literary Lab, Pamphlet 1)’, Pamphlets of the Stanford Literary Lab, 2011 <https://litlab.stanford.edu/LiteraryLabPamphlet1.pdf>
Altmann, Eduardo G., Janet B. Pierrehumbert, and Adilson E. Motter, ‘Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words’, ed. by Enrico Scalas, PLoS ONE, 4.11 (2009), e7678 <https://doi.org/10.1371/journal.pone.0007678>
Altmann, E. G., G. Cristadoro, and M. D. Esposti, ‘On the Origin of Long-Range Correlations in Texts’, Proceedings of the National Academy of Sciences, 109.29 (2012), 11582–87 <https://doi.org/10.1073/pnas.1117723109>
Anderson, Wendy J., The Phraseology of Administrative French: A Corpus-Based Study, Language and Computers, 57 (Amsterdam: Rodopi, 2006)
André Salem, Ludovic Lebart, ‘Statistique Textuelle’, ResearchGate, 1994 <https://www.researchgate.net/publication/44832136_Statistique_textuelle> [accessed 7 September 2019]
Angenot, Marc, Les Dehors de La Littérature, Unichamp Essentiel ; 31 (Champion, 2013)
Angenot, Marc, Le roman populaire: recherches en paralittérature (Montreal: Presses de l’université du Québec, 1975) <http://digitale-objekte.hbz-nrw.de/storage/2009/11/07/file_18/3292944.pdf> [accessed 31 October 2018]
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, in IPCC 2005. Proceedings. International Professional Communication Conference, 2005. (IEEE, 2005), pp. 729–37
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, 2005, pp. 729–37 <https://doi.org/10.1109/IPCC.2005.1494244>
Auer, P., ‘Anmerkungen Zum Salienzbegriff in Der Soziolinguistik’, Linguistik Online, 66.4 (2014) <https://doi.org/https://doi.org/10.13092/lo.66.1569>
Auerbach, Erich, Mimesis: dargestellte Wirklichkeit in der abendländischen Literatur, Sammlung Dalp, 11. Auflage (Tübingen: A. Francke Verlag, 2015)
Baayen, Harald, ‘Statistical Models for Word Frequency Distributions: A Linguistic Evaluation’, Computers and the Humanities, 26.5–6 (1992), 347–63 <https://doi.org/10.1007/BF00136980>
Baayen, R. H., Analyzing Linguistic Data: A Practical Introduction to Statistics Using R (Cambridge: Cambridge University Press, 2008) <https://doi.org/10.1017/CBO9780511801686>
Baeza-Yates, Ricardo, and Berthier Ribeiro Neto, Modern Information Retrieval (Harlow, 1999)
Baker, L. Douglas, and Andrew Kachites McCallum, ‘Distributional Clustering of Words for Text Classification’, in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  - SIGIR ’98 (presented at the the 21st annual international ACM SIGIR conference, Melbourne, Australia: ACM Press, 1998), pp. 96–103 <https://doi.org/10.1145/290941.290970>
Barber, Ros, ‘Big Data or Not Enough? Zeta Test Reliability and the Attribution of Henry VI’, Digital Scholarship in the Humanities, 36.3 (2021), 542–64 <https://doi.org/10.1093/llc/fqaa041>
Baron, Alistair, Paul Rayson, and Dawn Archer, ‘Word Frequency and Key Word Statistics in Historical Corpus Linguistics’, Anglistik, 20.1 (2009)
Bassi, Erica, ‘A Contrastive Analysis of Keywords in Newspaper Articles on the “Kyoto Protocol”’, in Keyness in Texts, 2010 <https://benjamins.com/catalog/scl.41.15bas> [accessed 23 July 2020]
Beliga, Slobodan, ‘Keyword Extraction: A Review of Methods and Approaches’, University of Rijeka, Department of Informatics, Rijeka, 2014, 1–9
Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić, ‘Selectivity-Based Keyword Extraction Method’, International Journal on Semantic Web and Information Systems (IJSWIS), 12.3 (2016), 1–26
Beliga, Slobodan, Ana Meštrović, and Sanda Martinčić-Ipšić, ‘An Overview of Graph-Based Keyword Extraction Methods and Approaches’, Journal of Information and Organizational Sciences, 39.1 (2015), 1–20
Bertels, Ann, and Dirk Speelman, ‘“Keywords Method” versus “Calcul Des Spécificités”: A Comparison of Tools and Methods’, International Journal of Corpus Linguistics, 18.4 (2013), 536–60 <https://doi.org/10.1075/ijcl.18.4.04ber>
Bestgen, Yves, ‘Inadequacy of the Chi-Squared Test to Examine Vocabulary Differences between Corpora’, Literary and Linguistic Computing, 29.2 (2014), 164–70 <https://doi.org/10.1093/llc/fqt020>
Bestgen, Yves, ‘Evaluating the Frequency Threshold for Selecting Lexical Bundles by Means of an Extension of the Fisher’s Exact Test’, Corpora, 13.2 (2018), 205–28 <https://doi.org/10.3366/cor.2018.0144>
Bharti, Santosh Kumar, and Korra Sathya Babu, ‘Automatic Keyword Extraction for Text Summarization: A Survey’, ArXiv Preprint ArXiv:1704.03242, 2017
Biber, Douglas, Variation Across Speech and Writing (Cambridge, 1988)
Biber, Douglas, ‘Methodological Issues Regarding Corpus-Based Analyses of Linguistic Variation’, Literary and Linguistic Computing, 5.4 (1990), 257–69 <https://doi.org/10.1093/llc/5.4.257>
Biber, Douglas, Variation across Speech and Writing (Cambridge University Press, 1991)
Biber, Douglas, Biber Douglas, Professor Douglas Biber, Susan Conrad, and Randi Reppen, Corpus Linguistics: Investigating Language Structure and Use (Cambridge University Press, 1998)
Biemann, Chris, and Alexander Mehler, Text Mining: From Ontology Learning to Automated Text Processing Applications (Springer, 2014)
Blei, David M., ‘Probabilistic Topic Models’, Communications of the ACM, 55.4 (2012), 77 <https://doi.org/10.1145/2133806.2133826>
Bleton, Paul, ‘“Meurtre” Ne Rime à Rien. La Ville Dans Le Roman Policier Français Des Années 1958-1981’, Revue Critique de Fixxion Française Contemporaine, 10, 2014, 13–23
Blumenthal-Dramé, Alice, Adriana Hanulíková, and Bernd Kortmann, ‘Editorial: Perceptual Linguistic Salience: Modeling Causes and Consequences’, Frontiers in Psychology, 8 (2017) <https://doi.org/10.3389/fpsyg.2017.00411>
Boleda, Gemma, ‘Distributional Semantics and Linguistic Theory’, Annual Review of Linguistics, 6.1 (2020), 213–34 <https://doi.org/10.1146/annurev-linguistics-011619-030303>
Bondi, Marina, ‘Perspectives on Keywords and Keyness’, in Keyness in Texts, 2010 <https://benjamins.com/catalog/scl.41.01bon> [accessed 23 July 2020]
Bonin, Emmanuel, and Alain Dallo, ‘Hyperbase et Lexico 3, outils lexicométriques pour l’historien’, Histoire & mesure, XVIII.3/4 (2003), 389–402 <https://doi.org/10.4000/histoiremesure.840>
Bordet, Geneviève, ‘Marina Bondi (Dir.), Mike Scott (Dir.), Keyness in Texts. Amsterdam/ Philadelphia: John Benjamins Publishing Company, 2010’, ASp. La Revue Du GERAS, 71, 2017, 179–88 <http://journals.openedition.org/asp/4932> [accessed 23 July 2020]
Bouma, Gerlof, ‘Normalized (Pointwise) Mutual Information in Collocation Extraction’, Proceedings of GSCL, 30 (2009), 31–40
Breuer, Ulrich, ‘Text/Sorte/Genre: Konkurrenz Und Konvergenz Linguistischer Und Literaturwissenschaftlicher Klassifikationen?’, Mitteilungen des deutschen Germanistenverbands, 44, 1997, 53–63
Brezina, Vaclav, and Miriam Meyerhoff, ‘Significant or Random?: A Critical Review of Sociolinguistic Generalisations Based on Large Corpora’, International Journal of Corpus Linguistics, 19.1 (2014), 1–28 <https://doi.org/10.1075/ijcl.19.1.01bre>
Brinegar, Claude S., ‘Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship’, Journal of the American Statistical Association, 58.301 (1963), 85–96 <https://doi.org/10.1080/01621459.1963.10500834>
Brinker, Klaus, Linguistische Textanalyse (Tübingen, 2001)
Bruza, P. D., D. W. Song, and K. F. Wong, ‘Aboutness from a Commonsense Perspective’, Journal of the American Society for Information Science, 51.12 (2000), 1090–1105 <https://doi.org/10.1002/1097-4571(2000)9999:9999<::AID-ASI1026>3.0.CO;2-Y>
Bubenhofer, Noah, ‘Semantische Äquivalenz in Geburtserzählungen: Anwendung von Word Embeddings’, Zeitschrift für germanistische Linguistik, 48.3 (2020), 562–89 <https://doi.org/10.1515/zgl-2020-2014>
Burnard, Lou, Christof Schöch, and Carolin Odebrecht, ‘In search of comity: TEI for distant reading’, Journal of the Text Encoding Initiative, Issue 14, 2021 <https://doi.org/10.4000/jtei.3500>
Burrows, J. F., ‘Not Unless You Ask Nicely: The Interpretative Nexus Between Analysis and Information’, Literary and Linguistic Computing, 7.2 (1992), 91–109 <https://doi.org/10.1093/llc/7.2.91>