Der Chi-Quadrat-Test und der Log-Likelihood-Ratio-Test sind etwas komplexere statistische Verteilungstests mit zugrunde liegenden Hypothesentests. Diese Maße werden in CL häufig verwendet und in einigen Korpusanalysetools wie WordSmith Tools (Scott 1997), Wmatrix (Rayson et al. 2009) und AntConc (Anthony 2005) implementiert. Ein Problem bei diesen Maßen besteht darin, dass p-Werte generell sehr niedrig sind, da zwei Textkorpora nicht gleich sein können. Das wichtigere Problem ist jedoch, dass sie darauf ausgelegt sind, statistisch unabhängige Ereignisse zu vergleichen und Korpora als „Bag of Words“ zu behandeln. Diese Tests verwenden die Gesamtzahl der Wörter im Korpus und berücksichtigen keine ungleichmäßige Verteilung von Wörtern innerhalb eines Korpus (Lijffijt 2014).

Bibliografie

Peters, Christine, ‘Text Mining, Travel Writing, and the Semantics of the Global. An AntConc Analysis of Alexander von Humboldt’s Reise in Die Aequinoktial-Gegenden Des Neuen Kontinents’, in Digital Methods in the Humanities: Challenges, Ideas, Perspectives (Bielefeld University Press, 2021), pp. 185–215
Stefanowitsch, Anatol, ‘Text [Keyword Analysis]’, in Corpus Linguistics: A Guide to the Methodology, Textbooks in Language Sciences, 7 (LangSci Press, 2020), pp. 353–96
Pojanapunya, Punjaporn, and Richard Watson Todd, ‘Log-Likelihood and Odds Ratio: Keyness Statistics for Different Purposes of Keyword Analysis’, Corpus Linguistics and Linguistic Theory, 14.1 (2018), pp. 133–67, http://doi.org/10.1515/cllt-2015-0030
Froehlich, Heather, ‘Corpus Analysis with Antconc’, Programming Historian, 2015 <https://programminghistorian.org/en/lessons/corpus-analysis-with-antconc>
Lijffijt, Jefrey, and others, ‘Significance Testing of Word Frequencies in Corpora’, Digital Scholarship in the Humanities, 31.2 (2014), pp. 374–97, http://doi.org/10.1093/llc/fqu064
Brezina, Vaclav, and Miriam Meyerhoff, ‘Significant or Random?: A Critical Review of Sociolinguistic Generalisations Based on Large Corpora’, International Journal of Corpus Linguistics, 19.1 (2014), pp. 1–28, http://doi.org/10.1075/ijcl.19.1.01bre
Paquot, Magali, and Yves Bestgen, ‘Distinctive Words in Academic Writing: A Comparison of Three Statistical Tests for Keyword Extraction’, in Corpora: Pragmatics and Discourse, ed. by Andreas H. Jucker, Daniel Schreier, and Marianne Hundt (Brill | Rodopi, 2009), doi:10.1163/9789042029101_014
Chen, Francine R., Thorsten H. Brants, and Annie E. Zaenen, ‘Systems and Methods for Sentence Based Interactive Topic-Based Text Summarization’, 20 May 2008 <https://patents.google.com/patent/US7376893B2/en>
Rayson, Paul, ‘From Key Words to Key Semantic Domains’, International Journal of Corpus Linguistics, 13.4 (2008), pp. 519–49, http://doi.org/10.1075/ijcl.13.4.06ray
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, 2005, pp. 729–37, http://doi.org/10.1109/IPCC.2005.1494244
Rayson, Paul, ‘Wmatrix: A Web-Based Corpus Processing Environment.’ (Computing Department, Lancaster University, 2005)
Kilgarriff, Adam, ‘Language Is Never, Ever, Ever, Random’, Corpus Linguistics and Linguistic Theory, 1.2 (2005), pp. 263–76, http://doi.org/10.1515/cllt.2005.1.2.263
Gamon, Michael, ‘Linguistic Correlates of Style: Authorship Classification with Deep Linguistic Analysis Features’, in Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04 (Association for Computational Linguistics, 2004), http://doi.org/10.3115/1220355.1220443
Rayson, Paul, and Roger Garside, ‘Comparing Corpora Using Frequency Profiling’, in Proceedings of the Workshop on Comparing Corpora - Volume 9, WCC ’00 (Association for Computational Linguistics, 2000), pp. 1–6, http://doi.org/10.3115/1117729.1117730
Scott, Mike, ‘PC Analysis of Key Words and Key Key Words’, System, 25.2 (1997), pp. 233–45, http://doi.org/10.1016/S0346-251X(97)00011-0
Dunning, Ted, ‘Accurate Methods for the Statistics of Surprise and Coincidence’, Computational Linguistics, 19.1 (1993), p. 14 <http://aclweb.org/anthology/J93-1003>
Cressie, Noel A. C., and Timothy R. C. Read, ‘Pearsons-X2 and the Loglikelihood Ratio Statistic-G2: A Comparative Review’, 1989, http://doi.org/10.2307/1403582
Woolf, Barnet, ‘The Log-Likelihood Ratio Test (the G-Test)’, Annals of Human Genetics, 21.4 (1957), pp. 397–409, http://doi.org/10.1111/j.1469-1809.1972.tb00293.x