Der Chi-Quadrat-Test und der Log-Likelihood-Ratio-Test sind etwas komplexere statistische Verteilungstests mit zugrunde liegenden Hypothesentests. Diese Maße werden in CL häufig verwendet und in einigen Korpusanalysetools wie WordSmith Tools (Scott 1997), Wmatrix (Rayson et al. 2009) und AntConc (Anthony 2005) implementiert. Ein Problem bei diesen Maßen besteht darin, dass p-Werte generell sehr niedrig sind, da zwei Textkorpora nicht gleich sein können. Das wichtigere Problem ist jedoch, dass sie darauf ausgelegt sind, statistisch unabhängige Ereignisse zu vergleichen und Korpora als „Bag of Words“ zu behandeln. Diese Tests verwenden die Gesamtzahl der Wörter im Korpus und berücksichtigen keine ungleichmäßige Verteilung von Wörtern innerhalb eines Korpus (Lijffijt 2014).

Bibliografie

Peters, Christine, ‘Text Mining, Travel Writing, and the Semantics of the Global. An AntConc Analysis of Alexander von Humboldt’s Reise in Die Aequinoktial-Gegenden Des Neuen Kontinents’, in Digital Methods in the Humanities: Challenges, Ideas, Perspectives (Bielefeld: Bielefeld University Press, 2021), pp. 185–215
Stefanowitsch, Anatol, ‘Text [Keyword Analysis]’, in Corpus Linguistics: A Guide to the Methodology, Textbooks in Language Sciences, 7 (LangSci Press, 2020), pp. 353–96
Pojanapunya, Punjaporn, and Richard Watson Todd, ‘Log-Likelihood and Odds Ratio: Keyness Statistics for Different Purposes of Keyword Analysis’, Corpus Linguistics and Linguistic Theory, 14.1 (2018), 133–67 <https://doi.org/10.1515/cllt-2015-0030>
Froehlich, Heather, ‘Corpus Analysis with Antconc’, Programming Historian, 2015 <https://programminghistorian.org/en/lessons/corpus-analysis-with-antconc> [accessed 15 February 2021]
Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila, ‘Significance Testing of Word Frequencies in Corpora’, Digital Scholarship in the Humanities, 31.2 (2014), 374–97 <https://doi.org/10.1093/llc/fqu064>
Brezina, Vaclav, and Miriam Meyerhoff, ‘Significant or Random?: A Critical Review of Sociolinguistic Generalisations Based on Large Corpora’, International Journal of Corpus Linguistics, 19.1 (2014), 1–28 <https://doi.org/10.1075/ijcl.19.1.01bre>
Paquot, Magali, and Yves Bestgen, ‘Distinctive Words in Academic Writing: A Comparison of Three Statistical Tests for Keyword Extraction’, in Corpora: Pragmatics and Discourse, ed. by Andreas H. Jucker, Daniel Schreier, and Marianne Hundt (Brill | Rodopi, 2009) <https://doi.org/10.1163/9789042029101_014>
Chen, Francine R., Thorsten H. Brants, and Annie E. Zaenen, ‘Systems and Methods for Sentence Based Interactive Topic-Based Text Summarization’, 2008 <https://patents.google.com/patent/US7376893B2/en> [accessed 17 September 2019]
Rayson, Paul, ‘From Key Words to Key Semantic Domains’, International Journal of Corpus Linguistics, 13.4 (2008), 519–49 <https://doi.org/10.1075/ijcl.13.4.06ray>
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, 2005, pp. 729–37 <https://doi.org/10.1109/IPCC.2005.1494244>
Rayson, Paul, ‘Wmatrix: A Web-Based Corpus Processing Environment.’ (Lancaster, UK: Computing Department, Lancaster University, 2005)
Kilgarriff, Adam, ‘Language Is Never, Ever, Ever, Random’, Corpus Linguistics and Linguistic Theory, 1.2 (2005), 263–76 <https://doi.org/10.1515/cllt.2005.1.2.263>
Gamon, Michael, ‘Linguistic Correlates of Style: Authorship Classification with Deep Linguistic Analysis Features’, in Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04 (Stroudsburg, PA, USA: Association for Computational Linguistics, 2004) <https://doi.org/10.3115/1220355.1220443>
Rayson, Paul, and Roger Garside, ‘Comparing Corpora Using Frequency Profiling’, in Proceedings of the Workshop on Comparing Corpora - Volume 9, WCC ’00 (Stroudsburg, PA, USA: Association for Computational Linguistics, 2000), pp. 1–6 <https://doi.org/10.3115/1117729.1117730>
Scott, Mike, ‘PC Analysis of Key Words and Key Key Words’, System, 25.2 (1997), 233–45 <https://doi.org/10.1016/S0346-251X(97)00011-0>
Dunning, Ted, ‘Accurate Methods for the Statistics of Surprise and Coincidence’, Computational Linguistics, 19.1 (1993), 14 <http://aclweb.org/anthology/J93-1003>
Cressie, Noel A. C., and Timothy R. C. Read, ‘Pearsons-X2 and the Loglikelihood Ratio Statistic-G2: A Comparative Review’, 1989 <https://doi.org/10.2307/1403582>
Woolf, Barnet, ‘The Log-Likelihood Ratio Test (the G-Test)’, Annals of Human Genetics, 21.4 (1957), 397–409 <https://doi.org/10.1111/j.1469-1809.1972.tb00293.x>