Der Chi-Quadrat-Test und der Log-Likelihood-Ratio-Test sind etwas komplexere statistische Verteilungstests mit zugrunde liegenden Hypothesentests. Diese Maße werden in CL häufig verwendet und in einigen Korpusanalysetools wie WordSmith Tools (Scott 1997), Wmatrix (Rayson et al. 2009) und AntConc (Anthony 2005) implementiert. Ein Problem bei diesen Maßen besteht darin, dass p-Werte generell sehr niedrig sind, da zwei Textkorpora nicht gleich sein können. Das wichtigere Problem ist jedoch, dass sie darauf ausgelegt sind, statistisch unabhängige Ereignisse zu vergleichen und Korpora als „Bag of Words“ zu behandeln. Diese Tests verwenden die Gesamtzahl der Wörter im Korpus und berücksichtigen keine ungleichmäßige Verteilung von Wörtern innerhalb eines Korpus (Lijffijt 2014).

Bibliografie

McGillivray, Barbara, and Gábor Mihály Tóth, ‘Frequency’, in Applying Language Technology in Humanities Research: Design, Application, and the Underlying Logic, ed. by Barbara McGillivray and Gábor Mihály Tóth (Cham: Springer International Publishing, 2020), pp. 35–46 <https://doi.org/10.1007/978-3-030-46493-6_3>
Froehlich, Heather, ‘Corpus Analysis with Antconc’, Programming Historian, 2015 <https://programminghistorian.org/en/lessons/corpus-analysis-with-antconc> [accessed 15 February 2021]
Savoy, Jacques, ‘Comparative Evaluation of Term Selection Functions for Authorship Attribution’, Literary and Linguistic Computing, 30.2 (2015), 246–61 <https://doi.org/10.1093/llc/fqt047>
Gries, Stefan Th., ‘The Most Under-Used Statistical Method in Corpus Linguistics: Multi-Level (and Mixed-Effects) Models’, Corpora, 10.1 (2015), 95–125 <https://doi.org/10.3366/cor.2015.0068>
Bestgen, Yves, ‘Inadequacy of the Chi-Squared Test to Examine Vocabulary Differences between Corpora’, Literary and Linguistic Computing, 29.2 (2014), 164–70 <https://doi.org/10.1093/llc/fqt020>
Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila, ‘Significance Testing of Word Frequencies in Corpora’, Digital Scholarship in the Humanities, 31.2 (2014), 374–97 <https://doi.org/10.1093/llc/fqu064>
Parsons, Kathryn, Agata McCormac, and Marcus Butavicius, Human Dimensions of Corpora Comparison: An Analysis of Kilgarriff’s (2001) Approach (DEFENCE SCIENCE AND TECHNOLOGY ORGANISATION EDINBURGH (AUSTRALIA) COMMAND CONTROL COMMUNICATIONS AND INTELLIGENCE DIV, April 2009) <https://apps.dtic.mil/docs/citations/ADA506585> [accessed 17 September 2019]
Lüdeling, Anke, and Merja Kytö, eds., ‘Statistical Methods for Corpus Exploitation’, in Handbooks of Linguistics and Communication Science (Berlin, New York: Mouton de Gruyter, 2009) <https://doi.org/10.1515/9783110213881.2.777>
Oakes, Michael P., and Malcolm Farrow, ‘Use of the Chi-Squared Test to Examine Vocabulary Differences in English Language Corpora Representing Seven Different Countries’, Literary and Linguistic Computing, 22.1 (2007), 85–99 <https://doi.org/10.1093/llc/fql044>
Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, 2005, pp. 729–37 <https://doi.org/10.1109/IPCC.2005.1494244>
Lancaster, H. O., and E. Seneta, ‘Chi-Square Distribution’, in Encyclopedia of Biostatistics, ed. by Peter Armitage and Theodore Colton (Chichester, UK: John Wiley & Sons, Ltd, 2005), p. b2a15018 <https://doi.org/10.1002/0470011815.b2a15018>
Rayson, Paul, ‘Wmatrix: A Web-Based Corpus Processing Environment.’ (Lancaster, UK: Computing Department, Lancaster University, 2005)
Gabriela Cavaglià, ‘Measuring Corpus Homogeneity Using a Range of Measures for Inter-Document Distance Measuring Corpus Homogeneity Using a Range of Measures for Inter-Document Distance | Request PDF’, ResearchGate, 2002 <https://www.researchgate.net/publication/267784878_ITRI-02-08_Measuring_corpus_homogeneity_using_a_range_of_measures_for_inter-document_distance_Measuring_corpus_homogeneity_using_a_range_of_measures_for_inter-document_distance> [accessed 17 September 2019]
Kilgarriff, Adam, ‘Comparing Corpora’, International Journal of Corpus Linguistics, 6.1 (2001), 97–133 <https://doi.org/10.1075/ijcl.6.1.05kil>
Scott, Mike, ‘PC Analysis of Key Words and Key Key Words’, System, 25.2 (1997), 233–45 <https://doi.org/10.1016/S0346-251X(97)00011-0>
Kilgarriff, Adam, ‘Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity between Corpora’, in Fifth Workshop on Very Large Corpora, 1997 <https://www.aclweb.org/anthology/W97-0122> [accessed 6 September 2019]
Kilgarriff, Adam, ‘Using Word Frequency Lists to Measure Corpus Homogeneity and Similarity between Corpora’, in Fifth Workshop on Very Large Corpora, 1997 <https://www.aclweb.org/anthology/W97-0122> [accessed 12 June 2020]
Cressie, Noel A. C., and Timothy R. C. Read, ‘Pearsons-X2 and the Loglikelihood Ratio Statistic-G2: A Comparative Review’, 1989 <https://doi.org/10.2307/1403582>
Plackett, R. L., ‘Karl Pearson and the Chi-Squared Test’, International Statistical Review / Revue Internationale de Statistique, 51.1 (1983), 59 <https://doi.org/10.2307/1402731>
Brinegar, Claude S., ‘Mark Twain and the Quintus Curtius Snodgrass Letters: A Statistical Test of Authorship’, Journal of the American Statistical Association, 58.301 (1963), 85–96 <https://doi.org/10.1080/01621459.1963.10500834>