Log-Likelihood-Ratio Test – Zeta and Company

Der Chi-Quadrat-Test und der Log-Likelihood-Ratio-Test sind etwas komplexere statistische Verteilungstests mit zugrunde liegenden Hypothesentests. Diese Maße werden in CL häufig verwendet und in einigen Korpusanalysetools wie WordSmith Tools (Scott 1997), Wmatrix (Rayson et al. 2009) und AntConc (Anthony 2005) implementiert. Ein Problem bei diesen Maßen besteht darin, dass p-Werte generell sehr niedrig sind, da zwei Textkorpora nicht gleich sein können. Das wichtigere Problem ist jedoch, dass sie darauf ausgelegt sind, statistisch unabhängige Ereignisse zu vergleichen und Korpora als „Bag of Words“ zu behandeln. Diese Tests verwenden die Gesamtzahl der Wörter im Korpus und berücksichtigen keine ungleichmäßige Verteilung von Wörtern innerhalb eines Korpus (Lijffijt 2014).
Bibliografie



		2241481
		
		
		measure_log-likelihood
		
		
        
		
		modern-humanities-research-association
		50
		date
		desc
		
		
		
		
		
		
		
		
		
		
        
        585
		https://zeta-project.eu/wp-content/plugins/zotpress/

		
			
				%7B%22status%22%3A%22success%22%2C%22updateneeded%22%3Afalse%2C%22instance%22%3A%22zotpress-bf1ba2e40c68eb5d6aa4e97ba7c92042%22%2C%22meta%22%3A%7B%22request_last%22%3A0%2C%22request_next%22%3A0%2C%22used_cache%22%3Atrue%7D%2C%22data%22%3A%5B%7B%22key%22%3A%22P944W59L%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Peters%22%2C%22parsedDate%22%3A%222021%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EPeters%2C%20Christine%2C%20%26%23x2018%3BText%20Mining%2C%20Travel%20Writing%2C%20and%20the%20Semantics%20of%20the%20Global.%20An%20AntConc%20Analysis%20of%20Alexander%20von%20Humboldt%26%23x2019%3Bs%20Reise%20in%20Die%20Aequinoktial-Gegenden%20Des%20Neuen%20Kontinents%26%23x2019%3B%2C%20in%20%3Ci%3EDigital%20Methods%20in%20the%20Humanities%3A%20Challenges%2C%20Ideas%2C%20Perspectives%3C%5C%2Fi%3E%20%28Bielefeld%3A%20Bielefeld%20University%20Press%2C%202021%29%2C%20pp.%20185%26%23x2013%3B215%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22bookSection%22%2C%22title%22%3A%22Text%20Mining%2C%20Travel%20Writing%2C%20and%20the%20Semantics%20of%20the%20Global.%20An%20AntConc%20Analysis%20of%20Alexander%20von%20Humboldt%27s%20Reise%20in%20die%20Aequinoktial-Gegenden%20des%20Neuen%20Kontinents%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Christine%22%2C%22lastName%22%3A%22Peters%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22bookTitle%22%3A%22Digital%20methods%20in%20the%20humanities%3A%20challenges%2C%20ideas%2C%20perspectives%22%2C%22date%22%3A%222021%22%2C%22language%22%3A%22English%22%2C%22ISBN%22%3A%22978-3-8376-5419-6%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222021-02-15T07%3A33%3A00Z%22%7D%7D%2C%7B%22key%22%3A%22NYZY66PW%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Stefanowitsch%22%2C%22parsedDate%22%3A%222020%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EStefanowitsch%2C%20Anatol%2C%20%26%23x2018%3BText%20%5BKeyword%20Analysis%5D%26%23x2019%3B%2C%20in%20%3Ci%3ECorpus%20Linguistics%3A%20A%20Guide%20to%20the%20Methodology%3C%5C%2Fi%3E%2C%20Textbooks%20in%20Language%20Sciences%2C%207%20%28LangSci%20Press%2C%202020%29%2C%20pp.%20353%26%23x2013%3B96%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22bookSection%22%2C%22title%22%3A%22Text%20%5BKeyword%20Analysis%5D%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Anatol%22%2C%22lastName%22%3A%22Stefanowitsch%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22bookTitle%22%3A%22Corpus%20linguistics%3A%20A%20guide%20to%20the%20methodology%22%2C%22date%22%3A%222020%22%2C%22language%22%3A%22eng%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222020-06-12T20%3A44%3A15Z%22%7D%7D%2C%7B%22key%22%3A%22ZPLBK8XN%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A1194126%2C%22username%22%3A%22dkltimon%22%2C%22name%22%3A%22Keli%20Du%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fdkltimon%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Pojanapunya%20and%20Todd%22%2C%22parsedDate%22%3A%222018-04-25%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EPojanapunya%2C%20Punjaporn%2C%20and%20Richard%20Watson%20Todd%2C%20%26%23x2018%3BLog-Likelihood%20and%20Odds%20Ratio%3A%20Keyness%20Statistics%20for%20Different%20Purposes%20of%20Keyword%20Analysis%26%23x2019%3B%2C%20%3Ci%3ECorpus%20Linguistics%20and%20Linguistic%20Theory%3C%5C%2Fi%3E%2C%2014.1%20%282018%29%2C%20133%26%23x2013%3B67%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1515%5C%2Fcllt-2015-0030%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1515%5C%2Fcllt-2015-0030%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Log-likelihood%20and%20odds%20ratio%3A%20Keyness%20statistics%20for%20different%20purposes%20of%20keyword%20analysis%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Punjaporn%22%2C%22lastName%22%3A%22Pojanapunya%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Richard%20Watson%22%2C%22lastName%22%3A%22Todd%22%7D%5D%2C%22abstractNote%22%3A%22%3Csection%20class%3D%5C%22abstract%5C%22%3E%3Ch2%20class%3D%5C%22abstractTitle%20text-title%20my-1%5C%22%20id%3D%5C%22d436e2%5C%22%3EAbstract%3C%5C%2Fh2%3E%3Cp%3EKeyword%20analysis%20is%20used%20in%20a%20range%20of%20sub-disciplines%20of%20applied%20linguistics%20from%20genre%20analyses%20to%20critically-oriented%20studies%20for%20different%20purposes%20ranging%20from%20producing%20a%20general%20characterization%20of%20a%20genre%20to%20identifying%20text-specific%20ideological%20issues.%20This%20study%20compares%20the%20use%20of%20log-likelihood%20%28LL%29%2C%20a%20probability%20statistic%2C%20and%20odds%20ratio%20%28OR%29%2C%20an%20effect%20size%20statistic%2C%20for%20keyword%20identification%20and%20argues%20that%20the%20two%20methods%20produce%20different%20keywords%20applicable%20to%20research%20focusing%20on%20different%20purposes.%20Through%20two%20case%20studies%2C%20keyword%20analyses%20of%20advance%20fee%20scams%20against%20the%20British%20National%20Corpus%20and%20research%20articles%20in%20applied%20linguistics%20against%20research%20articles%20from%20other%20academic%20disciplines%2C%20we%20show%20that%20both%20the%20LL%20and%20OR%20keywords%20concern%20the%20aboutness%20of%20the%20corpus%2C%20but%20differ%20in%20their%20specificity%20and%20pervasiveness%20through%20the%20corpus.%20LL%20highlights%20words%20which%20are%20relatively%20common%20in%20general%20use%20serving%20genre%20purposes%2C%20whereas%20OR%20highlights%20more%20specialized%20words%20serving%20critically-oriented%20purposes.%20Methodological%20and%20practical%20contributions%20to%20keyword%20analysis%20are%20discussed.%3C%5C%2Fp%3E%3C%5C%2Fsection%3E%22%2C%22date%22%3A%222018%5C%2F04%5C%2F25%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1515%5C%2Fcllt-2015-0030%22%2C%22ISSN%22%3A%221613-7027%2C%201613-7035%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.degruyter.com%5C%2Fview%5C%2Fjournals%5C%2Fcllt%5C%2F14%5C%2F1%5C%2Farticle-p133.xml%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222021-12-15T19%3A53%3A10Z%22%7D%7D%2C%7B%22key%22%3A%22SXXAQ6C9%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Froehlich%22%2C%22parsedDate%22%3A%222015-06-19%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EFroehlich%2C%20Heather%2C%20%26%23x2018%3BCorpus%20Analysis%20with%20Antconc%26%23x2019%3B%2C%20%3Ci%3EProgramming%20Historian%3C%5C%2Fi%3E%2C%202015%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fprogramminghistorian.org%5C%2Fen%5C%2Flessons%5C%2Fcorpus-analysis-with-antconc%27%3Ehttps%3A%5C%2F%5C%2Fprogramminghistorian.org%5C%2Fen%5C%2Flessons%5C%2Fcorpus-analysis-with-antconc%3C%5C%2Fa%3E%26gt%3B%20%5Baccessed%2015%20February%202021%5D%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Corpus%20Analysis%20with%20Antconc%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Heather%22%2C%22lastName%22%3A%22Froehlich%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%222015-06-19%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fprogramminghistorian.org%5C%2Fen%5C%2Flessons%5C%2Fcorpus-analysis-with-antconc%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222021-02-15T07%3A32%3A53Z%22%7D%7D%2C%7B%22key%22%3A%22F2VKYUK3%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A5206995%2C%22username%22%3A%22roettgermann%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Froettgermann%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Lijffijt%20et%20al.%22%2C%22parsedDate%22%3A%222014%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ELijffijt%2C%20Jefrey%2C%20Terttu%20Nevalainen%2C%20Tanja%20S%26%23xE4%3Bily%2C%20Panagiotis%20Papapetrou%2C%20Kai%20Puolam%26%23xE4%3Bki%2C%20and%20Heikki%20Mannila%2C%20%26%23x2018%3BSignificance%20Testing%20of%20Word%20Frequencies%20in%20Corpora%26%23x2019%3B%2C%20%3Ci%3EDigital%20Scholarship%20in%20the%20Humanities%3C%5C%2Fi%3E%2C%2031.2%20%282014%29%2C%20374%26%23x2013%3B97%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1093%5C%2Fllc%5C%2Ffqu064%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1093%5C%2Fllc%5C%2Ffqu064%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Significance%20testing%20of%20word%20frequencies%20in%20corpora%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jefrey%22%2C%22lastName%22%3A%22Lijffijt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Terttu%22%2C%22lastName%22%3A%22Nevalainen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Tanja%22%2C%22lastName%22%3A%22S%5Cu00e4ily%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Panagiotis%22%2C%22lastName%22%3A%22Papapetrou%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kai%22%2C%22lastName%22%3A%22Puolam%5Cu00e4ki%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Heikki%22%2C%22lastName%22%3A%22Mannila%22%7D%5D%2C%22abstractNote%22%3A%22Finding%20out%20whether%20a%20word%20occurs%20significantly%20more%20often%20in%20one%20text%20or%20corpus%20than%20in%20another%20is%20an%20important%20question%20in%20analysing%20corpora.%20As%20noted%20by%20Kilgarriff%20%28Language%20is%20never%2C%20ever%2C%20ever%2C%20random%2C%20Corpus%20Linguistics%20and%20Linguistic%20Theory%20%2C%202005%3B%201%282%29%3A%20263%5Cu201376.%29%2C%20the%20use%20of%20the%20%5Cu03c7%202%20and%20log-likelihood%20ratio%20tests%20is%20problematic%20in%20this%20context%2C%20as%20they%20are%20based%20on%20the%20assumption%20that%20all%20samples%20are%20statistically%20independent%20of%20each%20other.%20However%2C%20words%20within%20a%20text%20are%20not%20independent.%20As%20pointed%20out%20in%20Kilgarriff%20%28Comparing%20corpora%2C%20International%20Journal%20of%20Corpus%20Linguistics%20%2C%202001%3B%206%281%29%3A%201%5Cu201337%29%20and%20Paquot%20and%20Bestgen%20%28Distinctive%20words%20in%20academic%20writing%3A%20a%20comparison%20of%20three%20statistical%20tests%20for%20keyword%20extraction.%20In%20Jucker%2C%20A.%2C%20Schreier%2C%20D.%2C%20and%20Hundt%2C%20M.%20%28eds%29%2C%20Corpora%3A%20Pragmatics%20and%20Discourse%20.%20Amsterdam%3A%20Rodopi%2C%202009%2C%20pp.%20247%5Cu201369%29%2C%20it%20is%20possible%20to%20represent%20the%20data%20differently%20and%20employ%20other%20tests%2C%20such%20that%20we%20assume%20independence%20at%20the%20level%20of%20texts%20rather%20than%20individual%20words.%20This%20allows%20us%20to%20account%20for%20the%20distribution%20of%20words%20within%20a%20corpus.%20In%20this%20article%20we%20compare%20the%20significance%20estimates%20of%20various%20statistical%20tests%20in%20a%20controlled%20resampling%20experiment%20and%20in%20a%20practical%20setting%2C%20studying%20differences%20between%20texts%20produced%20by%20male%20and%20female%20fiction%20writers%20in%20the%20British%20National%20Corpus.%20We%20find%20that%20the%20choice%20of%20the%20test%2C%20and%20hence%20data%20representation%2C%20matters.%20We%20conclude%20that%20significance%20testing%20can%20be%20used%20to%20find%20consequential%20differences%20between%20corpora%2C%20but%20that%20assuming%20independence%20between%20all%20words%20may%20lead%20to%20overestimating%20the%20significance%20of%20the%20observed%20differences%2C%20especially%20for%20poorly%20dispersed%20words.%20We%20recommend%20the%20use%20of%20the%20t-test%2C%20Wilcoxon%20rank-sum%20test%2C%20or%20bootstrap%20test%20for%20comparing%20word%20frequencies%20across%20corpora.%22%2C%22date%22%3A%222014%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1093%5C%2Fllc%5C%2Ffqu064%22%2C%22ISSN%22%3A%222055-7671%2C%202055-768X%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fdsh.oxfordjournals.org%5C%2Flookup%5C%2Fdoi%5C%2F10.1093%5C%2Fllc%5C%2Ffqu064%22%2C%22collections%22%3A%5B%22IUKRIB7T%22%2C%224MZ8ZP2B%22%5D%2C%22dateModified%22%3A%222024-02-20T09%3A03%3A52Z%22%7D%7D%2C%7B%22key%22%3A%22IZ46A9TP%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Brezina%20and%20Meyerhoff%22%2C%22parsedDate%22%3A%222014%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EBrezina%2C%20Vaclav%2C%20and%20Miriam%20Meyerhoff%2C%20%26%23x2018%3BSignificant%20or%20Random%3F%3A%20A%20Critical%20Review%20of%20Sociolinguistic%20Generalisations%20Based%20on%20Large%20Corpora%26%23x2019%3B%2C%20%3Ci%3EInternational%20Journal%20of%20Corpus%20Linguistics%3C%5C%2Fi%3E%2C%2019.1%20%282014%29%2C%201%26%23x2013%3B28%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1075%5C%2Fijcl.19.1.01bre%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1075%5C%2Fijcl.19.1.01bre%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Significant%20or%20random%3F%3A%20A%20critical%20review%20of%20sociolinguistic%20generalisations%20based%20on%20large%20corpora%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Vaclav%22%2C%22lastName%22%3A%22Brezina%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Miriam%22%2C%22lastName%22%3A%22Meyerhoff%22%7D%5D%2C%22abstractNote%22%3A%22This%20article%20offers%20a%20critical%20review%20of%20a%20methodology%20often%20employed%20in%20corpusbased%20sociolinguistic%20studies%20which%20make%20use%20of%20aggregate%20data.%20This%20methodology%20relies%20on%20a%20general%20comparison%20of%20frequencies%20of%20a%20target%20linguistic%20variable%5Cnin%20socially%20defined%20sub-corpora.%20The%20main%20issue%20with%20this%20procedure%20lies%20in%20the%5Cnfact%20that%20it%20emphasises%20inter-group%20differences%20and%20ignores%20within%20group%20variation.%20The%20methodology%20thus%20often%20yields%20falsely%20positive%20results%20%28with%20highly%5Cnsignificant%20log-likelihood%20scores%29.%20This%20article%20presents%20evidence%20which%20shows%5Cnthat%20sociolinguistic%20studies%20based%20on%20aggregate%20data%20are%20in%20principle%20unreliable.%5CnUsing%20BNC%2032%2C%20a%20one%20million-word%20corpus%20of%20informal%20speech%2C%20it%20demonstrates%5Cnthat%20random%20%28and%20therefore%20sociolinguistically%20irrelevant%29%20speaker%20groupings%20can%5Cnoften%20yield%20statistically%20significant%20results.%20The%20article%20offers%20suggestions%20for%20an%5Cnalternative%20methodology%20%28using%20the%20Mann-Whitney%20U%20test%29%2C%20which%20takes%20into%20account%20within%20group%20differences%20and%20therefore%20produces%20more%20meaningful%20results.%22%2C%22date%22%3A%222014%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1075%5C%2Fijcl.19.1.01bre%22%2C%22ISSN%22%3A%221384-6655%2C%201569-9811%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fwww.jbe-platform.com%5C%2Fcontent%5C%2Fjournals%5C%2F10.1075%5C%2Fijcl.19.1.01bre%22%2C%22collections%22%3A%5B%222CZHD96W%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A41Z%22%7D%7D%2C%7B%22key%22%3A%22BFMBZJ5T%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A5206995%2C%22username%22%3A%22roettgermann%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Froettgermann%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Paquot%20and%20Bestgen%22%2C%22parsedDate%22%3A%222009-01-01%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EPaquot%2C%20Magali%2C%20and%20Yves%20Bestgen%2C%20%26%23x2018%3BDistinctive%20Words%20in%20Academic%20Writing%3A%20A%20Comparison%20of%20Three%20Statistical%20Tests%20for%20Keyword%20Extraction%26%23x2019%3B%2C%20in%20%3Ci%3ECorpora%3A%20Pragmatics%20and%20Discourse%3C%5C%2Fi%3E%2C%20ed.%20by%20Andreas%20H.%20Jucker%2C%20Daniel%20Schreier%2C%20and%20Marianne%20Hundt%20%28Brill%20%7C%20Rodopi%2C%202009%29%20%26lt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1163%5C%2F9789042029101_014%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22bookSection%22%2C%22title%22%3A%22Distinctive%20words%20in%20academic%20writing%3A%20A%20comparison%20of%20three%20statistical%20tests%20for%20keyword%20extraction%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Andreas%20H.%22%2C%22lastName%22%3A%22Jucker%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Daniel%22%2C%22lastName%22%3A%22Schreier%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Marianne%22%2C%22lastName%22%3A%22Hundt%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Magali%22%2C%22lastName%22%3A%22Paquot%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Yves%22%2C%22lastName%22%3A%22Bestgen%22%7D%5D%2C%22abstractNote%22%3A%22Most%20studies%20that%20make%20use%20of%20keyword%20analysis%20rely%20on%20log-likelihood%20ratio%20or%20chi-square%20tests%20to%20extract%20words%20that%20are%20particularly%20characteristic%20of%20a%20corpus%20%28e.g.%20Scott%20and%20Tribble%202006%29.%20These%20measures%20are%20computed%20on%20the%20basis%20of%20absolute%20frequencies%20and%20cannot%20account%20for%20the%20fact%20that%20%5Cu201ccorpora%20are%20inherently%20variable%20internally%5Cu201d%20%28Gries%202006%3A%20110%29.%20To%20overcome%20this%20limitation%2C%20measures%20of%20dispersion%20are%20sometimes%20used%20in%20combination%20with%20keyness%20values%20%28e.g.%20Rayson%202003%3B%20Oakes%20and%20Farrow%202007%29.%20Some%20scholars%20have%20also%20suggested%20using%20other%20statistical%20measures%20%28e.g.%20Wilcoxon-Mann-Whitney%20test%29%20but%20these%20techniques%20have%20not%20gained%20corpus%20linguists%5Cu2019%20favour%20%28yet%3F%29.%20One%20possible%20explanation%20for%20this%20lack%20of%20enthusiasm%20is%20that%20statistical%20tests%20for%20keyword%20extraction%20have%20rarely%20been%20compared.%20In%20this%20article%2C%20we%20make%20use%20of%20the%20log-likelihood%20ratio%2C%20the%20t-test%20and%20the%20Wilcoxon-Mann-Whitney%20test%20in%20turn%20to%20compare%20the%20academic%20and%20the%20fiction%20sub-corpora%20of%20the%20British%20National%20Corpus%20and%20extract%20words%20that%20are%20typical%20of%20academic%20discourse.%20We%20compare%20the%20three%20lists%20of%20academic%20keywords%20on%20a%20number%20of%20criteria%20%28e.g.%20number%20of%20keywords%20extracted%20by%20each%20measure%2C%20percentage%20of%20keywords%20that%20are%20shared%20in%20the%20three%20lists%2C%20frequency%20and%20distribution%20of%20academic%20keywords%20in%20the%20two%20corpora%29%20and%20explore%20the%20specificities%20of%20the%20three%20statistical%20measures.%20We%20also%20assess%20the%20advantages%20and%20disadvantages%20of%20these%20measures%20for%20the%20extraction%20of%20general%20academic%20words.%22%2C%22bookTitle%22%3A%22Corpora%3A%20Pragmatics%20and%20Discourse%22%2C%22date%22%3A%222009-01-01%22%2C%22language%22%3A%22%22%2C%22ISBN%22%3A%22978-90-420-2910-1%20978-90-420-2592-9%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fbrill.com%5C%2Fview%5C%2Fbook%5C%2Fedcoll%5C%2F9789042029101%5C%2FB9789042029101-s014.xml%22%2C%22collections%22%3A%5B%222CZHD96W%22%2C%224MZ8ZP2B%22%5D%2C%22dateModified%22%3A%222024-02-20T09%3A05%3A29Z%22%7D%7D%2C%7B%22key%22%3A%22LSZXQNAD%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Chen%20et%20al.%22%2C%22parsedDate%22%3A%222008-05-20%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EChen%2C%20Francine%20R.%2C%20Thorsten%20H.%20Brants%2C%20and%20Annie%20E.%20Zaenen%2C%20%26%23x2018%3BSystems%20and%20Methods%20for%20Sentence%20Based%20Interactive%20Topic-Based%20Text%20Summarization%26%23x2019%3B%2C%202008%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fpatents.google.com%5C%2Fpatent%5C%2FUS7376893B2%5C%2Fen%27%3Ehttps%3A%5C%2F%5C%2Fpatents.google.com%5C%2Fpatent%5C%2FUS7376893B2%5C%2Fen%3C%5C%2Fa%3E%26gt%3B%20%5Baccessed%2017%20September%202019%5D%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22patent%22%2C%22title%22%3A%22Systems%20and%20methods%20for%20sentence%20based%20interactive%20topic-based%20text%20summarization%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22inventor%22%2C%22firstName%22%3A%22Francine%20R.%22%2C%22lastName%22%3A%22Chen%22%7D%2C%7B%22creatorType%22%3A%22inventor%22%2C%22firstName%22%3A%22Thorsten%20H.%22%2C%22lastName%22%3A%22Brants%22%7D%2C%7B%22creatorType%22%3A%22inventor%22%2C%22firstName%22%3A%22Annie%20E.%22%2C%22lastName%22%3A%22Zaenen%22%7D%5D%2C%22abstractNote%22%3A%22Techniques%20for%20determining%20sentence%20based%20interactive%20topic-based%20summarization%20are%20provided.%20A%20text%20to%20be%20summarized%20is%20segmented.%20Discrete%20keyword%2C%20key-phrase%2C%20n-gram%2C%20sentence%20and%20other%20sentence%20constituent%20based%20summaries%20are%20generated%20based%20on%20statistical%20measures%20for%20each%20text%20segment.%20Interactive%20topic-based%20summaries%20are%20displayed%20with%20human%20sensible%20omitted%20text%20indicators%20such%20as%20alternate%20colors%2C%20fonts%2C%20sounds%2C%20tactile%20elements%20or%20other%20human%20sensible%20display%20characteristics%20useful%20in%20indicating%20omitted%20text.%20Individual%20and%5C%2For%20combinations%20of%20discrete%20keyword%2C%20key-phrase%2C%20n-gram%2C%20sentence%2C%20noun%20phrase%20and%20sentence%20constituent%20based%20summaries%20are%20dynamically%20displayed%20to%20provide%20an%20overview%20of%20topic%20and%20subtopic%20development%20within%20a%20text.%20A%20hierarchical%20and%20interactive%20display%20of%20texts%20based%20on%20the%20use%20of%20discrete%20sentence%20constituent%20based%20summaries%20which%20associates%20expansible%20and%20contractible%20displayed%20text%20provides%20contextualized%20access%20to%20an%20interactive%20topic-based%20text%20summary%20and%20to%20an%20original%20text.%22%2C%22country%22%3A%22US%22%2C%22assignee%22%3A%22Palo%20Alto%20Research%20Center%20Inc%22%2C%22issuingAuthority%22%3A%22United%20States%22%2C%22patentNumber%22%3A%22US7376893B2%22%2C%22filingDate%22%3A%222002-12-16%202002-12-16%22%2C%22applicationNumber%22%3A%22US10%5C%2F319%2C544%22%2C%22priorityNumbers%22%3A%22%22%2C%22issueDate%22%3A%222008-05-20%22%2C%22references%22%3A%22%22%2C%22legalStatus%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fpatents.google.com%5C%2Fpatent%5C%2FUS7376893B2%5C%2Fen%22%2C%22collections%22%3A%5B%222CZHD96W%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A41Z%22%7D%7D%2C%7B%22key%22%3A%22XQ5GFAA6%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Rayson%22%2C%22parsedDate%22%3A%222008-01-01%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ERayson%2C%20Paul%2C%20%26%23x2018%3BFrom%20Key%20Words%20to%20Key%20Semantic%20Domains%26%23x2019%3B%2C%20%3Ci%3EInternational%20Journal%20of%20Corpus%20Linguistics%3C%5C%2Fi%3E%2C%2013.4%20%282008%29%2C%20519%26%23x2013%3B49%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1075%5C%2Fijcl.13.4.06ray%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1075%5C%2Fijcl.13.4.06ray%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22From%20key%20words%20to%20key%20semantic%20domains%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Paul%22%2C%22lastName%22%3A%22Rayson%22%7D%5D%2C%22abstractNote%22%3A%22This%20paper%20reports%20the%20extension%20of%20the%20key%20words%20method%20for%20the%20comparison%20of%20corpora.%20Using%20automatic%20tagging%20software%20that%20assigns%20part-of-speech%20and%20semantic%20field%20%28domain%29%20tags%2C%20a%20method%20is%20described%20which%20permits%20the%20extraction%20of%20key%20domains%20by%20applying%20the%20keyness%20calculation%20to%20tag%20frequency%20lists.%20The%20combination%20of%20the%20key%20words%20and%20key%20domains%20methods%20is%20shown%20to%20allow%20macroscopic%20analysis%20%28the%20study%20of%20the%20characteristics%20of%20whole%20texts%20or%20varieties%20of%20language%29%20to%20inform%20the%20microscopic%20level%20%28focussing%20on%20the%20use%20of%20a%20particular%20linguistic%20feature%29%20and%20thereby%20suggesting%20those%20linguistic%20features%20which%20should%20be%20investigated%20further.%20The%20resulting%20%5Cu2018data-driven%5Cu2019%20approach%20presented%20here%20combines%20elements%20of%20both%20the%20%5Cu2018corpus-based%5Cu2019%20and%20%5Cu2018corpus-driven%5Cu2019%20paradigms%20in%20corpus%20linguistics.%20A%20web-based%20tool%2C%20Wmatrix%2C%20implementing%20the%20proposed%20method%20is%20applied%20in%20a%20case%20study%3A%20the%20comparison%20of%20UK%202001%20general%20election%20manifestos%20of%20the%20Labour%20and%20Liberal%20Democratic%20parties.%22%2C%22date%22%3A%222008%5C%2F01%5C%2F01%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1075%5C%2Fijcl.13.4.06ray%22%2C%22ISSN%22%3A%221384-6655%2C%201569-9811%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.jbe-platform.com%5C%2Fcontent%5C%2Fjournals%5C%2F10.1075%5C%2Fijcl.13.4.06ray%22%2C%22collections%22%3A%5B%222CZHD96W%22%2C%223NJGA7NT%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A30%3A14Z%22%7D%7D%2C%7B%22key%22%3A%22QVBMECF9%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A5935700%2C%22username%22%3A%22yulyadudar%22%2C%22name%22%3A%22Iuliia%20Dudar%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fyulyadudar%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Anthony%22%2C%22parsedDate%22%3A%222005-08-10%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EAnthony%2C%20Laurence%2C%20%26%23x2018%3BAntConc%3A%20Design%20and%20Development%20of%20a%20Freeware%20Corpus%20Analysis%20Toolkit%20for%20the%20Technical%20Writing%20Classroom%26%23x2019%3B%2C%202005%2C%20pp.%20729%26%23x2013%3B37%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FIPCC.2005.1494244%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1109%5C%2FIPCC.2005.1494244%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22AntConc%3A%20Design%20and%20development%20of%20a%20freeware%20corpus%20analysis%20toolkit%20for%20the%20technical%20writing%20classroom%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Laurence%22%2C%22lastName%22%3A%22Anthony%22%7D%5D%2C%22abstractNote%22%3A%22In%20this%20paper%2C%20the%20author%20describes%20AntConc%2C%20a%20freeware%2C%20multi-platform%2C%20multi-purpose%20corpus%20analysis%20toolkit%2C%20designed%20by%20the%20author%20for%20specific%20use%20in%20the%20classroom.%20AntConc%20includes%20a%20powerful%20concor%20dancer%2C%20word%20and%20keyword%20frequency%20generators%2C%20tools%20for%20cluster%20and%20lexical%20bundle%20analysis%2C%20and%20a%20word%20distribution%20plot.%20It%20also%20offers%20the%20choice%20of%20simple%20wildcard%20searches%20or%20powerful%20regular%20expression%20searches%2C%20and%20has%20an%20extremely%20easy-to-use%2C%20intuitive%20interface.%20After%20explaining%20the%20background%20to%20AntConc%2C%20the%20author%20gives%20an%20overview%20of%20each%20of%20its%20tools%2C%20and%20explains%20the%20value%20to%20learners.%20Then%2C%20the%20author%20discusses%20the%20current%20limitations%20of%20the%20software%2C%20before%20explaining%20how%20these%20will%20be%20addressed%20in%20the%20future.%22%2C%22date%22%3A%22August%2010%2C%202005%22%2C%22proceedingsTitle%22%3A%22%22%2C%22conferenceName%22%3A%22Proceedings%20of%20Professional%20Communication%20Conference%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.1109%5C%2FIPCC.2005.1494244%22%2C%22ISBN%22%3A%22978-0-7803-9027-0%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%22IUKRIB7T%22%5D%2C%22dateModified%22%3A%222022-01-05T19%3A23%3A49Z%22%7D%7D%2C%7B%22key%22%3A%2249T2NLQJ%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A5206995%2C%22username%22%3A%22roettgermann%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Froettgermann%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Rayson%22%2C%22parsedDate%22%3A%222005%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ERayson%2C%20Paul%2C%20%26%23x2018%3BWmatrix%3A%20A%20Web-Based%20Corpus%20Processing%20Environment.%26%23x2019%3B%20%28Lancaster%2C%20UK%3A%20Computing%20Department%2C%20Lancaster%20University%2C%202005%29%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22bookSection%22%2C%22title%22%3A%22Wmatrix%3A%20a%20web-based%20corpus%20processing%20environment.%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Paul%22%2C%22lastName%22%3A%22Rayson%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22bookTitle%22%3A%22%22%2C%22date%22%3A%222005%22%2C%22language%22%3A%22English%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222024-03-22T10%3A25%3A37Z%22%7D%7D%2C%7B%22key%22%3A%22SM7AJ6NY%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Kilgarriff%22%2C%22parsedDate%22%3A%222005%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EKilgarriff%2C%20Adam%2C%20%26%23x2018%3BLanguage%20Is%20Never%2C%20Ever%2C%20Ever%2C%20Random%26%23x2019%3B%2C%20%3Ci%3ECorpus%20Linguistics%20and%20Linguistic%20Theory%3C%5C%2Fi%3E%2C%201.2%20%282005%29%2C%20263%26%23x2013%3B76%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1515%5C%2Fcllt.2005.1.2.263%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1515%5C%2Fcllt.2005.1.2.263%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Language%20is%20never%2C%20ever%2C%20ever%2C%20random%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Adam%22%2C%22lastName%22%3A%22Kilgarriff%22%7D%5D%2C%22abstractNote%22%3A%22Language%20users%20never%20choose%20words%20randomly%2C%20and%20language%20is%20essentially%20non-random.%20Statistical%20hypothesis%20testing%20uses%20a%20null%20hypothesis%2C%20which%20posits%20randomness.%20Hence%2C%20when%20we%20look%20at%20linguistic%20phenomena%20in%20corpora%2C%20the%20null%20hypothesis%20will%20never%20be%20true.%20Moreover%2C%20where%20there%20is%20enough%20data%2C%20we%20shall%20%28almost%29%20always%20be%20able%20to%20establish%20that%20it%20is%20not%20true.%20In%20corpus%20studies%2C%20we%20frequently%20do%20have%20enough%20data%2C%20so%20the%20fact%20that%20a%20relation%20between%20two%20phenomena%20is%20demonstrably%20non-random%2C%20does%20not%20support%20the%20inference%20that%20it%20is%20not%20arbitrary.%20We%20present%20experimental%20evidence%20of%20how%20arbitrary%20associations%20between%20word%20frequencies%20and%20corpora%20are%20systematically%20non-random.%20We%20review%20literature%20in%20which%20hypothesis%20testing%20has%20been%20used%2C%20and%20show%20how%20it%20has%20often%20led%20to%20unhelpful%20or%20misleading%20results.%22%2C%22date%22%3A%222005%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.1515%5C%2Fcllt.2005.1.2.263%22%2C%22ISSN%22%3A%221613-7035%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.degruyter.com%5C%2Fview%5C%2Fj%5C%2Fcllt.2005.1.issue-2%5C%2Fcllt.2005.1.2.263%5C%2Fcllt.2005.1.2.263.xml%22%2C%22collections%22%3A%5B%222CZHD96W%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A41Z%22%7D%7D%2C%7B%22key%22%3A%22QN7JA7PK%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Gamon%22%2C%22parsedDate%22%3A%222004%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EGamon%2C%20Michael%2C%20%26%23x2018%3BLinguistic%20Correlates%20of%20Style%3A%20Authorship%20Classification%20with%20Deep%20Linguistic%20Analysis%20Features%26%23x2019%3B%2C%20in%20%3Ci%3EProceedings%20of%20the%2020th%20International%20Conference%20on%20Computational%20Linguistics%3C%5C%2Fi%3E%2C%20COLING%20%26%23x2019%3B04%20%28Stroudsburg%2C%20PA%2C%20USA%3A%20Association%20for%20Computational%20Linguistics%2C%202004%29%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3115%5C%2F1220355.1220443%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3115%5C%2F1220355.1220443%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Linguistic%20Correlates%20of%20Style%3A%20Authorship%20Classification%20with%20Deep%20Linguistic%20Analysis%20Features%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Michael%22%2C%22lastName%22%3A%22Gamon%22%7D%5D%2C%22abstractNote%22%3A%22The%20identification%20of%20authorship%20falls%20into%20the%20category%20of%20style%20classification%2C%20an%20interesting%20sub-field%20of%20text%20categorization%20that%20deals%20with%20properties%20of%20the%20form%20of%20linguistic%20expression%20as%20opposed%20to%20the%20content%20of%20a%20text.%20Various%20feature%20sets%20and%20classification%20methods%20have%20been%20proposed%20in%20the%20literature%2C%20geared%20towards%20abstracting%20away%20from%20the%20content%20of%20a%20text%2C%20and%20focusing%20on%20its%20stylistic%20properties.%20We%20demonstrate%20that%20in%20a%20realistically%20difficult%20authorship%20attribution%20scenario%2C%20deep%20linguistic%20analysis%20features%20such%20as%20context%20free%20production%20frequencies%20and%20semantic%20relationship%20frequencies%20achieve%20significant%20error%20reduction%20over%20more%20commonly%20used%20%5C%22shallow%5C%22%20features%20such%20as%20function%20word%20frequencies%20and%20part%20of%20speech%20trigrams.%20Modern%20machine%20learning%20techniques%20like%20support%20vector%20machines%20allow%20us%20to%20explore%20large%20feature%20vectors%2C%20combining%20these%20different%20feature%20sets%20to%20achieve%20high%20classification%20accuracy%20in%20style-based%20tasks.%22%2C%22date%22%3A%222004%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%2020th%20International%20Conference%20on%20Computational%20Linguistics%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.3115%5C%2F1220355.1220443%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3115%5C%2F1220355.1220443%22%2C%22collections%22%3A%5B%222CZHD96W%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A41Z%22%7D%7D%2C%7B%22key%22%3A%22U7U9Z5M8%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Rayson%20and%20Garside%22%2C%22parsedDate%22%3A%222000%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ERayson%2C%20Paul%2C%20and%20Roger%20Garside%2C%20%26%23x2018%3BComparing%20Corpora%20Using%20Frequency%20Profiling%26%23x2019%3B%2C%20in%20%3Ci%3EProceedings%20of%20the%20Workshop%20on%20Comparing%20Corpora%20-%20Volume%209%3C%5C%2Fi%3E%2C%20WCC%20%26%23x2019%3B00%20%28Stroudsburg%2C%20PA%2C%20USA%3A%20Association%20for%20Computational%20Linguistics%2C%202000%29%2C%20pp.%201%26%23x2013%3B6%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3115%5C%2F1117729.1117730%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3115%5C%2F1117729.1117730%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Comparing%20Corpora%20Using%20Frequency%20Profiling%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Paul%22%2C%22lastName%22%3A%22Rayson%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Roger%22%2C%22lastName%22%3A%22Garside%22%7D%5D%2C%22abstractNote%22%3A%22This%20paper%20describes%20a%20method%20of%20comparing%20corpora%20which%20uses%20frequency%20profiling.%20The%20method%20can%20be%20used%20to%20discover%20key%20words%20in%20the%20corpora%20which%20differentiate%20one%20corpus%20from%20another.%20Using%20annotated%20corpora%2C%20it%20can%20be%20applied%20to%20discover%20key%20grammatical%20or%20word-sense%20categories.%20This%20can%20be%20used%20as%20a%20quick%20way%20in%20to%20find%20the%20differences%20between%20the%20corpora%20and%20is%20shown%20to%20have%20applications%20in%20the%20study%20of%20social%20differentiation%20in%20the%20use%20of%20English%20vocabulary%2C%20profiling%20of%20learner%20English%20and%20document%20analysis%20in%20the%20software%20engineering%20process.%22%2C%22date%22%3A%222000%22%2C%22proceedingsTitle%22%3A%22Proceedings%20of%20the%20Workshop%20on%20Comparing%20Corpora%20-%20Volume%209%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.3115%5C%2F1117729.1117730%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.3115%5C%2F1117729.1117730%22%2C%22collections%22%3A%5B%22IUKRIB7T%22%2C%222CZHD96W%22%2C%22GTTUG6GN%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A31%3A26Z%22%7D%7D%2C%7B%22key%22%3A%22775JLQX4%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A5206995%2C%22username%22%3A%22roettgermann%22%2C%22name%22%3A%22%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Froettgermann%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Scott%22%2C%22parsedDate%22%3A%221997%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EScott%2C%20Mike%2C%20%26%23x2018%3BPC%20Analysis%20of%20Key%20Words%20and%20Key%20Key%20Words%26%23x2019%3B%2C%20%3Ci%3ESystem%3C%5C%2Fi%3E%2C%2025.2%20%281997%29%2C%20233%26%23x2013%3B45%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1016%5C%2FS0346-251X%2897%2900011-0%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1016%5C%2FS0346-251X%2897%2900011-0%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22PC%20Analysis%20of%20key%20words%20and%20key%20key%20words%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Mike%22%2C%22lastName%22%3A%22Scott%22%7D%5D%2C%22abstractNote%22%3A%22PC%20analysis%20of%20key%20words%20%5Cu2014%20And%20key%20key%20words%22%2C%22date%22%3A%226%5C%2F1997%22%2C%22language%22%3A%22eng%22%2C%22DOI%22%3A%2210.1016%5C%2FS0346-251X%2897%2900011-0%22%2C%22ISSN%22%3A%220346251X%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%22IUKRIB7T%22%5D%2C%22dateModified%22%3A%222024-02-20T09%3A07%3A15Z%22%7D%7D%2C%7B%22key%22%3A%228ZQCYHBL%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Dunning%22%2C%22parsedDate%22%3A%221993%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EDunning%2C%20Ted%2C%20%26%23x2018%3BAccurate%20Methods%20for%20the%20Statistics%20of%20Surprise%20and%20Coincidence%26%23x2019%3B%2C%20%3Ci%3EComputational%20Linguistics%3C%5C%2Fi%3E%2C%2019.1%20%281993%29%2C%2014%20%26lt%3B%3Ca%20href%3D%27http%3A%5C%2F%5C%2Faclweb.org%5C%2Fanthology%5C%2FJ93-1003%27%3Ehttp%3A%5C%2F%5C%2Faclweb.org%5C%2Fanthology%5C%2FJ93-1003%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Accurate%20Methods%20for%20the%20Statistics%20of%20Surprise%20and%20Coincidence%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ted%22%2C%22lastName%22%3A%22Dunning%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%221993%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Faclweb.org%5C%2Fanthology%5C%2FJ93-1003%22%2C%22collections%22%3A%5B%22IUKRIB7T%22%2C%222CZHD96W%22%5D%2C%22dateModified%22%3A%222024-02-20T09%3A00%3A41Z%22%7D%7D%2C%7B%22key%22%3A%22RKF7VLQ9%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Cressie%20and%20Read%22%2C%22parsedDate%22%3A%221989%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ECressie%2C%20Noel%20A.%20C.%2C%20and%20Timothy%20R.%20C.%20Read%2C%20%26%23x2018%3BPearsons-X2%20and%20the%20Loglikelihood%20Ratio%20Statistic-G2%3A%20A%20Comparative%20Review%26%23x2019%3B%2C%201989%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.2307%5C%2F1403582%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.2307%5C%2F1403582%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Pearsons-X2%20and%20the%20loglikelihood%20ratio%20statistic-G2%3A%20a%20comparative%20review%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Noel%20A.%20C.%22%2C%22lastName%22%3A%22Cressie%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Timothy%20R.%20C.%22%2C%22lastName%22%3A%22Read%22%7D%5D%2C%22abstractNote%22%3A%22Summary%20The%20importance%20of%20developing%20useful%20and%20appropriate%20statistical%20methods%20for%20analyzing%20discrete%20multivariate%20data%20is%20apparent%20from%20the%20enormous%20amount%20of%20attention%20this%20subject%20has%20commanded%20in%20the%20literature%20over%20the%20last%20thirty%20years.%20Central%20to%20these%20discussions%20has%20been%20Pearson%27s%20X2%20statistic%20and%20the%20loglikelihood%20ratio%20statistic%20G2.%20Our%20review%20seeks%20to%20consolidate%20this%20fragmented%20literature%20and%20develop%20a%20unifying%20theme%20for%20much%20of%20this%20research.%20The%20traditional%20X2%20and%20G2%20statistics%20are%20viewed%20as%20members%20of%20the%20power-divergence%20family%20of%20statistics%2C%20and%20are%20linked%20through%20a%20single%20real-valued%20parameter.%20The%20principal%20areas%20covered%20in%20this%20comparative%20survey%20are%20small-sample%20comparisons%20of%20X2%20and%20G2%20under%20both%20classical%20%28fixed-cells%29%20assumptions%20and%20sparseness%20assumptions%2C%20efficiency%20comparisons%2C%20and%20various%20modifications%20to%20the%20test%20statistics%20%28including%20parameter%20estimation%20for%20ungrouped%20data%2C%20data-dependent%20and%20overlapping%20cell%20boundaries%2C%20serially%20dependent%20data%2C%20and%20smoothing%29.%20Finally%20some%20future%20areas%20for%20research%20are%20discussed.%22%2C%22date%22%3A%221989%22%2C%22proceedingsTitle%22%3A%22%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.2307%5C%2F1403582%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%222CZHD96W%22%2C%22NG7P7RZR%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A30%3A30Z%22%7D%7D%2C%7B%22key%22%3A%228A9H3P69%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Woolf%22%2C%22parsedDate%22%3A%221957%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EWoolf%2C%20Barnet%2C%20%26%23x2018%3BThe%20Log-Likelihood%20Ratio%20Test%20%28the%20G-Test%29%26%23x2019%3B%2C%20%3Ci%3EAnnals%20of%20Human%20Genetics%3C%5C%2Fi%3E%2C%2021.4%20%281957%29%2C%20397%26%23x2013%3B409%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1111%5C%2Fj.1469-1809.1972.tb00293.x%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1111%5C%2Fj.1469-1809.1972.tb00293.x%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22The%20log-likelihood%20ratio%20test%20%28the%20G-test%29%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Barnet%22%2C%22lastName%22%3A%22Woolf%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%2205%5C%2F1957%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1111%5C%2Fj.1469-1809.1972.tb00293.x%22%2C%22ISSN%22%3A%2200034800%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fdoi.wiley.com%5C%2F10.1111%5C%2Fj.1469-1809.1972.tb00293.x%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222020-07-07T13%3A19%3A46Z%22%7D%7D%5D%7D

				

  Peters, Christine, ‘Text Mining, Travel Writing, and the Semantics of the Global. An AntConc Analysis of Alexander von Humboldt’s Reise in Die Aequinoktial-Gegenden Des Neuen Kontinents’, in Digital Methods in the Humanities: Challenges, Ideas, Perspectives (Bielefeld: Bielefeld University Press, 2021), pp. 185–215

				
				

  Stefanowitsch, Anatol, ‘Text [Keyword Analysis]’, in Corpus Linguistics: A Guide to the Methodology, Textbooks in Language Sciences, 7 (LangSci Press, 2020), pp. 353–96

				
				

  Pojanapunya, Punjaporn, and Richard Watson Todd, ‘Log-Likelihood and Odds Ratio: Keyness Statistics for Different Purposes of Keyword Analysis’, Corpus Linguistics and Linguistic Theory, 14.1 (2018), 133–67 <https://doi.org/10.1515/cllt-2015-0030>

				
				

  Froehlich, Heather, ‘Corpus Analysis with Antconc’, Programming Historian, 2015 <https://programminghistorian.org/en/lessons/corpus-analysis-with-antconc> [accessed 15 February 2021]

				
				

  Lijffijt, Jefrey, Terttu Nevalainen, Tanja Säily, Panagiotis Papapetrou, Kai Puolamäki, and Heikki Mannila, ‘Significance Testing of Word Frequencies in Corpora’, Digital Scholarship in the Humanities, 31.2 (2014), 374–97 <https://doi.org/10.1093/llc/fqu064>

				
				

  Brezina, Vaclav, and Miriam Meyerhoff, ‘Significant or Random?: A Critical Review of Sociolinguistic Generalisations Based on Large Corpora’, International Journal of Corpus Linguistics, 19.1 (2014), 1–28 <https://doi.org/10.1075/ijcl.19.1.01bre>

				
				

  Paquot, Magali, and Yves Bestgen, ‘Distinctive Words in Academic Writing: A Comparison of Three Statistical Tests for Keyword Extraction’, in Corpora: Pragmatics and Discourse, ed. by Andreas H. Jucker, Daniel Schreier, and Marianne Hundt (Brill | Rodopi, 2009) <https://doi.org/10.1163/9789042029101_014>

				
				

  Chen, Francine R., Thorsten H. Brants, and Annie E. Zaenen, ‘Systems and Methods for Sentence Based Interactive Topic-Based Text Summarization’, 2008 <https://patents.google.com/patent/US7376893B2/en> [accessed 17 September 2019]

				
				

  Rayson, Paul, ‘From Key Words to Key Semantic Domains’, International Journal of Corpus Linguistics, 13.4 (2008), 519–49 <https://doi.org/10.1075/ijcl.13.4.06ray>

				
				

  Anthony, Laurence, ‘AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom’, 2005, pp. 729–37 <https://doi.org/10.1109/IPCC.2005.1494244>

				
				

  Rayson, Paul, ‘Wmatrix: A Web-Based Corpus Processing Environment.’ (Lancaster, UK: Computing Department, Lancaster University, 2005)

				
				

  Kilgarriff, Adam, ‘Language Is Never, Ever, Ever, Random’, Corpus Linguistics and Linguistic Theory, 1.2 (2005), 263–76 <https://doi.org/10.1515/cllt.2005.1.2.263>

				
				

  Gamon, Michael, ‘Linguistic Correlates of Style: Authorship Classification with Deep Linguistic Analysis Features’, in Proceedings of the 20th International Conference on Computational Linguistics, COLING ’04 (Stroudsburg, PA, USA: Association for Computational Linguistics, 2004) <https://doi.org/10.3115/1220355.1220443>

				
				

  Rayson, Paul, and Roger Garside, ‘Comparing Corpora Using Frequency Profiling’, in Proceedings of the Workshop on Comparing Corpora - Volume 9, WCC ’00 (Stroudsburg, PA, USA: Association for Computational Linguistics, 2000), pp. 1–6 <https://doi.org/10.3115/1117729.1117730>

				
				

  Scott, Mike, ‘PC Analysis of Key Words and Key Key Words’, System, 25.2 (1997), 233–45 <https://doi.org/10.1016/S0346-251X(97)00011-0>

				
				

  Dunning, Ted, ‘Accurate Methods for the Statistics of Surprise and Coincidence’, Computational Linguistics, 19.1 (1993), 14 <http://aclweb.org/anthology/J93-1003>

				
				

  Cressie, Noel A. C., and Timothy R. C. Read, ‘Pearsons-X2 and the Loglikelihood Ratio Statistic-G2: A Comparative Review’, 1989 <https://doi.org/10.2307/1403582>

				
				

  Woolf, Barnet, ‘The Log-Likelihood Ratio Test (the G-Test)’, Annals of Human Genetics, 21.4 (1957), 397–409 <https://doi.org/10.1111/j.1469-1809.1972.tb00293.x>