TF-IDF – Zeta and Company

TF-IDF (Term Frequency Inverse Document Frequency) wurde zuerst von Luhn (1957) vorgeschlagen und von Spärck (1972) optimiert. Es liefert die Information darüber, wie wichtig ein Wort für ein Dokument in einer Textsammlung ist. Heute gibt es eine Vielzahl unterschiedlicher Varianten und Anwendungen des tf-idf-Maßes. Ein prominentes Beispiel ist der in der Python-Bibliothek „sklearn“ enthaltene Tf-idf-Vectorizer, der viele nützliche Parameter anbietet. Das in unserem Framework implementierte Tf-idf-Maß basiert auf dieser Anwendung.
Bibliografie



		2241481
		
		
		measure_tfidf
		
		
        
		
		modern-humanities-research-association
		50
		date
		desc
		
		
		
		
		
		
		
		
		
		
        
        576
		https://zeta-project.eu/wp-content/plugins/zotpress/

		
			
				%7B%22status%22%3A%22success%22%2C%22updateneeded%22%3Afalse%2C%22instance%22%3A%22zotpress-9e114a1c6dafc140caa38d085930b01c%22%2C%22meta%22%3A%7B%22request_last%22%3A0%2C%22request_next%22%3A0%2C%22used_cache%22%3Atrue%7D%2C%22data%22%3A%5B%7B%22key%22%3A%22WY2K4XTH%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Havrlant%20and%20Kreinovich%22%2C%22parsedDate%22%3A%222017-01-02%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EHavrlant%2C%20Luk%26%23xE1%3B%26%23x161%3B%2C%20and%20Vladik%20Kreinovich%2C%20%26%23x2018%3BA%20Simple%20Probabilistic%20Explanation%20of%20Term%20Frequency-Inverse%20Document%20Frequency%20%28Tf-Idf%29%20Heuristic%20%28and%20Variations%20Motivated%20by%20This%20Explanation%29%26%23x2019%3B%2C%20%3Ci%3EInternational%20Journal%20of%20General%20Systems%3C%5C%2Fi%3E%2C%2046.1%20%282017%29%2C%2027%26%23x2013%3B36%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1080%5C%2F03081079.2017.1291635%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1080%5C%2F03081079.2017.1291635%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22A%20simple%20probabilistic%20explanation%20of%20term%20frequency-inverse%20document%20frequency%20%28tf-idf%29%20heuristic%20%28and%20variations%20motivated%20by%20this%20explanation%29%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Luk%5Cu00e1%5Cu0161%22%2C%22lastName%22%3A%22Havrlant%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Vladik%22%2C%22lastName%22%3A%22Kreinovich%22%7D%5D%2C%22abstractNote%22%3A%22In%20document%20analysis%2C%20an%20important%20task%20is%20to%20automatically%20find%20keywords%20which%20best%20describe%20the%20subject%20of%20the%20document.%20One%20of%20the%20most%20widely%20used%20techniques%20for%20keyword%20detection%20is%20a%20technique%20based%20on%20the%20term%20frequency-inverse%20document%20frequency%20%28tf-idf%29%20heuristic.%20This%20techniques%20has%20some%20explanations%2C%20but%20these%20explanations%20are%20somewhat%20too%20complex%20to%20be%20fully%20convincing.%20In%20this%20paper%2C%20we%20provide%20a%20simple%20probabilistic%20explanation%20for%20the%20tf-idf%20heuristic.%20We%20also%20show%20that%20the%20ideas%20behind%20explanation%20can%20help%20us%20come%20up%20with%20more%20complex%20formulas%20which%20will%20hopefully%20lead%20to%20a%20more%20adequate%20detection%20of%20keywords.%22%2C%22date%22%3A%22January%202%2C%202017%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.1080%5C%2F03081079.2017.1291635%22%2C%22ISSN%22%3A%220308-1079%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1080%5C%2F03081079.2017.1291635%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222019-09-17T15%3A32%3A00Z%22%7D%7D%2C%7B%22key%22%3A%22HVAVPYWC%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Chen%20et%20al.%22%2C%22parsedDate%22%3A%222016-12-30%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EChen%2C%20Kewen%2C%20Zuping%20Zhang%2C%20Jun%20Long%2C%20and%20Hao%20Zhang%2C%20%26%23x2018%3BTurning%20from%20TF-IDF%20to%20TF-IGM%20for%20Term%20Weighting%20in%20Text%20Classification%26%23x2019%3B%2C%20%3Ci%3EExpert%20Systems%20with%20Applications%3C%5C%2Fi%3E%2C%2066%20%282016%29%2C%20245%26%23x2013%3B60%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1016%5C%2Fj.eswa.2016.09.009%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1016%5C%2Fj.eswa.2016.09.009%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Turning%20from%20TF-IDF%20to%20TF-IGM%20for%20term%20weighting%20in%20text%20classification%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kewen%22%2C%22lastName%22%3A%22Chen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zuping%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Jun%22%2C%22lastName%22%3A%22Long%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hao%22%2C%22lastName%22%3A%22Zhang%22%7D%5D%2C%22abstractNote%22%3A%22Massive%20textual%20data%20management%20and%20mining%20usually%20rely%20on%20automatic%20text%20classification%20technology.%20Term%20weighting%20is%20a%20basic%20problem%20in%20text%20classification%20and%20directly%20affects%20the%20classification%20accuracy.%20Since%20the%20traditional%20TF-IDF%20%28term%20frequency%20%26%20inverse%20document%20frequency%29%20is%20not%20fully%20effective%20for%20text%20classification%2C%20various%20alternatives%20have%20been%20proposed%20by%20researchers.%20In%20this%20paper%20we%20make%20comparative%20studies%20on%20different%20term%20weighting%20schemes%20and%20propose%20a%20new%20term%20weighting%20scheme%2C%20TF-IGM%20%28term%20frequency%20%26%20inverse%20gravity%20moment%29%2C%20as%20well%20as%20its%20variants.%20TF-IGM%20incorporates%20a%20new%20statistical%20model%20to%20precisely%20measure%20the%20class%20distinguishing%20power%20of%20a%20term.%20Particularly%2C%20it%20makes%20full%20use%20of%20the%20fine-grained%20term%20distribution%20across%20different%20classes%20of%20text.%20The%20effectiveness%20of%20TF-IGM%20is%20validated%20by%20extensive%20experiments%20of%20text%20classification%20using%20SVM%20%28support%20vector%20machine%29%20and%20kNN%20%28k%20nearest%20neighbors%29%20classifiers%20on%20three%20commonly%20used%20corpora.%20The%20experimental%20results%20show%20that%20TF-IGM%20outperforms%20the%20famous%20TF-IDF%20and%20the%20state-of-the-art%20supervised%20term%20weighting%20schemes.%20In%20addition%2C%20some%20new%20findings%20different%20from%20previous%20studies%20are%20obtained%20and%20analyzed%20in%20depth%20in%20the%20paper.%22%2C%22date%22%3A%22December%2030%2C%202016%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%2210.1016%5C%2Fj.eswa.2016.09.009%22%2C%22ISSN%22%3A%220957-4174%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fwww.sciencedirect.com%5C%2Fscience%5C%2Farticle%5C%2Fpii%5C%2FS0957417416304870%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A36Z%22%7D%7D%2C%7B%22key%22%3A%225XCNXKUS%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Albitar%20et%20al.%22%2C%22parsedDate%22%3A%222014%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EAlbitar%2C%20Shereen%2C%20S%26%23xE9%3Bbastien%20Fournier%2C%20and%20Bernard%20Espinasse%2C%20%26%23x2018%3BAn%20Effective%20TF%5C%2FIDF-Based%20Text-to-Text%20Semantic%20Similarity%20Measure%20for%20Text%20Classification%26%23x2019%3B%2C%20in%20%3Ci%3EWeb%20Information%20Systems%20Engineering%20%26%23x2013%3B%20WISE%202014%3C%5C%2Fi%3E%2C%20ed.%20by%20Boualem%20Benatallah%2C%20Azer%20Bestavros%2C%20Yannis%20Manolopoulos%2C%20Athena%20Vakali%2C%20and%20Yanchun%20Zhang%20%28Cham%3A%20Springer%20International%20Publishing%2C%202014%29%2C%20105%26%23x2013%3B14%20%26lt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-319-11749-2_8%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22bookSection%22%2C%22title%22%3A%22An%20Effective%20TF%5C%2FIDF-Based%20Text-to-Text%20Semantic%20Similarity%20Measure%20for%20Text%20Classification%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Boualem%22%2C%22lastName%22%3A%22Benatallah%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Azer%22%2C%22lastName%22%3A%22Bestavros%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Yannis%22%2C%22lastName%22%3A%22Manolopoulos%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Athena%22%2C%22lastName%22%3A%22Vakali%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Yanchun%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Shereen%22%2C%22lastName%22%3A%22Albitar%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22S%5Cu00e9bastien%22%2C%22lastName%22%3A%22Fournier%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Bernard%22%2C%22lastName%22%3A%22Espinasse%22%7D%5D%2C%22abstractNote%22%3A%22The%20use%20of%20semantics%20in%20tasks%20related%20to%20information%20retrieval%20has%20become%2C%20in%20recent%20years%2C%20a%20vast%20field%20of%20research.%20Considering%20supervised%20text%20classification%2C%20which%20is%20the%20main%20interest%20of%20this%20work%2C%20semantics%20can%20be%20involved%20at%20different%20steps%20of%20text%20processing%3A%20during%20indexing%20step%2C%20during%20training%20step%20and%20during%20class%20prediction%20step.%20As%20for%20class%20prediction%20step%2C%20new%20text-to-text%20semantic%20similarity%20measures%20can%20replace%20classical%20similarity%20measures%20that%20are%20traditionally%20used%20by%20some%20classification%20methods%20for%20decision-making.%20In%20this%20paper%20we%20propose%20a%20new%20measure%20for%20assessing%20semantic%20similarity%20between%20texts%20based%20on%20TF%5C%2FIDF%20with%20a%20new%20function%20that%20aggregates%20semantic%20similarities%20between%20concepts%20representing%20the%20compared%20text%20documents%20pair-to-pair.%20Experimental%20results%20demonstrate%20that%20our%20measure%20outperforms%20other%20semantic%20and%20classical%20measures%20with%20significant%20improvements.%22%2C%22bookTitle%22%3A%22Web%20Information%20Systems%20Engineering%20%5Cu2013%20WISE%202014%22%2C%22date%22%3A%222014%22%2C%22language%22%3A%22%22%2C%22ISBN%22%3A%22978-3-319-11748-5%20978-3-319-11749-2%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Flink.springer.com%5C%2F10.1007%5C%2F978-3-319-11749-2_8%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A36Z%22%7D%7D%2C%7B%22key%22%3A%224IVNQ38D%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Zhang%20et%20al.%22%2C%22parsedDate%22%3A%222011%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EZhang%2C%20Wen%2C%20Taketoshi%20Yoshida%2C%20and%20Xijin%20Tang%2C%20%26%23x2018%3BA%20Comparative%20Study%20of%20TF%2AIDF%2C%20LSI%20and%20Multi-Words%20for%20Text%20Classification%26%23x2019%3B%2C%20%3Ci%3EExpert%20Systems%20with%20Applications%3C%5C%2Fi%3E%2C%2038.3%20%282011%29%2C%202758%26%23x2013%3B65%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1016%5C%2Fj.eswa.2010.08.066%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1016%5C%2Fj.eswa.2010.08.066%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22A%20comparative%20study%20of%20TF%2AIDF%2C%20LSI%20and%20multi-words%20for%20text%20classification%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Wen%22%2C%22lastName%22%3A%22Zhang%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Taketoshi%22%2C%22lastName%22%3A%22Yoshida%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xijin%22%2C%22lastName%22%3A%22Tang%22%7D%5D%2C%22abstractNote%22%3A%22One%20of%20the%20main%20themes%20in%20text%20mining%20is%20text%20representation%2C%20which%20is%20fundamental%20and%20indispensable%20for%20text-based%20intellegent%20information%20processing.%20Generally%2C%20text%20representation%20inludes%20two%20tasks%3A%20indexing%20and%20weighting.%20This%20paper%20has%20comparatively%20studied%20TF%2AIDF%2C%20LSI%20and%20multi-word%20for%20text%20representation.%20We%20used%20a%20Chinese%20and%20an%20English%20document%20collection%20to%20respectively%20evaluate%20the%20three%20methods%20in%20information%20retreival%20and%20text%20categorization.%20Experimental%20results%20have%20demonstrated%20that%20in%20text%20categorization%2C%20LSI%20has%20better%20performance%20than%20other%20methods%20in%20both%20document%20collections.%20Also%2C%20LSI%20has%20produced%20the%20best%20performance%20in%20retrieving%20English%20documents.%20This%20outcome%20has%20shown%20that%20LSI%20has%20both%20favorable%20semantic%20and%20statistical%20quality%20and%20is%20different%20with%20the%20claim%20that%20LSI%20can%20not%20produce%20discriminative%20power%20for%20indexing.%22%2C%22date%22%3A%223%5C%2F2011%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1016%5C%2Fj.eswa.2010.08.066%22%2C%22ISSN%22%3A%2209574174%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Flinkinghub.elsevier.com%5C%2Fretrieve%5C%2Fpii%5C%2FS0957417410008626%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-02-14T14%3A31%3A51Z%22%7D%7D%2C%7B%22key%22%3A%22HQ86UP7H%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Wu%20et%20al.%22%2C%22parsedDate%22%3A%222008-06-01%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EWu%2C%20Ho%20Chung%2C%20Robert%20Wing%20Pong%20Luk%2C%20Kam%20Fai%20Wong%2C%20and%20Kui%20Lam%20Kwok%2C%20%26%23x2018%3BInterpreting%20TF-IDF%20Term%20Weights%20as%20Making%20Relevance%20Decisions%26%23x2019%3B%2C%20%3Ci%3EACM%20Transactions%20on%20Information%20Systems%3C%5C%2Fi%3E%2C%2026.3%20%282008%29%2C%201%26%23x2013%3B37%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F1361684.1361686%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1145%5C%2F1361684.1361686%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Interpreting%20TF-IDF%20term%20weights%20as%20making%20relevance%20decisions%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Ho%20Chung%22%2C%22lastName%22%3A%22Wu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Robert%20Wing%20Pong%22%2C%22lastName%22%3A%22Luk%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kam%20Fai%22%2C%22lastName%22%3A%22Wong%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kui%20Lam%22%2C%22lastName%22%3A%22Kwok%22%7D%5D%2C%22abstractNote%22%3A%22A%20novel%20probabilistic%20retrieval%20model%20is%20presented.%20It%20forms%20a%20basis%20to%20interpret%20the%20TF-IDF%20term%20weights%20as%20making%20relevance%20decisions.%20It%20simulates%20the%20local%20relevance%20decision-making%20for%20every%20location%20of%20a%20document%2C%20and%20combines%20all%20of%20these%20%5Cu201clocal%5Cu201d%20relevance%20decisions%20as%20the%20%5Cu201cdocument-wide%5Cu201d%20relevance%20decision%20for%20the%20document.%20The%20significance%20of%20interpreting%20TF-IDF%20in%20this%20way%20is%20the%20potential%20to%3A%20%281%29%20establish%20a%20unifying%20perspective%20about%20information%20retrieval%20as%20relevance%20decision-making%3B%20and%20%282%29%20develop%20advanced%20TF-IDF-related%20term%20weights%20for%20future%20elaborate%20retrieval%20models.%20Our%20novel%20retrieval%20model%20is%20simplified%20to%20a%20basic%20ranking%20formula%20that%20directly%20corresponds%20to%20the%20TF-IDF%20term%20weights.%20In%20general%2C%20we%20show%20that%20the%20term-frequency%20factor%20of%20the%20ranking%20formula%20can%20be%20rendered%20into%20different%20term-frequency%20factors%20of%20existing%20retrieval%20systems.%20In%20the%20basic%20ranking%20formula%2C%20the%20remaining%20quantity%20-%20log%20p%28r%5Cu00af%7Ct%20%5Cu2208%20d%29%20is%20interpreted%20as%20the%20probability%20of%20randomly%20picking%20a%20nonrelevant%20usage%20%28denoted%20by%20r%5Cu00af%29%20of%20term%20t.%20Mathematically%2C%20we%20show%20that%20this%20quantity%20can%20be%20approximated%20by%20the%20inverse%20document-frequency%20%28IDF%29.%20Empirically%2C%20we%20show%20that%20this%20quantity%20is%20related%20to%20IDF%2C%20using%20four%20reference%20TREC%20ad%20hoc%20retrieval%20data%20collections.%22%2C%22date%22%3A%222008-06-01%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1145%5C%2F1361684.1361686%22%2C%22ISSN%22%3A%2210468188%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Fportal.acm.org%5C%2Fcitation.cfm%3Fdoid%3D1361684.1361686%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-02-14T14%3A31%3A34Z%22%7D%7D%2C%7B%22key%22%3A%22AUYAWN9G%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Achananuparp%20et%20al.%22%2C%22parsedDate%22%3A%222008%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EAchananuparp%2C%20Palakorn%2C%20Xiaohua%20Hu%2C%20and%20Xiajiong%20Shen%2C%20%26%23x2018%3BThe%20Evaluation%20of%20Sentence%20Similarity%20Measures%26%23x2019%3B%2C%20in%20%3Ci%3EData%20Warehousing%20and%20Knowledge%20Discovery%3C%5C%2Fi%3E%2C%20ed.%20by%20Il-Yeol%20Song%2C%20Johann%20Eder%2C%20and%20Tho%20Manh%20Nguyen%20%28Berlin%2C%20Heidelberg%3A%20Springer%20Berlin%20Heidelberg%2C%202008%29%2C%20%3Cspan%20style%3D%5C%22font-variant%3Asmall-caps%3B%5C%22%3Emmmmmclxxxii%3C%5C%2Fspan%3E%2C%20305%26%23x2013%3B16%20%26lt%3Bhttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2F978-3-540-85836-2_29%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22bookSection%22%2C%22title%22%3A%22The%20Evaluation%20of%20Sentence%20Similarity%20Measures%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Il-Yeol%22%2C%22lastName%22%3A%22Song%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Johann%22%2C%22lastName%22%3A%22Eder%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Tho%20Manh%22%2C%22lastName%22%3A%22Nguyen%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Palakorn%22%2C%22lastName%22%3A%22Achananuparp%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xiaohua%22%2C%22lastName%22%3A%22Hu%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Xiajiong%22%2C%22lastName%22%3A%22Shen%22%7D%5D%2C%22abstractNote%22%3A%22The%20ability%20to%20accurately%20judge%20the%20similarity%20between%20natural%20language%20sentences%20is%20critical%20to%20the%20performance%20of%20several%20applications%20such%20as%20text%20mining%2C%20question%20answering%2C%20and%20text%20summarization.%20Given%20two%20sentences%2C%20an%20effective%20similarity%20measure%20should%20be%20able%20to%20determine%20whether%20the%20sentences%20are%20semantically%20equivalent%20or%20not%2C%20taking%20into%20account%20the%20variability%20of%20natural%20language%20expression.%20That%20is%2C%20the%20correct%20similarity%20judgment%20should%20be%20made%20even%20if%20the%20sentences%20do%20not%20share%20similar%20surface%20form.%20In%20this%20work%2C%20we%20evaluate%20fourteen%20existing%20text%20similarity%20measures%20which%20have%20been%20used%20to%20calculate%20similarity%20score%20between%20sentences%20in%20many%20text%20applications.%20The%20evaluation%20is%20conducted%20on%20three%20different%20data%20sets%2C%20TREC9%20question%20variants%2C%20Microsoft%20Research%20paraphrase%20corpus%2C%20and%20the%20third%20recognizing%20textual%20entailment%20data%20set.%22%2C%22bookTitle%22%3A%22Data%20Warehousing%20and%20Knowledge%20Discovery%22%2C%22date%22%3A%222008%22%2C%22language%22%3A%22en%22%2C%22ISBN%22%3A%22978-3-540-85835-5%20978-3-540-85836-2%22%2C%22url%22%3A%22http%3A%5C%2F%5C%2Flink.springer.com%5C%2F10.1007%5C%2F978-3-540-85836-2_29%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A36Z%22%7D%7D%2C%7B%22key%22%3A%22D8T9WBNM%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Yun-tao%20et%20al.%22%2C%22parsedDate%22%3A%222005-08-01%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EYun-tao%2C%20Zhang%2C%20Gong%20Ling%2C%20and%20Wang%20Yong-cheng%2C%20%26%23x2018%3BAn%20Improved%20TF-IDF%20Approach%20for%20Text%20Classification%26%23x2019%3B%2C%20%3Ci%3EJournal%20of%20Zhejiang%20University-SCIENCE%20A%3C%5C%2Fi%3E%2C%206.1%20%282005%29%2C%2049%26%23x2013%3B55%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2FBF02842477%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2FBF02842477%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22An%20improved%20TF-IDF%20approach%20for%20text%20classification%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Zhang%22%2C%22lastName%22%3A%22Yun-tao%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Gong%22%2C%22lastName%22%3A%22Ling%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Wang%22%2C%22lastName%22%3A%22Yong-cheng%22%7D%5D%2C%22abstractNote%22%3A%22This%20paper%20presents%20a%20new%20improved%20term%20frequency%5C%2Finverse%20document%20frequency%20%28TF-IDF%29%20approach%20which%20uses%20confidence%2C%20support%20and%20characteristic%20words%20to%20enhance%20the%20recall%20and%20precision%20of%20text%20classification.%20Synonyms%20defined%20by%20a%20lexicon%20are%20processed%20in%20the%20improved%20TF-IDF%20approach.%20We%20detailedly%20discuss%20and%20analyze%20the%20relationship%20among%20confidence%2C%20recall%20and%20precision.%20The%20experiments%20based%20on%20science%20and%20technology%20gave%20promising%20results%20that%20the%20new%20TF-IDF%20approach%20improves%20the%20precision%20and%20recall%20of%20text%20classification%20compared%20with%20the%20conventional%20TF-IDF%20approach.%22%2C%22date%22%3A%222005-08-01%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1007%5C%2FBF02842477%22%2C%22ISSN%22%3A%221862-1775%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1007%5C%2FBF02842477%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-02-14T14%3A35%3A35Z%22%7D%7D%2C%7B%22key%22%3A%228INNQ6MP%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Jones%22%2C%22parsedDate%22%3A%222004-10-01%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EJones%2C%20Karen%20Sp%26%23xE4%3Brck%2C%20%26%23x2018%3BIDF%20Term%20Weighting%20and%20IR%20Research%20Lessons%26%23x2019%3B%2C%20%3Ci%3EJournal%20of%20Documentation%3C%5C%2Fi%3E%2C%202004%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1108%5C%2F00220410410560591%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1108%5C%2F00220410410560591%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22IDF%20term%20weighting%20and%20IR%20research%20lessons%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Karen%20Sp%5Cu00e4rck%22%2C%22lastName%22%3A%22Jones%22%7D%5D%2C%22abstractNote%22%3A%22Robertson%20comments%20on%20the%20theoretical%20status%20of%20IDF%20term%20weighting.%20Its%20history%20illustrates%20how%20ideas%20develop%20in%20a%20specific%20research%20context%2C%20in%20theory%5C%2Fexperiment%20interaction%2C%20and%20in%20operational%20practice.%22%2C%22date%22%3A%222004-10-01T00%3A00%3A00Z%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1108%5C%2F00220410410560591%22%2C%22ISSN%22%3A%220022-0418%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.emerald.com%5C%2Finsight%5C%2Fcontent%5C%2Fdoi%5C%2F10.1108%5C%2F00220410410560591%5C%2Ffull%5C%2Fhtml%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A36Z%22%7D%7D%2C%7B%22key%22%3A%22GHPN4JZ7%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Robertson%22%2C%22parsedDate%22%3A%222004%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ERobertson%2C%20Stephen%2C%20%26%23x2018%3BUnderstanding%20Inverse%20Document%20Frequency%3A%20On%20Theoretical%20Arguments%20for%20IDF%26%23x2019%3B%2C%20%3Ci%3EJournal%20of%20Documentation%3C%5C%2Fi%3E%2C%2060.5%20%282004%29%2C%20503%26%23x2013%3B20%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1108%5C%2F00220410410560582%27%3Ehttps%3A%5C%2F%5C%2Fdoi.org%5C%2F10.1108%5C%2F00220410410560582%3C%5C%2Fa%3E%26gt%3B%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22Understanding%20inverse%20document%20frequency%3A%20on%20theoretical%20arguments%20for%20IDF%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Stephen%22%2C%22lastName%22%3A%22Robertson%22%7D%5D%2C%22abstractNote%22%3A%22The%20term%5Cu2010weighting%20function%20known%20as%20IDF%20was%20proposed%20in%201972%2C%20and%20has%20since%20been%20extremely%20widely%20used%2C%20usually%20as%20part%20of%20a%20TF%2AIDF%20function.%20It%20is%20often%20described%20as%20a%20heuristic%2C%20and%20many%20papers%20have%20been%20written%20%28some%20based%20on%20Shannon%27s%20Information%20Theory%29%20seeking%20to%20establish%20some%20theoretical%20basis%20for%20it.%20Some%20of%20these%20attempts%20are%20reviewed%2C%20and%20it%20is%20shown%20that%20the%20Information%20Theory%20approaches%20are%20problematic%2C%20but%20that%20there%20are%20good%20theoretical%20justifications%20of%20both%20IDF%20and%20TF%2AIDF%20in%20the%20traditional%20probabilistic%20model%20of%20information%20retrieval.%22%2C%22date%22%3A%2210%5C%2F2004%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%2210.1108%5C%2F00220410410560582%22%2C%22ISSN%22%3A%220022-0418%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.emeraldinsight.com%5C%2Fdoi%5C%2F10.1108%5C%2F00220410410560582%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A36Z%22%7D%7D%2C%7B%22key%22%3A%22P26U78Q7%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Ramos%22%2C%22parsedDate%22%3A%222003%22%2C%22numChildren%22%3A1%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ERamos%2C%20Juan%20Enrique%2C%20%26%23x2018%3BUsing%20TF-IDF%20to%20Determine%20Word%20Relevance%20in%20Document%20Queries%26%23x2019%3B%2C%202003%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Using%20TF-IDF%20to%20Determine%20Word%20Relevance%20in%20Document%20Queries%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Juan%20Enrique%22%2C%22lastName%22%3A%22Ramos%22%7D%5D%2C%22abstractNote%22%3A%22In%20this%20paper%2C%20we%20examine%20the%20results%20of%20applying%20Term%20Frequency%20Inverse%20Document%20Frequency%20%28TF-IDF%29%20to%20determine%20what%20words%20in%20a%20corpus%20of%20documents%20might%20be%20more%20favorable%20to%20use%20in%20a%20query.%20As%20the%20term%20implies%2C%20TF-IDF%20calculates%20values%20for%20each%20word%20in%20a%20document%20through%20an%20inverse%20proportion%20of%20the%20frequency%20of%20the%20word%20in%20a%20particular%20document%20to%20the%20percentage%20of%20documents%20the%20word%20appears%20in.%20Words%20with%20high%20TF-IDF%20numbers%20imply%20a%20strong%20relationship%20with%20the%20document%20they%20appear%20in%2C%20suggesting%20that%20if%20that%20word%20were%20to%20appear%20in%20a%20query%2C%20the%20document%20could%20be%20of%20interest%20to%20the%20user.%20We%20provide%20evidence%20that%20this%20simple%20algorithm%20efficiently%20categorizes%20relevant%20words%20that%20can%20enhance%20query%20retrieval.%22%2C%22date%22%3A%222003%22%2C%22proceedingsTitle%22%3A%22%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-02-14T14%3A38%3A20Z%22%7D%7D%2C%7B%22key%22%3A%22ZDY855LD%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22creatorSummary%22%3A%22Church%20and%20Gale%22%2C%22parsedDate%22%3A%221995%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3EChurch%2C%20Kenneth%2C%20and%20William%20Gale%2C%20%26%23x2018%3BInverse%20Document%20Frequency%20%28IDF%29%3A%20A%20Measure%20of%20Deviations%20from%20Poisson%26%23x2019%3B%2C%20in%20%3Ci%3EThird%20Workshop%20on%20Very%20Large%20Corpora%3C%5C%2Fi%3E%2C%201995%20%26lt%3B%3Ca%20href%3D%27https%3A%5C%2F%5C%2Fwww.aclweb.org%5C%2Fanthology%5C%2FW95-0110%27%3Ehttps%3A%5C%2F%5C%2Fwww.aclweb.org%5C%2Fanthology%5C%2FW95-0110%3C%5C%2Fa%3E%26gt%3B%20%5Baccessed%2012%20June%202020%5D%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Inverse%20Document%20Frequency%20%28IDF%29%3A%20A%20Measure%20of%20Deviations%20from%20Poisson%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Kenneth%22%2C%22lastName%22%3A%22Church%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22William%22%2C%22lastName%22%3A%22Gale%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%221995%22%2C%22proceedingsTitle%22%3A%22Third%20Workshop%20on%20Very%20Large%20Corpora%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22%22%2C%22url%22%3A%22https%3A%5C%2F%5C%2Fwww.aclweb.org%5C%2Fanthology%5C%2FW95-0110%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222020-06-12T09%3A56%3A37Z%22%7D%7D%2C%7B%22key%22%3A%22FNUD3HQQ%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A228821%2C%22username%22%3A%22christof.s%22%2C%22name%22%3A%22Christof%20Sch%5Cu00f6ch%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fchristof.s%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Robertson%20and%20Walker%22%2C%22parsedDate%22%3A%221994%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ERobertson%2C%20S.%20E.%2C%20and%20S.%20Walker%2C%20%26%23x2018%3BSome%20Simple%20Effective%20Approximations%20to%20the%202-Poisson%20Model%20for%20Probabilistic%20Weighted%20Retrieval%26%23x2019%3B%2C%20in%20%3Ci%3ESIGIR%20%26%23x2019%3B94%3C%5C%2Fi%3E%2C%20ed.%20by%20Bruce%20W.%20Croft%20and%20C.%20J.%20van%20Rijsbergen%20%28Springer%20London%2C%201994%29%2C%20pp.%20232%26%23x2013%3B41%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22conferencePaper%22%2C%22title%22%3A%22Some%20Simple%20Effective%20Approximations%20to%20the%202-Poisson%20Model%20for%20Probabilistic%20Weighted%20Retrieval%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22S.%20E.%22%2C%22lastName%22%3A%22Robertson%22%7D%2C%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22S.%22%2C%22lastName%22%3A%22Walker%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22Bruce%20W.%22%2C%22lastName%22%3A%22Croft%22%7D%2C%7B%22creatorType%22%3A%22editor%22%2C%22firstName%22%3A%22C.%20J.%22%2C%22lastName%22%3A%22van%20Rijsbergen%22%7D%5D%2C%22abstractNote%22%3A%22The%202-Poisson%20model%20for%20term%20frequencies%20is%20used%20to%20suggest%20ways%20of%20incorporating%20certain%20variables%20in%20probabilistic%20models%20for%20information%20retrieval.%20The%20variables%20concerned%20are%20within-document%20term%20frequency%2C%20document%20length%2C%20and%20within-query%20term%20frequency.%20Simple%20weighting%20functions%20are%20developed%2C%20and%20tested%20on%20the%20TREC%20test%20collection.%20Considerable%20performance%20improvements%20%28over%20simple%20inverse%20collection%20frequency%20weighting%29%20are%20demonstrated.%22%2C%22date%22%3A%221994%22%2C%22proceedingsTitle%22%3A%22SIGIR%20%5Cu201994%22%2C%22conferenceName%22%3A%22%22%2C%22language%22%3A%22en%22%2C%22DOI%22%3A%22%22%2C%22ISBN%22%3A%22978-1-4471-2099-5%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%22GA9F3B2S%22%5D%2C%22dateModified%22%3A%222020-10-13T05%3A29%3A36Z%22%7D%7D%2C%7B%22key%22%3A%22XRDF9PWT%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A1194126%2C%22username%22%3A%22dkltimon%22%2C%22name%22%3A%22Keli%20Du%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fdkltimon%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Sp%5Cu00e4rck%20Jones%22%2C%22parsedDate%22%3A%221972%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ESp%26%23xE4%3Brck%20Jones%2C%20Karen%2C%20%26%23x2018%3BA%20Statistical%20Interpretation%20of%20Term%20Specificity%20and%20Its%20Application%20in%20Retrieval.%26%23x2019%3B%2C%20%3Ci%3EJournal%20of%20Documentation%3C%5C%2Fi%3E%2C%2028%20%281972%29%2C%2011%26%23x2013%3B21%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22A%20statistical%20interpretation%20of%20term%20specificity%20and%20its%20application%20in%20retrieval.%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Karen%22%2C%22lastName%22%3A%22Sp%5Cu00e4rck%20Jones%22%7D%5D%2C%22abstractNote%22%3A%22Reprinted%20in%20Journal%20of%20Documentation%2C%2060.%20493-502%2C%202004.%22%2C%22date%22%3A%221972%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%22IUKRIB7T%22%5D%2C%22dateModified%22%3A%222021-12-19T18%3A20%3A05Z%22%7D%7D%2C%7B%22key%22%3A%22586KKGL8%22%2C%22library%22%3A%7B%22id%22%3A2241481%7D%2C%22meta%22%3A%7B%22lastModifiedByUser%22%3A%7B%22id%22%3A5935700%2C%22username%22%3A%22yulyadudar%22%2C%22name%22%3A%22Iuliia%20Dudar%22%2C%22links%22%3A%7B%22alternate%22%3A%7B%22href%22%3A%22https%3A%5C%2F%5C%2Fwww.zotero.org%5C%2Fyulyadudar%22%2C%22type%22%3A%22text%5C%2Fhtml%22%7D%7D%7D%2C%22creatorSummary%22%3A%22Luhn%22%2C%22parsedDate%22%3A%221957%22%2C%22numChildren%22%3A0%7D%2C%22bib%22%3A%22%3Cdiv%20class%3D%5C%22csl-bib-body%5C%22%20style%3D%5C%22line-height%3A%201.35%3B%20padding-left%3A%201em%3B%20text-indent%3A-1em%3B%5C%22%3E%5Cn%20%20%3Cdiv%20class%3D%5C%22csl-entry%5C%22%3ELuhn%2C%20Hans%20Peter%2C%20%26%23x2018%3BA%20Statistical%20Approach%20to%20Mechanized%20Encoding%20and%20Searching%20of%20Literary%20Information%26%23x2019%3B%2C%20%3Ci%3EIBM%20Journal%20of%20Research%20and%20Development%3C%5C%2Fi%3E%2C%201.4%20%281957%29%2C%20309%26%23x2013%3B17%3C%5C%2Fdiv%3E%5Cn%3C%5C%2Fdiv%3E%22%2C%22data%22%3A%7B%22itemType%22%3A%22journalArticle%22%2C%22title%22%3A%22A%20statistical%20approach%20to%20mechanized%20encoding%20and%20searching%20of%20literary%20information%22%2C%22creators%22%3A%5B%7B%22creatorType%22%3A%22author%22%2C%22firstName%22%3A%22Hans%20Peter%22%2C%22lastName%22%3A%22Luhn%22%7D%5D%2C%22abstractNote%22%3A%22%22%2C%22date%22%3A%221957%22%2C%22language%22%3A%22%22%2C%22DOI%22%3A%22%22%2C%22ISSN%22%3A%22%22%2C%22url%22%3A%22%22%2C%22collections%22%3A%5B%5D%2C%22dateModified%22%3A%222022-01-05T19%3A18%3A32Z%22%7D%7D%5D%7D

				

  Havrlant, Lukáš, and Vladik Kreinovich, ‘A Simple Probabilistic Explanation of Term Frequency-Inverse Document Frequency (Tf-Idf) Heuristic (and Variations Motivated by This Explanation)’, International Journal of General Systems, 46.1 (2017), 27–36 <https://doi.org/10.1080/03081079.2017.1291635>

				
				

  Chen, Kewen, Zuping Zhang, Jun Long, and Hao Zhang, ‘Turning from TF-IDF to TF-IGM for Term Weighting in Text Classification’, Expert Systems with Applications, 66 (2016), 245–60 <https://doi.org/10.1016/j.eswa.2016.09.009>

				
				

  Albitar, Shereen, Sébastien Fournier, and Bernard Espinasse, ‘An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification’, in Web Information Systems Engineering – WISE 2014, ed. by Boualem Benatallah, Azer Bestavros, Yannis Manolopoulos, Athena Vakali, and Yanchun Zhang (Cham: Springer International Publishing, 2014), 105–14 <https://doi.org/10.1007/978-3-319-11749-2_8>

				
				

  Zhang, Wen, Taketoshi Yoshida, and Xijin Tang, ‘A Comparative Study of TF*IDF, LSI and Multi-Words for Text Classification’, Expert Systems with Applications, 38.3 (2011), 2758–65 <https://doi.org/10.1016/j.eswa.2010.08.066>

				
				

  Wu, Ho Chung, Robert Wing Pong Luk, Kam Fai Wong, and Kui Lam Kwok, ‘Interpreting TF-IDF Term Weights as Making Relevance Decisions’, ACM Transactions on Information Systems, 26.3 (2008), 1–37 <https://doi.org/10.1145/1361684.1361686>

				
				

  Achananuparp, Palakorn, Xiaohua Hu, and Xiajiong Shen, ‘The Evaluation of Sentence Similarity Measures’, in Data Warehousing and Knowledge Discovery, ed. by Il-Yeol Song, Johann Eder, and Tho Manh Nguyen (Berlin, Heidelberg: Springer Berlin Heidelberg, 2008), mmmmmclxxxii, 305–16 <https://doi.org/10.1007/978-3-540-85836-2_29>

				
				

  Yun-tao, Zhang, Gong Ling, and Wang Yong-cheng, ‘An Improved TF-IDF Approach for Text Classification’, Journal of Zhejiang University-SCIENCE A, 6.1 (2005), 49–55 <https://doi.org/10.1007/BF02842477>

				
				

  Jones, Karen Spärck, ‘IDF Term Weighting and IR Research Lessons’, Journal of Documentation, 2004 <https://doi.org/10.1108/00220410410560591>

				
				

  Robertson, Stephen, ‘Understanding Inverse Document Frequency: On Theoretical Arguments for IDF’, Journal of Documentation, 60.5 (2004), 503–20 <https://doi.org/10.1108/00220410410560582>

				
				

  Ramos, Juan Enrique, ‘Using TF-IDF to Determine Word Relevance in Document Queries’, 2003

				
				

  Church, Kenneth, and William Gale, ‘Inverse Document Frequency (IDF): A Measure of Deviations from Poisson’, in Third Workshop on Very Large Corpora, 1995 <https://www.aclweb.org/anthology/W95-0110> [accessed 12 June 2020]

				
				

  Robertson, S. E., and S. Walker, ‘Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval’, in SIGIR ’94, ed. by Bruce W. Croft and C. J. van Rijsbergen (Springer London, 1994), pp. 232–41

				
				

  Spärck Jones, Karen, ‘A Statistical Interpretation of Term Specificity and Its Application in Retrieval.’, Journal of Documentation, 28 (1972), 11–21

				
				

  Luhn, Hans Peter, ‘A Statistical Approach to Mechanized Encoding and Searching of Literary Information’, IBM Journal of Research and Development, 1.4 (1957), 309–17