Type

Conference Proceedings

Authors

Johannes Leveling

Subjects

Linguistics

Topics
information retrieval cross language evaluation forum mean average precision n grams map english indexing cross language information retrieval

A comparison of sub-word indexing methods for information retrieval (2009)

Abstract This paper compares different methods of subword indexing and their performance on the English and German domain-specific document collection of the Cross-language Evaluation Forum (CLEF). Four major methods to index sub-words are investigated and compared to indexing stems: 1) sequences of vowels and consonants, 2) a dictionary-based approach for decompounding, 3) overlapping character n-grams, and 4) Knuth’s algorithm for hyphenation. The performance and effects of sub-word extraction on search time and index size and time are reported for English and German retrieval experiments. The main results are: For English, indexing sub-words does not outperform the baseline using standard retrieval on stemmed word forms (–8% mean average precision (MAP), – 11% geometric MAP (GMAP), +1% relevant and retrieved documents (rel ret) for the best experiment). For German, with the exception of n-grams, all methods for indexing sub-words achieve a higher performance than the stemming baseline. The best performing sub-word indexing methods are to use consonant-vowelconsonant sequences and index them together with word stems (+17% MAP, +37% GMAP, +14% rel ret compared to the baseline), or to index syllable-like sub-words obtained from the hyphenation algorithm together with stems (+9% MAP, +23% GMAP, +11% rel ret).
Collections Ireland -> Dublin City University -> Publication Type = Conference or Workshop Item
Ireland -> Dublin City University -> Subject = Computer Science
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres: Centre for Next Generation Localisation (CNGL)
Ireland -> Dublin City University -> Status = Published
Ireland -> Dublin City University -> Subject = Computer Science: Information retrieval
Ireland -> Dublin City University -> DCU Faculties and Centres = Research Initiatives and Centres

Full list of authors on original publication

Johannes Leveling

Experts in our system

1
Johannes Leveling
Dublin City University
Total Publications: 66