Paper Type |
Contributed Paper |
Title |
IR Enhancement Using a Classified Multi-modal $s$-gram Similarity Aggregation |
Author |
Pakinee Aimmanee and Thanaruk Theeramunkong |
Email |
pakinee@siit.tu.ac.th; thanaruk@siit.tu.ac.th |
Abstract: The s-gram or sn,k-gram is a generalization of n-gram term modeling obtained by allowing k-term skipping in the n-gram framework of a multi-modal sn,k-gram similarity combination, a combination of similarities between a document and a query encoded with several sn,k -grams with various n and k. Adjusting weights in the similarity aggregation enables us to create a suitable approximate matching model between a relevant document and a query although such document does not include any exact terms as in the query or vice versa. In the experiments, three different types of weightings are used and compared in the combination of similarities. Two collections that are alike in context but different in written languages (English and Thai) are the testing domain. The result shows that the proposed approach significantly outperforms the conventional approaches such as the unigram and bigram models. |
|
Start & End Page |
661 - 675 |
Received Date |
2011-08-25 |
Revised Date |
|
Accepted Date |
2013-05-08 |
Full Text |
Download |
Keyword |
|
Volume |
Vol.41 No.3 (JULY 2014) |
DOI |
|
Citation |
Aimmanee P. and Theeramunkong T., IR Enhancement Using a Classified Multi-modal $s$-gram Similarity Aggregation , Chiang Mai J. Sci., 2014; 41(3): 661-675. |
SDGs |
|
View:968 Download:147 |