Wikification via Co-occurrence


Wikification, which stands for the process of linking terms in a plain text document to Wikipedia articles which represent the correct meanings of the terms, can be thought of as a generalized Word Sense Disambiguation problem. It disambiguates multi-word ex- pressions (MWEs) in addition to single words. Existing Wikification techniques either models the context of a given term as well as the Wikipedia article as bags of words, or compute global constraints among Wikipedia concepts by the link graph or link distributions. The first method doesn’t achieve good results because the MWEs can have very different meanings than its constituent words which themselves are ambiguous. The second method doesn’t produce high accuracy because the link structure or link distribution is often biased or incomplete by themselves due to the fact that Wikipedia pages are often sparsely linked. In this paper, we present a simple but surprisingly powerful framework of sense disambigua tion using co-occurrences of Wikipedia links in the Wikipedia cor pus. We propose an iterative method to enrich the sparsely-linked articles by adding more links and then use the resulting link co occurrence matrix to disambiguate an input document by a sliding window algorithm. Our prototype system achieves over 90% pre cision and recall on news articles and compares favorably with all four state-of-the-art wikfication techniques.



Figure 1: Snapshot of a Wikipedia article “before” and “after” itera tion. The source and destination of newly added links are indicated at the bottom
Figure 2: Result of Wikifying Example 2 via Link Co-occurrence Concepts are enclosed in square brackets
Figure 3: Architecture of Wikification via Link Co-occurrence