Entity linking is a term used to describe the automated process, carried out by a computer, of identifying objects or concepts mentioned in a body of text. Take the following text, for example:
It’s difficult to remember a time when I wasn’t conducting literature searches and looking for research gaps to fill.
Concepts mentioned in this text include literatureand gaps. Each could refer to several different entities. Literature could represent the concept of writing as an art form, written work in general, or specifically academic literature. Because it is mentioned here in the context of a literature search however, it is likely to refer to academic literature. Gaps may be physical spaces between objects or conceptual breaks in continuity, but since they are mentioned in this text as research gaps, we can infer that these gaps are conceptual.
When a computer carries out the task of entity linking, it uses the context in which an entity is mentioned to identify which specific entity the text refers to. It does this by referring to a knowledge base, such as Wikipedia. If you haven’t heard of entity linking before, you may have seen it referred to by one of its other names: named entity linking, named entity disambiguation, named entity recognition and disambiguation, or named entity normalization.
Examples of the use if entity linking to assist in comparing groups of literature
1. Comparing perspectives and attitudes
In a recent study of perspectives on smart cities, Marrone & Hammerle (2018) compared topics across news media, trade publications, academic articles and government reports. This allowed them to compare sources to which citizens, businesses, research organisations and governments were exposed, thus gaining insight into the attitudes and perspectives of these groups. The comparison was carried out using a entity linker, TAGME, which allowed search strings which referred to the same entity to be merged.
2. Comparing practitioner and academic literature
In a second study by the same authors (Marrone & Hammerle, 2017), misalignment between practitioner and academic literatures was examined, again using an entity linker. Topics were compared across the two groups of literature, focusing on those which were salient in practitioner literature. This facilitated identification of areas on interest to practitioners which are not discussed regularly in academic literature. In short, it elicited research gaps – areas where research is needed by practitioners or is likely to be relevant to practice.
How entity linking is done
According to Piccinno and Ferragina (2014), the entity linking process, as carried out by the tool, TAGME, may be divided into three stages: spotting, disambiguation, and pruning.
- Spotting involves scanning of the text for meaningful sequences to produce a set of possible mentions (such as literature searchesin the example text given above). The SEA then retrieves a list of candidate entities from its knowledge base for each mention. This list will contain all the possible meanings that it can associate with the mention (such as literature as art, as all writing or as academic literature).
- Disambiguation then takes place, where the SEA connects a score with each candidate entity in the list, by modelling how strongly the entity correlates with the mention in its context. The connections with the highest scores become the candidate annotation (in the case of the mention, literature, in the example text, the candidate annotation could be academic literature).
- Pruning is the final stage, in which the SEA decides if it will discard a candidate annotation based on the other annotations that it has made to the text. This decision will therefore depend on whether the annotation makes sense given the overall context of the text.
By removing ambiguities, entity linking can improve the performance of your data analysis. As an automated process, it prevents the introduction of bias, which occurs when we manually code text.
Marrone, M. & Hammerle, M. (2017) Relevant research areas in IT Service Management: An examination of academic and practitioner literatures. Communications of the Association for Information Systems: Vol. 41 , Article 23. Available at: http://aisel.aisnet.org/cais/vol41/iss1/23
Marrone, M. & Hammerle, M. (2018) Smart Cities: A review and analysis of stakeholders’ literature. Business and Information Systems EngineeringAvailable at: https://doi.org/10.1007/s12599-018-0535-3
Piccinno, F., & Ferragina, P. (2014). From TagME to WAT: A new entity annotator. In Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation(pp. 55-62).