Entity linking Systems for Literature Reviews

Let’s face it. In busy academic lives there is hardly any time to do some deep reading, let alone stay up to date with everything that is published in your area in real time. Yet, staying up to date with latest knowledge and reviewing literature regularly is our bread and butter as academics.

In addition, published literature reviews help establish your expertise of a particular area. Researchers increasingly automate the coding process in literature reviews and accelerate the literature review process by using computer-assisted tools like Leximancer, topic modelling, Bibliometrix, R packages, NVivo, etc.

However, existing approaches for coding textual data do not account for lexical ambiguity; that is, instances in which individual words have multiple meanings.

To counter this, we developed a method to conduct rapid and comprehensive analyses of diverse literature types by using entity linking in literature reviews. We present a new literature review framework that embeds entity linking.

See the framework step by step below:

In the same paper, we present an example where we apply the framework to review the literature on digital disruption and digital transformation.

On how to adapt the framework to your needs, see the full paper:

Marrone, M., Lemke, S., Kolbe, L.M. (2022), Entity linking Systems for Literature Reviews, Scientometrics. Forthcoming.,

Trends in FinTech Research and Practice: a systematic review

Many industry sectors have experienced significant disruption in recent years through the introduction of new financial technology (or FinTech), including process automation in financial services and the adoption of cryptocurrencies. From the first telegraph cable in 1866 to blockchain in 2009, the evolution of financial technologies has always been aligned with innovations in information systems (IS).

How do FinTech and Information Systems relate to each other? Where are the crossovers, where do they intersect, where do they diverge?
This question drove me and my colleagues to conduct a systematic literature review and to compare academic with practitioner literature.

Findings from our review show that the practitioner-oriented literature foreshadowed the rise of FinTech by extensively reporting on algorithm-based and electronic trading (2009 onwards), followed by reporting on FinTech start-ups and funding successes (2014 onwards).

The practitioner literature subsequently reported on alternative finance models, the introduction of cryptocurrencies, and risks and regulatory issues. Academic literature on FinTech began to rise from 2014 onwards, focusing initially on the development of FinTech in the aftermath of the 2007-2008 global financial crisis.

Research attention subsequently shifted to FinTech innovations (alternative finance, cryptocurrency and blockchain, machine-based methods for financial analysis and forecasting, including artificial intelligence), as well as risk and regulatory issues.

IS work on FinTech started to emerge from 2015 onwards, initially focusing on mobile payment systems and peer-to-peer lending. However, the body of work at the intersection of FinTech and IS is still small.

Changes in FinTech literature over time

Our review sheds light on several opportunities for future research, including financial inclusion, the impacts arising from COVID-19, and the emergence of new business models, such as Banking as a Service (BaaS).

Full paper reference: 

Cai, C., Marrone, M., & Linnenluecke, M. (2022). Trends in FinTech Research and Practice: Examining the Intersection with the Information Systems Field. Communications of the Association for Information Systemshttps://www.researchgate.net/publication/359107231_Trends_in_FinTech_Research_and_Practice_Examining_the_Intersection_with_the_Information_Systems_Field

Developing interdisciplinary research maps from business/management and the environmental sciences

Summary of “Interdisciplinary Research Maps: A new technique for visualizing research topics”(Marrone & Linnenluecke, 2020)

Interdisciplinary research is challenging, in part due to the sheer magnitude of knowledge embedded within disciplines, and also to the lack of a common shared understanding across them.

To bridge the gap in understanding between the disciplines of business/management and the environmental sciences, Marrone & Linnenluecke (2020) developed a ‘map’ of topics, concepts, and ideas discussed in top publications in these fields of research.

Developing the map

  • Data for the study was sourced by selecting articles published since 2011 in the top four journals by impact factor in each field, through the Scopus database. The abstracts, titles and publication years of 4,827 environmental sciences articles and 2,671 business and management articles were downloaded.
  • These data were exported to two separate Comma Separated Values (CSV) files, one for each of the areas of interest. The titles of the publications were then merged with their respective abstracts and the files were analysed using TAGME entity linking tool to compile a list of all possible topics from the text in the abstracts and titles.
  • The researchers ‘cleaned’ the results of the analysis by deleting topics that made little meaningful sense, given the context in which they were used. After cleaning the results, 7,915 topics were retained in the environmental sciences articles, and 4,293 in business/management articles.
  • A map was created (see figure below) to show the frequency with which topics are mentioned in each field.

What the map tells us

1.     Topics that are frequently identified in one literature, but not the other

Some topics are represented almost exclusively in the environmental sciences articles, many of them linked to concerns about climate change. Meanwhile, in the business/management journals, the topics which arise most frequently are related to firm structure and expansion. There is an opportunity here, for future research to connect topics across these two fields. For example, further interdisciplinary research could serve to explore the impacts of climate change on business and management decisions, such as asset valuations and investments.

2.     Topics that are frequently associated with both literatures

Several topics are common to both sets of literature. For example, “decision-making” is a frequently discussed topic in both business/management and the environmental sciences. The topic of “China” also arises frequently in both disciplines, but in different ways. Articles in the business/management journals address management challenges and economic opportunities in China, while the those in the environmental sciences journals address the role of emerging economies like China’s in climate adaptation and mitigation efforts. Such areas of topic convergence may provide fruitful avenues for future interdisciplinary research.

Figure 1: Topics are represented as dots; those associated with the business/management literature are coloured red, and those associated with the environmental sciences blue.

Reference:

Marrone, M. and Linnenluecke, M.K., 2020. Interdisciplinary Research Maps: A new technique for visualizing research topics. Plos one, 15(11), p.e0242283.

Intelligent Machines: Will AI replace academic researchers?

According to a recent article in Forbes magazine,we can expect to see most people collaborating on their work with an AI (Artificial Intelligence) counterpart by 2030. Will these ‘counterparts’ simply enhance what we do, or will they ultimately come to replace human beings altogether?

Transhumanism

Transhumanists examine the possibilities presented by the interactions between people and technology. They tend to welcome technological developments as means of enhancing our intellectual, physical, and psychological capacities (Bostrom, 2003; figure 1).

Figure 1. Adapted from Bostrom (2003, p.12)

An ethical minefield?

It’s not all good news, however. The concept of merging people with technology throws up a host of ethical considerations.

  • Who will access technological advantages? Those who can afford to pay for AI implants, wearable technology and tools? Will a group of enhanced humans emerge at the expense of everyone else?
  • If a person performs a criminal act, while coupled with an AI element, who bears responsibility? Should it be the person linked to the AI, the AI itself, or the person who developed or programmed the AI?

How close are we to developing human-like intelligence?

The idea of people integrating with computers is far from new. The famous computer pioneer J.C.R. Licklider coined the term ¨man-computer symbiosis” in the 1960s. Licklider predicted that computing technology would eventually advance to the extent that you would be able “to think in interaction with a computer in the same way that you think with a colleague whose competence supplements your own.” (Licklider, quoted in Lesh et al., 2004). Almost 60 years on, are we there yet?

The sort of technology which could result in human-like intelligence and the sorts of ethical conundrums outlined above, is still some way off, yet there are some very interesting possibilities available to us. Researchers, for example, may approach a form of symbiosis with computers. AI-based search tools are developing, which can help us navigate the broad swathes of literature available to us, speeding up our engagement with information (Extance, 2018). This is where we believe ResGap comes in.

Harnessing the potential of AI

Technology such as that available through ResGap can enhance your research performance, by allowing you to quickly find research gaps, understand how your research field has evolved over time, and identify hot and cold topics. By letting ResGap do the filtering for you, pointing you to where you will most productively direct your attention, you are able to engage with an astonishing breadth of information.

The kind of technology offered by ResGap doesn’t replace you as a researcher. You can work withResGap, but not yet in Licklider’s “Man-computer symbiosis”. Rather, you are still in the driving seat, harnessing the power of an extremely useful tool. While you won’t be able to access the whole space available to posthumans in Bostrom’s model (above: The Space of Possible Modes of Being), but you’ll certainly be pushing at the envelope of what is accessible to humans. It’s not a bad place to start.

Bostrom, N. (2003) Introduction to transhumanism. Presented at the Intensive Seminar on Transhumanism, Yale University, 26 June 2003. Available at:  https://www.slideshare.net/danila/introduction-to-transhumanism?from_action=save

Extance, A. (2018) How AI technology can tame the scientific literature. Nature Available at: https://www.nature.com/articles/d41586-018-06617-5

Lesh, N., Marks, J., Rich, C., Sidner, C.L. (2004) Man-Computer Symbiosis ´Revisited: Achieving Natural Communication and Collaboration with Computers. IEICE Transactions on Information and Systems

A tale of two research fields: from Team Mental Model theory to research model creation

Team Mental Model Theory

Team mental models are organized mental representations of the team’s relevant environment, shared across team members (Klimoski & Mohammed, 1994). They emerge because individual team members tend to categorise elements of their environments, such as tasks, situations, response patterns or relationships. These categorisations then become shared over time, thanks to communication within the team.

The extent to which mental categorisations are shared across team members can vary widely. They may be highly consistent with one another or completely incongruent. Importantly, when researchers talk about shared mental models, they do not suggest that an identical set of categorisations is held by every member of the team. Rather it is suggested that there exists some degree of consistency or convergence between individuals’ mental models (Kang, 2006; Rentsch, 2008).

Teams with higher levels of convergence of mental models can perform better. A shared mental model enables team members to anticipate the needs and actions of others in the team (Cannon-Bowers, 1993). Thus, the team can coordinate its actions, enhancing its decision-making capacity (Stout, 1999).

From TMM to research models

If we conceptualise research communities as teams, we can begin to see how TMM theory applies to researchers seeking to identify what is researched by individuals members of the team. A degree of sharedness will exist across the mental models held by researchers in each field. They are likely to use the same terms in reference to the same concepts. However, mental models which are shared within one field may diverge significantly from those in a neighbouring field. An understanding of the ways in which different groups think about a topic has vast potential in the pursuit of interdisciplinary research.

Our map, generated by resgap.com help illustrate the idea that mental models lie on a continuum, rather than as a dichotomy (say, very infrequently discussed in one literature to smilingly identical mentions in both literatures). As an example, we could study how the term “techno-stress” is studied by Psychology and Information Systems researchers. Our tool would help uncover what are the topics that are frequently discussed in both fields, as well as topics that are frequently discussed in one field, but unfrequently discussed in the other.

By mapping what is discussed among researchers in separate fields, we may increase the effectiveness of our research. It is possible to develop a systemic understanding of who is doing what, increasing the coordination of our actions with those of other members of the research community. We become able to anticipate the needs and actions of our fellow researchers.

The unique contribution of resgap.com lies in its use of entity linking. Because resgap.com does not rely on keyword identification alone, it identifies discussion of concepts, not just the usage of similar terms. This helps overcome the challenge presented differences in the use of terminology. Different terms may be used in different fields to describe the same concept, or the same terms may be used in different fields to describe completely different concepts. However, resgap.com can cut through the confusion, allowing us to see where fields overlap, and where relevant and valuable research may be directed.

References

Cannon-Bowers, J. A., Salas, E., & Converse, S. (1993). Shared mental models in expert team decision making. In N. J. Castellan, Jr. (Ed.), Individual and group decision making: Current issues(pp. 221-246). Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.

Kang, H.-R., Yang, H.-D., & Rowley, C. (2006). Factors in team effectiveness: Cognitive and demographic similarities of software development team members. Human Relations, 59(12), 1681–1710. https://doi.org/10.1177/0018726706072891

Klimoski, R., & Mohammed, S. (1994). Team Mental Model: Construct or Metaphor? Journal of Management, 20(2), 403–437. https://doi.org/10.1177/014920639402000206

Marrone, M, Hammerle, M (2018) Smart Cities: A Review and Analysis of Stakeholders’ Literature. Bus Inf Syst Eng60: 197. https://doi.org/10.1007/s12599-018-0535-3

Rentsch, J. R., Small, E. E. & Hanges, P. J. (2008) Cognitions in organizations and teams: What is the meaning of cognitive similarity? In D. B. Smith (Ed.),LEA’s organization and management series. The people make the place: Dynamic linkages between individuals and organizations (pp. 127-155). New York, NY, : Taylor & Francis Group/Lawrence Erlbaum Associates.

Stout,R.J., Cannon-Bowers, J. A., Salas, E., & Milanovich, D. M. (1999). Planning, Shared Mental Models, and Coordinated Performance: An Empirical Link Is Established. Human Factors, 41(1), 61–71. https://doi.org/10.1518/001872099779577273

Finding research gaps in IT Service Management (and in ITIL research)

Summary of “Relevant Research Areas in IT Service Management: An Examination of Academic and Practitioner Literatures” (Marrone & Hammerle, 2017)

IT Service Management is a field of IS research which is widely used and popular in practice (Iden & Eikebrokk, 2013; Marrone & Kolbe, 2011). This study compares business and academic literature in the field. Since the behaviour of practitioners is influenced by the (business) literature they read (Carroll & McCombs, 2003), the comparison uncovers aspects of professional behaviour or practices not explored by academic research, termed “practice-oriented research gaps” (Müller-Bloch and Kranz, 2015).

Academic literature used in the study comprised abstracts of papers from selected information systems publications, identified by database searches using keywords related to ITSM. Practitioner literature was identified through searches of selected popular press and specialist IS publications. Both sets of literature were published during a time span of 16 years (from 1 January, 2000, to 1 May, 2016). A semantic entity annotator(the technology used by resgap.com) was employed to identify topics in the two groups of identified literature, then keyword analysis was applied to identify statistically significant topics.

For each set of literature, eight of the 10 most frequently used topics also appeared regularly in the other set, suggesting that academics and practitioners view many of the same topics as highly important. However, several of the most frequently used topics differed, suggesting a degree of misalignment. Practitioner literature tended to focus on topics associated with the physical implementation and application of ITSM, while, academic literature highlighted the ideaof implementation.

Research gaps identified

The study uncovered four broad practice-oriented research gaps. For each of these, three examples of possible research questions are provided, using a taxonomy proposed by Jarvinen (2000). They are categorised as conceptual-analytical, theory creating or testing, and artefact building or evaluating.

1.    Combining Frameworks

The combination of different frameworks by organisations, to support the use of ITSM, is frequently discussed in practitioner literature. In contrast, most academic papers only consider the use of one framework at a time. Some papers do present evidence that firms combine two frameworks: CoBIT (Control Objectives for Information and Related Technologies) and Information Technology Infrastructure Library (Cater-Steel, Tan, & Toleman, 2006; de Espindola, Luciano, & Audy, 2009; Lapão, 2011; Vogt, Küller, Hertweck, & Hales, 2011). However, practitioners discuss a wide range of frameworks that organizations use simultaneously.

Potential research questions:

  • How does co-implementing frameworks help strengthen areas that a single framework does not cover, such as business-IT alignment, knowledge management, organizational learning, outsourcing, and competitive advantage? (Conceptual analytical)
  • What theory can best reflect why different organizations consider strategic and technical factors when choosing to co-implement ITSM frameworks? (Theory creating)
  • Can one develop a model that indicates the most appropriate mix of ITSM frameworks based on an organization’s specific requirements? (Artefacts building)

2.    Infrastructure

Little academic research has addressed how improvements in infrastructure help organizations achieve beneficial outcomes of implementing ITSM. Further, research has not described the impact that implementing ITSM has on an organization’s infrastructure or cloud computing.

Potential research questions:

  • Which ITSM processes, if any, contribute to the effective management of cloud services? (Conceptual analytical)
  • As organizations increase their reliance on cloud service providers, what is the impact on the benefits that they receive when implementing ITSM? (Artefacts evaluation)
  • How do ITSM frameworks help organizations implement cloud services? (Conceptual analytical)

3.    Software and gamification

The practitioner literature often warns that IT departments implementing ITSM may prioritise software tools over processes, to their disadvantage. It further proposes that gamified tools may offer significant benefits in implementation. However, the advantages and difficulties associated with relying on tools when implementing ITSM are not discussed in the academic literature, nor are the effects of gamification examined.

Potential research questions:

  • How can an organization best use tools to support the implementation of ITSM? (Conceptual analytical)
  • Which kind of model could explain the benefits received due to the use of ITSM tools? (Theory creating)
  • How effectively does gamification help train staff in the ITSM processes—specifically as it concerns content retention and engagement and staff retention? (Artefacts evaluating)

4.    Regulation compliance

Practitioner literature suggests that several organizations implemented ITSM, motivated by the need to comply with regulations, such as the  Sarbanes-Oxley Act (SOX) introduced in in 2002. The impact of regulation on the implementation of ITSM is less evident in the academic literature.

Potential research questions:

  • What is the relationship between the types of regulations introduced and the ITSM organizations implement? (Conceptual analytical)
  • Which kind of model could explain how organizations implement ITSM due to the introduction of different regulations compared to other rationales for adoption? (Theory creating)
  • How effectively did SOX encourage organizations to pay closer attention to their IT governance?(Artefacts evaluating)

References

Carroll, C. E., & McCombs, M. (2003). Agenda-setting effects of business news on the public’s images and opinions about major corporations. Corporate Reputation Review, 6(1), 36-46.

Cater-Steel, A., Tan, W.-G., & Toleman, M. (2006). Challenge of adopting multiple process improvement frameworks. In Proceedings of the European Conference on Information Systems.

de Espindola, R. S., Luciano, E. M., & Audy, J. L. N. (2009). An overview of the adoption of IT governance models and software process quality instruments at Brazil—preliminary results of a survey. In Proceedings of the 42nd Hawaii International Conference on System Sciences.

Iden, J., & Eikebrokk, T. R. (2013). Implementing IT service management: A systematic literature review. International Journal of Information Management, 33(3), 512-523.

Jarvinen, P. (2000). Research questions guiding selection of an appropriate research method. In Proceedings of the European Conference on Information Systems.

Lapão, L. V. (2011). Organizational challenges and barriers to implementing IT governance in a hospital. Electronic Journal of Information Systems Evaluation, 14(1), 37-45.

Marrone, M., & Kolbe, L. M. (2011). Uncovering ITIL claims: IT executives’ perception on benefits and Business-IT alignment. Information Systems and E-Business Management, 9(3), 363-380

Marrone, M., & Hammerle, M. (2017) Relevant Research Areas in IT Service Management: An Examination of Academic and Practitioner Literatures. Communications of the Association for Information Systems 41(1), 517-543

Müller-Bloch, C., & Kranz, J. (2015). A framework for rigorously identifying research gaps in qualitative literature reviews. In Proceedings of the International Conference on Information Systems.

Vogt, M., Küller, P., Hertweck, D., & Hales, K. (2011). Adapting IT governance frameworks using domain specific requirements methods: Examples from small & medium enterprises and emergency management. In Proceedings of the Americas Conference on Information Systems.

Smart Cities: a literature review (in plain English!)

Summary of “Smart Cities: A Review and Analysis of Stakeholders’ Literature” (Marrone and Hammerle, 2018)

There has been increasing interest in recent years in the use of digital technology help deal with the “wicked problems” of environmental degradation and poverty in towns and cities. Cities where attempts are made to achieve this are known as “smart cities”.

This literature review compared the views of different groups of people on the idea of “smart cities”, seeking to compare diverse perspectives by examining the topics discussed in different categories of publication. Since what people read, hear and see will influence and reflect their views, analysis of the publications they are exposed to can give us an insight into those views (McCombs and Shaw 1972; Carroll and McCombs 2003). In this study, for example, the views of those who live in towns and cities were considered by reviewing news media, while the views of those involved in research organisations were analysed using academic publications (see table).

Group Literature category
Citizens News media
People involved in business Trade publications
People involved in research Academic publications
People involved in government Government reports

The topics arising in different categories of literature were compared using resgap.com technology. Key topics forallcategories of literature were:

  • Internet of Things
  • Technology
  • Infrastructure
  • Smart grid
  • Urban planning
  • Energy
  • Transport
  • Innovation
  • Sustainability

Key topics which arose frequently in news media but less so in other categories of literature, suggesting that citizens were concerned about them but that other groups did not consider them to be of such high importance, were:

  • Autonomous car
  • Hackers
  • Start-up company

Further analysis of these topics revealed some interesting differences between the ways in which they were discussed in news media and in other categories of literature. In the case of the “Autonomous car” topic, all categories of literature addressed the benefits of autonomous cars. However, while other literature types focused more on how a reliance on autonomous vehicles might come about, news media tended to present this transportation method as potentially disruptive, considering the risks associated with it. News media was also the only literature category to focus on how peoplemight be involved in the use of autonomous cars.

On the topic of “Hackers”, news media presented more detail regarding the intricacies of hacking, compared with other types of literature, and suggested reasons why hackers have not yet become widespread in smart cities. News media expressed the importance of preventing hacking to protect the people who use smart city services and emphasised how lack of action on the part of companies and governments could leave smart city services open to attack from hackers.

Regarding “Start-up company”, although all categories of literature highlighted the importance of start-ups in the development of smart cities and of fostering connections between different groups to enable start-ups to be successful, news media alone specifically highlighted how innovations brought about by start-ups may help to serve people and impact their everyday lives. Other literature types were more focused on the opportunities for economic growth and profits brought about by developments in the smart city space.

Existing academic research suggests that the perspectives of citizens are often ignored in the development of smart cities (Hollands 2015). The results of this review suggest that citizens are under-represented, rather than being completely ignored. The research gaps identified here are in  person-centred topics, such as privacy, which important to citizens. These should be addressed by practitioners involved in developing and marketing smart city services and by government and academic bodies involved in producing smart city policies.

References:

Carroll CE, McCombs M (2003) Agenda-setting effects of business news on the public’s images and opinions about major corporations. Corp Reput Rev6:36–46. https://doi.org/10.1057/ palgrave.crr.1540188

Hollands RG (2015) Critical interventions into the corporate smart city. Camb J Reg Econ Soc8:61–77

Marrone, M, Hammerle, M (2018) Smart Cities: A Review and Analysis of Stakeholders’ Literature. Bus Inf Syst Eng 60: 197. https://doi.org/10.1007/s12599-018-0535-3

McCombs ME, Shaw DL (1972) The agenda-setting function of mass media. Public Opin Q36:176–187

Literature reviews – Keyword searches don’t work

The trouble with keyword searches

Keywords are expected to help us identify relevant papers, when we conduct literature searches. Unfortunately, they’re not as effective at doing this as we might hope, since they aren’t always representative of the content of an article.

Subjectivity in keyword  selection

Keywords may be selected by the author of a paper, in which case are likely to represent the themes which the author deems most important in their article (Névéol et al. 2010).  However, these may not necessarily correspond with the dominant themes found in the paper itself.  When authors do not provide keywords to accompany their own publications, they may be selected by editors, who then add their subjective interpretations of the text (Gerdsri et al., 2013; p.420).

Inconsistent terminology

Terminology used in keywords may vary according to preference, so that different terms are used by different authors  to represent the same concept. Where standardised indexing terms are used, such as the Medical Subject Headings (MeSH®) in the bibliographic database, MEDLINE®, these can be substantially different from the author-selected keywords (Figure 1).

Author keywords MEDLINE Indexing Terms
Decision-making

Rural health services

Interhospital transport

Survival analysis

Adult

Aged

Cohort studies

Decision making

Diagnosis-related groups

Female

Health services accessibility

Hospital mortality

Hospitals, community/organisation & administration

Hospitals, rural/organisation & administration

Humans

Intensive care units/utilisation

Length of stay

Male

Middle aged

New Hampshire/epidemiology

Outcome assessment (health care)

Patient transfer/statistics & numerical data

Prospective studies

Survival analysis

Figure 1: Author keywords and MeSH indexing terms assigned to a sample article indexed in MEDLINE (Névéol et al. 2010)

Limited number of keywords

Author-selected keywords are also usually limited in number, typically to between 4 and 8 per article. This small number of keywords is unlikely to provide a comprehensive overview of the topics or themes in an article. Indeed, indexers assigned an average of 13.0 (+/-11.9) terms to papers in a collection of 14,398 open-access articles in PubMed Central®, suggesting that a greater number of terms is required to capture the thematic content of most papers.

An alternative approach

Fortunately, there’s a better way to enhance literature searches. Entity linking allows us to consider the context of words as well as the relationships between them. By linking words which carry the same meaning to an entity, we can extract entities from text, rather than relying on subjectively assigned keywords. Entities represent the themes contained in the text, removing the ambiguity associated with varying use of terminology.

References

Névéol, A., Doğan, R. I., & Lu, Z. (2010). Author Keywords in Biomedical Journal Articles. AMIA Annual Symposium Proceedings, 2010, 537–541.

Gerdsri, N., Kongthon, A. & Vatananan, R. S. (2013) Mapping the knowledge evolution and professional network in the field of technology roadmapping: a bibliometric analysis. Technology Analysis & Strategic Management, 25(4), 403-422.

Can entity linking be used for literature reviews?

Entity linking is a term used to describe the automated process, carried out by a computer, of identifying objects or concepts mentioned in a body of text. Take the following text, for example:

It’s difficult to remember a time when I wasn’t conducting literature searches and looking for research gaps to fill.

Concepts mentioned in this text include literatureand gaps. Each could refer to several different entities. Literature could represent the concept of writing as an art form, written work in general, or specifically academic literature. Because it is mentioned here in the context of a literature search however, it is likely to refer to academic literature. Gaps may be physical spaces between objects or conceptual breaks in continuity, but since they are mentioned in this text as research gaps, we can infer that these gaps are conceptual.

When a computer carries out the task of entity linking, it uses the context in which an entity is mentioned to identify which specific entity the text refers to. It does this by referring to a knowledge base, such as Wikipedia. If you haven’t heard of entity linking before, you may have seen it referred to by one of its other names: named entity linking, named entity disambiguation, named entity recognition and disambiguation, or named entity normalization.

Examples of the use if entity linking to assist in comparing groups of literature

1.    Comparing perspectives and attitudes

In a recent study of perspectives on smart cities, Marrone & Hammerle (2018) compared topics across news media, trade publications, academic articles and government reports. This allowed them to compare sources to which citizens, businesses, research organisations and governments were exposed, thus gaining insight into the attitudes and perspectives of these groups. The comparison was carried out using a entity linker, TAGME, which allowed search strings which referred to the same entity to be merged.

2.    Comparing practitioner and academic literature

In a second study by the same authors (Marrone & Hammerle, 2017), misalignment between practitioner and academic literatures was examined, again using an entity linker. Topics were compared across the two groups of literature, focusing on those which were salient in practitioner literature. This facilitated identification of areas on interest to practitioners which are not discussed regularly in academic literature. In short, it elicited research gaps – areas where research is needed by practitioners or is likely to be relevant to practice.

How entity linking is done

According to Piccinno and Ferragina (2014), the entity linking process, as carried out by the tool, TAGME, may be divided into three stages: spotting, disambiguation, and pruning.

  • Spotting involves scanning of the text for meaningful sequences to produce a set of possible mentions (such as literature searchesin the example text given above). The SEA then retrieves a list of candidate entities from its knowledge base for each mention. This list will contain all the possible meanings that it can associate with the mention (such as literature as art, as all writing or as academic literature).
  • Disambiguation then takes place, where the SEA connects a score with each candidate entity in the list, by modelling how strongly the entity correlates with the mention in its context. The connections with the highest scores become the candidate annotation (in the case of the mention, literature, in the example text, the candidate annotation could be academic literature).
  • Pruning is the final stage, in which the SEA decides if it will discard a candidate annotation based on the other annotations that it has made to the text. This decision will therefore depend on whether the annotation makes sense given the overall context of the text.

By removing ambiguities, entity linking can improve the performance of your data analysis. As an automated process, it prevents the introduction of bias, which occurs when we manually code text.

Sources:

Marrone, M. & Hammerle, M. (2017) Relevant research areas in IT Service Management: An examination of academic and practitioner literatures. Communications of the Association for Information Systems: Vol. 41 , Article 23. Available at: http://aisel.aisnet.org/cais/vol41/iss1/23

Marrone, M. & Hammerle, M. (2018) Smart Cities: A review and analysis of stakeholders’ literature. Business and Information Systems EngineeringAvailable at: https://doi.org/10.1007/s12599-018-0535-3

Piccinno, F., & Ferragina, P. (2014). From TagME to WAT: A new entity annotator. In Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation(pp. 55-62).