Entity linking Systems for Literature Reviews

Let’s face it. In busy academic lives there is hardly any time to do some deep reading, let alone stay up to date with everything that is published in your area in real time. Yet, staying up to date with latest knowledge and reviewing literature regularly is our bread and butter as academics.

In addition, published literature reviews help establish your expertise of a particular area. Researchers increasingly automate the coding process in literature reviews and accelerate the literature review process by using computer-assisted tools like Leximancer, topic modelling, Bibliometrix, R packages, NVivo, etc.

However, existing approaches for coding textual data do not account for lexical ambiguity; that is, instances in which individual words have multiple meanings.

To counter this, we developed a method to conduct rapid and comprehensive analyses of diverse literature types by using entity linking in literature reviews. We present a new literature review framework that embeds entity linking.

See the framework step by step below:

In the same paper, we present an example where we apply the framework to review the literature on digital disruption and digital transformation.

On how to adapt the framework to your needs, see the full paper:

Marrone, M., Lemke, S., Kolbe, L.M. (2022), Entity linking Systems for Literature Reviews, Scientometrics. Forthcoming.,

Trends in FinTech Research and Practice: a systematic review

Many industry sectors have experienced significant disruption in recent years through the introduction of new financial technology (or FinTech), including process automation in financial services and the adoption of cryptocurrencies. From the first telegraph cable in 1866 to blockchain in 2009, the evolution of financial technologies has always been aligned with innovations in information systems (IS).

How do FinTech and Information Systems relate to each other? Where are the crossovers, where do they intersect, where do they diverge?
This question drove me and my colleagues to conduct a systematic literature review and to compare academic with practitioner literature.

Findings from our review show that the practitioner-oriented literature foreshadowed the rise of FinTech by extensively reporting on algorithm-based and electronic trading (2009 onwards), followed by reporting on FinTech start-ups and funding successes (2014 onwards).

The practitioner literature subsequently reported on alternative finance models, the introduction of cryptocurrencies, and risks and regulatory issues. Academic literature on FinTech began to rise from 2014 onwards, focusing initially on the development of FinTech in the aftermath of the 2007-2008 global financial crisis.

Research attention subsequently shifted to FinTech innovations (alternative finance, cryptocurrency and blockchain, machine-based methods for financial analysis and forecasting, including artificial intelligence), as well as risk and regulatory issues.

IS work on FinTech started to emerge from 2015 onwards, initially focusing on mobile payment systems and peer-to-peer lending. However, the body of work at the intersection of FinTech and IS is still small.

Changes in FinTech literature over time

Our review sheds light on several opportunities for future research, including financial inclusion, the impacts arising from COVID-19, and the emergence of new business models, such as Banking as a Service (BaaS).

Full paper reference: 

Cai, C., Marrone, M., & Linnenluecke, M. (2022). Trends in FinTech Research and Practice: Examining the Intersection with the Information Systems Field. Communications of the Association for Information Systemshttps://www.researchgate.net/publication/359107231_Trends_in_FinTech_Research_and_Practice_Examining_the_Intersection_with_the_Information_Systems_Field

Finding research gaps in IT Service Management (and in ITIL research)

Summary of “Relevant Research Areas in IT Service Management: An Examination of Academic and Practitioner Literatures” (Marrone & Hammerle, 2017)

IT Service Management is a field of IS research which is widely used and popular in practice (Iden & Eikebrokk, 2013; Marrone & Kolbe, 2011). This study compares business and academic literature in the field. Since the behaviour of practitioners is influenced by the (business) literature they read (Carroll & McCombs, 2003), the comparison uncovers aspects of professional behaviour or practices not explored by academic research, termed “practice-oriented research gaps” (Müller-Bloch and Kranz, 2015).

Academic literature used in the study comprised abstracts of papers from selected information systems publications, identified by database searches using keywords related to ITSM. Practitioner literature was identified through searches of selected popular press and specialist IS publications. Both sets of literature were published during a time span of 16 years (from 1 January, 2000, to 1 May, 2016). A semantic entity annotator(the technology used by resgap.com) was employed to identify topics in the two groups of identified literature, then keyword analysis was applied to identify statistically significant topics.

For each set of literature, eight of the 10 most frequently used topics also appeared regularly in the other set, suggesting that academics and practitioners view many of the same topics as highly important. However, several of the most frequently used topics differed, suggesting a degree of misalignment. Practitioner literature tended to focus on topics associated with the physical implementation and application of ITSM, while, academic literature highlighted the ideaof implementation.

Research gaps identified

The study uncovered four broad practice-oriented research gaps. For each of these, three examples of possible research questions are provided, using a taxonomy proposed by Jarvinen (2000). They are categorised as conceptual-analytical, theory creating or testing, and artefact building or evaluating.

1.    Combining Frameworks

The combination of different frameworks by organisations, to support the use of ITSM, is frequently discussed in practitioner literature. In contrast, most academic papers only consider the use of one framework at a time. Some papers do present evidence that firms combine two frameworks: CoBIT (Control Objectives for Information and Related Technologies) and Information Technology Infrastructure Library (Cater-Steel, Tan, & Toleman, 2006; de Espindola, Luciano, & Audy, 2009; Lapão, 2011; Vogt, Küller, Hertweck, & Hales, 2011). However, practitioners discuss a wide range of frameworks that organizations use simultaneously.

Potential research questions:

  • How does co-implementing frameworks help strengthen areas that a single framework does not cover, such as business-IT alignment, knowledge management, organizational learning, outsourcing, and competitive advantage? (Conceptual analytical)
  • What theory can best reflect why different organizations consider strategic and technical factors when choosing to co-implement ITSM frameworks? (Theory creating)
  • Can one develop a model that indicates the most appropriate mix of ITSM frameworks based on an organization’s specific requirements? (Artefacts building)

2.    Infrastructure

Little academic research has addressed how improvements in infrastructure help organizations achieve beneficial outcomes of implementing ITSM. Further, research has not described the impact that implementing ITSM has on an organization’s infrastructure or cloud computing.

Potential research questions:

  • Which ITSM processes, if any, contribute to the effective management of cloud services? (Conceptual analytical)
  • As organizations increase their reliance on cloud service providers, what is the impact on the benefits that they receive when implementing ITSM? (Artefacts evaluation)
  • How do ITSM frameworks help organizations implement cloud services? (Conceptual analytical)

3.    Software and gamification

The practitioner literature often warns that IT departments implementing ITSM may prioritise software tools over processes, to their disadvantage. It further proposes that gamified tools may offer significant benefits in implementation. However, the advantages and difficulties associated with relying on tools when implementing ITSM are not discussed in the academic literature, nor are the effects of gamification examined.

Potential research questions:

  • How can an organization best use tools to support the implementation of ITSM? (Conceptual analytical)
  • Which kind of model could explain the benefits received due to the use of ITSM tools? (Theory creating)
  • How effectively does gamification help train staff in the ITSM processes—specifically as it concerns content retention and engagement and staff retention? (Artefacts evaluating)

4.    Regulation compliance

Practitioner literature suggests that several organizations implemented ITSM, motivated by the need to comply with regulations, such as the  Sarbanes-Oxley Act (SOX) introduced in in 2002. The impact of regulation on the implementation of ITSM is less evident in the academic literature.

Potential research questions:

  • What is the relationship between the types of regulations introduced and the ITSM organizations implement? (Conceptual analytical)
  • Which kind of model could explain how organizations implement ITSM due to the introduction of different regulations compared to other rationales for adoption? (Theory creating)
  • How effectively did SOX encourage organizations to pay closer attention to their IT governance?(Artefacts evaluating)

References

Carroll, C. E., & McCombs, M. (2003). Agenda-setting effects of business news on the public’s images and opinions about major corporations. Corporate Reputation Review, 6(1), 36-46.

Cater-Steel, A., Tan, W.-G., & Toleman, M. (2006). Challenge of adopting multiple process improvement frameworks. In Proceedings of the European Conference on Information Systems.

de Espindola, R. S., Luciano, E. M., & Audy, J. L. N. (2009). An overview of the adoption of IT governance models and software process quality instruments at Brazil—preliminary results of a survey. In Proceedings of the 42nd Hawaii International Conference on System Sciences.

Iden, J., & Eikebrokk, T. R. (2013). Implementing IT service management: A systematic literature review. International Journal of Information Management, 33(3), 512-523.

Jarvinen, P. (2000). Research questions guiding selection of an appropriate research method. In Proceedings of the European Conference on Information Systems.

Lapão, L. V. (2011). Organizational challenges and barriers to implementing IT governance in a hospital. Electronic Journal of Information Systems Evaluation, 14(1), 37-45.

Marrone, M., & Kolbe, L. M. (2011). Uncovering ITIL claims: IT executives’ perception on benefits and Business-IT alignment. Information Systems and E-Business Management, 9(3), 363-380

Marrone, M., & Hammerle, M. (2017) Relevant Research Areas in IT Service Management: An Examination of Academic and Practitioner Literatures. Communications of the Association for Information Systems 41(1), 517-543

Müller-Bloch, C., & Kranz, J. (2015). A framework for rigorously identifying research gaps in qualitative literature reviews. In Proceedings of the International Conference on Information Systems.

Vogt, M., Küller, P., Hertweck, D., & Hales, K. (2011). Adapting IT governance frameworks using domain specific requirements methods: Examples from small & medium enterprises and emergency management. In Proceedings of the Americas Conference on Information Systems.

Smart Cities: a literature review (in plain English!)

Summary of “Smart Cities: A Review and Analysis of Stakeholders’ Literature” (Marrone and Hammerle, 2018)

There has been increasing interest in recent years in the use of digital technology help deal with the “wicked problems” of environmental degradation and poverty in towns and cities. Cities where attempts are made to achieve this are known as “smart cities”.

This literature review compared the views of different groups of people on the idea of “smart cities”, seeking to compare diverse perspectives by examining the topics discussed in different categories of publication. Since what people read, hear and see will influence and reflect their views, analysis of the publications they are exposed to can give us an insight into those views (McCombs and Shaw 1972; Carroll and McCombs 2003). In this study, for example, the views of those who live in towns and cities were considered by reviewing news media, while the views of those involved in research organisations were analysed using academic publications (see table).

Group Literature category
Citizens News media
People involved in business Trade publications
People involved in research Academic publications
People involved in government Government reports

The topics arising in different categories of literature were compared using resgap.com technology. Key topics forallcategories of literature were:

  • Internet of Things
  • Technology
  • Infrastructure
  • Smart grid
  • Urban planning
  • Energy
  • Transport
  • Innovation
  • Sustainability

Key topics which arose frequently in news media but less so in other categories of literature, suggesting that citizens were concerned about them but that other groups did not consider them to be of such high importance, were:

  • Autonomous car
  • Hackers
  • Start-up company

Further analysis of these topics revealed some interesting differences between the ways in which they were discussed in news media and in other categories of literature. In the case of the “Autonomous car” topic, all categories of literature addressed the benefits of autonomous cars. However, while other literature types focused more on how a reliance on autonomous vehicles might come about, news media tended to present this transportation method as potentially disruptive, considering the risks associated with it. News media was also the only literature category to focus on how peoplemight be involved in the use of autonomous cars.

On the topic of “Hackers”, news media presented more detail regarding the intricacies of hacking, compared with other types of literature, and suggested reasons why hackers have not yet become widespread in smart cities. News media expressed the importance of preventing hacking to protect the people who use smart city services and emphasised how lack of action on the part of companies and governments could leave smart city services open to attack from hackers.

Regarding “Start-up company”, although all categories of literature highlighted the importance of start-ups in the development of smart cities and of fostering connections between different groups to enable start-ups to be successful, news media alone specifically highlighted how innovations brought about by start-ups may help to serve people and impact their everyday lives. Other literature types were more focused on the opportunities for economic growth and profits brought about by developments in the smart city space.

Existing academic research suggests that the perspectives of citizens are often ignored in the development of smart cities (Hollands 2015). The results of this review suggest that citizens are under-represented, rather than being completely ignored. The research gaps identified here are in  person-centred topics, such as privacy, which important to citizens. These should be addressed by practitioners involved in developing and marketing smart city services and by government and academic bodies involved in producing smart city policies.

References:

Carroll CE, McCombs M (2003) Agenda-setting effects of business news on the public’s images and opinions about major corporations. Corp Reput Rev6:36–46. https://doi.org/10.1057/ palgrave.crr.1540188

Hollands RG (2015) Critical interventions into the corporate smart city. Camb J Reg Econ Soc8:61–77

Marrone, M, Hammerle, M (2018) Smart Cities: A Review and Analysis of Stakeholders’ Literature. Bus Inf Syst Eng 60: 197. https://doi.org/10.1007/s12599-018-0535-3

McCombs ME, Shaw DL (1972) The agenda-setting function of mass media. Public Opin Q36:176–187

Literature reviews – Keyword searches don’t work

The trouble with keyword searches

Keywords are expected to help us identify relevant papers, when we conduct literature searches. Unfortunately, they’re not as effective at doing this as we might hope, since they aren’t always representative of the content of an article.

Subjectivity in keyword  selection

Keywords may be selected by the author of a paper, in which case are likely to represent the themes which the author deems most important in their article (Névéol et al. 2010).  However, these may not necessarily correspond with the dominant themes found in the paper itself.  When authors do not provide keywords to accompany their own publications, they may be selected by editors, who then add their subjective interpretations of the text (Gerdsri et al., 2013; p.420).

Inconsistent terminology

Terminology used in keywords may vary according to preference, so that different terms are used by different authors  to represent the same concept. Where standardised indexing terms are used, such as the Medical Subject Headings (MeSH®) in the bibliographic database, MEDLINE®, these can be substantially different from the author-selected keywords (Figure 1).

Author keywords MEDLINE Indexing Terms
Decision-making

Rural health services

Interhospital transport

Survival analysis

Adult

Aged

Cohort studies

Decision making

Diagnosis-related groups

Female

Health services accessibility

Hospital mortality

Hospitals, community/organisation & administration

Hospitals, rural/organisation & administration

Humans

Intensive care units/utilisation

Length of stay

Male

Middle aged

New Hampshire/epidemiology

Outcome assessment (health care)

Patient transfer/statistics & numerical data

Prospective studies

Survival analysis

Figure 1: Author keywords and MeSH indexing terms assigned to a sample article indexed in MEDLINE (Névéol et al. 2010)

Limited number of keywords

Author-selected keywords are also usually limited in number, typically to between 4 and 8 per article. This small number of keywords is unlikely to provide a comprehensive overview of the topics or themes in an article. Indeed, indexers assigned an average of 13.0 (+/-11.9) terms to papers in a collection of 14,398 open-access articles in PubMed Central®, suggesting that a greater number of terms is required to capture the thematic content of most papers.

An alternative approach

Fortunately, there’s a better way to enhance literature searches. Entity linking allows us to consider the context of words as well as the relationships between them. By linking words which carry the same meaning to an entity, we can extract entities from text, rather than relying on subjectively assigned keywords. Entities represent the themes contained in the text, removing the ambiguity associated with varying use of terminology.

References

Névéol, A., Doğan, R. I., & Lu, Z. (2010). Author Keywords in Biomedical Journal Articles. AMIA Annual Symposium Proceedings, 2010, 537–541.

Gerdsri, N., Kongthon, A. & Vatananan, R. S. (2013) Mapping the knowledge evolution and professional network in the field of technology roadmapping: a bibliometric analysis. Technology Analysis & Strategic Management, 25(4), 403-422.

Can entity linking be used for literature reviews?

Entity linking is a term used to describe the automated process, carried out by a computer, of identifying objects or concepts mentioned in a body of text. Take the following text, for example:

It’s difficult to remember a time when I wasn’t conducting literature searches and looking for research gaps to fill.

Concepts mentioned in this text include literatureand gaps. Each could refer to several different entities. Literature could represent the concept of writing as an art form, written work in general, or specifically academic literature. Because it is mentioned here in the context of a literature search however, it is likely to refer to academic literature. Gaps may be physical spaces between objects or conceptual breaks in continuity, but since they are mentioned in this text as research gaps, we can infer that these gaps are conceptual.

When a computer carries out the task of entity linking, it uses the context in which an entity is mentioned to identify which specific entity the text refers to. It does this by referring to a knowledge base, such as Wikipedia. If you haven’t heard of entity linking before, you may have seen it referred to by one of its other names: named entity linking, named entity disambiguation, named entity recognition and disambiguation, or named entity normalization.

Examples of the use if entity linking to assist in comparing groups of literature

1.    Comparing perspectives and attitudes

In a recent study of perspectives on smart cities, Marrone & Hammerle (2018) compared topics across news media, trade publications, academic articles and government reports. This allowed them to compare sources to which citizens, businesses, research organisations and governments were exposed, thus gaining insight into the attitudes and perspectives of these groups. The comparison was carried out using a entity linker, TAGME, which allowed search strings which referred to the same entity to be merged.

2.    Comparing practitioner and academic literature

In a second study by the same authors (Marrone & Hammerle, 2017), misalignment between practitioner and academic literatures was examined, again using an entity linker. Topics were compared across the two groups of literature, focusing on those which were salient in practitioner literature. This facilitated identification of areas on interest to practitioners which are not discussed regularly in academic literature. In short, it elicited research gaps – areas where research is needed by practitioners or is likely to be relevant to practice.

How entity linking is done

According to Piccinno and Ferragina (2014), the entity linking process, as carried out by the tool, TAGME, may be divided into three stages: spotting, disambiguation, and pruning.

  • Spotting involves scanning of the text for meaningful sequences to produce a set of possible mentions (such as literature searchesin the example text given above). The SEA then retrieves a list of candidate entities from its knowledge base for each mention. This list will contain all the possible meanings that it can associate with the mention (such as literature as art, as all writing or as academic literature).
  • Disambiguation then takes place, where the SEA connects a score with each candidate entity in the list, by modelling how strongly the entity correlates with the mention in its context. The connections with the highest scores become the candidate annotation (in the case of the mention, literature, in the example text, the candidate annotation could be academic literature).
  • Pruning is the final stage, in which the SEA decides if it will discard a candidate annotation based on the other annotations that it has made to the text. This decision will therefore depend on whether the annotation makes sense given the overall context of the text.

By removing ambiguities, entity linking can improve the performance of your data analysis. As an automated process, it prevents the introduction of bias, which occurs when we manually code text.

Sources:

Marrone, M. & Hammerle, M. (2017) Relevant research areas in IT Service Management: An examination of academic and practitioner literatures. Communications of the Association for Information Systems: Vol. 41 , Article 23. Available at: http://aisel.aisnet.org/cais/vol41/iss1/23

Marrone, M. & Hammerle, M. (2018) Smart Cities: A review and analysis of stakeholders’ literature. Business and Information Systems EngineeringAvailable at: https://doi.org/10.1007/s12599-018-0535-3

Piccinno, F., & Ferragina, P. (2014). From TagME to WAT: A new entity annotator. In Proceedings of the 1st International Workshop on Entity Recognition & Disambiguation(pp. 55-62).