Skills needed to do a PhD: recent research using text mining and machine learning

The number of people with PhDs is growing worldwide. We know that doing a PhD is a significant undertaking and dropping out of one can result in serious financial loss, psychological issues such as loss of confidence, etc. 

It is not surprising that a lot of research exists on the doctoral experience with the aim to improve it and to find out what it takes to finish it. 

A body of research also exists that looks at the outcomes of a PhD, i.e. what do people gain from a PhD? This question is gaining importance because there seems to be an oversupply of PhD graduates for academia, which means PhD holders need to seek jobs elsewhere, I.e. in corporate or public sectors. 

We analysed PhD requirements to find out what PhD students need in terms of skills, attributes and qualifications.

We analysed the selection criteria for PhD candidates on a platform that advertises PhD programs as job ads. Our analysis of thousands of these ads revealed exactly what types of skills different countries and disciplines require. See the infographic below for a quick summary of the findings and implications of this research:

This study draws on the data source of PhD role advertisements (aka ‘PhD ads’) to identify what skills and/or other requirements doctoral programs seek before PhD admission. We analysed the selection criteria of 13,562 PhD ads posted in 2016-2019 on EURAXESS – Researchers in Motion, a pan-European initiative by the European Commission.

We developed a taxonomy based on the EURODOC ‘Transferable Skills for Early-Career Researchers’ framework and extracted attributes present in each advertisement. To do this we employed text mining and machine learning approaches. We created an updated taxonomy using data-derived dictionary. 

You may use the interactive dashboard below to search for details on PhD ads in any of the 50+ countries and any of the 30+ disciplines represented in the data sample (2016-2021).

Dashboard with data up to 2021

See the full paper for full details on the methodology and the research study:  

Note: The paper is based on 2016-2019 data only and the sample for this time period can be accessed here.

Read The Conversation article on this research.

Read the Campus Morning Mail post on this research.

Entity linking Systems for Literature Reviews

Let’s face it. In busy academic lives there is hardly any time to do some deep reading, let alone stay up to date with everything that is published in your area in real time. Yet, staying up to date with latest knowledge and reviewing literature regularly is our bread and butter as academics.

In addition, published literature reviews help establish your expertise of a particular area. Researchers increasingly automate the coding process in literature reviews and accelerate the literature review process by using computer-assisted tools like Leximancer, topic modelling, Bibliometrix, R packages, NVivo, etc.

However, existing approaches for coding textual data do not account for lexical ambiguity; that is, instances in which individual words have multiple meanings.

To counter this, we developed a method to conduct rapid and comprehensive analyses of diverse literature types by using entity linking in literature reviews. We present a new literature review framework that embeds entity linking.

See the framework step by step below:

In the same paper, we present an example where we apply the framework to review the literature on digital disruption and digital transformation.

On how to adapt the framework to your needs, see the full paper:

Marrone, M., Lemke, S., Kolbe, L.M. (2022), Entity linking Systems for Literature Reviews, Scientometrics. Forthcoming.,

Trends in FinTech Research and Practice: a systematic review

Many industry sectors have experienced significant disruption in recent years through the introduction of new financial technology (or FinTech), including process automation in financial services and the adoption of cryptocurrencies. From the first telegraph cable in 1866 to blockchain in 2009, the evolution of financial technologies has always been aligned with innovations in information systems (IS).

How do FinTech and Information Systems relate to each other? Where are the crossovers, where do they intersect, where do they diverge?
This question drove me and my colleagues to conduct a systematic literature review and to compare academic with practitioner literature.

Findings from our review show that the practitioner-oriented literature foreshadowed the rise of FinTech by extensively reporting on algorithm-based and electronic trading (2009 onwards), followed by reporting on FinTech start-ups and funding successes (2014 onwards).

The practitioner literature subsequently reported on alternative finance models, the introduction of cryptocurrencies, and risks and regulatory issues. Academic literature on FinTech began to rise from 2014 onwards, focusing initially on the development of FinTech in the aftermath of the 2007-2008 global financial crisis.

Research attention subsequently shifted to FinTech innovations (alternative finance, cryptocurrency and blockchain, machine-based methods for financial analysis and forecasting, including artificial intelligence), as well as risk and regulatory issues.

IS work on FinTech started to emerge from 2015 onwards, initially focusing on mobile payment systems and peer-to-peer lending. However, the body of work at the intersection of FinTech and IS is still small.

Changes in FinTech literature over time

Our review sheds light on several opportunities for future research, including financial inclusion, the impacts arising from COVID-19, and the emergence of new business models, such as Banking as a Service (BaaS).

Full paper reference: 

Cai, C., Marrone, M., & Linnenluecke, M. (2022). Trends in FinTech Research and Practice: Examining the Intersection with the Information Systems Field. Communications of the Association for Information Systems

Developing interdisciplinary research maps from business/management and the environmental sciences

Summary of “Interdisciplinary Research Maps: A new technique for visualizing research topics”(Marrone & Linnenluecke, 2020)

Interdisciplinary research is challenging, in part due to the sheer magnitude of knowledge embedded within disciplines, and also to the lack of a common shared understanding across them.

To bridge the gap in understanding between the disciplines of business/management and the environmental sciences, Marrone & Linnenluecke (2020) developed a ‘map’ of topics, concepts, and ideas discussed in top publications in these fields of research.

Developing the map

  • Data for the study was sourced by selecting articles published since 2011 in the top four journals by impact factor in each field, through the Scopus database. The abstracts, titles and publication years of 4,827 environmental sciences articles and 2,671 business and management articles were downloaded.
  • These data were exported to two separate Comma Separated Values (CSV) files, one for each of the areas of interest. The titles of the publications were then merged with their respective abstracts and the files were analysed using TAGME entity linking tool to compile a list of all possible topics from the text in the abstracts and titles.
  • The researchers ‘cleaned’ the results of the analysis by deleting topics that made little meaningful sense, given the context in which they were used. After cleaning the results, 7,915 topics were retained in the environmental sciences articles, and 4,293 in business/management articles.
  • A map was created (see figure below) to show the frequency with which topics are mentioned in each field.

What the map tells us

1.     Topics that are frequently identified in one literature, but not the other

Some topics are represented almost exclusively in the environmental sciences articles, many of them linked to concerns about climate change. Meanwhile, in the business/management journals, the topics which arise most frequently are related to firm structure and expansion. There is an opportunity here, for future research to connect topics across these two fields. For example, further interdisciplinary research could serve to explore the impacts of climate change on business and management decisions, such as asset valuations and investments.

2.     Topics that are frequently associated with both literatures

Several topics are common to both sets of literature. For example, “decision-making” is a frequently discussed topic in both business/management and the environmental sciences. The topic of “China” also arises frequently in both disciplines, but in different ways. Articles in the business/management journals address management challenges and economic opportunities in China, while the those in the environmental sciences journals address the role of emerging economies like China’s in climate adaptation and mitigation efforts. Such areas of topic convergence may provide fruitful avenues for future interdisciplinary research.

Figure 1: Topics are represented as dots; those associated with the business/management literature are coloured red, and those associated with the environmental sciences blue.


Marrone, M. and Linnenluecke, M.K., 2020. Interdisciplinary Research Maps: A new technique for visualizing research topics. Plos one, 15(11), p.e0242283.

Tracking trends in environmental accounting research using machine learning

Summary of “Trends in environmental accounting research within and outside of the accounting discipline” (Marrone, Linnenluecke, Richardson & Smith, 2020)

Environmental sustainability concerns us all. Its importance is reflected in the exponential rise in the profile of research into accounting for environmental degradation, which has taken place since the establishment of the 1997 Kyoto Agreement.

To identify those areas of environmental accounting research which might benefit from a greater exchange of ideas between accounting and non-accounting disciplines, Marrone et al. (2020) utilised a literature review powered by machine learning. The review tracks the emergence of topics and trends, both within and outside of the discipline of accounting.

The review process

  1. A range of keywords were applied to a Scopus database search, which returned 2,502 records. Eighty-three percent of these were published in non-accounting journals.
  2. The TAGME Entity linking system was used to extract topics within the titles and abstracts of these journal papers.
  3. A burst algorithm was then applied. This identified ‘hot’ topics, looking at publications over time– ‘bursts’ indicate new developments related to the topic or a sudden surge of publications in a topic area.

The findings

The review compared two bodies of literature. The figures below show trending topics over the past 50 years, in accounting and non-accounting journals. Those that were trending in 2019 are highlighted in red.

Comparison shows that research in the field of accounting has recently focused on the connection of environmental accounting with corporate social responsibility (CSR) and stakeholder theory. But outside of the accounting journals, more specialised sustainability topics are explored. These include the shift to a low-carbon or circular economy, the attainment of sustainability goals (SDGs) and newer concepts such as accounting for ecosystem services.

Figure 1 Timeline of accounting journal bursts.

Figure 2 Timeline of non-accounting journal bursts.

One reason for the difference between the bodies of literature could be that accounting research is turning away from practical, interdisciplinary issues in favour of building on the theoretical foundations of the discipline. An increased exchange of ideas across disciplines could both strengthen the theoretical basis of research published in non-accounting journals and increase the range of emerging sustainability topics explored in accounting journals.

In future, the method of this review may be developed further, allowing for a more fine-grained analysis which produces updates as new issues of journals are released. Additionally, a means of quantitively examining topic exchanges and cross citations could enable more accurate comparison of the relative relationships between literature reviews. Further improvements in natural language processing may also facilitate an increase in the quality of the automated coding conducted by entity linking tools.

Intelligent Machines: Will AI replace academic researchers?

According to a recent article in Forbes magazine,we can expect to see most people collaborating on their work with an AI (Artificial Intelligence) counterpart by 2030. Will these ‘counterparts’ simply enhance what we do, or will they ultimately come to replace human beings altogether?


Transhumanists examine the possibilities presented by the interactions between people and technology. They tend to welcome technological developments as means of enhancing our intellectual, physical, and psychological capacities (Bostrom, 2003; figure 1).

Figure 1. Adapted from Bostrom (2003, p.12)

An ethical minefield?

It’s not all good news, however. The concept of merging people with technology throws up a host of ethical considerations.

  • Who will access technological advantages? Those who can afford to pay for AI implants, wearable technology and tools? Will a group of enhanced humans emerge at the expense of everyone else?
  • If a person performs a criminal act, while coupled with an AI element, who bears responsibility? Should it be the person linked to the AI, the AI itself, or the person who developed or programmed the AI?

How close are we to developing human-like intelligence?

The idea of people integrating with computers is far from new. The famous computer pioneer J.C.R. Licklider coined the term ¨man-computer symbiosis” in the 1960s. Licklider predicted that computing technology would eventually advance to the extent that you would be able “to think in interaction with a computer in the same way that you think with a colleague whose competence supplements your own.” (Licklider, quoted in Lesh et al., 2004). Almost 60 years on, are we there yet?

The sort of technology which could result in human-like intelligence and the sorts of ethical conundrums outlined above, is still some way off, yet there are some very interesting possibilities available to us. Researchers, for example, may approach a form of symbiosis with computers. AI-based search tools are developing, which can help us navigate the broad swathes of literature available to us, speeding up our engagement with information (Extance, 2018). This is where we believe ResGap comes in.

Harnessing the potential of AI

Technology such as that available through ResGap can enhance your research performance, by allowing you to quickly find research gaps, understand how your research field has evolved over time, and identify hot and cold topics. By letting ResGap do the filtering for you, pointing you to where you will most productively direct your attention, you are able to engage with an astonishing breadth of information.

The kind of technology offered by ResGap doesn’t replace you as a researcher. You can work withResGap, but not yet in Licklider’s “Man-computer symbiosis”. Rather, you are still in the driving seat, harnessing the power of an extremely useful tool. While you won’t be able to access the whole space available to posthumans in Bostrom’s model (above: The Space of Possible Modes of Being), but you’ll certainly be pushing at the envelope of what is accessible to humans. It’s not a bad place to start.

Bostrom, N. (2003) Introduction to transhumanism. Presented at the Intensive Seminar on Transhumanism, Yale University, 26 June 2003. Available at:

Extance, A. (2018) How AI technology can tame the scientific literature. Nature Available at:

Lesh, N., Marks, J., Rich, C., Sidner, C.L. (2004) Man-Computer Symbiosis ´Revisited: Achieving Natural Communication and Collaboration with Computers. IEICE Transactions on Information and Systems

A tale of two research fields: from Team Mental Model theory to research model creation

Team Mental Model Theory

Team mental models are organized mental representations of the team’s relevant environment, shared across team members (Klimoski & Mohammed, 1994). They emerge because individual team members tend to categorise elements of their environments, such as tasks, situations, response patterns or relationships. These categorisations then become shared over time, thanks to communication within the team.

The extent to which mental categorisations are shared across team members can vary widely. They may be highly consistent with one another or completely incongruent. Importantly, when researchers talk about shared mental models, they do not suggest that an identical set of categorisations is held by every member of the team. Rather it is suggested that there exists some degree of consistency or convergence between individuals’ mental models (Kang, 2006; Rentsch, 2008).

Teams with higher levels of convergence of mental models can perform better. A shared mental model enables team members to anticipate the needs and actions of others in the team (Cannon-Bowers, 1993). Thus, the team can coordinate its actions, enhancing its decision-making capacity (Stout, 1999).

From TMM to research models

If we conceptualise research communities as teams, we can begin to see how TMM theory applies to researchers seeking to identify what is researched by individuals members of the team. A degree of sharedness will exist across the mental models held by researchers in each field. They are likely to use the same terms in reference to the same concepts. However, mental models which are shared within one field may diverge significantly from those in a neighbouring field. An understanding of the ways in which different groups think about a topic has vast potential in the pursuit of interdisciplinary research.

Our map, generated by help illustrate the idea that mental models lie on a continuum, rather than as a dichotomy (say, very infrequently discussed in one literature to smilingly identical mentions in both literatures). As an example, we could study how the term “techno-stress” is studied by Psychology and Information Systems researchers. Our tool would help uncover what are the topics that are frequently discussed in both fields, as well as topics that are frequently discussed in one field, but unfrequently discussed in the other.

By mapping what is discussed among researchers in separate fields, we may increase the effectiveness of our research. It is possible to develop a systemic understanding of who is doing what, increasing the coordination of our actions with those of other members of the research community. We become able to anticipate the needs and actions of our fellow researchers.

The unique contribution of lies in its use of entity linking. Because does not rely on keyword identification alone, it identifies discussion of concepts, not just the usage of similar terms. This helps overcome the challenge presented differences in the use of terminology. Different terms may be used in different fields to describe the same concept, or the same terms may be used in different fields to describe completely different concepts. However, can cut through the confusion, allowing us to see where fields overlap, and where relevant and valuable research may be directed.


Cannon-Bowers, J. A., Salas, E., & Converse, S. (1993). Shared mental models in expert team decision making. In N. J. Castellan, Jr. (Ed.), Individual and group decision making: Current issues(pp. 221-246). Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.

Kang, H.-R., Yang, H.-D., & Rowley, C. (2006). Factors in team effectiveness: Cognitive and demographic similarities of software development team members. Human Relations, 59(12), 1681–1710.

Klimoski, R., & Mohammed, S. (1994). Team Mental Model: Construct or Metaphor? Journal of Management, 20(2), 403–437.

Marrone, M, Hammerle, M (2018) Smart Cities: A Review and Analysis of Stakeholders’ Literature. Bus Inf Syst Eng60: 197.

Rentsch, J. R., Small, E. E. & Hanges, P. J. (2008) Cognitions in organizations and teams: What is the meaning of cognitive similarity? In D. B. Smith (Ed.),LEA’s organization and management series. The people make the place: Dynamic linkages between individuals and organizations (pp. 127-155). New York, NY, : Taylor & Francis Group/Lawrence Erlbaum Associates.

Stout,R.J., Cannon-Bowers, J. A., Salas, E., & Milanovich, D. M. (1999). Planning, Shared Mental Models, and Coordinated Performance: An Empirical Link Is Established. Human Factors, 41(1), 61–71.

Finding research gaps in IT Service Management (and in ITIL research)

Summary of “Relevant Research Areas in IT Service Management: An Examination of Academic and Practitioner Literatures” (Marrone & Hammerle, 2017)

IT Service Management is a field of IS research which is widely used and popular in practice (Iden & Eikebrokk, 2013; Marrone & Kolbe, 2011). This study compares business and academic literature in the field. Since the behaviour of practitioners is influenced by the (business) literature they read (Carroll & McCombs, 2003), the comparison uncovers aspects of professional behaviour or practices not explored by academic research, termed “practice-oriented research gaps” (Müller-Bloch and Kranz, 2015).

Academic literature used in the study comprised abstracts of papers from selected information systems publications, identified by database searches using keywords related to ITSM. Practitioner literature was identified through searches of selected popular press and specialist IS publications. Both sets of literature were published during a time span of 16 years (from 1 January, 2000, to 1 May, 2016). A semantic entity annotator(the technology used by was employed to identify topics in the two groups of identified literature, then keyword analysis was applied to identify statistically significant topics.

For each set of literature, eight of the 10 most frequently used topics also appeared regularly in the other set, suggesting that academics and practitioners view many of the same topics as highly important. However, several of the most frequently used topics differed, suggesting a degree of misalignment. Practitioner literature tended to focus on topics associated with the physical implementation and application of ITSM, while, academic literature highlighted the ideaof implementation.

Research gaps identified

The study uncovered four broad practice-oriented research gaps. For each of these, three examples of possible research questions are provided, using a taxonomy proposed by Jarvinen (2000). They are categorised as conceptual-analytical, theory creating or testing, and artefact building or evaluating.

1.    Combining Frameworks

The combination of different frameworks by organisations, to support the use of ITSM, is frequently discussed in practitioner literature. In contrast, most academic papers only consider the use of one framework at a time. Some papers do present evidence that firms combine two frameworks: CoBIT (Control Objectives for Information and Related Technologies) and Information Technology Infrastructure Library (Cater-Steel, Tan, & Toleman, 2006; de Espindola, Luciano, & Audy, 2009; Lapão, 2011; Vogt, Küller, Hertweck, & Hales, 2011). However, practitioners discuss a wide range of frameworks that organizations use simultaneously.

Potential research questions:

  • How does co-implementing frameworks help strengthen areas that a single framework does not cover, such as business-IT alignment, knowledge management, organizational learning, outsourcing, and competitive advantage? (Conceptual analytical)
  • What theory can best reflect why different organizations consider strategic and technical factors when choosing to co-implement ITSM frameworks? (Theory creating)
  • Can one develop a model that indicates the most appropriate mix of ITSM frameworks based on an organization’s specific requirements? (Artefacts building)

2.    Infrastructure

Little academic research has addressed how improvements in infrastructure help organizations achieve beneficial outcomes of implementing ITSM. Further, research has not described the impact that implementing ITSM has on an organization’s infrastructure or cloud computing.

Potential research questions:

  • Which ITSM processes, if any, contribute to the effective management of cloud services? (Conceptual analytical)
  • As organizations increase their reliance on cloud service providers, what is the impact on the benefits that they receive when implementing ITSM? (Artefacts evaluation)
  • How do ITSM frameworks help organizations implement cloud services? (Conceptual analytical)

3.    Software and gamification

The practitioner literature often warns that IT departments implementing ITSM may prioritise software tools over processes, to their disadvantage. It further proposes that gamified tools may offer significant benefits in implementation. However, the advantages and difficulties associated with relying on tools when implementing ITSM are not discussed in the academic literature, nor are the effects of gamification examined.

Potential research questions:

  • How can an organization best use tools to support the implementation of ITSM? (Conceptual analytical)
  • Which kind of model could explain the benefits received due to the use of ITSM tools? (Theory creating)
  • How effectively does gamification help train staff in the ITSM processes—specifically as it concerns content retention and engagement and staff retention? (Artefacts evaluating)

4.    Regulation compliance

Practitioner literature suggests that several organizations implemented ITSM, motivated by the need to comply with regulations, such as the  Sarbanes-Oxley Act (SOX) introduced in in 2002. The impact of regulation on the implementation of ITSM is less evident in the academic literature.

Potential research questions:

  • What is the relationship between the types of regulations introduced and the ITSM organizations implement? (Conceptual analytical)
  • Which kind of model could explain how organizations implement ITSM due to the introduction of different regulations compared to other rationales for adoption? (Theory creating)
  • How effectively did SOX encourage organizations to pay closer attention to their IT governance?(Artefacts evaluating)


Carroll, C. E., & McCombs, M. (2003). Agenda-setting effects of business news on the public’s images and opinions about major corporations. Corporate Reputation Review, 6(1), 36-46.

Cater-Steel, A., Tan, W.-G., & Toleman, M. (2006). Challenge of adopting multiple process improvement frameworks. In Proceedings of the European Conference on Information Systems.

de Espindola, R. S., Luciano, E. M., & Audy, J. L. N. (2009). An overview of the adoption of IT governance models and software process quality instruments at Brazil—preliminary results of a survey. In Proceedings of the 42nd Hawaii International Conference on System Sciences.

Iden, J., & Eikebrokk, T. R. (2013). Implementing IT service management: A systematic literature review. International Journal of Information Management, 33(3), 512-523.

Jarvinen, P. (2000). Research questions guiding selection of an appropriate research method. In Proceedings of the European Conference on Information Systems.

Lapão, L. V. (2011). Organizational challenges and barriers to implementing IT governance in a hospital. Electronic Journal of Information Systems Evaluation, 14(1), 37-45.

Marrone, M., & Kolbe, L. M. (2011). Uncovering ITIL claims: IT executives’ perception on benefits and Business-IT alignment. Information Systems and E-Business Management, 9(3), 363-380

Marrone, M., & Hammerle, M. (2017) Relevant Research Areas in IT Service Management: An Examination of Academic and Practitioner Literatures. Communications of the Association for Information Systems 41(1), 517-543

Müller-Bloch, C., & Kranz, J. (2015). A framework for rigorously identifying research gaps in qualitative literature reviews. In Proceedings of the International Conference on Information Systems.

Vogt, M., Küller, P., Hertweck, D., & Hales, K. (2011). Adapting IT governance frameworks using domain specific requirements methods: Examples from small & medium enterprises and emergency management. In Proceedings of the Americas Conference on Information Systems.