Skills needed to do a PhD: recent research using text mining and machine learning

The number of people with PhDs is growing worldwide. We know that doing a PhD is a significant undertaking and dropping out of one can result in serious financial loss, psychological issues such as loss of confidence, etc. 

It is not surprising that a lot of research exists on the doctoral experience with the aim to improve it and to find out what it takes to finish it. 

A body of research also exists that looks at the outcomes of a PhD, i.e. what do people gain from a PhD? This question is gaining importance because there seems to be an oversupply of PhD graduates for academia, which means PhD holders need to seek jobs elsewhere, I.e. in corporate or public sectors. 

We analysed PhD requirements to find out what PhD students need in terms of skills, attributes and qualifications.

We analysed the selection criteria for PhD candidates on a platform that advertises PhD programs as job ads. Our analysis of thousands of these ads revealed exactly what types of skills different countries and disciplines require. See the infographic below for a quick summary of the findings and implications of this research:

This study draws on the data source of PhD role advertisements (aka ‘PhD ads’) to identify what skills and/or other requirements doctoral programs seek before PhD admission. We analysed the selection criteria of 13,562 PhD ads posted in 2016-2019 on EURAXESS – Researchers in Motion, a pan-European initiative by the European Commission.

We developed a taxonomy based on the EURODOC ‘Transferable Skills for Early-Career Researchers’ framework and extracted attributes present in each advertisement. To do this we employed text mining and machine learning approaches. We created an updated taxonomy using data-derived dictionary. 

You may use the interactive dashboard below to search for details on PhD ads in any of the 50+ countries and any of the 30+ disciplines represented in the data sample (2016-2021).

Dashboard with data up to 2021

See the full paper for full details on the methodology and the research study:  

Note: The paper is based on 2016-2019 data only and the sample for this time period can be accessed here.

Read The Conversation article on this research.

Read the Campus Morning Mail post on this research.

Entity linking Systems for Literature Reviews

Let’s face it. In busy academic lives there is hardly any time to do some deep reading, let alone stay up to date with everything that is published in your area in real time. Yet, staying up to date with latest knowledge and reviewing literature regularly is our bread and butter as academics.

In addition, published literature reviews help establish your expertise of a particular area. Researchers increasingly automate the coding process in literature reviews and accelerate the literature review process by using computer-assisted tools like Leximancer, topic modelling, Bibliometrix, R packages, NVivo, etc.

However, existing approaches for coding textual data do not account for lexical ambiguity; that is, instances in which individual words have multiple meanings.

To counter this, we developed a method to conduct rapid and comprehensive analyses of diverse literature types by using entity linking in literature reviews. We present a new literature review framework that embeds entity linking.

See the framework step by step below:

In the same paper, we present an example where we apply the framework to review the literature on digital disruption and digital transformation.

On how to adapt the framework to your needs, see the full paper:

Marrone, M., Lemke, S., Kolbe, L.M. (2022), Entity linking Systems for Literature Reviews, Scientometrics. Forthcoming.,

Trends in FinTech Research and Practice: a systematic review

Many industry sectors have experienced significant disruption in recent years through the introduction of new financial technology (or FinTech), including process automation in financial services and the adoption of cryptocurrencies. From the first telegraph cable in 1866 to blockchain in 2009, the evolution of financial technologies has always been aligned with innovations in information systems (IS).

How do FinTech and Information Systems relate to each other? Where are the crossovers, where do they intersect, where do they diverge?
This question drove me and my colleagues to conduct a systematic literature review and to compare academic with practitioner literature.

Findings from our review show that the practitioner-oriented literature foreshadowed the rise of FinTech by extensively reporting on algorithm-based and electronic trading (2009 onwards), followed by reporting on FinTech start-ups and funding successes (2014 onwards).

The practitioner literature subsequently reported on alternative finance models, the introduction of cryptocurrencies, and risks and regulatory issues. Academic literature on FinTech began to rise from 2014 onwards, focusing initially on the development of FinTech in the aftermath of the 2007-2008 global financial crisis.

Research attention subsequently shifted to FinTech innovations (alternative finance, cryptocurrency and blockchain, machine-based methods for financial analysis and forecasting, including artificial intelligence), as well as risk and regulatory issues.

IS work on FinTech started to emerge from 2015 onwards, initially focusing on mobile payment systems and peer-to-peer lending. However, the body of work at the intersection of FinTech and IS is still small.

Changes in FinTech literature over time

Our review sheds light on several opportunities for future research, including financial inclusion, the impacts arising from COVID-19, and the emergence of new business models, such as Banking as a Service (BaaS).

Full paper reference: 

Cai, C., Marrone, M., & Linnenluecke, M. (2022). Trends in FinTech Research and Practice: Examining the Intersection with the Information Systems Field. Communications of the Association for Information Systemshttps://www.researchgate.net/publication/359107231_Trends_in_FinTech_Research_and_Practice_Examining_the_Intersection_with_the_Information_Systems_Field

Tracking trends in environmental accounting research using machine learning

Summary of “Trends in environmental accounting research within and outside of the accounting discipline” (Marrone, Linnenluecke, Richardson & Smith, 2020)

Environmental sustainability concerns us all. Its importance is reflected in the exponential rise in the profile of research into accounting for environmental degradation, which has taken place since the establishment of the 1997 Kyoto Agreement.

To identify those areas of environmental accounting research which might benefit from a greater exchange of ideas between accounting and non-accounting disciplines, Marrone et al. (2020) utilised a literature review powered by machine learning. The review tracks the emergence of topics and trends, both within and outside of the discipline of accounting.

The review process

  1. A range of keywords were applied to a Scopus database search, which returned 2,502 records. Eighty-three percent of these were published in non-accounting journals.
  2. The TAGME Entity linking system was used to extract topics within the titles and abstracts of these journal papers.
  3. A burst algorithm was then applied. This identified ‘hot’ topics, looking at publications over time– ‘bursts’ indicate new developments related to the topic or a sudden surge of publications in a topic area.

The findings

The review compared two bodies of literature. The figures below show trending topics over the past 50 years, in accounting and non-accounting journals. Those that were trending in 2019 are highlighted in red.

Comparison shows that research in the field of accounting has recently focused on the connection of environmental accounting with corporate social responsibility (CSR) and stakeholder theory. But outside of the accounting journals, more specialised sustainability topics are explored. These include the shift to a low-carbon or circular economy, the attainment of sustainability goals (SDGs) and newer concepts such as accounting for ecosystem services.

Figure 1 Timeline of accounting journal bursts.

Figure 2 Timeline of non-accounting journal bursts.

One reason for the difference between the bodies of literature could be that accounting research is turning away from practical, interdisciplinary issues in favour of building on the theoretical foundations of the discipline. An increased exchange of ideas across disciplines could both strengthen the theoretical basis of research published in non-accounting journals and increase the range of emerging sustainability topics explored in accounting journals.

In future, the method of this review may be developed further, allowing for a more fine-grained analysis which produces updates as new issues of journals are released. Additionally, a means of quantitively examining topic exchanges and cross citations could enable more accurate comparison of the relative relationships between literature reviews. Further improvements in natural language processing may also facilitate an increase in the quality of the automated coding conducted by entity linking tools.