Skills needed to do a PhD: recent research using text mining and machine learning

The number of people with PhDs is growing worldwide. We know that doing a PhD is a significant undertaking and dropping out of one can result in serious financial loss, psychological issues such as loss of confidence, etc. 

It is not surprising that a lot of research exists on the doctoral experience with the aim to improve it and to find out what it takes to finish it. 

A body of research also exists that looks at the outcomes of a PhD, i.e. what do people gain from a PhD? This question is gaining importance because there seems to be an oversupply of PhD graduates for academia, which means PhD holders need to seek jobs elsewhere, I.e. in corporate or public sectors. 

We analysed PhD requirements to find out what PhD students need in terms of skills, attributes and qualifications.

We analysed the selection criteria for PhD candidates on a platform that advertises PhD programs as job ads. Our analysis of thousands of these ads revealed exactly what types of skills different countries and disciplines require. See the infographic below for a quick summary of the findings and implications of this research:

This study draws on the data source of PhD role advertisements (aka ‘PhD ads’) to identify what skills and/or other requirements doctoral programs seek before PhD admission. We analysed the selection criteria of 13,562 PhD ads posted in 2016-2019 on EURAXESS – Researchers in Motion, a pan-European initiative by the European Commission.

We developed a taxonomy based on the EURODOC ‘Transferable Skills for Early-Career Researchers’ framework and extracted attributes present in each advertisement. To do this we employed text mining and machine learning approaches. We created an updated taxonomy using data-derived dictionary. 

You may use the interactive dashboard below to search for details on PhD ads in any of the 50+ countries and any of the 30+ disciplines represented in the data sample (2016-2021).

Dashboard with data up to 2021

See the full paper for full details on the methodology and the research study:  

Note: The paper is based on 2016-2019 data only and the sample for this time period can be accessed here.

Read The Conversation article on this research.

Read the Campus Morning Mail post on this research.

Tracking trends in environmental accounting research using machine learning

Summary of “Trends in environmental accounting research within and outside of the accounting discipline” (Marrone, Linnenluecke, Richardson & Smith, 2020)

Environmental sustainability concerns us all. Its importance is reflected in the exponential rise in the profile of research into accounting for environmental degradation, which has taken place since the establishment of the 1997 Kyoto Agreement.

To identify those areas of environmental accounting research which might benefit from a greater exchange of ideas between accounting and non-accounting disciplines, Marrone et al. (2020) utilised a literature review powered by machine learning. The review tracks the emergence of topics and trends, both within and outside of the discipline of accounting.

The review process

  1. A range of keywords were applied to a Scopus database search, which returned 2,502 records. Eighty-three percent of these were published in non-accounting journals.
  2. The TAGME Entity linking system was used to extract topics within the titles and abstracts of these journal papers.
  3. A burst algorithm was then applied. This identified ‘hot’ topics, looking at publications over time– ‘bursts’ indicate new developments related to the topic or a sudden surge of publications in a topic area.

The findings

The review compared two bodies of literature. The figures below show trending topics over the past 50 years, in accounting and non-accounting journals. Those that were trending in 2019 are highlighted in red.

Comparison shows that research in the field of accounting has recently focused on the connection of environmental accounting with corporate social responsibility (CSR) and stakeholder theory. But outside of the accounting journals, more specialised sustainability topics are explored. These include the shift to a low-carbon or circular economy, the attainment of sustainability goals (SDGs) and newer concepts such as accounting for ecosystem services.

Figure 1 Timeline of accounting journal bursts.

Figure 2 Timeline of non-accounting journal bursts.

One reason for the difference between the bodies of literature could be that accounting research is turning away from practical, interdisciplinary issues in favour of building on the theoretical foundations of the discipline. An increased exchange of ideas across disciplines could both strengthen the theoretical basis of research published in non-accounting journals and increase the range of emerging sustainability topics explored in accounting journals.

In future, the method of this review may be developed further, allowing for a more fine-grained analysis which produces updates as new issues of journals are released. Additionally, a means of quantitively examining topic exchanges and cross citations could enable more accurate comparison of the relative relationships between literature reviews. Further improvements in natural language processing may also facilitate an increase in the quality of the automated coding conducted by entity linking tools.

Blowing Hot and Cold: What can topic dynamics tell us about research gaps?

What are “hot” and “cold” topics and how do we identify them?

The amount of scientific interest shown in different topics varies over time. Those generating increasing levels of interest are “hot” topics, while topics with falling levels of interest are “cold” (Antons, 2016).

The level of scientific interest in a topic is reflected by its prevalence in published literature, so analysis of the occurrence of topics in the literature over time can provide an indication of topic dynamics (Griffiths & Steyvers, 2004).Entity linking is an incredibly useful tool for identifying topic occurrence in a body of literature. By taking account of the context in which a concept is mentioned, entity linking can pick up trends which would be missed by analyses based on keywords alone.

Why should researchers care about topic dynamics?

Information about topic dynamics can give us an insight into the ways in which scientific interest in different topics has waxed and waned in the past. This can be useful in historical research. The topic landscape can also be mapped for different journals, allowing researchers to select the most appropriate journals to which they should submit their papers (Lee, 2018).

Further, we can use data on topic dynamics to make predictions about future research trends. We may build predictive models, or we may simply assume past growth to be an indicator of future potential. An understanding of topic dynamics can therefore help researchers and funders to direct research efforts to areas where there is most likely to be demand.

Finding research gaps

Perhaps of greatest interest to potential researchers are hot topics associated with relatively small numbers of published articles. These represent emerging areas of scientific interest and may be the richest source of research gaps, waiting to be filled. However, hot topics already associated with a large body of published work, may still present opportunities to apply different or multiple levels of analysis, while cold topics may represent areas where interest can be revived (Antons, 2016).

Analysis of topic dynamics may also reveal be opportunities to connect related areas of research which have previously been considered separately. Hopp (2018) mapped the topic landscape of disruption research between 1975 and 2016 to discover two increasingly disconnected subnetworks within the field. They suggest that reconnecting these areas should be a research priority.

Technological advances have resulted in the development of excellent tools for analysing the vast bodies of literature available to us. This presents researchers with the opportunity to understand the dynamics of scientific interest, allowing them to direct their work towards the areas where it may have the greatest impact.

Antons, D., Kleer, R. & Salge, T. O. (2016) Mapping the Topic Landscape of JPIM, 1984-2013: In Search of Hidden Structures and Development Trajectories. Journal of Product Innovation Management, 33(6), 726-749.

Griffiths, T.L. and Steyvers, M., 2004. Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), pp.5228-5235.

Hopp, C., Antons, D., Kaminski, J. & Salge, T. O. (2018) The Topic Landscape of Disruption ResearchA Call for Consolidation, Reconciliation, and Generalization. Journal of Product Innovation Management, 35(3), 458-487.

Lee, H. & Kang, P. (2018) Identifying core topics in technology and innovation management studies: a topic model approach. Journal of Technology Transfer, 43(5), 1291-1317.

A thousand cultures: bridging interdisciplinary divides

In his “Two Cultures” lecture, C.P. Snow (1961) spoke of a “gulf of mutual incomprehension” between scientists and scholars of the humanities, resulting from their disparate perspectives on the world and what we can hope to know about it. Since the enlightenment, scientists have sought to unearth objective facts and present them in an unbiased manner, while scholars of the humanities have celebrated the role of human interpretation in building knowledge and understanding. In the succeeding decades, interdisciplinary or even transdisciplinary work has been called for, to solve increasingly complex problems, yet divides persist.

Interdisciplinary divides are not only in evidence between the sciences and humanities. They exist between quantitative and qualitative researchers, between practitioners and theorists, and between academic and business sectors (Kahn, 2011). Echoing the title of Deleuze and Guattari’s (1987) seminal book. “A Thousand Plateaus”, it is perhaps more accurate to speak of a thousand cultures than of two. Deleuze and Guattari used the analogy of rhizomes (modified plant stems which run underground horizontally) to describe aggregations of multiplicity as an alternative to “arboreal” conceptions of diverging roots or branches. We may consider “disciplines” in a similar way (see Figure 1). Within a rhizomatic conception of research, groups of researchers may share enough qualities that they may be considered to form a discipline, but each discipline will still be connected to the others.

Figure 1: Possible means of conceptualising disciplines. The image on the left is of a traditional, “arboreal” conception of disciplines, represented as a branching tree diagram. The image on the right is of an alternative, rhizomatic conception of disciplines, a horizontally interconnected network of researchers, represented here by numbered circles.

When we broaden our conception of cultures or disciplines, we acknowledge a wider range of types of interdisciplinary divide. Beyond the intellectual divides described by Snow (1961), structural divides emerge, including gulfs in the understanding and use of technology. There are institutional divides concerning conventions in career progression and journal publication, differences in academic calendars, and differences in working practices between academia and business (Kahn, 2011). Further, there is variation in conventions and language across different geographic locations.

How entity linkers can help us cross the divides

Different terms may be used in different disciplines to describe the same concept, or the same term may refer to different concepts depending on its disciplinary context. Both phenomena were encountered in a recent review of literature on the concept of “green prescriptions”. In New Zealand, the term, “green prescription” is used to describe the prescribing of physical activity as therapy (e.g. Anderson, 2015). In Europe, the term has been similarly adopted to describe the prescribing of activities carried out in natural settings as therapy (Van den Berg, 2017). This practice is also referred to in Europe as a performing “nature-based” (e.g. Barton, 2015) or “ecotherapy” (Bibby, 2013) interventions. In Germany, however, “green prescriptions” refer to a type of medical prescription, usually for complementary or supplementary medication (e.g. Heyde, 2014).

As the purpose of the literature review was to explore the prescribing of activities in natural environments, papers describing the New Zealand concept of a “green prescription” and papers describing the European concepts of green prescriptions and nature-based or ecotherapy interventions were relevant, while papers on the German “green prescription” were not. An entity linker would enhance a literature review of this kind, by expediting the uncovering of associated terms such as “nature-based intervention” while discarding unrelated papers, such as those using the term “green prescription” to describe the prescribing of supplementary medication.

Entity linkers have great potential aligning the different ways in which a term is discussed across disciplines, by removing ambiguity. Harnessing this capacity to search for references to the entity in question, discarding references to unrelated entities and including references which use different terminology, we may begin to bridge the divides between a thousand cultures or disciplines.



Anderson, Y. C., Taylor, G. M., Grant, C. C., Fulton, R. B. & Hofman, P. L. (2015) The green prescription active families programme in Taranaki, New Zealand 2007–2009: Did it reach children in need? Journal of Primary Health Care, 7(3), 192-197.

Barton, J., Sandercock, G., Pretty, J. & Wood, C. (2015) The effect of playground- and nature-based playtime interventions on physical activity and self-esteem in UK school children. International Journal of Environmental Health Research, 25(2), 196-206.

Bibby, P., Wild, A. & Bodell, S. (2013) The benefits of ecotherapy interventions on mental health conditions. British Journal of Occupational Therapy, 76, 47-47.

Deleuze, G. & Guatarri, F. (1987) A Thousand Plateus: Capitalism and Schizophrenia. (2ndEdn) University of Minnesota Press

Heyde, I., Böschen, D., Dicheva, S., Hinrichs, A. & Peters, H. (2014) Folic acid on green prescription: When adjunctive therapy is reimbursed. Deutsche Apotheker Zeitung, 154(19), 36-38.

Kahn, J. (2011) The two (institutional) cultures: a consideration of structural barriers to interdisciplinarity. Perspect Biol Med.,54(3):399-408.

Snow, C.P. (1961) The Two Cultures and The Scientific Revolution. New York: Cambridge University Press

Van den Berg, A. E. (2017) From green space to green prescriptions: Challenges and opportunities for research and practice.Frontiers in Psychology, 8.