This article is part 6 of a series reviewing selected papers and associated commentary from Altmetric’s list of the top 100 most discussed and shared research and commentary of 2020.
As with the article that was the subject of part 3 of this series, the reason why the #33 article1 in Altmetric’s top 100 list for 2020 has been so widely discussed and shared is not because of the merits of the article itself, but because of the criticism it has attracted.
Titled “Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings,” the article was published in the prestigious high impact factor2 journal Nature Communications. It reports on the design of an algorithm to automatically generate trustworthiness evaluations for the facial action units (smile, eye brows, etc.) of European portraits in large historical databases. The article authors conclude that:
Our results show that trustworthiness in portraits increased over the period 1500–2000 paralleling the decline of interpersonal violence and the rise of democratic values observed in Western Europe. Further analyses suggest that this rise of trustworthiness displays is associated with increased living standards.
However, when the article was published in September 2020, it attracted strong criticism, with people likening it to the pseudosciences phrenology and physiognomy. Phrenology is a pseudoscience which involves the measurement of bumps on the skull to predict mental traits3. Physiognomy is the practice of assessing a person’s character or personality from their outer appearance – especially the face4.
In a post on Medium, human memory PhD researcher Rory Spanton identifies key flaws in the “Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings” study:
- The algorithm is biased towards white people. Algorithms that use trustworthiness rankings to facilitate the criminal judgement of citizens are real, so a biased algorithm that rates perceived trustworthiness has a huge potential for misuse.
- A muddy definition of trustworthiness creates enough ambiguity for bigots to claim support for the view that lower classes of society are actually less trustworthy. To many critics, this is physiognomy clothed in contemporary statistical methods, and sets the precedent for biased algorithms to infer dangerous conclusions about less privileged groups.
- Problems with the study’s algorithm are only compounded by many other issues with its logic. For example, humanities scholars have been quick to point out that paintings aren’t “cognitive fossils”; art is instead influenced by ever-changing cultural attitudes.
- The authors’ assertion that trustworthiness is also linked to metrics of societal advancement – GDP per capita and a supposed decrease in interpersonal violence – is also a flawed interpretation.
In response to the criticisms of many, an editor’s note was attached to the article on 30 September 2020. It states that “Readers are alerted that this paper is subject to criticisms that are being considered by the editors. A further editorial response will follow the resolution of these issues.” This response has not yet been forthcoming.
But how was such a flawed article published in a prestigious academic journal in the first place? Rory Spanton provides this advice:
Huge flaws permeate [this] … problematic article and are visible even on a brief inspection. So why did the reviewers and editors at Nature Communications publish it?[The article’s] … acceptance into a prestigious journal illustrates a current stereotype in academia: any research is immediately more publishable if it uses machine learning and tells a good story. AI is a captivating buzzword, and many academics and laypersons alike lap up AI research without giving thought to its true validity.
Supporting Rory Spanton’s observation, the paper was not just published by Nature Communications, but actively marketed by the journal publisher using a topical media hook. Evidence for this comes from a number of media articles that reported the research using similar headlines. For example:
- an article published in the New Zealand Herald titled “Why Meghan Markle’s face is more trustworthy than the Queen’s”
- an article published in the Daily Mail titled “Meghan Markle looks ‘more trustworthy’ than the Queen, according to face-scanning algorithm that reveal portraits show their subjects looking more dependable as living standards improve.”
The attractiveness of such a marketing hook to the media is highlighted by the mass reporting in the past few days of Oprah Winfrey’s interview with Meghan Markle and Prince Harry.
What does this mean for knowledge management?
Scientific research literature is a key evidence source in evidence-based management. However, prior to use, all scientific research evidence must be critically appraised to judge its trustworthiness, value, and relevance in a particular context.
The “Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings” study highlights the importance of this critical appraisal. In particular, the study exposes the risks of artificial intelligence (AI) and machine learning perpetuating dangerous biases. As The Conversation reports, there is a growing body of literature on this topic.
The criticisms of this flawed study also reveal important lessons that need to be taken very seriously by researchers. Rory Spanton warns that:
Now more than ever, scientists must confront the reality that after publication, their work is woven into the fabric of culture and society. Their conclusions, well-founded or not, can change minds, promote agendas and influence policy. Even research that has been completely debunked can result in adverse consequences for people years later.[This flawed study] … is not an anomaly. In any field, problematic work sometimes makes its way into reputable outlets. This is the inevitable result of the many factors and incentive structures that drive academics to publish. But these academics still form the last line of defence against bad science. Educating oneself about prejudice and the implications of bad research is as crucial in this defence as maintaining subject-specific knowledge. But until more researchers actively pursue these goals, we can expect more flawed conclusions and more racist algorithms.
- Safra, L., Chevallier, C., Grèzes, J., & Baumard, N. (2020). Tracking historical changes in trustworthiness using machine learning analyses of facial cues in paintings. Nature communications, 11(1), 1-7. ↩
- The 2019 2-year impact factor for Nature Communications was 12.121, with an impact factor above 10 considered to be outstanding. ↩
- Wikipedia, CC BY-SA 3.0. ↩
- Wikipedia, CC BY-SA 3.0. ↩
Also published on Medium.