A seminar with James Mickens

The Assembly Student Fellows of 2021 joined Berkman staff members Zenzele Best, Hilary Ross and Sarah Newman for this final seminar with Professor Mickens on 8 April. This summary was prepared by David Stansbury.

Combating disinformation through external verification

Everyone agrees that false information, sometimes dubbed “fake news” is a problem. But there is much less agreement on what to do about it. Users as verifiers, professional fact checkers, and algorithmic solutions have all been trialled.

Harvard Professor James Mickens has a new idea. What if there were a way to provide an externally verifiable, tamper-resistant provenance history about information and other content that is published online that could be used and understood both by humans and machines?

In 2019, a photo of a field strewn with trash was shared widely on social media: 12,000 times in twelve hours. It purported to show the aftermath of a climate strike protest in Australia, and fuelled a stream of denouncements of the protestors’ hypocrisy. The problem was that the photo had nothing to do with an event in Australia. It was actually a photo of an event in the UK for fans of marijuana.

This event highlights the shortcomings of our current methods to address false information online. For real human users, unless you were at one of the events in question, it is practically impossible to know that an otherwise real image has been attached to a completely unrelated story. 

Although fact checkers, who are often overwhelmed with the volume of false information they need to handle, were able to catch and debunk this particular story, they were only able to do so after it had already got considerable traction. The fact-checked version was less widely shared than the original false claims, as is often the case. 

As for the content moderation algorithms? They missed that this story had a falsely attributed image attached to it.

Professor Mickens asks, what if each part of the story had had its own digital history that could be inspected by users, fact checkers and algorithms? For example, when the picture was originally uploaded to a computer, metadata are embedded in the image. These include the type of camera used, the date and time, and the settings such as shutter speed and focal length. This extant data could be supplemented by additional items like a unique device number, much like we already have on our phones and computers.

Additionally, crucially, an “edit history” could be added. This history would show every time the image was altered, as well as every time it was used and how. If it were put into a file alongside some text like a news story, that information would be stored with the image. When the combined file was uploaded to the internet, all of the metadata and edit history would be uploaded with it. Web browsers would only need minor changes to make this edit history viewable. This would help provide key information about provenance at a glance, and if it were absent might help create healthy scepticism about the veracity of the content in question.

We already have the basic technological building blocks needed to assemble this externally verifiable pipeline. Version control and viewable edit histories are already widely used in software development, through Git and various commercial Content Management Systems. Digital signatures that verify the integrity of the signatures on lower level software are already used in the booting processes of technologies from firms like Apple and Google. So, no, this isn’t about the blockchain. 

What’s stopping us from getting started? Practically, it is a steep hill to climb to get every device its own unique signature. Integrating software to view the verification chain into a browser is simple enough to do. But it needs to be designed such that it is readable by a non-expert user. It also needs to become not only widely available in a range of browsers, but also actually installed and deployed by users. This kind of wide-scale behaviour change and adoption is no small challenge.

There are also human rights and other concerns. Sometimes anonymity is necessary. An uneditable history may broadcast  enough information about a piece of content’s creator to make them identifiable. This could put them in danger. If you create a loophole for anonymity in these challenging cases, bad actors can make use of those same gaps in the system.

Further reading

Dis.info.dex – Disinfodex is a database of publicly available information about disinformation campaigns. It currently includes disclosures issued by major online platforms and accompanying reports from independent open source investigators (https://disinfodex.org/).