How analogue film will be the future of digital history

The pandemic meant GitHub had to wait until July to store a 21TB snapshot of its code repositories, on special film that can last a thousand years

Cliff Saran, Managing Editor

Published: 29 Jul 2020 17:15

Earlier in July, a new initiative to preserve historical open source code began, with snapshots of the code that makes Facebook and Netflix among others archived for future prosperity. The open source code of these and other GitHub repositories were successfully deposited in the GitHub Arctic Code Vault. These snapshots aim to preserve the code for future generations, historians and scientists.

The storage medium GitHub is entrusting to store this valuable archive on is good old fashioned film, which is not dissimilar to the reels that people used to put into cameras before digital camera manufacturers came along claiming SD cards were better.

The GitHub Arctic Code Vault is a data repository preserved in the Arctic World Archive (AWA). This data repository is located in a decommissioned coal mine in the Svalbard archipelago, closer to the North Pole than the Arctic Circle. The archive is stored 250 meters deep in the permafrost of an Arctic mountain. GitHub originally captured a snapshot of every active public repository on 2 February 2020.

The archive holds 6,000 of its most significant repositories for perpetuity, capturing the evolution of technology and software. This collection includes the source code for the Linux and Android operating systems; the programming languages Python, Ruby, and Rust; web platforms Node, V8, React, and Angular; cryptocurrencies Bitcoin and Ethereum; AI tools TensorFlow and FastAI; and many more.

Describing why it is important to maintain such a code archive, Thomas Dohmke, vice-president of special projects at GitHub, says: “Over the past 20 years, open source software has dramatically changed our lives.” For instance, the German coronavirus track and trace app and apps for finding the status of a flight or booking a car all rely on open source code.

“Moving forward, there will be no major invention that doesn’t rely on open source software,” he said. For instance, the code that Katie Bouman and the team behind the Event Horizon Telescope, used to capture the first ever picture of a black hole, is based on open source software. “Some 90% of all software is dependent on open source software,” says Dohmke. “No one wants to reinvent the wheel. Developers pull in libraries from GitHub.”

From a purely practical perspective, the dependency on open source code in modern software development actually means that developers may find the code repository their application depends on has been removed by its maintainer. “Stuff gets lost because hard disk drives fail, or the inventor intentionally deletes the repository when it became a burden.” He says this recently happened when the inventor of a Javascript library decided to delete it. Its removal broke software that had coding dependencies based on it.

Storing for a thousand years

Github has worked with Piql, which has developed a way to archive data, built on principles of open source and future access. The media, called piqlFilm (digital photosensitive archival film), provides authenticity measures, supplier independence and does not require data migration. Data stored on piqlFilm can be read back both by machines and the human eye. The manufacturer estimates that archived data will remain stable for a thousand years.

Github is also working with Microsoft’s Project Silicon, which builds on research from the University of Southampton’s optoelectronics research centre. This makes use of recent discoveries in ultrafast laser optics to store data in quartz glass by using femtosecond lasers.

How analogue film will be the future of digital history

The pandemic meant GitHub had to wait until July to store a 21TB snapshot of its code repositories, on special film that can last a thousand years

Read more about archiving

Storing for a thousand years

Read more on Open source software

GitHub

GitHub vulnerability leaks sensitive security reports

GitHub fixes race condition that could have led to ‘repojacking’

GitHub to lay off 10% of workforce, sparks remote work debate