As satellites move through space, the imagery and data readings that arrive in a constant flow from them and from probes are unique and cannot be lost. Once captured it is processed, converted to an archive format that can be searched by the global research community, protected and made useable by numerous applications. And every year, existing data is re-processed as new means of mathematical modelling are devised or new measurements added to existing information.
That’s the mission – in data terms – for the European Space Astronomy Centre (ESAC), the one part of the European Space Agency (ESA) where the word “space” means data storage capacity. It turned to NetApp storage as a service to provide the capacity it needs.
“Here, the characteristics of the storage infrastructure aren’t like anything you encounter elsewhere,” said Rubén Alvarez, IT director for science and operations at ESA.
The ESA site in Madrid sits atop the storage infrastructure, where the flow of data arrives from space, virtualised and containerised servers expose it, and research centres glean new insights from it.
Putting archives into production
Currently, that storage infrastructure totals around 8PB. The capacity must be added to constantly, however, due to the exponential rate of growth of data from satellite-borne measuring equipment.
That includes the Gaia project, which has been building 3D imagery of the Milky Way since 2013 and will account for up to 3PB of the total by 2025. Then there’s Euclid, which will start analysis of dark matter in 2024 and will produce up to 20PB by 2030.
By contrast, the Rosetta probe, which landed on a comet in 2014 to collect data for two years, has only produced 218GB. But that poses a different challenge, to transmit its data 400 million km back to Earth with the constraint that it’s impossible to take a second reading if it wasn’t correctly stored the first time.
Among the peculiarities of the ESA’s “library of the universe”, which makes for storage like no other, is that data is put into production not as “hot data” but into archives. Technically, the requirement is to marry high capacity on spinning disk hard drives – which are more fragile and slower than the more expensive SSDs – with the ability to withstand intensive activity (18,000 users per month) and exceptional reliability.
New data sits alongside all the data produced by the ESA since 1999 and which the global space research community uses every day. Furthermore, European best practice dictates scientific journals must make data sources available via links.
The variety of access is made more complex because the ESA’s data comprises a vast number of files. But Alvarez rejects the idea of managing storage in tiers.
“We don’t use the public cloud, except for point requirements, because sovereignty in our own datacentre in Madrid fits with the values of a European public agency,” said Alvarez.
“That’s to say the ESA is not an IT business. Its vocation is to invest enormous resources into space research, so those available to the IT director are limited. That’s why we need storage equipment that really simplifies administration tasks.”
NetApp simplifies the job
ESAC has used NetApp arrays since 2005, with FAS filers with HDDs for the data library and AFF flash-based arrays for application storage.
“We don’t have one array per workload but a cluster that holds data for everything,” said Alvarez. “That’s the most efficient way to manage the complexity and simplify the work of the IT teams.
“From the start, we didn’t decide to settle on one vendor. We just wanted to buy the most reliable storage systems and the easiest to manage. We talked to our colleagues at NASA, they told us they use NetApp, and we did as they did.
“NetApp’s support has been unfailing. Having a vendor accompany us is important at this stage. I think we were among the first customers to pay for storage by usage. We pay for NetApp to provide a storage service that functions all the time with the capacity that we need.”
Alvarez pointed to transparent maintenance processes that have no impact on production: “Maintenance is not only physical intervention to add or replace disks, or shelves of disks. To guarantee the reliability of our data, we must regularly apply updates – for controller firmware, for array OSs. You can’t just ask satellites to stop sending data or researchers to wait to access information.”
In addition to the constant need to add capacity, technical characteristics are also subject to evolution.
“For example, most of our data is in file format because that’s how the format scientific community mostly accesses it,” said Alvarez. “But we have started to see demand for object protocols and have started a slow transition in this direction.”
Regarding data security, four third-party solutions take care of backup, but NetApp snapshots are also in use. These take care of file writes and are set to trigger at the slightest corruption detected in the data.
Regarding access controls, Alvarez said: “Our data is designed for sharing among a large number of people so it’s better if anyone can read it. We don’t share the same preoccupations regarding storage as a classic enterprise. And that’s the same when it comes to cyber security problems, which worry us less than it might others.”
Read more about storage as a service
- From capex to opex: Storage procurement options bloom. We look at the growing list of possibilities when it comes to paying for storage infrastructure, ranging from upfront purchases with upgrades to pure pay-as-you-go options.
- Five key questions to ask about storage-as-a-service and consumption models. We look at important questions to ask providers of consumption-based storage procurement services, such as base costs and burst, usage measurement and upgrade paths.