HPE on the ISS: In space, no one can hear your CPU fans
In The Martian, Matt Damon is stranded on Mars, and his comms link is so slow that it takes almost an hour to have a conversation. Now HPE is going into orbit
Next week, an HPE edge computer will be launched into space on a trip to the International Space Station (ISS), where it is being deployed to provide artificial intelligence (AI) processing.
The hardware, built around HPE’s Edgeline EL4000 converged edge device and HP ProLiant DL360 Gen110 server, will act as a first step towards a manned mission to Mars.
The Spaceborne Computer-2 is scheduled to launch into orbit on the 15th Northrop Grumman Resupply Mission to Space Station (NG-15) on 20 February and will be available for use on the ISS for two to three years.
It builds on a proof-of-concept system from HPE, Spaceborne Computer, launched into space in 2017 to operate on the ISS for a one-year mission. Nasa’s goal was to test whether affordable, commercial off-the-shelf servers used on Earth could withstand being launched into space and operate on the ISS.
The results of the pilot led Nasa to ask HPE to build a larger machine, which requires twice the rack space. Compared to its predecessor, said HPE, Spaceborne Computer-2 will offer twice as much compute speed with purpose-built edge computing capabilities powered by the HPE Edgeline Converged Edge system and HPE ProLiant server to ingest data from a range of devices, including satellites and cameras, and process it in real time.
It is also equipped with graphic processing units (GPUs) to support applications that process image-intensive data, such as shots of polar ice caps on Earth or medical X-rays. HPE said the GPU capabilities will also support specific projects using AI and machine learning techniques.
The base software on the Spaceborne Computer-2 is standard Red Hat Enterprise Linux 7.8 and Nasa TRek 5.2.2 (Telescience Resource Kit), a suite of software applications and libraries that can be used to monitor and control assets in space or on the ground.
The ISS has 3D printers on board to enable astronauts to print new tools and parts they may require. HPE said the Spaceborne Computer-2 could be used to verify that the parts printed matched the specifications on file.
Spaceborne Computer-2 is also different from the first version in that it has been designed for serviceability in space, which means the astronauts on the ISS may be called on to work like a datacentre operator to replace failed components.
HPE has provided an inventory of spare parts, such as fans and power supplies, for the Spaceborne Computer-2, but the redundancy of the system means that parts do not have to be replaced immediately. Instead, any hardware fixes are scheduled as part of routine maintenance tasks performed by the astronauts on the ISS.
Read more about tech in space
- Like most applications, space systems are migrating from centralised to distributed processing for more autonomous decision-making, lower latency, less dependence on communications systems and lower power.
- Neil Armstrong stepped onto the lunar surface on 20 July 1969. Computers in space have come a long way since then.
The system can also cut power consumption from 600W to an idle state of 300W, depending on the ISS power requirements.
Through a collaboration with Microsoft Azure Space, researchers around the world can also run experiments on Spaceborne Computer-2, which bursts processing to the Azure cloud for computationally intensive workloads.
“The most important benefit to delivering reliable in-space computing with Spaceborne Computer-2 is making real-time insights a reality,” said Mark Fernandez, solution architect, converged edge systems, at HPE, and principal investigator for Spaceborne Computer-2. “Space explorers can now transform how they conduct research based on readily available data and improve decision-making.”
For Fernandez, the development of self-sufficient computing is a necessary step for astronauts to become self-sufficient in space. Network latency on the ISS is between 700ms (milliseconds) and 900ms, and bandwidth is about 2Mbps, which is not that much better than the network speeds available in the 1990s with dial-up modems.
“We want to take advantage of precious bandwidth, so we focus on encoding and compressing messages back and forth,” said Fernandez. The GPU and CPU on the Spaceborne Computer-2 are used to compress the communications.
For example, the system status is updated every six seconds, and this update is encoded to fit into a single IP packet. “This means it has a high probability that it will get through,” he said.
How HPE monitors servers on the ISS
In systems management, a preventative design focuses on anticipating errors or failures, such as the fact that memory experiences increased errors in high-radiation environments. Mark Fernandez, principal investigator for Spaceborne Computer-2, said a typical preventative systems management approach would involve augmenting memory monitoring, to check and re-check for memory errors. One of the downsides of this approach is that it can potentially slow down processing.
To avoid this, he said HPE uses a consequential design on the ISS. “Our consequential design treats all sensors and sensor data equally from a systems management perspective, and does not affect processing performance. Only when a standard reading falls out of range is action potentially taken,” said Fernandez. In effect, the systems management tries to ascertain the consequence of such-and-such a thing happening.
“Aspirationally, we are working on artificial intelligence and machine learning to analyse this standard data and potentially recommend a change in operating parameters and/or a maintenance window,” he added.
One of the areas HPE needed to address in Spaceborne Computer-2 was monitoring. Datacentre computing has evolved to focus on power efficiency and performance, which, said Fernandez, has led to monitoring of individual nodes becoming centralised. But on the ISS, he said, “we don’t have that luxury in space”.
Instead, the system uses hardware redundancy and a status table with a series of actions. The system software is encoded so that if a parameter falls out of a given range, there is a ranking of steps to follow to fix the problem.
Instead of preventative design, said Fernandez, system management takes a consequential approach. Here, a ranking model attempts to provide an understanding of the consequences of applying a given fix, to overcome an undesirable system status parameter reading.
Read more on Data centre hardware
How TDCX is building a people-centric business
The inauguration of LUMI: Europe's most powerful supercomputer launches
Durham University upgrades cosmology supercomputer to switchless architecture with Rockport Networks
HPE offers up progress report on how its space-based supercomputer is performing