In this podcast, we talk to Rainer Kaese, senior manager for business development in storage products at Toshiba Electronics Europe, about how temperature affects hard disk drives (HDDs).

Kaese says the thing to keep an eye on is airflow, and that hard disk drive failure rates tend to multiply significantly if they run above their optimum average temperature.

Kaese also highlights the key SMART (Self-Monitoring, Analysis, and Reporting Technology) value to monitor for good hard drive health when it comes to temperature.

What does it mean for the system when a hard disk drive overheats? The temperature of the hard disk drive is something that you should keep a good eye on in your systems. There are two failure modes for the hard disk drive, which are related to the temperature of the drive itself. In a system, in operation, the hard disk drive heats up, and it needs to be somehow cooled. It does not overheat like other components, such as CPUs [central processing units]. We don't need heat sinks, but at least a little bit of airflow is required in operation. And there are two limits. One is the functional limit. So, if a hard disk drive heats up to an internal temperature of 60°C or 70°C, it is still functioning. Above this, it may not function anymore. So internal temperatures of 60°C for server hard disk drives or 70°C for client hard disk drives should be avoided by all means because this is the functional limit. You will probably immediately recognise if a hard disk drive is that hot because the system may not work anymore.

development instorage products for Toshiba Electronics Europe, in this podcast Also important, but not so well recognised, is the reliability limit. And there it starts way earlier. Talking about reliability, the annualised average failure rate of a server hard disk drive is 0.4%. That means four out of a thousand per year may fail. This is very low. So, hard disk drives rarely fail. For a client machine, it’s a little bit higher. It’s 0.9%, which is still less than 1%. So only nine out of a thousand per year may fail. But this is if the average operating temperature is in the range of 40°C. You may operate it a little bit higher, 41°C, 43°C, or 45°C. But if the average temperature over its lifetime reaches 50°C, the failure probability is already 1.5 times higher. That’s the nominal one, which I explained earlier. If your hard disk drive is operated at 55°C on average, it’s double. And at 60°C, it’s triple. I mean, if it was initially, let’s say 0.9% or say 1% per year, at 60°C average temperature, you have a 3% failure rate. I mean, 97% of your drives will not fail. But with 3%, it’s something that you don’t need. If you cooled your hard disk drives in a better way, you can avoid this higher failure probability. So, the bottom line is 40°C, maximum 45°C should be maintained. Then you can enjoy the best reliability for your hard disk drive and the lowest expected failure rate.

So what causes internal temperature rises in hard disk drives? It’s always the same thing: improper cooling. So, as I said, the heat dissipation of the hard disk drive is not that high. But it needs a little bit of airflow, either with good convection in fanless systems or with a proper airflow around the hard disk drive if there are systems with fans. In all of our experiments, with the equipment we have reviewed here in our laboratory in Düsseldorf, but also at the customer site, it’s always if a hard disk drive reaches more than 45°C, 50°C, 60°C, there was some problem with the cooling. Either a fan was missing, a fan was defective, the system was designed in the wrong way, blocking airflow, not allowing any convection or not allowing any airflow around the hard disk. Let’s say, for example, if you have a two- or four-bay home NAS, some of these boxes have closed lids, closed doors in front. It looks nice, but it blocks some of the airflow. So, what we suggest is to check the temperature of your drive. And if it is, on average, more than 45°C, try to remove obstructions, tune up the fans, or, in the worst case, if a system does not allow it, then buy another system which has better airflow.