
Dmitry - stock.adobe.com
Get HDD temperature right, or risk more drive failures
We talk to Rainer Kaese of Toshiba about the right temperature to run hard disk drives at. Not getting it right risks higher failure rates than what would normally be expected
In this podcast, we talk to Rainer Kaese, senior manager for business development in storage products at Toshiba Electronics Europe, about how temperature affects hard disk drives (HDDs).
Kaese says the thing to keep an eye on is airflow, and that hard disk drive failure rates tend to multiply significantly if they run above their optimum average temperature.
Kaese also highlights the key SMART (Self-Monitoring, Analysis, and Reporting Technology) value to monitor for good hard drive health when it comes to temperature.
What does it mean for the system when a hard disk drive overheats?
The temperature of the hard disk drive is something that you should keep a good eye on in your systems. There are two failure modes for the hard disk drive, which are related to the temperature of the drive itself.
In a system, in operation, the hard disk drive heats up, and it needs to be somehow cooled. It does not overheat like other components, such as CPUs [central processing units]. We don’t need heat sinks, but at least a little bit of airflow is required in operation.
And there are two limits. One is the functional limit. So, if a hard disk drive heats up to an internal temperature of 60°C or 70°C, it is still functioning. Above this, it may not function anymore. So internal temperatures of 60°C for server hard disk drives or 70°C for client hard disk drives should be avoided by all means because this is the functional limit. You will probably immediately recognise if a hard disk drive is that hot because the system may not work anymore.
Also important, but not so well recognised, is the reliability limit. And there it starts way earlier. Talking about reliability, the annualised average failure rate of a server hard disk drive is 0.4%. That means four out of a thousand per year may fail. This is very low. So, hard disk drives rarely fail.
For a client machine, it’s a little bit higher. It’s 0.9%, which is still less than 1%. So only nine out of a thousand per year may fail. But this is if the average operating temperature is in the range of 40°C.
You may operate it a little bit higher, 41°C, 43°C, or 45°C. But if the average temperature over its lifetime reaches 50°C, the failure probability is already 1.5 times higher. That’s the nominal one, which I explained earlier.
If your hard disk drive is operated at 55°C on average, it’s double. And at 60°C, it’s triple. I mean, if it was initially, let’s say 0.9% or say 1% per year, at 60°C average temperature, you have a 3% failure rate.
I mean, 97% of your drives will not fail. But with 3%, it’s something that you don’t need. If you cooled your hard disk drives in a better way, you can avoid this higher failure probability.
So, the bottom line is 40°C, maximum 45°C should be maintained. Then you can enjoy the best reliability for your hard disk drive and the lowest expected failure rate.
So what causes internal temperature rises in hard disk drives?
It’s always the same thing: improper cooling.
So, as I said, the heat dissipation of the hard disk drive is not that high. But it needs a little bit of airflow, either with good convection in fanless systems or with a proper airflow around the hard disk drive if there are systems with fans.
In all of our experiments, with the equipment we have reviewed here in our laboratory in Düsseldorf, but also at the customer site, it’s always if a hard disk drive reaches more than 45°C, 50°C, 60°C, there was some problem with the cooling. Either a fan was missing, a fan was defective, the system was designed in the wrong way, blocking airflow, not allowing any convection or not allowing any airflow around the hard disk.
Let’s say, for example, if you have a two- or four-bay home NAS, some of these boxes have closed lids, closed doors in front.
It looks nice, but it blocks some of the airflow. So, what we suggest is to check the temperature of your drive. And if it is, on average, more than 45°C, try to remove obstructions, tune up the fans, or, in the worst case, if a system does not allow it, then buy another system which has better airflow.
What problems can be caused by not managing temperature in hard disk drives, and how can you mitigate these problems in the datacentre?
The first thing is to watch the temperature of your hard disk drives. Here, we are talking about the average temperature, say around 40°C.
Then let’s say if there’s a month of 50°C in summer, but three months of 30°C in winter, that is still okay. So, we’re always talking about the average temperature.
That means check the temperature of your systems from time to time. You can do this either in the graphical user interface [GUI] of your system. A NAS [network-attached storage] may have a GUI where you can see the temperature of the hard disk drive in the datacentre.
You just check the SMART values of your hard disk. So, the SMART value number 194 is the temperature in degrees Celsius. Check it from time to time. If it is below 45°C, it’s okay.
If it is more than 45°C on average, it may still work, but you can experience higher failure rates. If you are okay with higher failures because you cannot do anything in your system anyway, that’s still okay. But please be aware things are not as good as they could be.
If your system allows, remove obstructions, tune up the fans. Or, if you have built your own server or your own gaming PC and your hard disk drive is running at 50°C, 55°C, put in another fan.
It can be a small one around the hard disk drive or drives to provide at least a little bit of convection and airflow. This is a good thing you can do to your hard disk drive. You will enjoy a longer lifetime or lower failure rates.
It’s all about the airflow. A hard disk drive has to breathe somehow. It doesn’t need to be much, but there should be some airflow to take away the heat, to keep it in a reasonable temperature range.
And as I said, this is possible. There’s no reason why it should not be. It just has to be done by the supplier or manufacturer of your equipment. Or if you manage the thing by yourself, it can be done by yourself. But the very first thing is keeping an eye on it.
Read more about hard disk drives
- Storage technology explained – flash vs HDD: In this guide, we examine the differences between flash storage and HDD, the rise of NVMe and much denser formats such as QLC, and whether or not flash will vanquish HDD in the all-flash datacentre.
- Spinning disk hard drives present good value for many use cases: When it comes to storage media, all talk is of flash, but there are plenty of use cases where – far less costly – spinning disk can do a perfectly adequate job in the enterprise.