In this guest post, Mark Fenton, product manager at datacentre software visualisation provider Future Facilities, sets out why mixing monitoring and computational fluid dynamics technologies could help datacentre operators manage their sites better
Datacentre monitoring uses data from sensors to visualise what is happening in the datacentre in real-time and is of paramount importance to guaranteeing performance amid changing demands.
This is key as it is the only reliable way for operators to be alerted to critical issues caused by failure or human error. Without this awareness the datacentre would be at risk of outages.
In a colocation set-up this has consequences for the operator’s reputation, ability to retain customers, maintain profitability, and – in an enterprise environment – it impacts the facility’s ability to support reliable business operations. However, monitoring can’t tell operators the full picture, leaving the datacentre vulnerable. And the reason for that is because monitoring systems are blind to what’s beyond their sensors.
With typical datacentre air temperatures ranging from between 20 to 40 °C (68 to 104 °F), there can be significant changes in temperature over very short distances. Variations of 10 to 20 °C (18 to 36 °F) across a server inlet is not uncommon when a cabinet is suffering from poor airflow management for example. This means that even if sensors are in the recommended positions – at the top, middle, and bottom front door of the rack (as recommended by ASHRAE T.C. 9.9) – there’s still a large area that isn’t being monitored. Not least within the rack itself.
Some monitoring systems will attempt to fill in the gaps between the sensors with an estimated temperature map of the whole facility. The logic being that if one sensor is at 20°C and another at 30°C, it must be 25°C in the middle. Unfortunately, airflow is not that uniform and the true temperature variation between the sensors could be drastically, or dangerously, different.
What’s more, when sensors trigger alarms, these only tell operators there is a problem and not the root cause. For example, they can flag that something is overheating, but not where the hot air is coming from.
Without this critical information it is very challenging for datacentre managers to resolve the problem. As a result, they will likely have to resort to increasing their cooling fans or turning down their overall cooling temperatures, which are expensive and inefficient solutions.
Furthermore, monitoring can’t predict the impact of failure. Operators will plan for failure, and ensure that the cooling and power infrastructure is sized properly for their required level of redundancy.
In this context, redundancy referring to the amount of backup infrastructure the datacentre will need to take over in the event of failure. However, this is just theory, and you cannot be sure that the redundancy extends down to every piece of IT. For example, it’s not uncommon to see sites where a single cooling unit is supporting critical load and operating at full capacity. This is highly problematic in the event of failure or the unit having to be turned off for maintenance.
As such, it’s clear that facilities managers need a tool that can test resilience – the load supported under failure – rather than redundancy. Monitoring is unable to do this as it can only reveal the status in the datacentre. Therefore, without another tool in place, the only way to identify resilience would be to purposely trigger failure, which isn’t a practical solution.
Although monitoring systems accurately record the present and past state of the datacentre environment, this cannot and should not be used to project the impact of future changes such as new hardware deployment. This is because the relationships between IT infrastructure and cooling paths aren’t linear; past performance doesn’t indicate how future scenarios will unfold. Consequently, monitoring places operators in a position where they have to take a ‘try it and see’ approach, which creates a significant level of risk when making changes to a mission critical facility.
Therefore, it’s clear that monitoring delivers tangible benefits, however, it also has fundamental limitations. A dedicated Computational Fluid Dynamics (CFD) tool – which analyses fluid flows using numerical analysis and data structures – can help overcome these gaps.
Firstly, CFD can be used to calculate the environmental conditions at every point in the datacentre facility. This means that specific conditions – be they temperature, humidity, pressure, or air speed – can be investigated at any location in precise detail. For example, operators can follow airflow on its path through the datacentre and identify why thermal challenges are arising.
Secondly, with a CFD model facilities managers can simulate resilience safely; modelling turning off cooling units or failing circuit breakers to see how these changes will impact the IT infrastructure. They could even simulate failing the entire cooling system to see how long they would have to find a resolution in a disaster situation.
And finally, CFD takes the guesswork out of future performance by empowering users to simulate potential scenarios using the data from both monitoring and asset management systems. For instance, it can show the impact of new servers in the virtual world, while the physical servers sit on the loading bay in the real one. Meaning any potential problems can be identified and resolved before they even occur.
Monitoring solutions also offer significant benefits to the CFD model. By connecting the two together, the model can constantly import monitored data such as IT loads, air temperatures or cooling controller information to calibrate the model against reality. This way, the CFD model will remain reflective of its real counterpart.
To keep a datacentre running full-time means operators need control. To have control they must know what’s happening right now, why it’s happening, and what will happen in the future. Only a combination of monitoring and CFD simulation can provide the answers to all three, making them a powerful duo for ensuring datacentre performance under increasing demand.