Monitoring Conditions In Data Centers

Maximizing uptime and efficiency while minimizing cost requires analyzing a large number of highly dynamic variables.

By Marc Cram

For most people, predicting the future by looking at the past is at best a difficult exercise in making a fool of oneself. If at the start of the NFL football season, I predicted that one of the teams will win the Super Bowl, I have a 1 in 32 random chance of being right. Out of stubborn loyalty to my former hometown, I pick the Tennessee Titans (originally the Houston Oilers). I’ve never been right. On the other hand, if I had said at the start of the past 20 seasons that Bill Belichick, Tom Brady, and the Patriots would win, it turns out that I would have been right 30% of the time, the Patriots having won six times during that interval. That’s not by chance, and the success of their system has been proven by the ability to bring new players into the fold and turn them into a cohesive team.

True visionaries can mentally follow cause and effect in a logical progression to provide what on the surface appears to be great leaps in judging the direction of future outcomes. In the pandemic aftermath of 2021, most people can relatively accurately state whether or not they will be doing better financially, and what technology they will be using at the end of the year. But being accurate when looking beyond 2021 becomes more difficult. Handling all the variables is too complex for most of us mortals.

data center
Photo: Getty Images/AnuchaCheechang

Stock analysts known as “quants” use mathematics, finance, and computer skills to look at countless variables and attributes of stocks to determine when to buy, when to sell, and how to maximize profit while reducing risk. They are true “fortune tellers.” Factors that are believed to be market leading indicators weigh heavily in the equations and algorithms used by quants. And quants take advantage of massive quantities of historical data combined with sophisticated data analysis running in a data center to predict the future value of a stock. The visionaries of the quant world rely on artificial intelligence and high-speed computers to crunch the numbers and make many leaps of logical progression to identify opportunities and deliver value for investors.

In like fashion, most enterprise and hyperscale data center operators are looking for an edge when it comes to uptime and efficiency. Maximizing both while minimizing cost requires analyzing a large number of highly dynamic variables and then picking a course for operations over the next few minutes, hours, days, weeks, months, and years. This is not a task for the faint hearted or the slow moving. Rapid fire changes in data coming from sensors spewing updates hundreds or thousands of times per second can throw off the best of control mechanisms. Temperature inside and outside of the data center, delta T across the IT loads, humidity of external and internal air, atmospheric pressure, air pressure differential across air filters, water temperature, time of day, day of the year, instantaneous compute load, compute load trend cycles, instantaneous power draw, daily power draw trends, grid capacity utilization, fan speed, air flow, and myriad other variables can all be sent to a machine learning/artificial intelligence system to quickly respond, adjust, and optimize the workloads and systems of the data center.

Then…

In 2014, Google’s Joe Kava announced the company was using artificial intelligence (AI) to improve their data center efficiency. “We had a standalone model that we would run, and it would spit out recommendations, and then the engineers and the operators in the facility would go and change the setpoints on the chillers, and the heat exchangers, and the pumps, and all that to match what the AI system said,” Kava said in an interview with Data Center Knowledge. “That was manual.”

Recently…

In a follow up interview in 2018 with Google to investigate the impact of AI in the data center, it was heard from Kava that the company was “aggressively rolling out what Kava referred to as a ‘tier-two automated control system.’ Instead of simply making recommendations, this tier two system makes all the cooling-plant tweaks on its own, continuously, in real time.”

“Under a recent tornado watch, the AI system managing the cooling plant at one of Google’s data centers in the Midwest changed the plant’s settings in a way which the facility’s human operators found counterintuitive. After closer scrutiny, however, it did what had to be done to save energy under those specific circumstances.

“Weather conditions that make a severe thunderstorm likely to form include a big drop in atmospheric pressure and dramatic temperature and humidity changes. Weather plays a big role in the way some of the more sophisticated data center cooling systems are tuned, and the software running Google’s cooling system recalibrated it to take advantage of the changes — no matter how small the advantage.”

In 2018, Google data centers monitored 21 different variables to optimize their PUE [power usage effectiveness] into the range of 1.09 to 1.11, even under lightly loaded conditions such as when opening a brand-new data center. What Joe and the Google team have not discussed is how many and what type of sensors were distributed across locations inside and outside of the data center to accomplish this feat of wizardry. When you consider that they might be measuring temperature, humidity, and airflow inside 50,000 or more servers in a single facility, continuous monitoring, logging, and analysis can take up a sizable amount of bandwidth, storage, and compute resources. Compound that by having disparate sensor systems for monitoring and managing the plumbing, cooling, lighting, and power using BACnet, MODBUS, Ethernet, Zigbee, Wi-Fi, and other protocols — and you begin to see that pulling all these data sources together can be a challenge.

Sensors of all shapes, sizes, and protocols are available on the market to help companies like Google measure the variables used in their AI-based operation models. Battery-powered wireless sensors featuring a 1-way protocol, IP-addressable wired sensors that daisy-chain together via standard patch cords, sensors soldered directly to the server motherboard or embedded inside the CPU itself are all available for deployment in the modern data center. All of the major cooling, lighting, and power infrastructure companies provide a comprehensive set of sensor offerings targeting environments like Google’s data centers.

Looking Ahead…

Like the quant stock picker, projecting the data center operation into the future then requires even more calculations. Modeling what-if scenarios under varying weather conditions, power availability, new IT hardware deployments, new cooling technologies, and operating loads requires a great deal of inference to be applied.

The leaders in the data center community like Google will tell you that the Internet of Things (IoT) is going to grow exponentially over time, putting strains on bandwidth and computational infrastructure around the globe. Edge computing resources will be put in place to deal with the deluge, but the lack of availability of knowledgeable people to support the proliferation of equipment will require that everything be remotely monitored. To keep things as simple as possible, the data center operators, especially the public cloud vendors, are likely to try and standardize sensors, monitoring hardware, and software analytics tools used from the edge to the core. AI will be required to determine when to perform preventive maintenance on mission critical infrastructure, to orchestrate workloads across systems, and to optimize operating conditions for efficiency. Major vendors of infrastructure systems are anticipating this need, putting forth a variety of sensor systems and networks that coordinate and integrate across platforms. Global specialists in electrical and digital building infrastructure are all delivering technology today that enables the data center operator to remotely collect sensor data, manage that data, and determine what steps to take based on that data.

Where will the data center industry and its infrastructure monitoring go from here? Look for the Open Compute Project (OCP) to provide some insight. Already members have efforts under way to support open silicon, open firmware, open networking, open storage, open rack managers, and open AI/compute accelerator modules. Can a standard for sensors and for AI algorithms be far behind? They already work with IPMI, SMBus, PMbus, and several others. I’d be surprised if they don’t drive an open sensor standard soon, just as several OCP members are now pushing for standards on the home automation front.

Cram is director of new market development for Server Technology, a brand of Legrand. A technology evangelist, he is driven by a passion to deliver a positive power experience for the data center owner/operator. He earned a bachelor’s degree in electrical engineering from Rice University and has more than 30 years of experience in the field of electronics.

Click here for more facility management news related to technology.