数据中心液体冷却的风险与回报

 

数据中心液体冷却的风险与回报

Why data centers must commit to liquid cooling

 

Jason Matteson

 

译 者 说

随着技术的发展与迭代,冷板式、浸没式、喷淋式三类液冷方案都取得了长足的发展,技术日趋成熟。但在传统的印象中,数据中心采用液冷方案比风冷方案的初投资高出许多,但从设备的全生命周期角度去看,液冷在节能、节地、节水等方面,达到甚至由于传统的风冷方案。并且液冷在性能方面更加有优势,这使得在面向高热密度的应用时,液冷成为了绕不开高效方案。

 

 

Iceotope产品战略总监Jason Matteson探讨了数据中心如何弥合液体冷却现在和未来之间的差距

Jason Matteson, Iceotope's director of product strategy, explores how data centres can bridge the gap between the present and the future of liquid cooling ga

 

说到数据中心的冷却,其实运营商很早就了解到液体是一种比空气更加高效的散热介质。

When it comes to data center cooling, operators have long known that liquid is a far more efficient medium for removing heat compared to air.

 

然而迄今为止,采用液冷方案并从中获益仍存在许多障碍,这也在某种程度上解释了为什么当今数据中心行业仍然处在转型阶段。

However, to date there have been many barriers to adopting and benefiting from liquid cooling strategies, which comes some way to explaining why the data center industry remains in transition today.

 

应用液冷的主要障碍是对其风险的识别。

The leading obstacle to the adoption of liquid cooling is the perception of risk.

 

身处一个历来保守的行业中,例如,可靠性胜过效率的情况下,风险规避导致了液冷技术的潜在收益与实际商业决策之间产生了巨大的鸿沟。

In an historically conservative industry, where, e.g., reliability trumps efficiency, risk aversion has opened up a chasm between the understanding of potential benefits which could be accrued with liquid cooling and the business decision that could deliver those gains.

 

 

 

那么行业最终如何消除这一鸿沟呢?

How can the industry finally bridge this gap?

 

风险认知 1.数据中心的液体泄漏
Perception of Risk #1; Leaking Liquids in the Data Center

 

传统的风险认知使得用户担心水或者其他液体可能存在的泄漏以及其对服务器和其他IT设备运行的干扰。

One classic risk perception has led to concerns about the chance of leakage of water or other liquids that could interfere with the operation of the servers or other IT equipment.

 

尤其是对于水直接通入芯片的解决方案,因为在这种方案中冷板安装在服务器内部。

This has been especially true of water direct-to-chip solutions, where cold plates are installed inside the servers.

 

不幸的是,泄漏的风险加上使用水作为冷却剂已被证明是一个真正的问题,也是实际损坏的来源。

The risk of leakage coupled with the use of water as a coolant has unfortunately proven to be a real concern as well as the source of actual damage. 

 

虽然没有被公开承认,但是不可否认的是,多年来,水造成的损毁和水泄漏一直是停机的主要原因。

While it is not openly admitted, it cannot be denied that water damage or leakage has been a major cause of downtime over the years.

 

在电气设备周边布置流动的水不仅存在风险,而且还有潜在的危险。

Putting running water alongside electrical equipment is not only risky, but potentially dangerous.

 

因此,一些运营商自然感觉部署液冷带来的潜在收益并不值得忽视其对宝贵的IT设备的风险,尤其是数据丢失、设备损毁和停机可能导致的成本累积。

Not unnaturally therefore, some operators have felt the potential benefits accrued by deploying liquid cooling don’t outweigh the apparent risk to valuable IT loads, especially where costs can accumulate from lost data, damaged equipment and downtime.

 

 

客户通常会简单地表示他们不希望在数据中心使用水。

It is not unusual for a customer to simply say that they don’t want to have water in the data center.

 

尽管如此,客户现场位于地板下方的专用空间和空气处理器之中无疑将会有液体循环存在。

Despite this, the customer’s site would undoubtedly have liquid circulating the technical space underfloor and through their air handlers.

 

通常情况下,冷冻水环路置于其中,用来冷却房间内循环的空气。

Typically, a chilled water loop would be in place, cooling the air that was moving around the room.

 

然而,在应用过程中客户并没有对此做出反馈。

However, the customer journey didn't reflect this understanding at the time. 

 

风险认知 2基于风冷是否能够满足数据中心?
Perception of Risk #2; Isn’t Air-based Data Center Cooling Sufficient?

 

接下来的问题通常是数据中心运营商是否真的需要液冷——风冷无法满足吗?

The next question often became about whether the data center operator really needs liquid cooling - isn't air sufficient?

 

答案是:尽管几十年来我们很可能已经出色的应用了风冷,当下的现实问题是,风冷已经不足以保证数据中心负载的可靠运行了。

The answer is that while we might well have managed with air for decades, the reality today is that air cooling is no longer sufficient to ensure the reliable operation of data center loads.

 

ASHRAE TC9.9近期发表的一篇论文《液体冷却在主流数据中心的出现》正是在强调这一观点。

A recent paper published out by ASHRAE TC9.9, The Emergence of Liquid Cooling in Mainstream Data Center highlights exactly this point. 

 

 

ASHRAE TC9.9技术委员会(通常被认为是数据中心配电和冷却趋势及最佳实践方面的全球领先权威机构)表示:新技术刚刚出现,需要液体直接进入机柜和芯片。

New technologies are just on the horizon which requires liquid direct to rack and chip, says the ASHRAE TC9.9 technical committee – generally considered to be a leading global authority on data center power and cooling trends and best practices.

 

它解释道:对液体冷却的需求不仅受到芯片密度和应用性能的驱动,而且行业目前也迫切需要为液冷技术做好准备。

Not only is the need for liquid cooling being driven by chip density as well as application performance, it says, there is also an urgent need for the industry to prepare for liquid technologies now.

 

例如,高性能计算(HPC)社区已经将液体冷却作为一种行业规范部署了一段时间。

By way of example, the high-performance computing (HPC) community has for sometime deployed liquid cooling as an industry norm.

 

这项技术表面上并没有对正常运行时间或可用性产生不良影响,而且能够使得CPU和GPU以最高性能稳定运行,同时最大限度地降低其耗散功率。

With no seeming ill-effect on uptime or availability, the technology has enabled CPUs and GPUs to be reliably run at maximum performance while minimizing its leakage power.

 

在数据处理速度和数量很重要的市场领域,即使是很小百分比的改进也能产生真正的影响。

This in a market sector where data processing speed and volume matters, and even fractional percentage improvements can make a real difference.

 

在大多数的行业从业者中,越来越多的人意识到液体冷却带来的重要价值。

Amongst most industry players, there is growing awareness of the significant value which liquid cooling brings.

 

例如,液冷技术可以实现更高密度机架的部署,从而提高数据中心空间的利用率;它有助于提高数据处理能力和性能(数据中心的实际工作)、提高能源利用效率、降低碳排放的影响,并为热能回收提供机会。

For example, it allows higher density racks to be deployed making white space more productive; it facilitates greater data processing capacity and performance (the real work of the data center), increased energy efficiency, lower carbon impact and the opportunity for heat recovery.

 

采用精密浸没、底盘级液体冷却等技术的试点表明,当安装在电介质冷却环境中时,IT设备也更可靠,需要更少的人工服务干预。

Pilots with technology such as precision immersion, chassis-level liquid cooling, suggest that IT equipment is also more reliable and requires fewer manual service interventions when installed in dielectric-cooled environments. 

 

今时不同往日
The future doesn't have to resemble the past

 

技术不断进步。

Technology has evolved, as it does.

 

与此同时,数据中心冷却技术已经达到了一个拐点,这意味着已经发生的变革即将涌现。

At the same time, data center cooling technology has reached an inflection point, meaning the change that was already happening is now gathering momentum.

 

所有主流厂家推出的最新芯片组和相关的解决方案越来越多的需要液冷解决方案。

The newest chipsets and related solutions being launched by all the major vendors, increasingly require liquid cooling solutions.

 

许多网站和文档已经说明了这一需求。

Many websites and much documentation already state this need.

       

潜在的挑战是,对风冷系统进行粗略的改进,比如增加更多的风机或者降低硬件的密度,不是长久之计。

The underlying challenge is that superficial changes to the air cooling system, like adding more fans or reducing the hardware density, will no longer be enough.

 

例如IBM刚刚发布的2纳米芯片组,其性能比目前最先进的7nm处理器高出45%,进一步推动了分析、人工智能和机器学习的兴起。

IBM, for example, has just announced 2nm chipsets promising 45% higher performance than today's most advanced 7nm processors, further empowering the rise of analytics, AI and machine learning. 

 

从高级分析、机器学习、人工智能到5G、物联网(loT)和边缘计算,企业的首席信息官和IT策略师面临着解决新挑战的任务。

Progressively, enterprise CIOs and IT strategists are increasingly being tasked with solving newer challenges - from advanced analytics, machine learning and artificial intelligence to 5G, the internet of things (IoT) and edge compute.

 

 

这意味着数据中心运营商和支撑他们的数字基础设施必须支持更高的功率需求和机架密度。

That means the data center operators and digital infrastructure that underpin them must support much higher power demands and rack densities too.

 

对许多公司而言,数据中心领域面临的问题不再是“如果”,而是液冷何时以及如何开始变得普遍。

For many companies, the question facing the data center sector is no longer “if”, but when and how will liquid cooling start to become ubiquitous.

 

数据中心的冷却成本已经上升,不仅对超大型计算机、大学超级计算机和其他先驱来说,对主机托管提供商和“标准”企业级服务器机房来说也是如此。

The cooling stakes in the data center, not just for hyperscalers and university supercomputers and other pioneers but for colo providers and 'standard' enterprise-level server rooms, have risen.

 

越来越热、越来越耗电的系统和硬件的冷却以及散热陡然间变得更加具有挑战性。

Cooling and dissipating heat from increasingly hot and hungry systems and hardware has suddenly become much more challenging.

 

数据中心国际专家培训

 

 ATD设计课程
没有做过运维的设计师就无法设计出好用的数据中心吗?ATD课程将彻底解决这个问题
 
 AOS运维课程

运维管理的直接目标是优秀的运维人员而非设备本身;全球权威的AOS运维管理专家课程将带您深刻理解运维管理的本质

 
 ATS管理课程

正确和系统地了解Tier分级体系会提升数据中心的项目投资回报、减少业务中断风险,ATS课程将全面带您学习Tier知识,帮助您有效提升企业的运营指标和内外部客户的满意度。

 

点击图片查看课程排期

 

 

 

扫码回复【uptime培训】了解课程详情

 

这在很大程度上是因为这个缓慢而保守的行业仍然坚持使用效率极低的风冷系统。

This is largely because this slow-moving and conservative industry remains wedded to the use of hugely inefficient air-based cooling systems.

 

液冷风险-报酬方程的解
Solutions for the liquid risk-reward equation

 

HPC(高性能计算机)专家和高端研究人员等电力用户多年来一直沉浸在液体冷却的世界中。

Power users like HPC specialists and high-end researchers have been immersed in the world of liquid cooling for years now.

 

一定程度上是因为液冷解决方案的能力已经发展壮大。

Partly as a result, the range of liquid cooled solutions has developed and grown.

 

最近,适应于主流数据中心和主机代管数据中心更具体的创新要求也已经出现。

Latterly, more specific innovations to suit the requirements of mainstream and colocation data centers have also become available. 

 

如今,有几种不同的关键的液冷技术,每种技术都有自身的优缺点。

Today, there are a few different liquid cooled technologies that are key, each with its own pros and cons.

 

虽然热管理工程师强调直接芯片冷却技术是一种更优秀的方案,但并不是每一个客户都需要CPU或是芯片级别的性能。

While thermal engineers argue that direct-to-chip is the superior approach, not every customer will need that level of performance right down to CPU or chip level. 

 

对于其他人来说,浸没式液冷解决方案可以提供充分的热管理方面的改善。

For others, an immersion cooling solution might deliver sufficient thermal improvement.

 

有个问题我们没有提到,是关于IT设备的可维护性以及空白空间中持续运行的IT设备们的移动、添加和替换。

One question we have not mentioned is around the serviceability of IT equipment and the ongoing process of IT moves, adds and replacements in the white space.

 

如果你的服务器被浸没在浸没槽之中,您如何安全地使用它们进行必要的维护,这对保修又意味着什么?

If your servers are submerged in an immersion tank, how do you access them safely for essential maintenance and repairs and what might that mean for warranties? 

 

最新的方法是精准浸没,或者称之为机架级精密浸没,它本质上是液冷方法的混合体,结合了完全浸入和直接芯片冷却的二者的优点。

The latest innovation is precision immersion, or chassis-level precision immersion is essentially a hybrid of liquid cooling approaches combining the best features of full immersion and direct-to-chip.

 

机架级的优化和精准冷却,都是注重于用户的便利性,它可以使用标准设备机架改装到数据中心中,从而降低风险和复杂性,同时还可以简化部署。

Optimised at chassis-level, precision cooling is focussed on user convenience; it can be retro-fitted into the data center using standard equipment racks, reducing risk and complexity whilst simplifying deployment. 

 

弥补预算缺口
Bridging the budget gap

 

数据中心基础设施领导者施耐德电气近期发布的一片白皮书调查了大型数据中心浸没式液冷与风冷的成本,典型的成本差异并不想人们预期的那么大。

A recent white paper by data center physical infrastructure leader, Schneider Electric investigated the capital costs of immersive liquid cooling versus air cooling in the large data center and the typical cost differential is not as large as one might expect.

 

事实上,施耐德可以证明,在一个2MW的数据中心中,10kW的同类机架密度下,资本支出的需求大致是相同的。

In fact, Schneider were able to demonstrate that at a like-for-like rack density of 10 kW in a 2 MW data center, the CAPEX requirement is broadly the same. 

 

 

因为节省空间是液体冷却的一个关键优势,施耐德还量化了在相同容量的数据中心以每机架 20 kW和每机架 40 kW部署液体冷却时的资本支出差异,分别节省了10%和14%的资本支出。

Because compaction is a key benefit of liquid cooling, Schneider have also quantified the capex difference when liquid cooling is deployed at 20 kW per rack and 40 kW per rack for the same capacity data center, respectively achieving 10% and 14% CAPEX savings.

 

日前,多学科工厂公司Cundall发布的一篇白皮书,指出资本支出能够节省高达20%。

More recently, a white paper published by multi-disciplinary engineering company, Cundall, suggests that up 20% savings in CAPEX should be expected.

 

弥补缺口和部署液冷的选择涉及到一个复杂的方程式,其中也包含许多多变的因素,但其通常会被归结于美元、英镑或欧元。

The choice to bridge the gap and deploy liquid cooling involves a complex equation with a lot of moving parts, but it will usually end up coming down to the dollar, pound or euro.

 

在可以预见的未来,更新的机架级技术将会继续满足高密度CPU和GPU的冷却需求。

Newer chassis-level technologies will continue to meet the cooling requirements of high-density CPUs and GPUs for the foreseeable future.

 

更为重要的是,这项技术可以节省空间、提高冷却效率并降低总投资成本。

What’s more, the technology can deliver space savings, efficiency savings and lower TCO.

 

对于精明的数据中心运营商来说,机架级精密液冷是一个合理的工程和商业案例。

For the astute data center operator, chassis-level precision liquid cooling is a sound engineering and business case.

 

今日话题讨论

 

你认为阻碍液冷落地的主要风险是什么?

 

留言区写下您对今日话题的见解,截止下周五从留言区评论中精选最相关话题的同学将会获得由山特(SANTAK)独家赞助,价值168元的【SP-BOX系列防雷防浪涌插线板】一个。(中奖名单将在下周五留言区置顶公示)快来留言区发表自己的观点吧!

 

 

 
 
深 知 社
 
 

 

翻译:

薛运

维谛技术 制冷空调非标产品开发

DKV(DeepKnowledge Volunteer)计划成员

 

校对:

贾梦檩

阿里云 暖通工程师

DKV(DeepKnowledge Volunteer)计划精英成员

 

公众号声明:

本文并非官方认可的中文版本,仅供读者学习参考,不得用于任何商业用途,文章内容请以英文原版为准,本文不代表深知社观点。中文版未经公众号DeepKnowledge书面授权,请勿转载。

 

推荐阅读:

 

 

首页    暖通    数据中心液体冷却的风险与回报
Iceotope产品战略总监Jason Matteson探讨了数据中心如何弥合液体冷却现在和未来之间的差距
设计
管理
运维
设备
电气
暖通
控制
碳中和
储能

深知社