Keeping Data Centers Up And Running
Emerson Network Power, which provides services dedicated to protecting data center infrastructure through its Liebert Services business, offers a list of “Best Practices to Avoid Data Center Failure by Human Error” in centers of any size.
“The data center is like a living entity, constantly growing and changing, with many moving parts,” said Ahmad Moshiri, director of power technical support for Emerson Network Power’s Liebert Services business at the Emerson Network Power Learning Center. “There is no doubt that human errors in the data center causes a great deal of downtime and some of these can be avoided by adhering to some simple steps.”
Shielding Emergency OFF Buttons: Emergency OFF buttons are generally located near doorways in the data center. Often, these buttons are not covered or labeled, and are mistakenly shut off during an emergency, which shuts down power to the entire data center. This can be eradicated by labeling and covering emergency OFF buttons to prevent someone from accidentally pushing the button.
Documented Method of Procedure: It is the answer to many unforeseen human errors. This documented step-by-step, task oriented procedure mitigates or eliminates the risk associated with performing maintenance. Do not limit the procedure to one vendor and ensure backup plans are included in case of unforeseen events.
Correct Component Labeling: If protection devices, such as circuit breakers, are not labeled correctly, this can have a direct adverse impact in keeping data center load up. To correctly and safely operate a power system, all switching devices must be labeled correctly, as well as the facility one-line diagram to ensure correct sequence of operation. Procedures should be in place to double check device labeling.
Consistent Operating of the System: Sometimes data center managers get too comfortable with operating the systems, do not follow procedures, forget or skip steps, or perform the procedure from memory and inadvertently shut down the wrong equipment. It is critical to keep all operational procedures up to date and follow the instructions to operate the system.
Ongoing Personnel Training: Ensure all individuals with access to the data center, including IT, emergency, security, and facility personnel, have basic knowledge of equipment so that it’s not shut down by mistake.
Secure Access Policies: Organizations without data center sign-in policies run the risk of security breaches. Having a sign-in policy that requires an escort for visitors, such as vendors, will enable data center managers to know who is entering and exiting the facility at all times.
Enforcing Food/Drinks Policies: Liquids pose the greatest risk for shorting out critical computer components. The best way to communicate a data center’s food/drink policy is to post a sign outside the door that states what the policy is, and how vigorously the policy is enforced.
Avoiding Contaminants: Not keeping the indoor air quality clean can cause unwanted dust particles and debris to enter servers and other IT infrastructure. Much of the problem can be alleviated by having all personnel who access the data center wear antistatic booties, or by placing a mat outside the data center. This includes packing and unpacking equipment outside the data center. Moving equipment inside the data center increases the chances that fibers from boxes and skids will end up in server racks and other IT infrastructure.
Other posts by