Questions You Should Be Asking Your Hosting Provider
Internet data centres probably look much the same to the casual eye; mission critical computer systems and servers racked up into cabinets, forming single rows with access corridors between.
They will have some sort of environmental controls, such as air conditioning and fire suppression, as well as redundant or backup power supplies and redundant data communications. But not all Internet Data Centres are created equal. The devil is in the detail of what they provide.
For enterprises that demand high availability, or as close as possible to 100 per cent uptime, it is critical to first conduct a detailed audit of potential hosting providers.
This involves digging below the surface of marketing material to ensure adequate infrastructure and procedures are in place to maintain the required uptime of hosted server environments.
The main aim of an Internet Data Centre (IDC) is to eliminate as many single points of failure as possible within the infrastructure and systems of a facility. This is achieved by ‘redundancy'.
However, redundancy costs money, because instead of one system, you are purchasing two or even multiple systems. Potentially, a provider can keep adding additional layers of redundancy to reduce the risk of downtime or failure to an absolute minimum. However, due to the costs associated with providing additional redundancy, it becomes a risk versus cost argument.
Some systems, for instance the EFTPOS system, must be available 100 per cent of the time. Consequently, an enormous amount of money is spent to ensure redundancy and fault tolerant systems are in place to get as close to 100 per cent uptime as possible.
The two key issues for an IDC is the reliability of its power supply and its cooling systems. Servers cannot operate without either of these systems being fully functional, 24hrs x 365days. If cooling systems go down, server rooms heat up extremely quickly and ultimately, server shut-downs must be instigated to avoid hardware and system damage. For companies running 24x365 internet-based operations, such as, e-commerce, emergency server shut-downs are not a viable option.
Therefore, IDCs must have reliable, as well as redundant power and cooling systems.
Redundancy is provided by either:
- - two devices or systems running and functioning in parallel with one another, or
- - a backup standby device or system that will automatically come into operation if the primary device/system fails.
The ability to scale both of these systems (power and cooling) is also incredibly important with the trend toward denser, more highly concentrated server hosting environments (ie, the growing popularity of blade servers that have a high concentration of CPUs in a small space.
As a result, blade servers can cause power and cooling issues for some IDCs.
Virtually all power into hosted machines is converted into heat that must be dissipated. Some older IDCs have power systems incapable of handling the higher loads of modern server environments, and cooling systems unable to cool cabinets containing densely packed servers or blade server environments. While the demand for blade server hosting is increasing, the supply of IDCs that can handle them is lagging behind. So, if you want to run blade servers, be sure your IDC can cool them effectively.
If you are intending to use an Internet Data Centre, below is a list of questions you should ask.
1. Business Stability and Environment Ownership:
a. How long have you been in business?
b. Is your company stable and profitable, and viable over the longer term?
c. Do you own your own Internet Data Centre? (ie, Do you have direct control over the facility.)
d. What size of client do you service? Is the business sector your focus, or is residential mass market your priority?
Comments:
A number of hosting providers are in fact ‘Virtual Providers' - they do not own the IDC their gear is housed in and therefore do not have direct control over infrastructure designed to support it.
The type and size of client being served will provide an indication of the IDC's capability. This will give you some indication the IDC has met the stringent requirements of such organisations - larger organisations will likely undertake detailed audits of IDCs before signing up.
2. Power System Redundancy:
a. Do you have a mains power system that is scaleable and capable of managing additional power requirements as I grow my hosted environment?
b. Do you have centralised Uninterruptible Power Supply (UPS) systems and backup generator systems in place?
c. For High Availability customer requirements, do you have UPS and generator redundancy?
Comments:
A number of existing Data Centres were built some years ago when the power requirements of server equipment was more modest and the server power use variation was not great (5 per cent or less between high and low CPU utilisation). The average power utilisation of server systems has been trending up for a number of years as the processor power of CPUs increases. Power Management Technologies now deployed in servers and other communication equipment has resulted in a huge variation in power draw between a server operating at full capacity and when it is in idle mode. For example, blade server power draw variation can be as much as 80-90 per cent.
A Data Centre, therefore, has to have the capacity to increase the total amount of power it delivers to its server rooms, and its switchboards and backup generation has to be equal to the task. Some IDCs simply can't deliver, as their legacy power systems are not scaleable. It's very difficult to take systems offline for upgrades when you have customers reliant on 24x365 availability.
UPS and backup generators capable of carrying the full load of all hosted server equipment is mandatory for an IDC to ensure continuity of supply. Server equipment housed in High Availability facilities must be connected through redundant dual power supplies to the cabinet, dual UPS (run in parallel), and ideally dual back-up generators.
3. Key Infrastructure Maintenance:
a. Do you regularly maintain/check/test your Mains Power Supply systems and have service contracts with qualified personnel in place?
b. Do you regularly maintain/check/test your UPS and have service contracts with qualified personnel in place?
c. Do you regularly maintain/check/test your backup generator and have service contracts with qualified personnel in place?
d. Do you regularly maintain/check/test cooling systems and have service contracts with qualified personnel in place?
Comments:
Mains power, UPS, generator and cooling systems require specialists to effectively maintain and service these systems. Agreements containing suitable service levels and SLAs should be in place with external specialists to ensure that regular maintenance and testing is undertaken.
4. Physical Structure and Location:
a. Do your co-location rooms have concrete floors, exterior walls and concrete ceilings?
b. Are your co-location rooms on the ground floor, in a basement or on the first or second floor of the building?
c. Is the IDC in an area prone to flooding? (ie, is it near to a river or stream, at the bottom of a valley, close to the harbour's edge.)
d. Is there plumbing running above the server cabinets?
e. Is the IDC directly in the landing path of an international airport?
f. Are there any fuel dumps, major gas pipelines, petrol stations, liquefied petroleum storage tanks or any highly combustible substances stored nearby?
g. Do you have raised flooring in your co-location rooms?
Comments:
Ideally, a co-location room should have concrete floors so cabinets can be seismically secured to the floor to mitigate earthquake issues - stop server cabinets toppling over. Concrete exterior walls mitigates security risks, and concrete ceilings are preferred rather than tin roofing due to the risk of leaks.
It is important that the co-location rooms are on the ground floor as server equipment is heavy - a single cabinet can have 800kg or more of gear in it, so a 100 cabinet co-location room floor would have to be able to support 50-100tons of weight. For this reason, co-location rooms on the first and second floors of buildings designed for general office use run the risk of exceeding weight limits.
Co-location rooms situated in basements run the risk of being quickly flooded in the event of leaks or external floods, and often basement ceilings have mains water supplies running across them, meaning the co-located servers are seriously at risk if a leak occurs.
Obviously being in a flood-prone area is not a good idea - sensitive electronic equipment and water do not mix well.
It might be rare, but planes do crash and the threat is significantly higher if the IDC is near a busy international airport, in the flight path of planes landing and taking off.
Likewise, customers who take their hosting environment seriously will not consider an IDC close to a site storing highly combustible substances due to the risk of serious damage if an incident ever happened.
Raised flooring in the co-location rooms means that if water does ever leak or get into these rooms, the cabinets and server equipment is unlikely to come into contact with a minor flood.
Ideally, the IDC should have all water piping servicing High Velocity Air-conditioning Systems (HVAC) under the raised flooring, and have water sensor and alarm systems in place to detect potential leaks.
5. Connectivity to Outside World
a. How many upstream international bandwidth providers do you use?
b. Are you load-balanced across your upstream providers?
c. How many fibre optic circuits do you have into your IDC?
d. Do you have primary and secondary circuits for redundancy purposes?
e. Are your circuits through one carrier or do you use multiple carriers?
f. Are circuits into your IDC in separate ducting or trenches out on the road and do they come into the building at different entry points?
General Comments:
For an IDC, reliable internet connectivity is critical. International bandwidth redundancy is very important as approximately 80 per cent of bandwidth used by New Zealanders is international (ie, residential and business customers accessing sites offshore rather than locally). It is only a matter of time before an international provider has short term issues on its network due to congestion or hardware or software failure. So it is important that an IDC is not overly dependent on one single upstream but is ‘load-balanced' across multiple providers. If one fails, bandwidth is still available to hosted customers via others.
The physical connectivity from an IDC to the outside world is also important. Having multiple circuits through a variety of carrier networks is ideal with a primary connection and secondary (fail-over connection) for each carrier's circuit.
Ensuring that each fibre optic circuit is physically separated will reduce the likelihood of all circuits being accidentally cut. Again, this is one of the details that mission critical enterprises will require from a potential hosting supplier. They may request street level maps of fibre optic circuits to ensure ‘physical diversity'.
Physical diversity basically means that individual fibre connections are contained in separate trenches, which means they cannot all be accidentally dug up by a roading contractor.
6. Alarms, Monitoring, Fire Systems and Access:
a. Do you have monitoring and alarms on key parts of your infrastructure?
b. Do you have Closed Circuit TV monitoring IDC entranceways and co-location rooms?
c. What smoke detection systems do you have in place?
d. Do you have intrusion monitoring and alarms in place and are these monitored?
e. How is authorized access to the IDC managed?
f. Do you have flood monitors and alarms in your co-location space and in other key areas?
g. In the event of a fire, how are fire services alerted and are they conversant with dealing with a fire in a co-location space?
h. What kind of fire fighting equipment is in place?
General Comments:
All key equipment and systems in the IDC need to be monitored so engineers and building managers are alerted by alarms when a device or system starts, or even looks likely, to fail. Multiple types of alarm systems should be in place so that the IDC is not reliant on one system or person in the event of a failure. Ideally, the equivalent of a full Building Management System should be in place, so every part of the building and physical infrastructure is monitored and alarmed.
For security purposes there should only be a limited number of entranceways to co-location rooms, and these entranceways should have proximity card or biometric access systems in place. Alarm code systems should only let authorized personnel enter the IDC. There should be a minimum of two solid doors - either side of a ‘mantrap' - that authorised personnel must pass through using their proximity cards.
All entranceways should also be monitored by CCTV, and for high security rooms, CCTV cameras should operate down every cabinet row in the actual co-location room.
Police checks should be run on all staff working in the Network Operations Centre and on staff with direct access to server equipment. Access to co-location rooms should be limited to staff directly responsible for supporting hosted clients. Controls and procedures need to be in place to govern the activity of customers when accessing the server rooms.
Some co-location facilities run water-based cooling systems which means there are copper water pipes running to process coolers that reside inside the co-location space or in adjacent corridors. While these copper pipes lay beneath the level of the raised floor, it's important that water detection alarm systems are in place to immediately warn of a water leak.
Not all New Zealand data centres have raised floors, so water detection alarms are even more important. The water detection alarm systems should also be near key electrical equipment, such as the centralised UPS systems.
Co-location rooms should have Very Early Smoke Detection Apparatus (VESDA) systems in place. These systems constantly monitor air particles, raising the alert of a potential fire before one actually starts. The VESDA system typically raises an alert directly with the fire department, as well as NOC and building management staff.
The fire department needs to know how to access the building, and how to respond to fires involving co-location equipment without causing more problems than they are solving.
Some co-location facilities have automated gas fire suppressant systems in place, however, these are very expensive to implement and maintain, and are not feasible in very large co-location rooms or those designed with a high stud for cooling. In the absence of gas suppressant systems, manually operated CO2 fire extinguishers should be readily available in key areas of the IDC.
6. Network Redundancy:
a. Do you operate a full mesh core network?
b. Do you run a 100mbps, 1gbps or 10 gbps core network?
c. Are you able to provide Hot Standby Router Protocol services available to customers?
d. Do you make BGP routing available to your customers who require this level of redundancy?
Comments:
A ‘full mesh network' is important to eliminate single points of failure within the core network of a Data Centre. A full mesh network means each device is connected to every other device in the core network providing multiple paths through the network. This greatly enhances the uptime of the overall network and increases the availability to the outside world for customers hosted in the facility.
The bandwidth requirements of hosted enterprises are on average doubling every 18 months worldwide. This places huge demands on a hosting facility's networks that must cope with increases in network traffic. IDCs need to scale their core network capability to meet this requirement.
Hot Standby Router Protocol (HSRP) and BGP routing are important considerations as they help to eliminate reliance on the router that connects the hosted enterprise to the core network of the IDC.
7. Disaster Recovery:
a. What disaster recover contingencies do you have?
b. Do you have insurance to cover a major catastrophe?
c. Can you provide multi-site hosting for redundancy purposes?
Comments:
An IDC should have disaster recovery plans to mitigate the risk of both minor and major catastrophes. These plans should cover eventualities such as a failure in single system, through to the complete destruction of the facility, such as through a major earthquake.
Insurance cover should be in place to cover the cost of restoring the facility and loss of business revenues if these occur.
The IDC should be able to provide offsite hosting for high availability customers, so that if the primary site becomes unavailable, hosted machines at the secondary site will still be available to the outside world.
To achieve this, an IDC will likely have an arrangement with another data centre to host important back-up network equipment offsite, as well as equipment for customers requiring offsite disaster recovery hosting. It may also have backup equipment for various services hosted overseas for redundancy and disaster recovery.