Software to the switch: Redefining data centre networking
Networking is catching up with the rest of data centre technology. Lenovo’s innovative Cloud Network Operating System (CNOS),...
Technology drives data centre design, and a primary goal for data centre architects is to build facilities so that services and applications are always available. While their designs are based on the current technologies, near-future development expectations also influence their intents. So what can a consideration of such developments since the 1980s – when the data centre as we recognise it began to appear – tell us about the future evolution of these processing powerhouses?
In the 1990s, a typical data centre design was based on the ‘one application per server’ rule, with no premeditated application and server coordination. Over time, however, applications demanded more resources from the infrastructure than single servers could provide, and so external storage systems were hooked up to allow applications able to share larger datasets. These storage systems proliferated and acquired greater connectivity, thus turning the 1990s – in data centre terms – into the decade of the storage area network, or ‘SAN’.
The early 2000s saw an important shift take place. Technology from VMware brought a new concept, the ‘virtual server’. This enabled two or more server volumes (called ‘virtual machines’ or ‘VMs’) to coexist on the same physical server hardware. Then, as networking and SAN technologies matured, data centre architects were able to place the operating system (OS), the application and the data on the SAN.
By combining virtualisation and SAN features, it was then possible to avoid the ‘big server with internal storage’ silo dilemma. Data, application, VM and OS were all consolidated onto the SAN. Next up, server vendors introduced ‘blade’ servers – physically limited units installed as multiple separate components in a blade chassis with little or no local storage, but still capable of supporting applications and ‘hypervisors’ (the software that creates and runs VMs).
The blade server model meant that greater densities of processing power could be consolidated into available data centre rackspace. Consolidation thus became the prevalent trend, and soon customer preferences refocused on vendors able to deliver the fastest, safest and most efficient external storage solution. The role of the server, meanwhile, became somewhat marginalised.
While vendors battled to sell the best storage solutions to the mainstream market, leading web-based companies found themselves forced to rethink their specific data centre designs in order to meet escalating demands from internet-driven use cases. These companies arose from an all-digital base, and were sufficiently deep-pocketed to afford to do things their own way.
The most well-known of these rethinkers – Amazon, Google and Facebook – needed ‘mega data centres’ with thousands of servers and huge data storage capacities. They needed massive scale and flexibility, and reliability was a primary requirement. So they developed software solutions to be installed on standard x86 server nodes to meet these demands – solutions now being made generally available as hyper-converged infrastructure (HCI).
This is where the story really starts.
In its simplest configuration, an HCI solution comprises a number of standard x86 server nodes in a cluster connected with a high-performance network. Each node uses a hypervisor layer and a virtual storage appliance (VSA).
The VSA is a controller that combines all disk resources in the cluster into one pool of storage capacity. It also provides advanced features such as failover and data integrity. With some HCI solutions the controller is a virtual machine, while others have the controller as part of the hypervisor layer. A summary diagram of a hyper-converged node architecture looks like this:
The impact of these innovations was marked. When virtualisation solutions became generally available in the mid-2000s, it changed data centre economic dynamics by introducing more bang per buck. By simplifying the technology even further, HCI also brings down ownership costs.
The complexity of the proprietary storage controllers is moved to software in the server nodes, and the overall architecture is made more streamlined. Hyper-convergence consolidates the management tasks, and in some solutions just about everything can be controlled from a single management console.
Implemented correctly, hyper-converged solutions can increase both overall system performance and specific response times experienced by users. We have seen improvements of up to 40 per cent over the performance levels typical to traditional data centre architectures.
VSA scale-out is a contributory factor in operating speed gains: the fact that there is a controller in each node, plus the close proximity of the controller to the hypervisor and the data, all helps boost performance.
In a hyper-converged solution, nodes are connected and storage is distributed across nodes. This leads to a lot of crisscrossing network traffic. The communication must be secure, reliable, and high performance. The network’s design and switches – like Lenovo’s Top-of-Rack (ToR) range, with their the advanced features such as converged enhanced ethernet and software components – are critical to fully optimised HCI solutions.*
In a hyper-converged data centre infrastructure, features such as availability and performance can be assigned to specific components. In a scenario with three nodes, for example, two switches could be assigned to work together as one to deliver performance as and where needed.
Configuring the switches with a Cisco Inter-Switch Link means the two physical switches will show and function as one logical switch. The MAC (media access control) tables for the nodes are shared between the switches, and each node can be attached to both physical switches to further increase redundancy gains. A Virtual Link Aggregation Group (VLAG) configuration, also known as a Multi-Chassis Link Aggregation Group (MC-LAG), could then be assigned, allowing the 10Gb node links to be active, increasing bandwidth and balancing the traffic over the two links.
* Lenovo RackSwitch G8124E, Lenovo RackSwitch G8272 and Lenovo RackSwitch G8296
With the amount of inter-node traffic in HCI infrastructures growing as the number of VMs and the amount of data grows, substandard network performance is another increasingly important concern for designers of data centre infrastructure, especially with respect to latency. High-performance computing (HPC) solutions were first to introduce advanced communication using remote direct memory access (RDMA) over the networks. This is a direct memory access from the memory of one server into another, without the involvement of either machine’s OS.
Microsoft Storage Spaces Direct (S2D) now supports RDMA over converged ethernet (RoCE) networks. It is an ethernet protocol offloading the CPU and increasing the latency efficiency of the networks. RDMA will be supported by all HCI solutions eventually; Lenovo rack switches are supporting RDMA already. In such architectures, the internal network will be very ‘flat’, hence fast and efficient.
It’s not hard to predict that other new networking developments will be software-defined – architectures where it is possible to ‘daisy chain’ server nodes in a hyper-converged solution. We can think of a further development of networking adapters, networking software and server hardware, all allowing the servers to handle the networking protocols. Switches will still be needed for external communication, but their role will also change.
In an HCI solution with all functionality based in software, it is easier to consolidate information about the setup and to make changes from a single management console. The overall solution will be simpler and more straightforward to grow and manage, and easier to maintain.
When such consolidated management is part of the solution, the upgrade process, root-cause analysis (should errors occur) and the addition of new services will be less demanding of the IT team’s precious time. The Prism tool that is part of the Nutanix solution is an example of this ethos. In a single console – ‘single pane of glass’, as it’s sometimes known – they are able to perform multiple tasks (add services, say, upgrade firmware or troubleshoot the system) with a single click – one-click management, as Nutanix calls it.
As described earlier, the primary focus for data centre expansion was replacing the SAN, then integration of the hypervisor layer in the nodes also providing storage. A comparison of the currently available HCI offerings reveals differences in features. Some offer a better integration, some have better hardware utilisation, some offer better management. Clearly, a direction of travel has emerged, and will continue to advance data centre functionality. However, until now we have seen only the initial phases. The next developments will be about cloud integration, and I’ll explain why.
HCI and cloud are correlated, and we can use Microsoft S2D, Azure and Azure Stack to illustrate how that correlation works. The software-defined storage solution Microsoft S2D is the storage engine in Microsoft Azure public cloud and Microsoft Azure Stack on-premises cloud. By adding Microsoft Hyper-V to the mix, it forms the foundation for the HCI solution from Microsoft. Microsoft S2D is a part of Windows Server 2016 – the Microsoft Azure Stack on-premises cloud solution will be launched to the general market later in 2017.
Some of the large cloud providers using hyper-converged solutions in their large scale-out data centres were referenced earlier. With Azure and Azure Stack, Microsoft says that cloud is a platform rather than a place. It is talking about a hybrid solution, where customers can choose to run their service on or off-premises. It will have the same look and feel wherever it is run, and customers can choose where to run their workloads based on requirements, security, cost or other factors. This is all doable because Microsoft is using S2D and Hyper-V in both Azure Stack (on-premises) and the Azure cloud service.
Nutanix is another instructive example of these emergent opportunities. Nutanix describes its solution as the “Enterprise Cloud Platform”: with the integration of the management platform providing one-click management, it arguably has the best integration of features for data centre managers that’s now available. Until Azure Stack launches officially, Microsoft deployment remains very much command-line-interface based.
VMware, meanwhile, is growing its cloud services from the strong position that the vendor occupies in the data centre. The addition of its vSAN solution is a strong first step towards a seamless integration with cloud, and we can expect that VMware will also soon be able to provide a ‘single pane of glass’-style management console experience.
From their respective market positions, all vendors discussed here seem to be going in the same unequivocal direction: steadily toward HCI as the foundation for the data centre of the future. This puts us at Lenovo in the IT equivalent of pole position. We work closely with most HCI software vendors, and know how the best node for any workload should be installed and configured.
It’s true that software should be run as expected in any vendor-specific environment, but it is still important that hardware is tested and certified for the exact solution it is intended to support. The specification and architecture for a Microsoft solution is not necessarily the ideal configuration for a VMware solution. Lenovo’s hardware management solution, XClarity, is designed for integration into the management consoles of HCI stacks (including those from Microsoft, VMware and Nutanix), which we know gives customers an assured one-console management experience.
The central role of the server node in the software-defined data centre demands increased availability, performance, management and security from the platform. Lenovo leads in all these areas. For instance, with security, dual Trusted Platform Modules (TPM) secure both applications and firmware.
Independent evidence supports Lenovo’s market pre-eminence in HCI and other segments. The latest from Technology Business Research, for instance, shows our Lenovo servers receive top customer satisfaction reviews. And ITIC’s 2017 Global Server Hardware, Server OS Reliability Report found that Lenovo servers deliver the highest availability among all x86 vendors. Both reports provide more evidence that Lenovo is openly prepared to embrace the software-defined future that is now unfolding for data centres around the world.