Building a software-defined data centre
Data centres are changing, becoming software-defined data centres, with IT organisations at varying levels of maturity. How is...
Faced with an influx of data and the need to effectively extract its value, companies need to be able to store and analyse data efficiently. Fortunately, many solutions are available to help achieve these goals.
If the 20th century was characterised by the race for oil, then it’s clear that data has been the new oil rush since the early 2000s. If proof were needed, Google, a company that specialises in managing data, replaced ExxonMobil in 2014 as the number-one enterprise in terms of global market capitalisation.
As information has grown in significance, data-related concepts and tools have emerged and developed to help companies manage their data assets. “With the arrival of the IoT, machine learning and the democratisation of big data, companies have been forced to update their IT infrastructure to get full value from their data,” says Ludovic Levé, managing director France, Data Centre Group.
The Big Data Executive Survey, carried out by New Vantage Partners LLC, shows that 80% of companies consider their investments in big data to have been successful and almost half report a return on investment. More precisely, 37% of Fortune 1000 companies have invested over $100 million in big data over the past five years.
“These projects won’t succeed, however, unless there is optimum use of the infrastructure and the storage. Nowadays, many different technologies, widely distributed storage, scale-out NAS and object storage in Flash or HDD bays, are sitting side-by-side to meet the various requirements,” explains Nicolas Mahé, the head of server products at Lenovo France.
Distributed storage, which is often associated with the use of Hadoop Distributed File System (HDFS) and grid computing, offers several advantages over traditional storage solutions. The first is that it is very cost-effective, as it can be deployed directly on existing disks and on the servers.
Another advantage is its proximity to data-crunching environments, as each node in a cluster contains both storage and processing capability. It is worth noting, however, that distributed storage requires a major configuration effort for optimum use; for example, for big data purposes.
Clustered network-attached storage (NAS) is also worth considering, though unlike distributed storage, data is not processed directly on the nodes in the cluster. Instead, all the power of the CPUs is given over entirely to storage functions. The performances delivered by scale-out NAS have meant that it is now widely used to process very high data volumes, such as in the high-performance computing (HPC) and scientific computing fields. The advantage is that, as well as increased capacity, the addition of a node translates as greater bandwidth.
These are the types of infrastructure that Lenovo has installed at Cineca, an Italian research consortium, for its HPC solution. Its storage system is based on NeXtScale System and System x, and IBM’s General Parallel File System (GPFS) packages on the software side. Scale-out NAS also includes more functionality (backup, restore, compression, replication, and so on) than conventional distributed storage solutions. One slight drawback is that the performance of these systems drops if they are used to store small files. It is, therefore, preferable to use them for large files.
Although often considered too costly for use with big data, Flash storage adoption is growing. In certain cases, enterprises are looking for real-time results and this is something that Flash bays can provide, especially when processing volumes are lower than those of Big Data. These bays also prove useful when organisations are using in-memory databases (IMDB), such as SAP’s Hana. And if software-defined storage (SDS) solutions like vSAN are used in hybrid bays (SSD and HDD), Flash can also provide very satisfactory results in terms of costs and performance.
Finally, object storage brings its own advantages. As with the first two types of storage mentioned, it is based on an array of distributed nodes and is able to store data sets containing billions, or even trillions, of objects. Capable of being replicated on sites in different geographical locations, these sets are used to handle different queries on the same data in various parts of the world. Lots of manufacturers are attempting to offer solutions based directly on object storage: Lenovo, for example, has forged a partnership with Cloudian to offer dedicated appliances.
All these technologies demand clever use of storage management solutions. A simple storage area network (SAN) is no longer enough. Use of SDS is becoming essential, especially with VMware vSAN solutions (or similar) and open-source solutions such as OpenStack or Red Hat Ceph Storage. Using these technologies can enable your data centre infrastructure to become increasingly scalable and, above all, easier to administer. SDS can also ensure direct integration of functions such as backup, restore, compression, and merge and purge.
Finally, object storage also provides greater data protection. The issue of data protection is one of the main concerns of ITDs, especially because the forthcoming European General Data Protection Regulation (GDPR) promises to be extremely rigorous. Other than conventional software-defined networking (SDN), security information and event management (SIEM) and infrastructure protection solutions, some manufacturers are proposing security ‘by design’, guaranteeing security at the very heart of the hardware. This is something that Lenovo offers, by building firmware integrity verification solutions directly onto its chips, thus preventing malevolent code from running on the systems.
To meet the costs and performance targets they have set themselves, companies today have a wide choice of solutions for processing and storing large volumes of data. “It is important to ensure that the infrastructure is fully scalable to process large volumes of unstructured data. One thing is certain, whether they opt for simple distributed storage, which is easy to implement and fairly cheap, or for vast NAS scale-out infrastructures, which are costlier but provide better performances, they have to develop their systems if they are going to extract maximum value from data,” concludes Levé.