0% found this document useful (0 votes)
59 views26 pages

Modern Infrastructure: Is It Big Data, or Fast?

The document discusses how companies are increasingly demanding real-time analysis of big data in order to make faster business decisions. This is shifting the big data market from a focus on volume to speed of analysis. New technologies are emerging that can process large amounts of data instantly to enable real-time analytics and applications.

Uploaded by

komal.kothari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views26 pages

Modern Infrastructure: Is It Big Data, or Fast?

The document discusses how companies are increasingly demanding real-time analysis of big data in order to make faster business decisions. This is shifting the big data market from a focus on volume to speed of analysis. New technologies are emerging that can process large amounts of data instantly to enable real-time analytics and applications.

Uploaded by

komal.kothari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

MI

Modern Infrastructure

Citrix Synergy and Modern Infrastructure Decisions Summit


Creating tomorrows data centers

EDITORS LETTER

#HASHTAG

Surface Tension

Twitter on
#BigData

CONTAINER SECURITY

SURVEY SAYS

Containment
Strategy

Platform as a Service

TWO QUESTIONS

NETWORKING

Contained Chaos

Striving for Simplicity

TECHNICALLY SPEAKING

THE NEXT BIG THING

Dell Pickle

Spark-ing the
Big Data Bonfire

Is It Big Data, or Fast?


Big data is growing at a rapid pace,
and enterprise IT wants its info
faster than ever.

NOVEMBER/DECEMBER 2016, VOL. 5, NO. 10

EDITORS LETTER

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag

Surface
Tension

Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

merely confusing is now exceptionally


complex. Data center networks are no longer just a maze
of physical cables, but a tangled web of overlays and firewall rules. Database management is more than ensuring
you have enough capacity as companies collect increasing
volumes of data and expect real-time analysis.
And yet users demand simplicity; they expect the underlying infrastructure to be invisible. Executives want
IT to function like a utility. When they turn on the tap,
they dont care about the plumbing required to deliver
the water; they simply want it to work. This is the tension
threatening to plunge IT shops into chaosto build and
support ever more complex data center infrastructure
while making it appear effortless.
Nowhere is this tension more clear than the growing
demand to store and digest big data. However, its not
just about big data volume today. Its about doing something with that dataand doing it now. Contributor Paul
WHAT WAS ONCE

Korzeniowski explores this tension and some of the tools


emerging to help IT professionals in Is It Big Data, or
Fast?
Curiously, technologies that once aimed to streamline
operations have sometimes led to more complexity. Networking overlays, for example, have given operators the
ability to steer traffic and create logical resource pools, but
they also come with additional management overhead. In
Striving for Simplicity, contributor Ethan Banks explains
the problem with managing and maintaining todays
multi-layered data center networks and challenges the
idea that more overlays are the answer to the complexity
problem.
In todays world, the IT professional must handle complex problems with a steady hand. When things go right,
everybody just sees water flowing from a tap. n
is Executive Editor of Modern Infrastructure.
You can reach him at [email protected].
NICK MARTIN

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

BIG DATA

DATA GENERATION IS increasing at mind boggling rates, and

Is It Big Data,
or Fast?

The need to provide employees with immediate access


to information is reshaping the big data market.
BY PAUL KORZENIOWSKI

ALEUTIE/FOTOLIA

the evidence surrounds us: 21 million tweets and 9 billion


email messages are sent every hour. Soon, even more
information will be created. Sensors will collect performance data on items like light bulbs, personal medical
devices will monitor insulin rates and inventory will be
tracked as it moves from place to place.
As a result, IDC expects data volumes to double every
two years and reach 40 zettabytesa zettabyte equals one
million petabytesin 2020. How corporations use that
information will be the difference between business success or business failure. Consequently, enterprises want
to do more than collect information for future analysis;
they want to evaluate it in real time, a desire that is dramatically changing the data management market.
Recently, big data systems have been all the rage. In
fact, IDC projects that the market will grow at 23.1% annually and reach $48.6 billion in 2019. These systems have
been gaining traction for a few reasons. They allow organizations to collect large volumes of information and use
commodity hardware and open source tools to examine
it. Businesses are then able to justify deployments that are
much less expensive than traditional proprietary database
management systems (DBMSes). Consequently, Hadoop

HOME
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

Home

clusters built from thousands of nodes have become common in many organizations.

Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

INSTANT GRATIFICATION

But with competition increasing, management is placing


new demands on IT. Knowledge is power, and knowledge
of yesterday is not as valuable as knowledge about whats
happening now in manybut not allcircumstances,
said W. Roy Schulte, vice president and analyst at Gartner.
Businesses want to analyze information in real time, an
emerging term dubbed fast data. Traditionally, acting on
large volumes of data instantly was viewed as impossible.
The hardware needed to support such applications is expensive. But the use of commodity servers and the rapidly
decreasing cost of flash memory now make it possible for
organizations to process large volumes of data without
breaking the bank.
In addition, new data management techniques are
emerging that enable firms to analyze information
instantly.
Transaction systems include checks, so only valid transactions take place. A bank would not want to approve two
transactions entered within milliseconds that took all of
the money out of a checking account. Analytical systems

HIGHLIGHTS

collect information and illustrate trends, such as more


time being taken by call center staff handling customer inquiries. By linking the two, corporations could build new
applications that perform tasks, like instantly approving
a customers request for an overdraft because the clients
payment history is strong.

POTENTIAL PRODUCTS ABOUND

Traditional data management systems worked only with


data at rest, storing information in memory, on a disk,
in a file, in a database or in an in-memory data grid and
evaluating it later. Emerging products, which are being
labeled as streaming systems, work with data in motion,
information that is evaluated the instant it arrives.
The new streaming platforms use a variety of different
approaches, all with the goal of delivering immediate
analysis. You dont need any DBMS at all for some fast
data applications, noted Gartners Schulte.
In certain cases, traditional DBMS products have
morphed to support the new functionality. For instance,
Hadoop is a parallel data processing framework that has
traditionally relied on a MapReduce job model. Here,
data is collected. Batch jobs, which take minutes or hours
to complete, eventually present the data to users for

The need to provide employees with immediate access to info is reshaping the big data market.

IDC expects data volumes to double every two years and reach 40 zettabytes in 2020.

Companies want to evaluate data in real time, completely changing the data management market.

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

evaluation. To address the demand for fast data, Apache,


the group in charge of Hadoop standards, created Spark,
which runs on top of Hadoop and provides an alternative
to the traditional batch MapReduce model. Spark supports real-time data streams and fast interactive queries.
However, in some cases, a business wants to store a copy
of the event stream and use it for later analysis.Originally
developed by the engineering team at Twitter, Apache
Storm processes unbounded streams of data at a rate of
millions of messages per second. Apache Kafka, developed
by engineers at LinkedIn, is a high-throughput distributed message queue system designed to support fast data
applications. In addition, start-ups are adding streaming
functionality to NewSQL and NoSQL and trying to bridge
the traditional antithetical desires of processing fast data
and storing information for later analysis.

PERISHABLE INFORMATION

The various products are built to handle the increasingly


high-volume, complex information streams that businesses generate. Examples of the new data sources include
news feeds, web clickstreams, social media posts and
email. The new, usually unstructured data (information
that does not fit neatly into the rows and columns found
with a traditional DBMS), is growing at higher rates than
structured data. Consequently, these emerging data repositories ingest large amounts of diverse informationas
many as millions of inputs every second.
Large companies already have thousands of event
streams running in the organization at any given moment.

How Big Is Big Data?


TECHNICALLY SPEAKING, A zettabyte is 1021 bytes, or

a billion terabytesbut data capacity numbers


that large can be hard to digest. In more practical terms, a zettabyte could be expressed as the
equivalent of 152 million years of high-definition
video. Forty zettabytesthe level IDC expects data
volume to reach in 2020split among the 7 billion
people on earth equals about 5.7 TBs per person. n

Traditionally, firms want to tap into that information and


improve operations. Such moments are perishable insights, urgent business risks and opportunities, that firms
can only detect and act on at a moments notice, according
to Mike Gualtieri, vice president and principal analyst at
Forrester Research.

EQUIPPED FOR BATTLE

Wargaming.net is an online multiplayer game developer


that was founded in 1998. In June 2015, the company was
searching for a fast data platform to support a 100 node,
200 TB application running on Apache Spark. The gaming
firm evaluated products and opted for Cloudera because
of its strong customer support, according to Sergei Vasiuk,
development director at Wargaming.net.
The gaming supplier began deploying the product in
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

Home
Editors Letter

June 2015 and had it operating by the end of the year.


Currently, a dozen fast data applications support functions
that include securing network connections and analytics
outlining how well individual players fare.

Is It Big Data,
or Fast?

BUILDING A NEW INFRASTRUCTURE


Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

But most companies are not ready for fast data for a number of reasons. First, the applications are complex and
hard to build, and almost always combine data from multiple sources. For example, a telecom support application
links incoming call data with customer profiles in order to
enable contact center agents to upsell, offering coupons
for an upgrade to a higher-tier calling plan.
For such connections to be coded into the applications,
new development tools are needed.
Developers require products that create streaming
flows and rely on new runtime platforms. These tools are
now being developed. However, as with any first generation solution, the current products lack the amenities
found with older, more experienced productssuch as
robust development, testing, integration and administration functionality. Often, the customer has to write
the code to deliver that functionality themselves, which
increases development time as well as the complexity in
system maintenance.
Because the streaming platforms and development
tools are new, many IT departments have little to no
experience with them. Firms need to develop different
design practices than those now used with traditional IT
architectures. Employees then need to work with them in

order to understand how to best write these applications.


Finally, the products are expensive. While many suppliers offer free, limited-function entry systems, pricing
for fast data products can quickly rise into the hundreds
of thousands. These projects can cost millions of dollars
to deploy once other factors, like development tools and
labor costs, are factored into the equation.

GET READY FOR THE FUTURE

The advent of mobile and social media is altering customer expectations: They want answers right now. So,
firms need to collect more information and move immediately in order to satisfy customer demands.
As noted, new data sources are gaining traction, and
their future is bright: The internet of [things] is the single biggest driver for fast data demand, Gartners Schulte
stated. By 2020, more than half of all new application
projects will incorporate somelarge or smallamount
of IoT processing. Some of these will use a stream analytics platform, and the remainder will write stream processing into the application code.
Businesses are generating new types of information,
and the volume of data is growing significantly as a result.
The need for immediate analysis is becoming a more
common expectation. Consequently, a variety of new
platforms have emerged and are jockeying for acceptance
as fast data reshapes the data management marketplace. n
PAUL KORZENIOWSKI is a freelance writer who specializes in modern
infrastructure issues. Email him at [email protected].

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

zzzzzz

Home

#Hashtag
Twitter on #BigData

Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity

Patrick
Demaret
@patrickdemaret
90% of the data that
exist in the world today
has been created in the
last 2 years! #bigdatavia @SenseableCity
@crassociati at #SFE16

Ernest Moniz
@ErnestMoniz

Kevin
McIsaac
@DataScienceAUS

Were working with


the @VA to help heal
those who served,
poring through the VAs
enormous data sets.
#CancerMoonshot
#BigData

There is a massive
gap between #bigdata
& customer insights
that needs to be bridged
with #predictiveanalytics. What are
you focused on?

Obama: My Successor
Will Govern a Country
Being Transformed by
AI. #AI #IA #BigData
#DataScience

Chiru
Bhavansikar
@AskChiru

Dario Olivini
@xflofoxx

Andy D
@HITstrategy

Self data analysis or


guided dashboards?
What will you prefer?
#BusinessIntelligence
#BigData #futureBI

Focus on the most


impactful use cases
4 HC Analytics rather
than how 2 use #BigData the technology!
#PutData2Work

Nils Schaetti
@nschaetti

Technically
Speaking
The Next Big Thing

Marco Bossi
@marcoatbossi
#IoT success at all
levels will depend
on infrastructure,
#security & #BigData
#analytics capabilities

Where the domain


of #InternetOfThings
stops, #BigData domain
begins.

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

CONTAINER SECURITY

Containment
Strategy
How to prevent kernel breakouts and ensure
the security of container-based workloads.
BY JIM OREILLY

PIXELPARTICLE/ISTOCK

hottest software idea in IT. The concept of sharing the common parts of a virtual machine
the operating system, management tools and even applicationsreduces the memory footprint of any image
by a large factor, while saving the network bandwidth
associated with the loading of many copies of essentially
the same code.
These are not trivial savings. Early estimates of containers supporting three to five times the number of instances
that traditional hypervisor-based approaches can manage
are proving true. In some cases, such as the VDI market,
results are even better. Notably, containers can be created
and deployed in a fraction of the time it takes for a VM to
be made.
The economics of containers are substantially better
than hypervisor virtualization, but containers are a new
technology, and that immaturity still has to incorporate
the (sometimes painful) lessons we learned for hypervisor virtualization. While many organizations are working
with containers at some level, most would admit to serious
fears in the area of security.
The most critical issue is multi-tenancy protection. Hypervisors have been around well over a decade and, more
importantly, have gone through several CPU lifecycles.
CONTAINERS ARE THE

HOME
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

Intel and Advanced Micro Devices have added features to


prevent cross-memory hacks in hypervisors.
These features protect systems with no local storage
drives, but the advent of local instance stores used to accelerate apps meant that erased data, and especially SSD
data, could be exposed across tenants. Hypervisor vendors
rose to the occasion and now flag blocks as unwritten. If an
instance tries to read a block that hasnt yet been written,
the hypervisor sends all zeros and hides any data in that
block.
Without these safeguards, hypervisors would be unsafe,
and any tenant could gain access to the data in other instances. Sharing a single operating system image across all
the containers in a server nullifies the hardware memory
barrier protection, and the storage issue is caught up in
the immaturity of container development.
These two problems can be mitigated by running the
containers inside a VM. This protects the containers in
one VM from a cross-memory exploit of another VM,
while the hypervisor provides the needed storage protection. All the major clouds and hypervisors, including
Azure, now support containers.
The layers of protection can come at a cost, though,
since during a scale expansion, the VM may have to be
created prior to building containers. These technologies

HIGHLIGHTS

operate on different timescales, with container deployment times measured in milliseconds against VM build
times measured in seconds. Even with the restrictions,
VM-based containers are a viable approach and by far the
most common method of deployment. There has been
considerable work toward developing lightweight hypervisor deployments. For instance, Intel Clear Containers is
a hypervisor built for containers. Among other things, it
uses kernel same-page merging to securely share memory
pages among VMs to reduce memory footprint. VMware
also supports containers, whichgiven its dominance in
virtualizationis important for operational confidence
in many shops.

USER ACCESS CONTROLS

Beyond cross-tenancy exploits, containers carry privilege escalation risks, where an application getting root
access can gain control of the host. Another problem is
a denial-of-service (DoS) attackor even a bug-driven
issuewhere all of the resources are grabbed by a single
container. These problems are much easier to create in
container environments. Docker, for instance, shares its
namespace with the host system, which would never be
the case on a hypervisor-based system.

Containers are immature and must incorporate the lessons learned from hypervisor virtualization.

Security risks exist, but the worries of ops pros are beginning to soften as the market matures.

Multi-tenancy protection remains the most critical issue for hypervisor security.

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

Escalation attacks can be mitigated by starting containers to run as ordinary users rather than root. In Docker,
this means adding -u to the start command. Removing
SUID flags bolsters this fix. Isolating namespaces between containers limits rogue apps from taking over the
server storage space. Control groups can be used to set
resource limits and to stop DoS attacks that suck up server
resources.

Security Concerns Fall


Security risks still exist, but the worries of operations
professionals are beginning to soften as the market
matures. This year, drastically fewer respondents
than last year cited container security as a
moderate or major concern.

61%
POISONED IMAGES

Another major protection from attack, especially in the


private cloud, is to use trusted public repositories for
images. Today, almost all mashups use code from many
public repository sources to build out an app. This saves
enormous development time and cost, so it is an essential practice in a world of tight IT budgets. Still, plenty of
horror stories abound. Even high-class repositories can
propagate malware, and there are recent cases of such
code remaining hidden in popular libraries for years.
Code from trusted repositories is still vulnerable to virus penetration. Image control is a critical problem with
any environment today, not just containers. Use trusted
repositories that support image signatures, and use those
signatures to validate by loading the image into the library
and later into a container. There are services for signature
validation, and proper use of these services will limit your
exposure to malware penetration. Docker Hub and Quay
are two trusted public container registries.
Another problem that is not particular to containers,
but is far more serious because of the typical microservices

11%
2015

2016

SOURCE: CLUSTERHQ

environment used with containers, is that users are expecting control over the app mashups that they run. This
makes repository control a bit like herding cats. A forced
user-level validation of both source identification and
signature checking is a critical need for a stable, secure
environment. The Docker security benchmark on GitHub
is a utility that checks for many of the known security
problems. Building ones own validated image library for
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

10

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

users to access may be the ultimate embodiment of this


approach, but the downsides are that coders are hard to
discipline and a lack of agility by the librarians will almost
guarantee that the library would be bypassed. Any repository has to have very tight security with limited access
for obtaining images from third-party repositories and no
write access for the user base. To facilitate image library
management, you can use Dockers registry server or
CoreOS Enterprise Registry.

VALIDATION AND ENCRYPTION

Version control of applications and operating systems


in an image is a related vulnerability area. Again, its not
just a containers question, but the very rapid evolution of
containers and Dockers tendency to tear down operating
code structures and replace them as new releases are made
requires strong discipline. Misaligned versions often offer
up an attack surface.
Image scanner tools are available to automate image
and file validation. Docker has Nautilus, and CoreOS
offers Clair. The issue of encrypting images at rest or in
motion is still somewhat unsettled. Generally, the more
encryption of vulnerable files is practiced, the more protection we have against malware. For images, encryption
should protect against virus or Trojan attacks on the image code and, when coupled with signature scanning and
validated playlists, should keep malware at bay. Here, containers have a distinct advantage over hypervisors. With
many fewer image files flying around, the encryption and
decryption load on servers is much lower.

The container daemon is another point of vulnerability. This is the process that manages containers and,
if compromised, could access anything in the system.
Limiting access is the first step in securing the daemon.
Encrypting transfers is essential if the daemon is exposed

THE MORE ENCRYPTION


OF VULNERABLE FILES IS
PRACTICED, THE MORE
PROTECTION WE HAVE
AGAINST MALWARE.
to the network, while using a minimal Linux configuration
with only limited administrative tools reduces the attack
surface.
With all of the above, we have the basics of a secure
environment for creating containers and building their
images. Protecting container stacks while they are running is still a work in progress. There is a good deal of
startup activity in the monitoring area, which provides a
first step in controlling what is typically a volatile instance
mix. CAdvisor is a good open source tool for monitoring
containers, while Docker offers the stats command. On
their own, these tools guarantee data overload, so their
output needs to be fed into a suitable analytics package
such as Splunk or Sumo Logics Docker Log Analysis App.
By establishing a baseline of normal operations, any traces
of abnormal access due to malware can be spotted and
remediated.
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

11

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag

Containers have evolved a long way in just a few years.


A secure environment does require strong discipline, but
its notable that the container community is leading in
areas such as image management.
We can expect hardware support for containers to arrive in one or two generations of CPUs, matching capabilities available for hypervisors today. When that happens,
we can expect a move to simplified bare-metal container

deployments. There will be further challenges, such as


the incorporation of software-defined infrastructure into
the containers ecosystem. But containers are on an equal
footing with VMs from a security perspective and way
ahead on agility and speed of deployment. n
JIM OREILLY is a consultant, focusing on storage,
infrastructure and software issues.

Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

12

Home

Survey Says
Platform as a Service

Editors Letter
Is It Big Data,
or Fast?
Hashtag

D Which of the following services

D What are the top business objectives

is your company using?*

for your upcoming PaaS purchase?*

Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

50%

45%

44%

Software
as a
service
(SaaS)

Infrastructure
as a
service
(IaaS)

Platform
as a
service
(PaaS)

14%
None

*RESPONDENTS COULD CHOOSE ALL THAT APPLIED; SOURCE: TECHTARGET


QUARTERLY SURVEY ON PLATFORM AS A SERVICE; N= 1,332

45

Percentage of respondents who say


a self-service provisioning feature
was a critical factor in their
PaaS vendor choice

SOURCE: TECHTARGET QUARTERLY SURVEY ON PLATFORM AS A SERVICE; N= 267

48%

Reduce software development costs

38%

Shorten development cycles

24%

Respond faster to changing business needs

22%

Implement standard platforms

21%

Allow for continuous, incremental improvements

17%

Improve geographic availability

17%

Improve software quality

14%

Improve governance and compliance

14%

Reduce environmental impact

12%

Augment internal resources

Strengthen DevOps support


12% 
*RESPONDENTS COULD CHOOSE UP TO THREE ANSWERS;
SOURCE: TECHTARGET QUARTERLY SURVEY ON PLATFORM AS A SERVICE; N= 565

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

13

TWO QUESTIONS

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy

Contained
Chaos
IT wants familiarity, and VMware
aims to simplify containers.
BY NICK MARTIN

Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

IT, with customers and


vendors alike scrambling to integrate containers into existing infrastructures. One such company trying to keep
up with the latest trend is VMware, which promised to
give customers a familiar way to manage containers. We
spoke with Kit Colbert, VP and GM of the cloud-native
apps business unit at VMware to get an update on the
companys projects.
CONTAINERIZATION IS DISRUPTING

Whats the status of vSphere Integrated


Containers (VIC) today?

Its been an interesting evolution. Last year, we announced


this prototype called Project Bonneville, which was sort
of a proof of conceptthe idea that this integration of
Docker containers into the vSphere runtime is achievable.
The original VIC was completely Docker-centric. We

actually modified the Docker engine.


Unfortunately, due to the architecture
of the Docker engine, we werent able to
do what you might call a clean integration in a way that leveraged well-defined
APIs. We had to get deep in there in
order to make it work the way we wanted, to get the deep
integration into vSphere. That approach wasnt ideal, but
normally that would be OKexcept for the fact that in
pretty much every major Docker release, theyve rewritten
major subsystems of the Docker engine and refactored
sections of code. So each time there was an update, we
had to throw out a bunch of code, rewrite and reintegrate
with Docker. It was a big effort over the latter half of last
year just to stay abreast of everything that was happening
in the Docker world, and it just wasnt tenable. The second
reason is that it was overly Docker-centric.
Today, everyone uses Docker, but things like Kubernetes are coming online, as are other container technologies
like Rocket. We dont see too much adoption yet, but we
want to have an architecture that will allow us to support
those technologies if they end up coming to prominence.
What trends are driving the frenzied adoption
of these products?

When you look at companies like Google, Facebook,


MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

14

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy

Twitter, there are a set of technologies they use to achieve


these high software delivery velocitiesthings like
containers, microservices, distributed architectures and
DevOps type patterns.
Now, I think thats trickling down to enterprises that
dont have the money to hire 1,000 Ph.Ds., but still want
to implement these processes. Thats where I think industrialized solutions come in, which is exactly what were
trying to do with our enterprise container infrastructure
offerings, like VIC and Photon Platform. Lets contain
a lot of complexity, allowing developers to go fast, but

still maintain that control, governance and [service-level


agreements]. n

Editors note: After this interview was conducted, VMware


announced the availability of vSphere 6.5, which includes support for vSphere Integrated Containers in Enterprise and Enterprise Plus versions. The company also said it plans to make
Photon Platform generally available by the end of the year.
is Executive Editor of Modern Infrastructure.
You can reach him at [email protected].
NICK MARTIN

Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

15

NETWORKING

IN THE NOT-TOO-DISTANT past,

traffic forwarding within the


data center was simple. One IP address would talk to another IP address. The addresses belonged to endpoints
bare-metal hosts or virtual machines talking to other
bare-metal hosts or virtual machines. The path between
those IP addresses was known to the data center switches
as entries in the routing and bridging tables.
If an engineer needed to troubleshoot poor performance or odd behavior between two IP endpoints, a good
starting point was constructing the path between the two
by looking at those tables. Equal cost multipath and multichassis link aggregation added complexity to this process,
but on the whole, operators could find out exactly which
path any given data center conversation traversed.
There was little to complicate traffic flows between
endpoints. Network-address translation, encryption or
tunneling were rarely present. Those sorts of functions
tended to be located at the data center edge, communicating with devices outside the trusted perimeter.

Striving for
Simplicity
Todays data center networks are too complex;
where do we go from here?

THE MODERN DATA CENTER

BY ETHAN BANKS

HONG LI/ISTOCK

The modern data center network looks different as business needs have morphed. The once relatively simple data

HOME
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

16

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

center is now a unified infrastructure platform on which


applications run. The data center runs as a whole; its an
engine for application delivery.
Increasingly, infrastructure is transparent to developers
and their applications. A thoroughly modern infrastructure is an abstraction upon which developers lay their
applications. Pools of resources are allocated on demand,
and the developer doesnt have to worry about the infrastructure. Rather, the infrastructure just works.
The modern data center also handles security in a distributed way that coordinates with the dynamic standing
up and tearing down of workloads. No longer does traffic
have to be pushed through a central, physical firewall to
enforce a security policy. Rather, a central security policy
is constructed, and a security manager installs the relevant parts of that policy onto the affected hosts, VMs or
containers. There is no infrastructure chokepoint and no
arcane routing requirements to enforce such a policy.
At a high level, weve been describing private cloud
architecture. Abstracting physical infrastructure in this
way allows for a simpler collaboration with the public
cloud. Thus, hybrid cloud architectures are growing in
popularity, with the expectation that public cloud workloads have the same security and connectivity as private
cloud workloads.

HIGHLIGHTS

THE LAYERS

With hybrid cloud architectures becoming the new normal, its important to note the impact these trends have on
networking. No longer is the data center as simple as one
IP address talking to another, with routing and bridging
tables a consultation away when theres trouble.
The infrastructure mechanisms that deliver modern
data center flexibility rely on complex networking. Driving this complexity is the need for workload segregation,
service-policy enforcement and security. Thus, rather than
a sea of IP addresses, the modern data center looks more
like a layer cake.
At the bottom of our layer cake is the underlay network.
This network is the basis on which all other network services will ride. This is also the network that looks the most
familiar to the average network engineer. When they peer
into their routing and bridging tables, they are seeing the
underlay networkthe data center foundation.
The underlay by itself, however, cant provide everything that the hybrid cloud needs. One growing requirement is segregation, referred to as multi-tenancy. A tenant
could be an application, a business unit or a customer.
A tenants traffic is segregated from other traffic through
virtual extensible LAN (VXLAN) encapsulation technology. Traffic from one segment is encapsulated in a VXLAN

The modern data center handles security in a distributed way that coordinates workloads.

As hybrid becomes the new normal, the networking impact is crucial.

The underlay network is the basis on which all other network services will ride.

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

17

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

packet, delivered in this wrapper across the network and


decapsulated on the other side. VXLAN is a second layer
an overlayon top of our base underlay.
Not only does it provide segregation of traffic, but VXLAN can also be used to route traffic via a specific path
across the network. Lets say the data center needs to forward traffic through a specific firewall and load balancer.
In a modern network, firewalls and load balancers are
likely to exist as virtualized network functions, residing
potentially anywhere in the data center. To route traffic
exactly where it needs to go, VXLAN encapsulation can
be used to tunnel traffic flows from device to device until
it has traversed all required devices.
Firewall rules form another layer in our overlay and
underlay cake. A central policy manager inserts firewall
rules host by host. Each host ends up with its own set of
rules that govern forwarding into and out of the device.
Known as microsegmentation, this is a practical way to
ensure security in a scalable data center.
A wildcard that adds yet more networking complexity is the container. Container networking is a nascent
technology, governed by namespaces, proxy servers and
network-address translation to enable containers to communicate with each other as well as the outside workyet
another layer.

TROUBLE FOR OPERATORS

This complexity is a potential issue for network operators.


Most networking issues are tied to connectivity or performance. Two endpoints that should be able to connect

but cannot is one sort of problem. Two endpoints that


connect but arent communicating as quickly as expected
is a different problem.
Troubleshoot a connectivity problem with the packet
walk method. From one network device to another, follow
the path that a packet would take to arrive at its destination. When the actual IP endpointsthe underlayare
known, this is straightforward.

NOT ONLY DOES IT PROVIDE


SEGREGATION OF TRAFFIC,
BUT VXLAN CAN ALSO BE
USED TO ROUTE TRAFFIC
VIA A SPECIFIC PATH.
In the modern data center, the underlay is used to
transmit VXLAN or other overlay packets. On top of that,
we add firewall rules and then perhaps network-address
translation or proxy services; a packet walk becomes
more difficult and fraught with nuance. To diagnose a
connectivity issue, an operator needs to know the source
and destination of the packetcontainer, virtual machine
or bare-metal hostthe firewall policies governing that
packet, packet encapsulation and the service chain to be
followed.
Assuming the operator understands the application
flow and works in a flat, silo-free IT organization, this
isnt so bad. Still, it is not easy. Looking up media access
control and IP addresses in bridging and routing tables is
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

18

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

only one small part of a more elaborate troubleshooting


process. And the fact is that modern infrastructure is often
ephemeral, and operators can be troubleshooting issues
that happened in the past and cant be reconstructed.
Performance challenges are even harder to diagnose.
The sheer number of network devices touching a given
conversation likely involves a virtual operating system,
a hypervisor soft switch, a virtual firewall, a top-of-rack
switch, a spine switch and then the reverse all the way to
the other endpoint.
When some workloads are in the public cloud, matters
become more complex. Putting infrastructure or platform
as a service in the equation means adding high latency
and additional tunneling to our troubleshooting equation.

INDUSTRY RESPONSES

Were stuck with IP. And since were stuck with IP while at
the same time needing additional functionality, overlays
are here to stay. Overlays give us the ability to steer and
segregate traffic, and that functionality is important. With
it, we can treat our infrastructure as pools of resources,
adding and subtracting capacity at will. The issue then
becomes one of managing the network complexity weve
added to our environments.
The networking industry has taken on this complexity
challenge in a couple of ways. The first is acceptance. If
we agree that the complexity is here to stay, then well
provide tools that allow us to discover or visualize whats
happening on the network. For example, Cisco provides
enhanced tools for operators to troubleshoot end-to-end

connectivity issues on its Application Centric Infrastructure. VMware recently bought Arkin, a visualization tool
that correlates workloads with firewall policy and VXLAN
segmentation in a GUI paired with a natural language
search engine.
Effective troubleshooting and visualization tools are,
increasingly, strong points in modern data center platforms. However, some people have reacted against the
complexity by creating forwarding schemes that eschew
overlays if at all possible.
For instance, the Romana.io open source project relies
on a hierarchical IP addressing scheme combined with
host-based firewall rules to create segmentation and a
central security policy. The open source Project Calico is
similar. Romana.io and Project Calico are both interesting
in that they offer forwarding schemes that scale to large
data centers while still handling security and segmentation requirementsand they do it without an overlay.
Perhaps the biggest question isnt about how to handle
network complexity but is about the humans supporting
the solution. Theres a thought out there that automation
will allow IT staff to be thinned. As a twenty-year IT infrastructure veteran, I dont see it that way. With great complexity comes a great support requirement. Organizations
wont want to be on hold with their vendors when the
magic goes sideways. Theyll want to have pros who know
the system at the ready to fix whats broken. n
ETHAN BANKS , CCIE #20655, is a hands-on networking practitioner who has designed, built and maintained networks for higher
education, state government, financial institutions and technology
corporations.

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

19

TECHNICALLY SPEAKING

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy

Dell Pickle
The tech giants latest acquisition
may leave a sour taste in the mouths
of both partners and customers.
BY BRIAN KIRSCH

Survey Says
Two Questions
MERGERS AND ACQUISITIONS are part of the norm for many IT
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

companies. Larger companies buy smaller ones and incorporate the technology into their portfolio, and life moves
on. This routine became a bit more complex recently as
Dell purchased EMC and all the companies under the
EMC Federation umbrella. Dell had already purchased
several companies selling data center technologies, such
as Quest Software, Compellent Technologies and Wyse,
but the EMC deal is different. Previous acquisitions were
meant to augment Dells technology base and help it move
from the consumer side to the data center sidebut the
EMC acquisition brought in several redundant products.
Dell added several crown jewels in VMware, RSA and
a host of other technologies that were under EMC. The
challenge is that several products now in the Dell Technologies family compete with each other. Dell storage
formally Compellentnow competes with several EMC

storage offerings. While Dell has said VMware will continue to operate independently, VMwares vRealize Suite
competes directly with Quest Software. You could even
say VMwares VSAN competes with both Dell and EMC
storage products. The list goes on. IT vendor relationships
are complex enough when they are separate companies;
putting them under the same roof can make it even harder
for vendors and customers alike.
One of the first questions admins should ask is if and
when a product will be phased out. Certain products that
overlap, such as Dell Compellent storage lines and Quest
monitoring tools, are products to watch. However, it is
very unlikely Dell would simply cut off money-making
product lines. A more likely outcome is a change in upgrade paths. Rather than upgrading within the existing
line, customers could be encouraged to switch to another
product in the Dell Technologies family. While this type
of change is not ideal, customers may find reasons to
switch. Dell might offer incentives in the form of additional discounts or training to customers willing to make
the transition. Companies prefer not to switch vendors or
products if they have something that works, but a savvy
IT professional should recognize this is the time to move
up in size, scope or capacity within the Dell Technologies
family. Waiting now could mean the product teams merge
and you will be left with no choice but to upgrade.
Support within the new Dell Technologies will also
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

20

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

change, even if the call-in numbers stay the same. We all


have been in the situation of vendors pointing fingers at
each other. In a perfect world, one company with a single
technical support center to call sounds ideal. But in reality,
the sheer number of products and technologies simply
make the concept impossible. With the dizzying array of
products falling under the Dell umbrella, it is doubtful
that support will drastically improve because no support
engineer could have such a wide scope of product knowledge. If support for EMC Federation products was any
indication, most customers will find their calls are simply
bounced from one business division to another because
support will still be siloed across business groups.
The sales side of the new Dell Technologies will be
equally complexfor both employees and customers.
I dont think its possible for one sales representative to
truly know every product and service now under the Dell
Technologies umbrella, but hopefully the sales person
will at least know where to go for those answers. Remember, some Dell-held companiessuch as VMwarewill
continue to operate independently and remain publicly
traded, meaning they are held accountable to both Dell
Technologies and stockholders. It was complex before; look for it to stay complex now that the merger is
complete.
Where this merger gets interesting is when it comes to
partnerships for new and updated offerings. Dell has a lot
of experience with hyper-converged infrastructure with
its continued partnership with Nutanix. While EMC has
gone down the hyper-converged route with VMware and

VCE, it has not really taken off as it should have. Dells


hardware experience with hyper-converged infrastructure
along with VMware software can create a true competitor to Nutanix. While Nutanix, VMware and EMC dont
always see eye to eye, the true winner here is Dell. The
company will now supply much of the hardware used in

UNFORTUNATELY FOR
THE CUSTOMERS, THESE
PARTNERSHIPS CREATE A
COMPLEX BATTLEGROUND
WHERE ITS NOT CLEAR
WHO IS FIGHTING WHOM.
both VxRail and Nutanix hyper-converged products. The
term frenemy uniquely describes the relationship between
the newly expanded Dell and Nutanix.
Unfortunately for the customers, these partnerships
create a complex battleground where its not clear who is
fighting whom. As administrators, we need these companies to work together, but were likely to see subtle combat strategies creep in. Software drivers for a particular
application or vendor are delayed; support calls point the
finger at a third party; sales and marketing spread fear,
uncertainty and doubtthe list goes on when company
partners are also competitors. This can raise customer
anxiety, making the prospect of buying everything from
one vendor even more attractive.
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

21

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy

The new Dell Technologies will affect different parts


of the business in different ways. Product lifecycles and
engineering should get better as Dell emphasizes the best
of its best products, encouraging customers to move away
from parallel products in its lineup. Sure, it may require an
upgrade to an unfamiliar product, but if it comes with discounts and incentives, it may not be hard to swallow. The
sales aspect looks to be a little confusing with changing
partnerships and product families, so administrators will

need to do their due diligence. The new Dell Technologies


isnt a bad thing. Its a bit more complex, but it also shows
there can be a one-stop shopping experience for the data
center. While competition is good for the market, its just
as important that everything works together. n
is an IT architect and instructor at Milwaukee Area
Technical College, focusing primarily on the virtualization and storage
environments.

BRIAN KIRSCH

Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking
The Next Big Thing

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

22

THE NEXT BIG THING

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions

Spark-ing
the Big Data
Bonfire
AI is making a comebackand its
going to affect your data center
soon.
BY MIKE MATCHETT

Striving for
Simplicity
Technically
Speaking

artificial intelligence will affect the world


and already arein mind-boggling ways. That includes,
of course, our data centers.
The term artificial intelligence (AI) is making a comeback. I interpret AI as a larger, encompassing umbrella
that includes machine learning (which in turn includes
deep learning methods), but also heavily implies thought.
Somehow, machine learning is safe to talk about. Its just
some applied mathe.g., built-over probabilities, linear
algebra, differential equationsunder the hood. But use
the term AI and, suddenly, you get wildly different emotional reactionsfor example, the Terminator is coming.
BIG DATA AND

The Next Big Thing

However, todays broader field of AI is working toward


providing humanity with enhanced and automated vision,
speech and reasoning.
If youd like to stay on top of whats happening practically in these areas, here are some emerging big data and
AI trends to watch that might affect you and your data
center sooner rather than later:
n

Where there is a Spark Apache Spark is replacing ba-

sic Hadoop MapReduce for latency-sensitive big data jobs


with its in-memory, real-time queries and fast machine
learning at scale. And with familiar, analyst-friendly data
constructs and languages, Spark brings it all within reach
of us middling hacker types.
As far as production bulletproofing, its not quite fully
baked. But version two of Spark was just released, and its
solidifying fast. Even so, this fast-moving ecosystem and
potential Next Big Things like Apache Flink are already
turning heads.
A few years ago, all this big data stuff
required doctorate-level data scientists. In response, a few
creative startups attempted to short-circuit those rare and
expensive math geeks out of the standard corporate analytics loop and provide the spreadsheet-oriented business
intelligence analyst some direct big data access.
Today, as with Spark, I get a real sense that big data
n

Even I can do it.

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

23

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions
Striving for
Simplicity
Technically
Speaking

analytics is finally within reach of the average engineer


or programming techie. The average IT geek may still
need to apply him or herself to some serious study but
can achieve great success creating massive organizational
value. In other words, there is now a large and growing
middle ground where smart non-data scientists can be
very productive with applied machine learning even on
big and real-time data streams.
Platforms like Spark are providing more accessible big
data access through higher-level programming languages
like Python and R.
We can see even easier approaches emerging with new
point-and-click, drag-and-drop big data analytics products
from companies like Dataiku or Cask. You still need to understand extract, transform and load (ETL) concepts and
what machine learning is and can do, but you certainly
dont need to program low-level parallel linear algebra in
MapReduce anymore.
n Data flow management now tops the IT systems man-

The Next Big Thing

agement stack. At a lower level, we are all familiar with

silo data storage management, which is down in the infrastructure layer. But new paradigms are enabling IT to
manage data itself and data flows as first-class systems
management resources, the same as network, storage,
server, virtualization and applications.
For example, enterprise data lakes and end-to-end production big data flows need professional data monitoring,
managing, troubleshooting, planning and architecting.
Like other systems management areas, data flows can
have their own service-level agreements, availability goals,

performance targets, capacity shortfalls and security concerns. And flowing data has provenance, lineage, veracity
and a whole lot of related metadata to track dynamically.
Much of this may seem familiar to longtime IT experts.
But this is a new world, and providing big data and big data
flows with their own systems management focus has real
merit as data grows larger and faster.
I wrote recently about how the classic siloed IT practitioner might think to grow his career; big data management would be an interesting career direction. New
vendors like StreamSets are tackling this area head-on,
while others that started with more ETL and data lake catalog and security products are evolving in this direction.
n

Super scale up comes around. Those of us long in the

IT world know that there are two mega-trends that cycle


back and forth: centralize vs. distribute and scale-up vs.
scale-out. Sure, every new cycle uses newer technology
and brings a distinct flavor, but if you step back far enough,
you can see a cyclical frequency.
Big data has been aiming at scale-out on commodity
hardware for a decade. Now, its bouncing back a bit toward scale-up. To be fair, it is really scale-up within scaleout grids, but a new crop of graphics processing units
(GPUs), is putting the spotlight on biggerand not necessarily commoditynodes. For example, Kinetica worked
with IBM on a custom four Nvidia GPU/1 TB RAM system
to power its fast, agile query, big data databaseno static
pre-indexing needed. And Nvidia recently rolled out a
powerful 8 GPU DGX-1 appliance designed especially for
deep learning.
MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

24

Home
Editors Letter
Is It Big Data,
or Fast?
Hashtag
Containment
Strategy
Survey Says
Two Questions

I have no doubt this trend hasnt finished swinging back


and forth yet. Internet of things applications are going to
push quite a bit of the big data opportunities out toward
the edge, which means super scale-out by definition. As
always, a practical approach will likely use both scale-up
and scale-out in new combinations. (How many folks kept
mainframes that now can run thousands of VMs, each
capable of supporting unknown numbers of containers?)
Eventually, all data will be big data, and machine learningand the broader AI capabilitieswill be applied
everywhere to dynamically optimize just about everything. Given the power easily available to anyone through
cloud computing, the impending explosion of internet of

things data sources and increasingly accessible packaged


algorithms, this is becoming a distinct possibility in our
lifetimes.
The data center of the near future may soon be a converged host of all the data an organization can muster,
continually fed by real-time data flows, supporting both
transactional systems of record and opportunistic systems
of engagement, and all driven by as much automated intelligence as possible. The number of enterprise IT management startups touting machine learning as part of their
value proposition is increasing daily. n
MIKE MATCHETT is senior analyst at Taneja Group. Reach him on
Twitter: @smworldbigdata.

Striving for
Simplicity
Technically
Speaking
The Next Big Thing

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

25

Home
Editors Letter
Is It Big Data,
or Fast?

Modern Infrastructure is a SearchITOperations.com publication.


Margie Semilof, Editorial Director

Hashtag

Nick Martin, Executive Editor

Containment
Strategy
Follow

Survey Says
Two Questions
Striving for
Simplicity

@ModernInfra

Adam Hughes, Managing Editor

on Twitter!
Phil Sweeney, Managing Editor

Linda Koury, Director of Online Design

Moriah Sargent, Managing Editor, E-Products

Technically
Speaking

Rebecca Kitchens, Publisher, [email protected]

The Next Big Thing

TechTarget, 275 Grove Street, Newton, MA 02466


www.techtarget.com

2016 TechTarget Inc. No part of this publication may be transmitted or reproduced in any form or by any means without written permission from the publisher.
TechTarget reprints are available through The YGS Group.
About TechTarget: TechTarget publishes media for information technology professionals. More than 100 focused websites enable quick access to a deep store of news, advice and
analysis about the technologies, products and processes crucial to your job. Our live and virtual events give you direct access to independent expert commentary and advice.
At IT Knowledge Exchange, our social community, you can get advice and share solutions with peers and experts.
COVER IMAGE AND PAGE 3: ALEUTIE/FOTOLIA

MODERN INFRASTRUCTURE NOVEMBER/DECEMBER 2016

26

You might also like