0% found this document useful (0 votes)
131 views36 pages

Big Data Quarterly Summer 2021 Issue

Uploaded by

Twee Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views36 pages

Big Data Quarterly Summer 2021 Issue

Uploaded by

Twee Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Volume 7 Number 2 n SUMMER 2021

NEXT-
GENERATION
DATABASES
PROVIDE an EMBARRASSMENT
of RICHES for DATA MANAGERS

Enabling Data Intelligence

Log Parsing With AI

WWW.DBTA.COM The Coming Tsunami of Automation


BIG DATA
QUARTERLY
Summer 2021 CONTENTS
editor’s note | Joyce Wells

2 AI, Automation, and New Ways of


Getting More Value From Data

PUBLISHED BY Unisphere Media—a Division of Information Today, Inc.


EDITORIAL & SALES OFFICE 121 Chanlon Road, New Providence, NJ 07974 departments
CORPORATE HEADQUARTERS 143 Old Marlton Pike, Medford, NJ 08055

Thomas Hogan Jr., Group Publisher Lauree Padgett,


3 BIG DATA BRIEFING
609-654-6266; thoganjr@infotoday Editorial Services
Key news on big data product launches,
Joyce Wells, Editor-in-Chief Tiffany Chamenko,
908-795-3704; [email protected] Production Manager partnerships, and acquisitions
Joseph McKendrick, Erica Pannella,
Contributing Editor; [email protected] Senior Graphic Designer
6 TRENDING NOW | Danny Allan
Adam Shepherd, Jackie Crawford,
Advertising and Sales Coordinator Ad Trafficking Coordinator DevOps Is Working Its Way Through Some Growing Pains
908-795-3705; [email protected]
Sheila Willison, Marketing Manager,
Stephanie Simone, Managing Editor Events and Circulation
908-795-3520; [email protected] 859-278-2223; [email protected]
23 TRENDING NOW | Sam Rehman
Don Zayacz, Advertising Sales Assistant DawnEl Harris, Director of Web Events;
Designing From the Inside Out
908-795-3703; [email protected] [email protected]

25 INSIGHTS
ADVERTISING
Enabling Data Intelligence:
Stephen Faig, Business Development Manager, 908-795-3702; [email protected]
Q&A With Zaloni’s Ben Sharma
INFORMATION TODAY, INC. EXECUTIVE MANAGEMENT
Thomas H. Hogan, President and CEO Thomas Hogan Jr., Vice President,
Marketing and Business Development features
Roger R. Bilboul,
Chairman of the Board Bill Spence, Vice President,
Information Technology 4 THE VOICE OF BIG DATA
Mike Flaherty, CFO
Taking Data and Analytics to the Next Level in 2021:
BIG DATA QUARTERLY (ISBN: 2376-7383) is published quarterly (Spring, Summer, Fall, and Winter) Q&A With Radiant Advisors’ John O’Brien
by Unisphere Media, a division of Information Today, Inc.
POSTMASTER
Send all address changes to: 8 FEATURE ARTICLE | Joe McKendrick
Big Data Quarterly, 143 Old Marlton Pike, Medford, NJ 08055
Copyright 2021, Information Today, Inc. All rights reserved. Next-Generation Databases Provide an

PRINTED IN THE UNITED STATES OF AMERICA Embarrassment of Riches for Data Managers
Big Data Quarterly is a resource for IT managers and professionals providing information on the
enterprise and technology issues surrounding the ‘big data’ phenomenon and the need to better
manage and extract value from large quantities of structured, unstructured and semi-structured 24 BIG DATA BY THE NUMBERS
data. Big Data Quarterly provides in-depth articles on the expanding range of NewSQL, NoSQL,
Hadoop, and private/public/hybrid cloud technologies, as well as new capabilities for traditional Key Trends in Data Management
data management systems. Articles cover business- and technology-related topics, including
business intelligence and advanced analytics, data security and governance, data integration,
data quality and master data management, social media analytics, and data warehousing.
No part of this magazine may be reproduced and by any means—print, electronic or any other—
columns
without written permission of the publisher.

COPYRIGHT INFORMATION
27 DATA SCIENCE PLAYBOOK | Jim Scott
Authorization to photocopy items for internal or personal use, or the internal or personal use
of specific clients, is granted by Information Today, Inc., provided that the base fee of US $2.00 Log Parsing With AI: Faster and With Greater Accuracy
per page is paid directly to Copyright Clearance Center (CCC), 222 Rosewood Drive, Danvers,
MA 01923, phone 978-750-8400, fax 978-750-4744, USA. For those organizations that have
been grated a photocopy license by CCC, a separate system of payment has been arranged.
Photocopies for academic use: Persons desiring to make academic course packs with articles
28 DATA DIRECTIONS | Michael Corey & Don Sullivan
from this journal should contact the Copyright Clearance Center to request authorization through
CCC’s Academic Permissions Service (APS), subject to the conditions thereof. Same CCC address as The Coming Tsunami of Automation
above. Be sure to reference APS.
Creation of derivative works, such as informative abstracts, unless agreed to in writing by the
copyright owner, is forbidden. 30 THE DATA-ENABLED ORGANIZATION | Lindy Ryan
Acceptance of advertisement does not imply an endorsement by Big Data Quarterly. Big Data
Quarterly disclaims responsibility for the statements, either of fact or opinion, advanced by the
Enabling the Entire Organization
contributors and/or authors.
The views in this publication are those of the authors and do not necessarily reflect the views of
Information Today, Inc. (ITI) or the editors. 31 THE IoT INSIDER | Bart Schouw

SUBSCRIPTION INFORMATION The Broken Promise of the Connected Oven


Subscriptions to Big Data Quarterly are available at the following rates (per year):
Subscribers in the U.S. —$97.95; Single issue price: $25
32 GOVERNING GUIDELINES | Kimberly Nevala
© 2021 Information Today, Inc. AI Governance: Cultivating Critical Thinking
EDITOR’S NOTE
AI, Automation, and New
Ways of Getting More Value
From Data
By Joyce Wells

A s restrictions caused by the pandemic begin are now being used to automate the task of pars-
to ease and a “new normal” for business comes ing network security logs.
into view, some organizations are emerging as A number of authors in this issue also focus
winners. Key to thriving and not just surviving, on the human considerations involved in new
many experts say, is the ability to use data more technology adoption. LicenseFortress’ Michael
effectively, enabled by the well-applied use of the Corey and VMware’s Don Sullivan weigh in on
right database technologies and cloud resources the impact of what they call the coming “tsunami
as well as automation and AI. The implications of automation.” As companies increasingly look
of these transformative technologies are exam- to automation to address the intertwined con-
ined from a range of perspectives in a variety of cerns of labor availability, profitability, efficiency,
articles in this issue of Big Data Quarterly. and speed, there is almost nothing outside of the
In his cover article exploring current data- arts and sports that will not be automated, they
base management trends, contributing editor Joe suggest. Also looking at automation and inno-
McKendrick notes that if there is one character- vation from the human perspective, Software
istic that singularly dominates the data scene AG’s Bart Schouw relates his own experience as a
now, it is the vast assortment of systems available. consumer purchasing a high-end “smart” oven
Today, he points out, there is a database for every and the problem of achieving the critical last
type of function, creating choice and opportu- mile of customer service. Connected products
nity but also requiring knowledge about the best necessitate connected CX across all channels,
database environment for the use case. he concludes. Meanwhile, Veeam’s Danny Allan
DataOps, MLOps, robotic process automation, considers what’s necessary to achieve the pro-
and low-code/no-code development are some of ductivity that DevOps promises and observes
the new approaches taking hold, according to that cultural shifts are the hardest kinds of reor-
Radiant Advisors’ John O’Brien, who shares his ganizations to pull off.
take on the top technology trends helping orga- And there are many other articles that exam-
nizations gain value from their data. There is a ine the use of new approaches and best prac-
new appreciation for being agile in terms of BI tices to improve data analytics, security, and
and data analytics since, over the course of the governance. Radiant Advisors’ Lindy Ryan spot-
past 15 months, many companies have discov- lights the challenges of data-enabling the entire
ered that what they were doing was simply not organization, SAS’ Kimberly Nevala looks at AI
fast enough, O’Brien notes. DataOps is building governance, and EPAM Systems’ Sam Rehman
on the best practices that have emerged from the underscores the importance of embedding secu-
DevOps world and is relying heavily on auto- rity by design to defend the enterprise against
mation and data governance, adds Zaloni’s Ben cyberthreats.
Sharma in a separate interview. “You must be To stay on top of the latest data trends, research,
able to use data effectively in a timely manner so and news, visit www.dbta.com/bdq, register for
that you can adapt to change,” Sharma stresses. highly informative weekly webinars at www.dbta​
Adding to the discussion of where AI and .com/Webinars, and take advantage of the exten-
automation can add value, NVIDIA’s Jim Scott sive white paper library available at www.dbta.com
shares how natural language processing methods /DBTA-Downloads/WhitePapers.

2 B IG D ATA QU A RTERLY | SU MMER 2021


Key news on big data

BIG DATA BRIEFING product launches,


partnerships, and
acquisitions

Dell Technologies has Matillion, an enterprise cloud data of multi-cloud and open source
announced plans to spin off its integration platform, has introduced technology. DataStax Astra builds upon
81% equity ownership interest in Matillion ETL for Delta Lake the Apache Cassandra open source
VMware, resulting in two standalone on Databricks , enabling data database and introduces a modern,
companies. The transaction is professionals across the business to microservices-based architecture
expected to close during the aggregate and share data in a single that separates compute from storage,
environment. www.matillion.com and enabling database resources to scale
fourth quarter of calendar 2021.
https://databricks.com up and down on demand to match
www.delltechnologies.com and
application requirements and traffic,
www.vmware.com Denodo, a leader in data virtualization, independent of compute resources.
is releasing Denodo Standard, www.datastax.com
Kyndryl will be the name of the a new data integration solution
new, independent company that will available on cloud marketplaces A erospike , provider of next-
be created following the separation that leverages Denodo’s modern generation, real-time NoSQL
of IBM’s Managed Infrastructure data virtualization engine to data solutions, has released the
Services business, which is expected deliver superior performance and Aerospike Kubernetes Operator and
to occur by the end of 2021. Kyndryl productivity. www.denodo.com made advancements in Aerospike
will be headquartered in New York City. Cloud Managed Service to help
Sumo Logic has partnered with enterprises unlock cloud productivity
www.kyndryl.com and www.ibm.com
AWS for the launch of Amazon and agility with scale-out cloud
Alluxio, a developer of open source CloudWatch Metric Streams, a data. www.aerospike.com
cloud data orchestration software, has fully managed, scalable, and low-
introduced a go-to-market solution in latency service that streams Amazon HVR, an independent provider
CloudWatch metrics to partners of real-time cloud data replication
collaboration with Intel to offer an
via Amazon Kinesis Data Firehose. technology, is expanding its partnership
in-memory acceleration layer with 3rd
AWS and Sumo Logic customers with Snowflake, enabling customers
Gen Intel Xeon Scalable processors now have a fully managed solution to utilize HVR within the Snowflake
and Intel Optane persistent memory for streaming CloudWatch metrics Data Cloud through Snowflake’s
(PMem) 200 series. www.alluxio.io into Sumo Logic to help simplify the Partner Connect. www.hvr-software.
and www.intel.com monitoring and troubleshooting of com and www.snowflake.com
AWS infrastructure, services, and
MarkLogic, a provider of cloud MathWorks has introduced the
applications. www.sumologic.com
data integration and data management and https://aws.amazon.com newest release of the MATLAB and
software, has announced the general Simulink product families. Release
availability of a custom connector for Cloudera, the enterprise data 2021a (R2021a) offers hundreds
AWS Glue, a fully managed, serverless cloud company, announced the of new and updated features and
data integration service, to create, run, Cloudera Data Platform is functions in MATLAB and Simulink,
and monitor data integration pipelines. now available on Google Cloud, along with three new products and 12
www.marklogic.com allowing customers to get positive major updates. www.mathworks.com
business results fast with instant
access to quality data on a scalable, Elastic and Grafana Labs
Redis L abs has closed $110 million
open source, enterprise data cloud are forming a partnership to deliver
in financing led by a new investor,
platform. www.cloudera.com and the best possible experience of both
Tiger Global, bringing the company’s Elasticsearch and Grafana. Through
valuation to more than $2 billion. https://cloud.google.com
joint development of the official
The company’s Series G round also DataStax has introduced Astra Grafana Elasticsearch plugin, users
included participation from another Serverless, an open, multi-cloud can combine the benefits of Grafana’s
new investor, SoftBank Vision Fund 2, serverless DBaaS that delivers a visualization platform with the full
and existing Redis Labs investor TCV. combination of pay-as-you-go data capabilities of Elasticsearch.
https://redislabs.com together with the freedom and agility www.elastic.co and https://grafana.com

DBTA. COM / BI GDATAQUARTER LY 3


THE VOICE OF BIG DATA
Taking Data and Analytics
to the Next Level in 2021
In the future, 2021 may be viewed as a sharp turning
point when companies recovered from the dramatic
changes of 2020 and moved from surviving to thriving.
Recently, John O’Brien, CEO and principal advisor,
Radiant Advisors, spoke with BDQ about the key
challenges and opportunities for organizations as
they emerge from the seismic shifts of the past year
and how they are refocusing their analytics efforts
to get more value from data. John O’Brien, CEO and Principal Advisor,
Radiant Advisors
What are the big challenges facing customers that you
are seeing right now?
We see companies now making the flip between last year and
this year. In 2020, there was a lot going on in our clients’ companies. not really embracing digital transformation. At the end of the
Obviously, everyone got stopped in their tracks. A lot of companies year, we saw companies with this clarity about what digital
had good programs going on that got deprioritized or put on hold, transformation and being agile mean. They came into 2021
but were not canceled. They were in a very reactive mode but wanting to revisit their data and analytics strategy, and they
it was because there were off-the-charts amounts of volatility. have this renewed energy and enthusiasm about what it needs
to be because they just went through a year in the trenches.
How is this revealed?
Some companies—we call it the letter K—were skyrocketing up How has it changed their perspective?
because the COVID situation had a positive impact on them. And A lot of companies are now trying to figure out how to be
quite a few had the bottom line of the letter K as they were spiraling business outcome-driven. They were working on it with the
down. At first, companies were thinking about how to get by luxury of time and with the economy going well. Now, it’s a
temporarily, but as the months and quarters went on, this went into, table-stakes, got-to-survive, next-generation-of-the-company
“It’s not changing anytime soon.” Then there were the companies situation, and they are trying to prioritize what is important.
that were saying, “Well, how are we going to thrive in this?” A lot of They have learned a lot, which will really set them up for the next
companies were thinking about a), what they had to do to get by, couple of years.
and then b), when things do start to turn around in 2021 or 2022,
being able to move with that. What are some of the newer approaches that are
being considered? What about the “data lakehouse”?
What has been the greatest change for companies over That one’s pretty interesting. The data lakehouse is really the
the past year? evolution beyond the data warehouse and data lake, but what
The biggest impact that we see in companies from a data analytics you’ll find is the question of whether or not all companies need
perspective is that the agile and speed components of analytics were to evolve that far. It does make sense. We always said compute
really tested. There was a new definition of what agile meant for technologies of SQL engines will get good enough at some point
BI and data analytics work. Some large enterprises found out in order to run a lakehouse type of model. Up until now, data
what they were doing wasn’t fast enough. You have to be able to lakes have not been efficient enough. A data lake basically has
react fast. You have to have first-mover advantage. the highest level of affordable scalability and flexibility of for-
mats, but it’s actually one of the worst performing. I think the
What else? real question, when we get asked by clients about the lakehouse,
The second thing was that a lot of them were saying they were is whether the architecture fits their needs and their maturity.
doing digital transformation, but they were still doing their old Having something as robust as a lakehouse might be a little bit
business model in a virtual, remote-work kind of way. They were of overkill for some of these companies.

4 B IG D ATA QU A RTERLY | SU MMER 2021


THE VOICE OF BIG DATA

How do you feel about low code, and is this approach much technology and too many layers. In the past, there was a
becoming more relevant for analytics? security layer, a data layer, and a governance layer, and you could be
I’m a big fan of the low-code and no-code world. We’ve been in just one space. Now that we’re slicing vertically, data engineers
helping companies move to this kind of paradigm where what have to sign up to do all of that.
you have are people that are the knowledge workers, the sub-
ject-matter experts—the business SMEs—and when you give them How quickly is DataOps being embraced?
a no-code or low-code kind of tool to work with data, they are going I do believe the key is having what we call IT enablement
to be 10 times more efficient in finding, exploring, and validating organizations that are focused on how agile delivery teams are
data than a data engineer or data scientist. They have the business doing. These teams need best practices, good processes, and a
knowledge, which means that when they look at something, they methodology more than they need the technologies. The old
can say, “That’s not right” or “That’s bad data.” It is business data way of thinking is, “I’ll buy a technology; it’ll solve my problem.”
enablement, and we’ve got a whole practice dedicated to this. It doesn’t work that way. The number-one challenge in these
companies is, “How do I change culture? How do I change to
Is this tied to the metamorphosis companies went democratize the data workforce and enable self-service?” That
through in 2020? is what they need to focus on, and that’s where the challenge lies.
It’s the next incarnation of agile BI that we’ve seen for a
couple of decades. It’s just that now, companies are trying to Are there any other approaches on the horizon that you
be data-driven, which means that they are enabling everybody think are going to be helpful?
in the business to work with the data. In the technologies, the What I think will be interesting in the future is related more
low-code world is one of the biggest trends moving forward. to machine learning. We have been talking about operationalizing
And then the second one is cloud. Very simply, [it is] the abil- analytics for over 5 years. It has finally come together to become
ity for people to self-provision their own needs and resources, MLOps in order to scale. It is an inverted paradigm shift from BI.
that’s the other piece where you’re taking time out of the cycle. In BI, we build a dashboard and we’re done. In ML, you put it into
production and your work is beginning because you have to start
There is also a strong emphasis on blending what were monitoring all the time.
previously siloed processes. How are methodologies MLOps will be one of the big trends that I think will continue
such as DataOps being used now? to evolve. One of my favorite new kinds of technologies in the
A significant aspect of DataOps around continuous integration space is the automated feature engineering. We get a lot of ques-
and continuous deployment is that the team not only has to do the tions about whether all of the data from 2020 is really garbage for
engineering work and the testing, but now they are responsible for a training dataset because it was so unrealistic. Training data-
embedding governance. We have actually leaned more on DataOps sets are a big challenge. There is new technology that allows you
to embed data governance inside of every data pipeline with moni- to take AI and give it a 1,000-column dataset, and it will churn
toring and alerting notifications. through that until it gets down to the dozen or so that are the
We created an organizational strategy of enabling teams with the most relevant for you, and the AutoML piece allows you to look
goal of making these agile delivery teams better at deploying their at the high predictors.
own code. If, organizationally, you have this team in an enablement,
supportive role, they will make sure that those agile delivery teams Is there anything else?
have everything they need to do a good job, which is embed gov- The other major trend for 2021 is going to be robotic process
ernance—embed data quality, auditing, and proper security. The automation (RPA). We expect RPA to come in and make a big
challenge that we have heard from some clients in the field further hit on the self-service data analytics world. Those are probably the
down this journey is that the data engineer who likes to write and two main categories that will dominate over the next year or two.
integrate code now becomes a full-stack engineer, which means
that person needs to understand security and all these other parts. What’s next?
We are at a turning point right now as companies are going
Has it been effective? from surviving to thriving. That’s how we view it. There is a lot of
Some teams have struggled a little bit because it is a shift from pent-up demand to figure out how to do it right, which is nice. In
a data engineer to becoming a full-stack engineer. In some cases, years past, everybody wanted to just go buy something.
these teams are deploying things into containers and deploying
Kubernetes into the fabric, and they are saying that it is just too Interview conducted, edited, and condensed by Joyce Wells.

DBTA. COM / BI GDATAQUARTER LY 5


TRENDING NOW

DevOps Is Working Its Way


Through Some Growing Pains
By Danny Allan 

A few years ago, if you had asked a group of C-Level organizations were experiencing using physical servers.
executives to project which software delivery trend Now, containers present a lightweight alternative to
would be more important for organizations, most virtual machines, and Kubernetes provides a frame-
would have ranked DevOps ahead of containers. work for managing containers at scale.
DevOps promised to profoundly refocus and re-en-
ergize software teams while containers seemed to be Challenging Operational Complexity
an interesting new way to repackage resources that DevOps is more complicated—and more of a trig-
were already there. ger for frustration. It’s focused on restructuring teams
Today, the rankings have flipped. Container adop- and processes. It attempts to solve a process challenge
What
organizations
tion is accelerating—and the technology that orches- of operational complexity and miscommunication.
need to do is trates container usage, Kubernetes, is regularly being Rather than have developers and operations teams
be patient and described as “revolutionary.” DevOps? While it’s still working in silos on different schedules with different
understand popular—up to three-quarters of all organizations use priorities, DevOps attempts to bring them together as
that DevOps— a DevOps blueprint—more users say they’re struggling one cohesive unit.
unlike a move with integrations. One survey found that 86% of orga- The problem is, DevOps isn’t a cookie-cutter
to containers— nizations consider software delivery a top priority, but approach that works the same way with every organi-
is a journey only 10% say they’re successful at it. A dozen years after zation. Some entities are able to use DevOps to sort out
that takes time
the term was first coined, DevOps may be heading inefficiencies and get intransigent workers aligned on a
and will, by
nature, have its
for Gartner’s so-called “Trough of Disillusionment.” common goal. Others run into trouble getting workers
ups and downs. The good news is that DevOps, similar to many motivated or managing the change process. While some
technology concepts, will likely climb out of the trough organizations are able to implement DevOps quickly,
and head toward the fourth and fifth stages of Gartner’s others may find it can take time. It all depends on the
“Hype Cycle”—a “Slope of Enlightenment,” leading to culture of the organization.
a “Plateau of Productivity.” What organizations need to DevOps aims to reduce complexity—but for some
do is be patient and understand that DevOps—unlike organizations, it can have the reverse effect. Teams have
a move to containers—is a journey that takes time and to get used to new sets of rules, timetables, reporting
will, by nature, have its ups and downs. structures, and general working conditions. Workers
Kubernetes is, at its core, a game-changing technol- may need to be retrained or reassigned, which intro-
ogy. Organizations use container platforms to create duces more complexity into an already challenging
and run applications a whole new way—and Kuber- process of delivering software.
netes orchestrates the way container infrastructures Although DevOps introduces some technical
operate. Kubernetes opens up a new approach for changes, it mainly involves making a cultural shift. Cul-
delivering services—providing scalability, portability, tural shifts are the hardest kinds of reorganizations to
and better economics. It allows development teams to pull off. They require buy-in from workers at all lev-
move faster, work more cost-efficiently, move appli- els—and they usually take time to execute correctly.
cations on any cloud or on-premises platform. These DevOps also depends heavily on automation.
are tangible technical and financial benefits that won’t Essentially, the goal is to automate as many tasks as
Danny Allan diminish over time. possible to ensure that the overall system runs like a
is CTO at
Veeam (www. Kubernetes is following the same trajectory virtual- clock—moving builds from step to step, performing
veeam.com). ization did back in the 2000s. Virtualization essentially regular tests, catching flaws, and capturing data. It’s
solved many of the storage and resource constraints a lofty goal, but in practice, it’s hard to do. Even the

6 B IG D ATA QU A RTERLY | SU MMER 2021


Although DevOps introduces some technical changes,
it mainly involves making a cultural shift,
which is the hardest kind of reorganization to pull off.

most efficiently designed software pipelines have to be config-


ured correctly and then monitored and managed to account
for variations in delivery processes.
Plus, is automating every task always the best strategy? Busi-
ness needs, timetables, and staff resources evolve day by day,
month by month, quarter by quarter. Some tasks that need
to change constantly might be better performed by a human
who has the experience to tweak a process on-the-fly. Alterna-
tively, some tasks that are performed infrequently—say, once a
year—might be better performed slowly, with a high degree of
analysis. If you don’t automate those tasks, you don’t technically
have true DevOps. You have human factors in the middle that
slow it down. It’s this yin and yang. Because processes are always
evolving, automation can annoy stakeholders when it doesn’t
do everything it promises.
Workers on some DevOps teams are finding that silos are
easier to eliminate on paper than in actual practice. In other
words, “DevOps” engineers get hired and then are put in
charge of an issue—managing infrastructure, security, scaling,
or fault tolerance—without developer involvement. While On the positive side, one benefit many organizations are
the goal was to create a team approach, team leaders still put seeing doesn’t focus on the delivery of software itself. It focuses
individual tasks in silos. Best practices such as test-driven and more on the broad-based success of the company.
behavior-driven development and agile/scrum processes are What DevOps has done is given the software delivery func-
ignored when it comes to managing infrastructure. tion a seat at the table to decide the direction of the company.
Since DevOps itself is iterative and incremental, the team
Airing Frustrations members who drive the discipline bring a certain pragmatism
If organizations don’t commit to developing and sustain- to decision-making processes. They can provide insights that
ing a DevOps culture, it can lead to other frustrations, includ- the organization needs not only about how software should
ing the following: be a part of every decision but also about what will work and
• The initial euphoria teams experience when new work what won’t. Installing a DevOps process provides a gut check
patterns are introduced can fade over time. If team mem- for the business about what can and should be rolled out
bers lose interest in the project or fail to generate the before investing large sums in an initiative that will fail.
results they’re looking for, they can become disenchanted.
• With new processes come more demands. Muscled up Reaching the Productivity That DevOps Promises
with a new DevOps practice, the company overpromises The best way organizations can fight through feelings of
on delivery schedules, forcing team members to scramble disillusionment over DevOps is to recognize that they’re there
and fall short of expectations. and commit to a long-term vision for the overall process. They
• Leadership engagement may drop off. If managers aren’t need to invest in the tools and the training to reinforce the
committed to a DevOps initiative, team members can lose initiative. They must continually evolve their software delivery
focus, leading to diminished effectiveness for the DevOps strategies to meet their business requirements—and perhaps
project overall. inject more human involvement in certain tasks if it can make
• Respected evangelists may leave or move on to new roles. a difference. By making these moves and working with team
Team-focused environments can suffer if they lose key members to address their frustrations, organizations can set
members who played important roles. their sights on the productivity that DevOps promises.

DBTA. COM / BI GDATAQUARTER LY 7


NEXT-
GENERATION
DATABASES
Provide an Embarrassment
of Riches for Data Managers

BY JOE M c KENDRICK

IF THERE HAS BEEN ONE CHARACTERISTIC THAT HAS BEEN EMBLEMATIC OF


data management in the early 2020s, it would be the incredible variety of appoaches.
Not too long ago, the data management space was dominated by either relational database
management systems (RDBMSs) at one end or desktop-based databases on the other. Now,
there is a database for every type of function—an embarrassment of riches for data
managers. Cloud databases, multi-model databases, time series databases, graph and
document databases, and a range of additional NoSQL systems all offer capabilities to
address business situations across digital and data-driven environments. The question
is: What is the best database environment for the purpose at hand?

The move to next-generation databases massive amounts of data and the increasing
is driven by their ability to help companies rate of query processing by making use of
achieve competitiveness and reach custom- additional resources, high availability and
ers faster and more efficiently. These new fault tolerance to respond to client requests,
breeds of systems can be a force for business even in the case of hardware or software
transformation—whether it is generating failure or upgrade events, transaction reli-
new sources of revenue, enhancing cus- ability to support strongly consistent data,
tomer experience, or producing data-driven and database schema maintainability to
insights that improve how organizations reduce the cost of schema evolution.”
interact with customers. NoSQL systems, for example, “are used
“Advances in web technology, social not as a revolutionary replacement for the
networking, mobile devices, and Inter- relational database systems but as a remedy
net of Things have resulted in the sudden for certain types of distributed applications
explosion of structured, semi-structured, involved with a massive amount of data that
and unstructured data generated by glob- need to be highly scalable and available,”
al-scope applications,” according to a pub- said Davoudian.
lished analysis by Ali Davoudian of Carleton
University. Due to their inflexibility and the LEADING THE NEXT-GENERATION  
cost and complexity of data transformation It isn’t just the NoSQL systems that are
and migration, traditional RDBMSs cannot gaining attention—open source databases
meet many of today’s digital requirements have also increasingly been implemented
alone, Davoudian and his colleagues stated. as solutions. “We’re seeing a huge growth
“Such applications have a variety of require- in technologies like Apache Cassandra and
ments from database systems, including Apache Hbase,” said Ken LaPorte, man-
horizontal scalability to linearly adapt to the ager of the data infrastructure engineering

8 B IG D ATA QU A RTERLY | SU MMER 2021


team at Bloomberg. “We are also starting manage multi-model data, Lu said, noting,
to leverage sharded database technologies, “Even though multi-model databases are a
like Citus and Vitess, which build on our newly emerging area, in recent years, we
existing strong foundation of traditional have witnessed many database systems to
PostgreSQL and MySQL deployments,” embrace this category.”
said LaPorte. “We’ve had numerous ten- Of course, this is a rapidly changing
ants successfully migrate their use cases field, with varying business and technology
to these sharded database technologies.” circumstances, making it difficult to pre-
Two factors are driving this growth, dict the path these data environments will
he continued. “First is the explosion in take. Marc Caruso, chief architect at Syntax,
the quantity of data we store, especially sees a hybrid environment encompassing
unstructured/semi-structured data. Sec- many types of approaches emerging. “Data
ond is the combination of organic and migration is happening right now and at
institutional migration from traditional a large scale,” he said. “We are seeing con-
large-scale RDBMS deployments to more tinued migrations into open source rela-
directed, lower-level technologies that more tional database solutions, non-relational
closely resemble the desired use cases.”  database solutions, PaaS-based database
This momentum toward more focused, solutions, and a combination thereof.”
flexible databases shows no signs of let- The primary focus of these initiatives can
ting up. Raj Rathee, head of product man- be grouped under the heading of reducing
agement at Exasol, expects “a continued operating costs, whether the initiatives
uptick in growth of specialized databases as are being undertaken to reduce hefty sup-
opposed to the one-fits-all-approach.” port contracts from major vendors, reduce
For example, he said, “We’re likely to see headcount expense, or gain performance
increased use of graph databases for rep- efficiencies by migrating to a more purpose-
resenting and analyzing relationships, and built database solution, Caruso noted.
NoSQL databases for storing IoT events. For
analytics and data warehouses, in-mem- WEIGHING THE ADVANTAGES
ory and columnar databases—especially Every database type has specific strengths
those that support MPP—will continue for specific functions within today’s enter-
to be adopted for their unmatched per- prises. Graphs, which are a type of NoSQL
formance and scalability for processing database, for example, can map very closely
large data volumes.” to business requirements. Graphs are
Jim Webber, chief scientist at Neo4j, expressive and can deal with the complex-
sees graphs as the databases to watch over ities, irregularities, and contradictions of
the year ahead. “They’ve reached the early modern business, said Neo4j’s Webber.
mainstream now, and their impact for “As such, users of graphs not only experi-
transactional and analytical applications ence superb performance from their data-
is profound,” he said. “Early adopters are bases but find that the results they achieve
already profiting from graphs, and now are accurate and actionable. Moreover,
the mainstream wants in. Graphs have graphs are the natural underlay for the best
been bubbling away under the surface for machine-learning systems.”
10 years, but are at an inflection point now. NoSQL databases, in general, are well-
The enterprise is no longer merely curious suited to big data environments. “NoSQL
about graphs but hungry for them.” databases have high scalability which makes
Multi-model databases also represent “a them suitable for dealing with large vol-
new journey to handle the variety of data” umes of data while offering cost-effec-
according to an analysis published by tiveness,” added Arthur Iinuma, president
Jiaheng Lu of the Department of Computer of ISBX. “They combine management of
Science at the University of Helsinki. “The unstructured data, flexible systems, and
variety of data is one of the most challeng- complex analysis. NoSQL databases also
ing issues for the research and practice in have high agility and dynamic schema,
data management systems.” Multi-model which makes them ideal for big data and
DBMSs build a single database platform to Internet of Things applications.”

DBTA. COM / BI GDATAQUARTER LY 9


NEXT-GENERATION DATABASES PROVIDE AN
EMBARRASSMENT OF RICHES FOR DATA MANAGERS

Open source databases also provide


high degrees of flexibility and cost-effective
performance. The variety and flexibility Every database type has specific strengths for
these next-generation databases provide specific functions within today’s enterprises.
is impressive, enabling “far more options
than ever for engineers to match specs to
their use cases. Further, these next-gener- There are many considerations that need Is there a different technology, or group
ation databases provide a stronger link to be made when transitioning to next-gen- of technologies, which suit the use case
between data and compute tools, such as eration database solutions, “including the more cleanly?”
Spark SQL, Presto, or Dremio,” Bloomberg’s capabilities of the future-state solution ver- Traditional relational databases “were
LaPorte said. sus the current state, the impact to licens- primarily engineered for transactions,” said
In addition, this new breed of databases ing and support contracts, and a method Exasol’s Rathee. “Some of the jobs that tra-
has “the ability to scale horizontally with to ensure that the correct solutions are ditional relational databases do not handle
more ease than ever before,” LaPorte contin- deployed,” said Syntax’s Caruso. “We are well include high-performance or real-time
ued. “For instance, Cassandra allows for continuing to see use cases in which a sub- analytics, IoT use cases that require very
doubling in size on commodity hardware optimal technology is chosen because the high ingest rates and cheap archived stor-
with minimal effort from either a ser- technical team wants to work with it.” age of mostly non-useful event data, time
vice provider or our in-house application The relative immaturity of databases series data, and relationship and network
teams.” Illustrating the performance and such as multi-model also needs to be con- analysis as done by graph databases, and AI.”
cost balance, LaPorte related how one of sidered. “There still remains a long journey The need for business flexibility also is a
Bloomberg’s clients “reduced their hard- toward a mature and robust multi-model major factor that may lead away from more
ware footprint by 50% by moving from a DBMS comparable with verified solutions traditional databases. “I encourage peo-
traditional RDBMS to Cassandra. As a side from the world of relational databases,” ple to not limit themselves to one solution
effect, they were able to increase both according to Helsinki University’s Lu. when a combination of technologies may
performance and future scalability.” IT leaders “need to be brave when faced often provide a better overall solution,” said
Stacy Robin, founder of The Disruptive with graphs, because the model is different LaPorte. “For example, one high-through-
Diva, points to the emerging class of AI from what they’ve done for their whole put solution uses Cassandra for permanent
databases that deliver advanced analyt- career,” said Webber. “An open mind and storage and analytics, along with a Solr
ics as a basic feature. “Emerging technol- a little reading will get them in the right front end for interactive uses.” 
ogies, independently, are useful. How- mindset.” From there, picking a reasonably In this way, “stringing together a search
ever, their real value is when you start well-understood graph problem—such as engine like Apache Solr with Cassandra
creating mashups, she said. AI databases product recommendations or customer and Apache Spark provides an incredibly
that are built for training machine learn- relationships—and solving it with a graph flexible approach to replacing large-scale
ing and deep learning models are where implementation is the gateway to a bigger interactive applications with dynamic auto-
businesses should be looking to truly universe of graph technology, including mated data,” LaPorte continued. “Some-
transform the value of robust databases sophisticated analysis and machine learn- thing like this would have historically been
and take their intelligence a step further, ing, Webber explained. shoehorned into a traditional RDBMS
Robin said. In addition, smaller organizations sim- deployment with multiple compromises,
ply may not have the time or expertise resulting in fragile applications running
WEIGHING THE RISKS to implement a variety of database types. on top. Plus, due to its size and complexity,
Of course, with any new wave of tech- “Smaller businesses, with fewer resources, it may also have required frequent care and
nology innovation, there are inherent risks may have far less understanding of these feeding from a DBA to ensure its reliable
that must be factored into designs and complex tools and the opportunities for performance and stability.”
deployments. “As with traditional RDBMSs, its use,” said Disruptive Diva’s Robin. A forward-looking approach is key. “As
not all use cases match next-generation with most decisions, it’s best not to limit
databases,” said LaPorte. “When your busi- DECISION TIME your scope to the problem in front of you
ness use case requires an ACID-compliant It comes down to assessing these fac- but to think about what the landscape will
database, a traditional RDBMS may fit your tors when determining the best fit for busi- look like 5 to 10 years from now,” said
use case better than current NoSQL imple- ness applications and data. “There are LaPorte. “How will your data grow? How
mentations. The performance and scalabil- many datasets which cannot be stored in will access patterns change? How liquid is
ity trade-offs provided by NoSQL technol- a traditional RDBMS,” said LaPorte. “The your data? The more liquid and flexible
ogies like HBase and Cassandra necessarily question we must ask is whether that’s the your data is, the easier it will be to pivot as
sacrifice these ACID guarantees.” best choice.” LaPorte puts this another way: your use cases evolve.” 

10 BI G D ATA QU A RTERLY | SU MMER 2021


Redis Labs
PAGE 14
DELIVERING VALUE FOR
THE REAL-TIME ENTERPRISE

Qlik
PAGE 17
DATA ONBOARDING:

DATA
OVERCOMING
THE CHALLENGE

Swim
PAGE 18
CONTINUOUS
INTELLIGENCE: APPS STRATEGIES
for the
THAT STAY IN SYNC
WITH THE REAL WORLD

Semarchy

REAL-TIME
PAGE 19
THE KEY ROLE OF THE
DATA HUB FOR REAL-TIME

ERA
DATA STRATEGIES

CData
PAGE 20
LOGICAL DATA
ARCHITECTURES
FOR REAL-TIME
INTEGRATION

GigaSpaces
PAGE 21
DIGITAL INTEGRATION HUB:
THE ARCHITECTURE OF
DIGITAL TRANSFORMATION

DataStax
PAGE 22
THE FUTURE OF
DATA MANAGEMENT
IN THE CLOUD

Best Practices Series


MODERN DATA
STRATEGIES
for the
REAL-TIME
ENTERPRISE
One of the many lessons learned from COVID-19 was the role that enterprises in recent years, according to a Forrester study of 253
real-time data analytics can play in helping businesses navigate companies, which cited the increasingly complicated aspects of data
through even the most intense and disruptive crises—from keep- management. “Enterprises look to fast data solutions to help, but
ing abreast of supply-chain issues to ensuring timely deliveries challenges prevent them from achieving the necessary function-
to customers’ homes. Real-time data—and associated analytics— alities to reap business benefit,” the researchers stated (“Don’t
has long been a feasible, though expensive, goal for many of today’s Get Caught Waiting on Fast Data,” October 2018).
data operations. It also forms the crux of AI powered by stream- Forrester’s data also showed that “the majority of enterprises
ing data that supports continuous intelligence initiatives, which surveyed say they need to analyze their data within a day or less,
will eventually enable real-time decision making across a broad yet many current solutions process data at a much slower rate than
range of enterprise functions. these demands require. Considering that almost 90% of organiza-
Even with the emergence of new technologies such as Apache tions report that they typically require their data to be ingested and
Kafka and Spark, the ability to effectively support the speed and analyzed within one day or less, and that 88% need to perform
scalability requirements of real-time data can be difficult for many analytics in near-real time on stored streamed data, it’s clear
enterprises. Nearly half of the organizations that participated in a that businesses cannot afford to wait around for the insights
recent survey of subscribers to Database Trends and Applications are that fuel growth.”
focused on streaming data (47%), followed closely by IoT (41%). Of course, at a time when enterprises are looking at approaches
Another 21% of respondents reported 5G initiatives. When asked such as machine learning to respond almost instantly to events and
about data from these initiatives straining their infrastructure, 47% requests, an inability to deliver insights within 24 hours—let alone
of companies responded that this was a concern. Only 21% of 24 minutes—calls for a rethinking of data strategies. There are
the companies surveyed are confident that their infrastructure is many tactical deployments of real-time solutions in production,
capable of handling these trends (“DBTA Digital Transformation such as traffic management systems that reroute customer ship-
and Cloud Workloads Study,” January 2021). ments, for instance, but the power of real-time analytics has yet
Thus, the ongoing shift in data strategies means new oppor- to be fully leveraged.
tunities and challenges in terms of performance, complexity, gov- The following are recommendations to ensure confidence
ernance, and integration. This has been a problem vexing many and performance in developing real-time analytical capabilities:

12 BI G D ATA QU A RTERLY | SU MMER 2021


There are many tactical deployments of real-time
solutions in produc­tion, but the power of
real-time analytics has yet to be fully leveraged.

technology available to the business. As is particularly prevalent


with established organizations, there may have been consider-
able investments already made in existing data management
and storage systems. Rather than attempting to replace systems,
real-time-capable systems need to support existing applications
and infrastructure.
Automate operations and workflows as much as possible.
This is made possible through newer approaches such as Data-
Ops. DataOps offers a collaborative and automated, process-ori-
ented methodology to improve the quality and reduce the cycle
time of data analytics. DataOps may be key to real-time analytics
initiatives going forward, as it allows a consistent process of mon-
itoring data, checking for quality, and integrating streams into
applications on demand.
Be proactive about requirements for storage and compute.
Best Practices Series Storage and compute requirements have been escalating for
decades, and the rise of real-time data analytics will demand
exponential increases in these technology requirements. “Fast
data is different because it powers real-time insights, decisions,
Work the business problem at hand. As with any relatively and machine learning,” the authors of the Forrester report stated.
involved technology initiative, real-time analytics requires support “Hardware and software infrastructure must expand to handle
from data keepers and orginators from across the enterprise. fast data workloads. IT professionals should consult with data
The key is to identify and document the advantages to be gained science, AI engineering, and application development teams
from applying real-time analysis to specific business problems, to forecast storage and compute needs for next-gen, real-time
and to measure the benefits eventually delivered against an ini- digital services.”
tially established baseline of performance. Keep data governance front and center. All data flowing into
Encourage the building of real-time analytics into every- applications or to decision makers in real time needs to be vetted
one’s job. Decision makers at all levels need to be assured that the not only for quality but for its relevance to the business. New
data delivering insights is the right data, at the right time, for the sources of data are constantly being added to the streams—
right purposes. This calls for education, training, and aware- often from outside parties. This requires the enforcement of
ness to help managers and employees embrace—and be able to rules and oversight to ensure that the information aligns with
configure and customize—the information and insights that business users’ needs.
are being delivered to them. Collaboration and team building
between data, IT, and business teams will help move the pro- EMPOWERING THE REAL-TIME ENTERPRISE
cess forward. A real-time enterprise can’t rely on just any data—decision
Develop and communicate use cases. To help decision mak- makers need to be assured that the data moving through their
ers fully appreciate the potential of real-time capabilities, pro- enterprises is of the highest quality, accuracy, and timeliness.
ponents should document use cases and how results have been Real-time data analytics is one of the most compelling, yet com-
successfully delivered. There may be confusion about how and plicated, initiatives being seen in the data technology space these
why capabilities such as predictive analytics, AI, and edge-based days. Applications’ ability to sense and respond to situations as
processing need to be applied, and, as success stories unfold, they they occur will deliver benefits for competing in both the digital
can be starting points. and traditional economies.
Take things one step at a time. Develop well-focused use
cases involving real-time engagements built on learnings and —Joe McKendrick

DBTA. COM/ BI GDATAQUARTERLY 13


sponsored content

Delivering Value for the Real-Time Enterprise


TODAY, ORGANIZATIONS OF ALL TYPES need to act to follow suit. Real-time financial services call for real-time
quickly on information to solve problems, make decisions, and capabilities that include high throughput, minimal latency, high
create business value. In industries undergoing rapid digital scalability, high availability, and flexibility in data models.
transformation such as retail and financial services, real-time To make the transition to real-time, you need to learn to
capabilities have become pressing requirements. work with the data layer in a new way. The shape of the data in
One of the fastest in-memory databases available, Redis your application matters to performance and capability. This
Enterprise provides the ideal data layer to meet real-time business means that forcing everything into rows and columns is no
needs. It’s built to give you extreme performance, horizontal longer sufficient.
scalability, hybrid and multi-cloud deployment options, and the The fastest growing, most important applications are
ability to geo-distribute data at local and global scale without now being built with modern data structures. That’s why
sacrificing response times. the ideal approach is to build on a single
Let’s explore the practical benefits for in-memory database that can handle
banking and retail enterprises. multiple data structures with a unified
operational interface. This represents a
REAL-TIME BANKING highly effective evolution in databases,
FinTech No Longer Has well-adapted for where the marketplace
the Luxury of Time is headed.
The financial services industry Unless the data layer is extremely fast,
evolved as a batch-processing entity, your applications simply won’t be able to
with business processes built around the deliver the real-time performance your
concepts of overnight processing runs, customers demand. Redis Enterprise is
daily back-office reconciliation, and the only solution that delivers this kind of
monthly statements. These processes performance as an in-memory database
were designed using traditional relational with support for multiple data structures
database architectures with large volumes with dedicated engines.
of data organized into rows and columns. The problem now is The most launched, most used, and most loved database
that financial services companies no longer have the luxury of in the world, Redis is beloved by software developers for its
time. Legacy database architectures and time-tested business simplicity, flexibility, and extensibility. Unmatched performance
models still power much of the financial services industry, but and ease of development are major reasons why financial
the field is increasingly embracing real-time payments, real- institutions like Deutsche Börse, Xignite, and Vetr
time e-commerce, and real-time gross settlement. are already using Redis Enterprise to power
Disk-based database architectures were not built to support real-time applications.
real-time applications, and although workarounds exist to To learn more about how Redis
extend these disk-based databases with an in-memory cache, supports real-time financial services,
this patchwork approach contributes to greater complexity, download “Building the Highway to
higher coordination costs, scalability bottlenecks, and Real-Time Financial Services.”
limitations in adopting the latest software architectures.
In addition to their inability to support real-time applications, Data Innovation in Banking and Finance
legacy databases have additional drawbacks: they don’t scale In addition to the need for real-time decision making and
gracefully, they are not flexible enough, they are not reliable execution, banking institutions and financial services companies
enough, and they are complex and difficult to work with. face serious challenges around strained margins, changing
customer behavior, a tough regulatory environment in the wake
When Milliseconds Matter of the 2008 financial crisis, and competition from digital-native
Today’s best-practice technology stack already powers flexible market entrants. Not surprisingly, these challenges have been
real-time services for the world’s most popular applications in exacerbated by the COVID-19 pandemic. However, tackling
the digital economy, and these same technologies will inevitably these problems offers the sector an opportunity to expand its
transform financial services. Digital fintech disruptors and reach and launch new products and services.
technology companies have started to go down this path. The good To compete, traditional companies are increasingly moving
news is that there’s still time for traditional financial institutions away from legacy, on-prem IT systems and embracing cloud

14 BI G D ATA QU A RTERLY | SU MMER 2021


sponsored content

continues on page 16

architectures, AI, and the growing use of analytics. Every single To learn more about how Redis
middle-market asset management, insurance, and financial helps financial services companies
institution surveyed by BDO in 2020 said they have developed—or address the increasing demands of
are planning to develop—a digital strategy. But despite the fact modern finance, download “Data
that literally everyone is working on a digital strategy, only one- Innovation Opportunities in Banking
quarter (27%) of those institutions are executing their strategies. and Finance.”
There are four areas where data-layer technologies in particular
can help traditional banking and financial services firms overcome REAL -TIME RETAIL
the new challenges and profit from emerging opportunities. Delivering Real-Time Retail
1. Customers increasingly demand an omnichannel During the last decade, widespread access to fast, reliable
experience from their financial services providers. broadband and the evolution of innovative online services
Traditional banks that successfully implement combined to create the era of real-time retail. Consumer
such a strategy can turn their physical branches expectations of retailers were transformed by their experiences
into a competitive advantage, see improved buying from Amazon and other ecommerce companies that
recommendation rates, and encourage customers were able to provide fast, efficient shopping experiences.
to take on more products and services. In 2020, the trend was amplified even further. COVID-19
2. Regulations are requiring financial institutions lockdowns restricted shopping at brick-and-mortar stores
to share customer data through open banking in many countries, sending even more consumers online.
processes. But meeting these standards is not just Today, retailers cannot afford to offer mediocre experiences.
a cost. In the UK, which was an early adopter of Nine out of 10 U.S. consumers say they will abandon a retailer’s
open banking, the measures have been shown website if it is too slow, according to a 2020 study compiled
to unlock new revenue opportunities. by Retail Systems Research for Yottaa. Worse, almost six out
3. F inancial institutions face a growing threat from of 10 (57%) will visit a competitor’s site instead and one five
fraud and cybercrime. Data layer technologies (21%) would never return. If that’s not bad enough, one in
can help financial services companies meet these seven (14%) would vent their frustration on social media.
challenges, giving customers confidence that But faster performance for mobile apps and websites is only
their financial security is in good hands. the beginning. Retailers can make the user experience feel more
4. A vailability and scalability are vital to ensuring that responsive by showing products in stock, allowing shoppers
new and innovative services can actually be delivered to search purchase history, and supporting buying online
to customers, providing banks with the flexibility to andpicking up curbside.
meet changing conditions and build customer trust. A retailer’s omnichannel and supply chain systems must
In banking and finance, companies that want to stay also be able to scale up when required to meet increased
relevant must manage and use data in ways that benefit their demand around predictable, major events of the retailer’s
customers, enable agile business processes, and support new year such as Black Friday and Cyber Monday, as well as
products and services. special events such as the release of limited-edition items.
Redis Enterprise brings real-time performance to use Additionally, these systems must be able to scale up to meet
cases like identity verification, transaction scoring, and more. consumers’ expectations even during unpredictable surges in
Redis Enterprise can also bring the power of in-memory demand such as those caused by surprise endorsements from
processing to other components of a fraud detection system. online influencers or by unexpected external events.
The RedisGraph module enables fast graph processing that can
be used to detect synthetic fraud, and RedisAI brings real-time Supporting Instant Retail Experiences
AI model-serving to power more efficient transaction analysis. Without a world-class data layer, a retailer will struggle to
Furthermore, Redis Enterprise Cloud and tiered storage develop a genuinely effective and compelling real-time offer. The
options in Redis Enterprise offer an attractive TCO by eliminating data layer underpins key elements within a successful real-time
data center-related spending and improving IT productivity to retail proposition. It must provide a consistent real-time view of
let your organization focus on rapid innovation, rather than inventory, managing updates from stores and enterprise systems to
just keeping the lights on. Finally, Redis Enterprise provides give customers and staff a clear, accurate view of stock availability.
enterprise-grade reliability, performance, and high availability It also needs to be resilient and scalable, to satisfy consumers’
for mission-critical financial applications. It ensures five-nines expectations, and to manage periods of increased demand.
(99.999%) availability around the world with active-active geo- As an in-memory database delivering multiple data
distribution across regions, and provides an in-memory data structures with best-in-class performance, Redis Enterprise
layer that delivers sub-millisecond latency at virtually any scale. is perfectly suited to meet the demands of real-time retail. It

DBTA. COM/ BI GDATAQUARTERLY 15


sponsored content

provides the performance needed to deliver a great shopping Leveraging Redis Enterprise’s active-active database replication,
experience, ensuring that retail applications and websites are with conflict-free resolution, allows enterprises to avoid the
always fast and responsive; and supports real-time inventory. complexity and costs of managing message brokers between their
Redis Enterprise can scale up capacity and performance in in-store and enterprise databases while at the same time ensuring
response to demand from real-time applications, with no need consistency. This eliminates the need for auditing, reconciliation,
to change application code and without incurring additional and risk of data duplication. In the event that a store becomes
costs, downtime, or disruption. By providing automated failure disconnected from the enterprise database, Redis Enterprise will
detection, failover, and cluster recovery, Redis Enterprise helps automatically reconcile once the database becomes available.
ensure that retailers can continue operating even after experiencing Enterprise operations can also leverage the replica database
bursts of traffic during seasonal peaks or to get an exact view into each store—allowing them to make
unexpected surges in demand. inventory/yield/supply-chain management decisions based
To learn more about how Redis on real-time information and send updates to the stores as
supports real-time retail, download needed. This simplifies ship-from-store functionality and
“Retail in the Era of Real-Time Everything.” change management for store-based order fulfillment, and
it helps ensure compliance with corporate promotions,
Enabling Real-Time Inventory Systems pricing, inventory levels, and so on. Finally, it improves yield
Real-time systems allow large, multi-site retailers to management by controlling local discounting behaviors.
optimize inventory, yield management, and supply-chain Just as importantly, since Redis Enterprise is a multi-model
management. Relying on historical data makes inventory database, it allows developers to choose the data structure best
forecasting less accurate, increasing costs from suited to the SLAs and data access patterns of their
carrying excess inventory and requiring application. This is one of the many reasons why
unnecessary shipping. adoption of Redis Enterprise is growing within
Retailers can also face reduced yields due microservices architectures. For example, you can
to poor execution of enterprise-wide choose to deploy your database as a key-value
pricing and promotional strategies— store, a graph database, a time-series database, a
for example, the inability to allocate cache, a streaming engine, a search engine, and/or
available inventory to the highest- a document store—and many others—with each
margin locations. And real-time database deployed on the same multi-tenant Redis
inventory is an essential component of a unified national Enterprise cluster, minimizing the complexity
order-fulfillment strategy, letting retailers pool geographically and costs of technology and vendor sprawl.
clustered store locations and warehouses to contribute to a To learn more about how Redis
single inventory. supports real-time inventory management,
Retailers without real-time inventory management risk download “Real-Time Inventory: Building
product unavailability in the face of natural events and disasters. Competitive Advantage.”
Before an anticipated event that may disrupt operations, real-
time inventory management lets companies redirect fulfillment The Redis Enterprise Real-Time Advantage
to healthy regions or proactively stock potentially impacted areas. For any company, particularly those in the most high-pressure
More importantly, a store database must remain available even market sectors undergoing rapid digital transformation, Redis
if it becomes cut off from the enterprise. This allows the retailer Enterprise offers the high availability, superior performance,
to continue operating with assurance that all of its inventory will and flexibility developers need to deliver real-time applications
automatically sync with the enterprise database—without any that meet the expectations of today’s customers. Using active-
conflicts—once connections are re-established. active geo-distributed technology, Redis Enterprise allows Redis
Real-time inventory is a critical piece of an omnichannel databases to be replicated across multiple geographic regions,
retail strategy, delivering a unified, seamless, and consistent delivering local latencies, rapid automated failover, and data
customer experience across all channels, including in-store consistency for globally distributed applications. Redis Enterprise
shopping, websites, mobile apps, email, and social media. is also available as a managed service via all three major cloud
providers, enabling developers to reduce time to market by
Optimizing Inventory, Yield, and Supply Chain Logistics quickly launching databases in the cloud.
For complex real-time inventory systems, Redis Enterprise Get started today with a free trial at https://redislabs.com/
is chosen by leading retailers because it uniquely provides the try-free.
capabilities required from a mission-critical database. Large
retailers like Staples, Gap, and many others are already reaping Redis Labs
these benefits. www.redislabs.com

16 BI G D ATA QU A RTERLY | SU MMER 2021


sponsored content

Data Onboarding:
Overcoming the Challenge
Adam Mayer, Senior Technical Product Marketing Manager, Qlik

MID-TO-LARGE-SIZED ENTERPRISES accelerate and automate delivery at ADDRESS NON-


are challenged with managing and scale, others are overly reliant on legacy TECHNICAL BARRIERS
deriving value from terabytes of data. systems and mainframes. Relying on While there may be compliance
Qlik research with IDC shows that, in labor-intensive processes for transactional issues in sharing too much information,
the months leading up to the pandemic data slows availability and analysis, businesses must find a happy medium
and in the early months of lockdown, impacting the speed of decision-making. that protects data while enabling more
many organizations increased their data employees to leverage the right data.
loads with new external data (40%), new REDUCE MANUAL METHODS Through an enterprise-wide data catalog,
internal data (45%), and new data types It is widely understood that siloed online and offline data sources can be
(45%). Unfortunately, much of this data data hurts organizational success, and brought together and made available to
is going to waste. As many as 68% of many business leaders recognize the employees in a single source of truth,
organizations say they fail to leverage need to modernize accessibility through while providing governance using
most available data. the cloud, modern data warehouses, role-based access. This enables more
Why? Among the multiple factors is or data lakes. But transmission is just accurate and trusted data to be shared,
data onboarding. as important was destination. Many without sensitive information ending up
Even with modern analytics in place are still relying on brittle ETL (extract, in the hands of the wrong employee.
to create impactful insights from data, transform, load) processes which
data onboarding roadblocks thwart those are slow, work in batch mode, take DATA ONBOARDING DONE RIGHT
analyses from creating real value. significant time to complete, and are Data onboarding continues to
One of the main sticking points not fit for modern enterprises. ETL challenge many enterprises, preventing
is a struggle with uniting online and processes also often require manual them from taking full advantage of
offline sources. This hinders leveraging programming to map data sources to their data. Thankfully, organizations
analytics-ready data in near-real-time, targets, imposing even greater query have technology solutions to provide
which is crucial for modern enterprises burdens on production systems. ongoing, governed access to more data
to compete. Instead, organizations should deliver to those who need it. Modernizing data
There are, however, practical steps to their data so it can be accessed at any onboarding enables enterprises to deliver
overcome these obstacles and improve data time, with data transformation completed greater value and bottom-line impact.
onboarding to take full advantage of data. after it is loaded to the target, through
a data catalog. Businesses can speed up ABOUT QLIK
WATCH FORMAT LIMITATIONS and smooth out their data onboarding Qlik’s vision is a data-literate world,
Format is extremely important to process through data integration where everyone can use data and analytics
data onboarding. If data is in proprietary solutions that use automated Change to improve decision-making and solve
formats, users cannot simply integrate Data Capture (CDC). This enables data their most challenging problems. A
it without conversion. Leveraging flat from different sources to be replicated private SaaS company, Qlik provides an
files can simplify the procedure, but the and streamed in near real-time to one end-to-end, real-time data integration
data must still be transformed into an or more destination of choice that will and analytics cloud platform to close
analytics-ready state. be kept up-to-date with the freshest the gaps between data, insights and
There are challenges for financial data, when changes occur at source. action. By transforming data into
services companies and businesses that CDC can shift the approach to ELT Active Intelligence, businesses can drive
possess sensitive customer information. (extract, load, transform), an alternative better decisions, improve revenue and
This data is often stored and coded in silos to the outdated ETL. ELT decouples the profitability, and optimize customer
to create higher levels of security to match Transformation from the Extracts (data relationships. Qlik does business in
governance measures. Understanding from the source) and Loads (into the target more than 100 countries and serves
how to safely unleash these data types systems) where the data transformation over 50,000 customers around the
is crucial to driving more value. occurs further downstream. Unlike world. Learn more at www.qlik.com.
Equally important is data delivery. traditional ETL, ELT lends itself well to
While some organizations have automation, reducing time-consuming Qlik
modernized their data pipelines to and labor-intensive manual programing. www.qlik.com

DBTA. COM/ BI GDATAQUARTERLY 17


sponsored content

Continuous Intelligence: Apps That


Stay in Sync with the Real World
ORGANIZATIONS ARE DROWNING IN STREAMS OF For example, Swim Continuum is deployed at national
real-time data from their products, assets, cloud services, apps scale by a global telecommunications service provider to
and infrastructure. How can they use it to derive continuously continuously aggregate and analyze petabytes of streaming
useful insights? Traditional “store then analyze” applications, data per day from thousands of cell towers that connect
built around a database for state management, can’t help: Data millions of subscribers. The application enables the provider
f lows are boundless, and the value of events is short-lived. to continuously optimize and predict connection quality and
Users want applications that react instantly, accurately, and in ensure a superior network experience for their subscribers.
sync with the real world, but growing data rates and the need Swim Web Agents create a smart, interactive model of each
for “always-on” situational awareness make databases too slow. network element, and continuously calculate hundreds
Fortunately, there is a new class of event-driven, continuous of KPIs and performance metrics that stream in real time
intelligence applications that respond immediately and in sync to business applications and storage. The solution gives
with the real world, to continuously deliver insights of business the operator continuous situational awareness to improve
value. They service quality and transform their network operations into
•A  nalyze events the moment they arrive because data is only a real-time asset.
ephemerally useful and applications need to react in real time,
•C  ontinuously discover complex relationships between data
sources, including geospatial (e,g., proximity), analytical
(e.g., correlation) and even predicted states, in a rich business
context, and
• Deliver insights and project outcomes at the rate of change
of the business captured in its event streams, databases,
and activity of users.
Continuous intelligence is not just about speed. Open source
projects and commercial products alike boost performance
using in-memory databases, data grids and caching, but none
offers a single stack that lets developers easily create and deploy
applications that continuously seek, stream and deliver insights
that are always concurrent with the real world.
Swim Continuum builds and scales applications directly Event-driven intelligence has become an imperative
from event streams; it creates active computational models that for any fast-moving, real-time enterprise seeking to closely
analyze, learn, and predict on-the-fly, to deliver contextually monitor, manage and adapt their business. Swim Continuum
rich responses. Relationships between real-world data sources is an open core application development and runtime platform
are fluid and discovering and analyzing their interdependence that can be added to stream processors such as Apache Kafka,
with enterprise context provides deep insights and accelerates simplifying event-driven, continuous intelligence. It augments
decision-making. Swim Continuum builds a stateful, in-memory Apache Kafka with easy development, scalable deployment,
graph of linked, concurrent, stateful actors, called Web Agents persistence, and comprehensive management of continuous
from streaming data that continuously analyze events, find applications. Its benefits are applicable to every enterprise, at
relationships, learn, and project, and stream their insights to any scale: It gives decision makers a finger on the pulse of the
UIs, applications and storage. Swim developers need only Java business, and the insights needed to adapt instantly.
skills, and Swim applications integrate easily with widely used To learn more how Fortune 100 companies in
components like Apache Kafka or Apache Pulsar for event telecommunications, energy, industrial automation,
streaming, enterprise and cloud databases and data lakes, or and other innovative sectors use Swim Continuum to
Apache Spark for batch analysis. Swim applications deploy monitor diverse data streams, anticipate disruption, and
securely in containers using existing DevOps tools, and scale rapidly respond to global changes in their industries,
automatically as event loads change. They are resilient, persistent, visit www.swim.ai and follow us on twitter @swim.
and efficient, typically using only about 10% of the infrastructure
required for database-driven streaming applications, while Swim
delivering up to a million-fold performance increase. www.swim.ai

18 BI G D ATA QU A RTERLY | SU MMER 2021


sponsored content

The Key Role of the Data Hub


for Real-Time Data Strategies
REAL-TIME CHALLENGES INSTANT INTEGRATION:
Data has become a critical asset for all companies: It is essential LOAD, EXPOSE, AND BROADCAST
for any business to rely on trusted and up-to-date data, and to be A real-time ecosystem requires collecting a very large amount of
able to quickly act on information. This trend is amplified by the data from numerous sensors and systems and sharing accurate and
increasing number of devices and platforms that produce data in consolidated data with analytics systems and business applications.
real time. This requires extensive integration capabilities, including:
Real-time data strategies bring many opportunities but also • I nbound and outbound integration with standard-based APIs
new challenges: •B  roadcasting data changes to downstream systems using
•G
 athering large amounts of data brings quality issues. event-streaming technologies
•D
 ata governance is foundational to the success of strategic •D  ata certification and quality at the core of the
initiatives that rely on real-time data. integration processing
•A
 dvanced integration capabilities are required to exchange
data with heterogeneous sources and targets. REAL-TIME ANALYTICS
•P
 rocessing ever-growing volumes of data requires highly Analytics probably benefits the most from real-time data
performant and scalable systems. strategies: By using instant data, real-time analytics translates into
faster decision making and allow prediction of future trends.
THE DATA HUB APPROACH However, the accuracy of analytics depends on the quality of
A data hub is key in a real-time data strategy. It centralizes collected data. Without data quality, analysis and predictions are
information across applications and provides services to manage flawed, and real-time data feeds captured are “noise.”
the enterprises’ core data (such as customers and products). Data By providing a solid data quality foundation, a data hub brings
is governed, mastered and managed in a centrally understood, meaning to this “noise” and trust to the analytics.
non-intrusive way. The data hub not only provides master data to
all applications and processes, but it connects business applications PERFORMANCE AND SCALABILITY
to analytics structures like data warehouses and data lakes. To support increasing volumes, the data hub must natively
support high availability and scalability and must be available on
CENTRALIZE THE RIGHT DATA most popular cloud platforms, such as Microsoft Azure, Amazon
Data from the sources is heterogeneous and comes with quality Web Services, and Google Cloud. These cloud offers allow
issues such as duplicated, incomplete, invalid, or non-standardized administrators to choose the backend databases technology and
information. The data hub is the central point where these issues size, as well as configure virtual machines with load balancing
are resolved: It validates, cleanses, enriches, and consolidates data. and high availability.
It also exposes interfaces and services to support the same level of
quality for data authored using the data hub user interfaces or in SEMARCHY xDM AND THE INTELLIGENT DATA HUB
external applications. A data hub strongly reduces the lead time With Semarchy xDM, companies with real-time data strategies
to obtain reliable data that can be leveraged with other systems. create their own data hub. This modern data management
solution delivers data quality, cleansing, deduplication, and
GOVERN REAL-TIME DATA curation capabilities—both on-premises and in the cloud. Along
With growing interactions and observations collected in real with consolidating trusted data, xDM helps organizations enforce
time through various acquisition points, it is critical to define data governance requirements and track changes over time. This
a framework for the governance, maintenance, and usage of all way, operational analytics, predictive analytics, and traditional
data. Setting up a collaborative governance within a data hub analytics share the same master data, reference data, business
drives accurate interpretation of all the relationships between glossary, and data catalogs to transform customer interactions
the core data elements. and product definitions into measurable business value.
However, governance is nothing if not enforced.
A data hub allows for policy definition and enforcement in a 30-DAY TRIAL
single platform with: Ready to enter the real-time data era with the Intelligent Data
•R
 oles and authorization management Hub™? https://www.semarchy.com/download/
•A
 n enterprise business glossary and data dictionary
•C
 ollaboration features Semarchy
•D
 ata quality, workflow, and business KPIs www.semarchy.com

DBTA. COM/ BI GDATAQUARTERLY 19


sponsored content

Logical Data Architectures


for Real-Time Integration

AS A BUSINESS, YOUR DATA GOALS Data virtualization is a method folding. This query folding means
initially center around understanding of building a “logical” data access that our drivers will intelligently push
the data you need and how that data layer, or logical data warehouse, that specific functionality down to the data
will be used. Once you have defined provides a unified data layer for BI source and let the data source process
your data sources and your policies systems or enterprise applications the request. Our solutions pass as
and procedures around data, your to query. Instead of consolidating much data processing as possible to
attention should move to drive data to a single repository, data the source to reduce the amount of
efficiency. Data efficiency is an end remains at the source and is client-side processing work, minimize
goal, but it is important to realize accessed on-demand in real-time. the number of API requests, and
that its execution takes many forms. At CData, our connectivity reduce the size of returned data sets.
Environmental adaptability, speed solutions simplify real-time data
of replication, ease of use, and data integration to the point where it is REAL-TIME DATA
protection are all factors that play into accessible to any user. Our driver CONNECTIVITY IN FOCUS
the efficiency of your data processes. technologies and connectivity solutions As the volume and complexity
While batch-oriented data create a logical data layer for data of enterprise data continue to grow,
architectures like ETL/ELT with data operations that makes all your data it is crucial to look beyond data
warehousing remain popular, the sources look and behave exactly like connectivity as a collection of point-
need for real-time data is increasingly a standard database to applications. to-point integrations. To harness data’s
critical. From analytics and decision strategic value, it needs to be available
support to AI and ML, to data and ELIMINATE DATA SILOS across a broad spectrum of use.
process integration—real-time data Our data virtualization technologies But thinking strategically doesn’t
drives efficiency, and often without the create a logical data connectivity layer mean you have to make major
compliance and governance challenges with real-time access to every data source investments in new technologies
inherent with data warehousing. that matters. We enable users to integrate and replace what is working. At
At the same time, the shift to cloud with 250+ SaaS, NoSQL, and Big Data CData, we offer a tactical approach
and hybrid-cloud infrastructure has sources, though universally accessible to data connectivity that supports
led to growing challenges with data plug-n-play interfaces that easily broad applications across every
fragmentation. As organizations extend modern and legacy applications. facet of data management.
shift their infrastructure to cloud This means you don’t need developer Whether you are looking for real-time
technologies, data has become more resources to connect your BI, analytics, connectivity for analytics, supporting
decentralized and more challenging or reporting applications directly to enterprise IT and business units with
to leverage as an asset. While APIs live data—since those applications integration, setting up a data warehousing
provide an extensibility point for already know how to connect to a system, developing an application, or
accessing data, every integration database, they can use our drivers to building connectivity for just about
is unique, making it challenging work with any data in real-time. anything else—we’ve worked with
to extract actionable insights from As performance is critical in real- customers to support nearly every data
disparate systems. time integrations, all our connectivity connectivity need.
solutions are hyper-optimized to To learn more, visit us online at
LOGICAL DATA WAREHOUSING make the fewest requests and return www.cdata.com.
Data virtualization (DV) technologies data as quickly as possible. They
offer a contemporary approach to both transparently support features such CData Software
real-time integration and fragmentation. as bulk/batch integration and query www.cdata.com

20 BI G D ATA QU A RTERLY | SU MMER 2021


sponsored content

Digital Integration Hub:


The Architecture of Digital Transformation
ENTERPRISES ARE STRIVING TO GLEAN REAL-TIME •H igh-Performance Data Store—Highly available, low-latency
insights from their data—especially for operational data, in-memory processing and storage to run your applications,
boost profitability, provide superior customer experiences, reports and analytics in real-time on operational data. With
and adhere to regulations. However, ingesting and analyzing no-code integrations, you can connect to your data stores in
rapidly growing data volumes at the speed of business, and one click, or leverage built-in CDC and enjoy high ingestion
from diverse data sources, is presenting a huge challenge, of any data structure, supporting millions of IOPS. Ultra-
particularly for enterprises with unconnected and incompatible fast data processing includes real-time transformations and
systems, including legacy core infrastructure. dynamic server-side aggregations as part of data preparation
One technology that is changing data infrastructure, and enrichment. Intelligent management of data in multiple
and powering in-the-moment data experiences is the storage tiers—hot on RAM, warm on SSD and cold on your
Digital Integration Hub (DIH). Coined by Gartner, a database- with an automated data lifecycle driven by business
DIH is an application architecture that connects to multiple policy logic and actual usage patterns.
systems of record and data stores regardless of whether • Smart Caching—Robust in-memory data grid technology
they reside on-premise or in the cloud, and aggregates powers a low-latency, distributed and scalable caching tier on
operational data into a low-latency data fabric. A DIH will any operational data source, optimized for your applications’
help organizations to offload and decouple from legacy advanced queries, with native secondary indexing and
systems of record and databases to rapidly introduce new dynamic fast aggregations. AI-driven autonomous
digital applications and provide the scale and availability scale out/up handles your unexpected workloads.
required for always-on services. The ability to seamlessly • Server-Side Aggregation—Performs aggregations
integrate with existing infrastructure and offload the data to in-memory, eliminating the need to retrieve the data
a cloud-native, high-performance, compute-and-storage tier set from the space to the client side, iterate the result set
will enable fast time-to-value and lower risk with the ability and perform the aggregation. Performing aggregation
to continuously migrate to the cloud without the need to operations on-the-fly and on the server side delivers
completely divest from existing mission-critical systems. optimal flexibility to data consumers, since it is not
necessary to pre-aggregate data on predefined schemas
GIGASPACES SMART ODS: A MODERN and no development is required.
APPROACH TO DATA MANAGEMENT • Streaming & Event-Driven Architecture—Define and
With the GigaSpaces Smart ODS, organizations can achieve manage multiple data events (aka triggers) and enable
the full potential of their data by augmenting their traditional multiple digital applications to subscribe to real-time events
infrastructure to meet modern business needs. Gartner cites for critical decision-making. Support high ingestion of real-
GigaSpaces Smart ODS as one out of four global vendors that time events and data streams, valid for use cases such as IoT,
provides an out-of-the-box DIH that enables organizations to: or when using message bus solutions such as Apache Kafka.
•C onsolidate any data type from multiple systems of record • Unified API Layer & Co-Located Microservices—
to deliver a unified, modern real-time operational data store Use of standard RESTful API and JDBC/SQL queries
• Implement an event-driven microservices architecture to for all digital applications with a unified API gateway/
rapidly introduce new modern digital applications proxy. This combined with the ability to run distributed
• Decouple digital applications from the systems of record business logic co-located with partitioned data delivers
to provide speed and scale for digital applications extreme performance and unparalleled agility.
• Ensure always-on applications and • Change Data Capture (CDC)—Existing ODS solutions
services across all environments use either third-party CDC solutions or batch ETL. With
•M odernize and offload from expensive legacy infrastructure GigaSpaces Smart ODS, it is possible to easily connect to your
• Develop once and deploy anywhere—on-premise, and legacy infrastructure or existing ODS in a non-intrusive way,
cloud, hybrid, and multicloud scenarios with a cloud- to get immediate data updates. This is delivered as part of
native platform Smart ODS with unified installation and orchestration.
The following native and modular components can be
leveraged according to the needs of the organization. They are GigaSpaces
part of a unified installation with a single orchestration tool. www.gigaspaces.com

DBTA. COM/ BI GDATAQUARTERLY 21


sponsored content

The Future of Data


Management in the Cloud
SUCCESS WITH CLOUD-NATIVE APPLICATIONS •R  unaway data growth and scalability. Your RDBMS is
requires enterprises to make platform choices that will shape designed to scale, but you must be ready with server space
everything they do—for better or worse—for many years and a storage array when it’s time to grow.
to come. Organizations face two critical questions when • S calability gaps. When you pair scalable, high-performance
considering to successfully build apps in the cloud: cloud applications with legacy database systems that
1. H ow can development teams ensure that the applications simply can’t achieve similar scalability gains, the results
they run in the cloud will perform at scale, minimize are predictable—and invariably disappointing.
complexity and cost, and adapt easily to the cloud? The solution: Going cloud-native with Cassandra and NoSQL.
2. How can they deploy cloud-native technology stacks that NoSQL databases have emerged as a go-to option for dealing with
keep applications and data management systems fully the realities of modern business data management and aligning
aligned and seamlessly integrated? an enterprise’s data management systems with the demands of
We believe the technology required to answer these a cloud-native application environment. Cassandra provides
questions is ready and waiting. As we’ll explain, Kubernetes low latency, high performance and availability, openness, and
and Apache Cassandra™ have already emerged as ideal scalability. Multi-cloud performance may be the single most
platforms for cloud-native application development and important requirement today for many enterprise teams—and
data management, giving teams a faster and more confident it’s an area where Cassandra truly stands out.
path to success with their cloud-native data applications.
KUBERNETES AND CASSANDRA ARE AN IDEAL
KUBERNETES REPRESENTS THE FUTURE OF PAIRING FOR BUILDING AND RUNNING MODERN,
CLOUD-NATIVE APPLICATION DEVELOPMENT. CLOUD-NATIVE DATA APPS.
Kubernetes today is the de facto global standard for container At one time, Kubernetes wasn’t a platform of choice for running
orchestration. Many of the capabilities that make Kubernetes a databases or data management applications. It was difficult to run
first-rate container orchestration tool also make it ideal for an stateful data on Kubernetes, which often resulted in a single point
even more strategic role: enabling enterprises to build and deploy of failure. On the apps side, that’s not a big deal, but it causes
cloud-native applications. challenges on the database side. Today, the truth about Kubernetes
In this role, Kubernetes makes it easy to move applications is very different. It now enables developers to run applications
seamlessly across multi-cloud and hybrid cloud environments, and databases side by side—creating a single, fully containerized
achieve lower total cost of ownership (TCO) without sacrificing technology stack, accessed via a shared control plane and
performance or highly available applications that can scale up leveraging a common management toolset. In addition, using the
or down—instantly and automatically. K8ssandra operator, persistent data is now possible on Kubernetes.
And Cassandra plays a starring role in this success story.
CASSANDRA REPRESENTS THE FUTURE FOR CLOUD-
NATIVE, DATA-INTENSIVE APPLICATION DEVELOPMENT. KUBERNETES AND CASSANDRA IS A CLOUD-NATIVE
You’ve heard it all before: Data is a modern enterprise’s MATCH MADE IN HEAVEN—ALMOST. AND DATASTAX
most valuable asset. So, data-driven applications are where BRINGS IT ALL TOGETHER.
it’s at for developers. DataStax is on a mission to make working with Cassandra
And while most businesses could do great things with the data as simple, easy, and profitable as possible. We understand
they’re sitting on right now, the tools they’ve used to work with Cassandra because we played a major role in launching
data in the past hold them back and limit this vast potential. the project and setting the groundwork for a thriving and
The problem: Business data is changing—your business diverse development community—one that continues to
database isn’t. Relational database (RDBMS) technology was make Cassandra one of the world’s most versatile, robust,
built for a world where business data looked and behaved in and reliable NoSQL database systems. Both developed by
very specific ways. Cloud-native applications play by different DataStax, the open source cass-operator abstracts away the
rules, and that’s where the trouble starts. complexities of deploying Cassandra on Kubernetes and the
• S tatic data structures. Data today doesn’t stick to pre- open source Stargate Data API includes native APIs, allowing
defined structures. Customer data today may include call native JSON and GraphQL to be used rather than CQL.
center transcripts, geo-spatial data, software usage telemetry,
and dozens of other sources. There’s gold in these mountains DataStax
of data, but your RDBMS has no idea where to start digging. www.datastax.com

22 BI G D ATA QU A RTERLY | SU MMER 2021


TRENDING NOW Designing From the Inside Out
By Sam Rehman

Everyone is talking about digital transformation The zero-trust network, also known as the zero-
and the new normal of working from home, which trust architecture, was a model created in 2010 by
has inarguably brought cybersecurity threats to a new Forrester Research analyst John Kindervag, who recog-
level. With the boundaries of the network parameter nized that as data grows, so do the security threats for
all but disappearing, the risks of an attack are greater organizations across the board. Since then, the National
than ever. While remote workforces require consid- Institutes of Standards and Technology (NIST) has
erable attention, it is equally important to remember developed a free cybersecurity framework that, simi-
that cyberattacks happen everywhere. In early January lar to Kindervag’s model, helps organizations not only
2021, for example, one of the country’s largest wire- develop a shared understanding of cybersecurity risks
less carriers was hacked when employees in its physi- but also reduce them with custom measures. Created in
cal retail locations were scammed by individuals who 2014 with input from private-sector and government
brazenly downloaded software onto a store computer. experts, the framework (ratified as a NIST responsi-
After using employee credentials to gain access to bility in the Cybersecurity Enhancement Act of 2014)
the company’s customer relationship management was used by 30% of U.S. organizations in 2015 and was
system, a wide range of customer information was projected by Gartner to rise to 50% by 2020.
lifted, from PIN codes to credit card numbers. To make the most of this cybersecurity framework,
Similar stories are becoming more commonplace, it is recommended that organizations take a number of
though hacking has been a major threat to enterprises steps to classify data and know their core assets, align it
No longer
viewed as a
and government agencies for decades. Few veterans with their regulatory requirements, enforce the princi-
byproduct of information technology will forget the notorious ple of least privilege, define and layer controls to verify
of business Mafiaboy hacks in 2000. With a series of distributed each point, and define how to observe incidents.
processing, denial of service (DDoS) attacks, which bombard Whether it is a customer, partner, or employee,
data is a a site or application with so many requests that the having the ability to identify end users in a way that is
critical asset server is unable to keep up, 15-year-old Michael Calce consistently reliable is one of the most fundamen-
that enables was able to shut down the websites of E*Trade, Dell, tal controls for protecting an organization. Particularly
decision Amazon, CNN, and Yahoo. With everything heading for a growing company that is adding new users, this
making.
toward digital, it’s not surprising to see this activity can become increasingly difficult to manage when
Therefore, a
data strategy
escalating, particularly as companies rush to accel- compounded with the fact that most modern systems
must do far erate time-to-value for customers and make their involve identities from multiple sources with different
more than services more scalable and accessible. At the same protocols, federated attributes, and identity mappings.
address storage. time, the enterprise threat landscape has become
increasingly dynamic, expansive, and fluid—making A Cloud-Specific Strategy
it harder for traditional security models and controls Since most data lives in the cloud now, it’s essen-
to defend against exploits. tial to have a cloud-specific data security strategy. This
starts with data classification and takes into account all
Embedding Security By Design the elastic and agile access semantics. The next step is
While traditional approaches such as ring fencing to take a careful look at an encryption strategy—at rest,
will still be necessary, they are not enough for today’s in use, and in transit—and make sure it is understood
enterprises. What’s central to prevention is embedding how the keys are managed and refreshed. Last but not
security by design. In other words, security must be least, it is imperative to have a robust disaster recovery
integrated into software development, cloud infrastruc- plan in place.
ture, and business systems holistically—starting first This cannot be overstated: The most effective cyber-
Sam Rehman with a data strategy. No longer viewed as a byproduct of security strategy is one that is architected into an enter-
is senior vice
president business processing, data is a critical asset that enables prise’s digital ecosystem and includes proactive (offen-
and chief decision making. Therefore, a data strategy must do far sive security) and reactive (defensive security) measures.
information more than address storage. It should start with identify- Venturing into 2021 has been nothing short of per-
security officer ing enterprise data assets, then establishing a common ilous. But with a keen awareness of the threat land-
of EPAM
Systems (www. set of goals and objectives to ensure how it is safely scape and a zero-trust architecture by design, orga-
epam.com). stored, provisioned, processed, and governed—all of nizations are less likely to become another statistic
which are core to a zero-trust approach. and far more likely to gain a competitive edge.

DBTA. COM/ BI GDATAQUARTERLY 23


BIG DATA BY
THE NUMBERS
KEY TRENDS IN DATA MANAGEMENT
P olyglot persistence, the practice of selecting the best database for the job, is on the rise. The
days of the one-size-fits-all approach are gone as companies strongly embrace a range of
NoSQL/NewSQL and relational, cloud and on-prem, and proprietary and open source options. With
this trend, comes the opportunity to choose the right database for the job but also greater complexity.

The top reasons for using multiple database platforms are: The most popular digital transformation
projects being undertaken by organizations
right now involve cloud solutions, BI
1. Supporting multiple applications 81%
and data analytics, and cybersecurity.
2. Supporting multiple departments 54% In addition, IoT and AI/machine learning
(ML) are also important initiatives.
3. Application vendor requirements
42%
Top digital transformation priorities:
4. Supporting multiple workloads
36%
1. Cloud solutions
5. Supporting increasing data volumes
29%
2. BI or data analytics
6. M
 anaging database licensing

28% 3. Cybersecurity/information security
and support costs
7. Supporting unstructured data growth 22% IoT
4.

17% AI/ML
5.
8. Deployment in multiple or hybrid cloud

9. Avoiding vendor lock-in 16% Source: “DBTA Digital Transformation and Cloud Workloads Study,”
produced by Unisphere Research and sponsored by Aerospike

Including both relational and non-relational


database management platforms, the majority of While it is no surprise that Oracle
organizations are relying on more than one brand prevails as the leading vendor in a
of database management system. survey of Quest-IOUG members, there
are numerous other data environments
• 1 brand— 11 % commonly seen within Oracle sites.
• 2–3 brands— 46%
• 4–5 brands— 23% The most popular DBMSs now in use or
planned for use within Oracle sites are:
• More than 5 brands— 13% 1. Microsoft SQL Server............. 70%
• Unsure— 8% 2. MySQL ........................................ 46%
The top 5 challenges with diverse Source: “Thriving in a 3. Apache Hadoop/HBase......... 29%
data environments are: Multi-Database World:
4. PostgreSQL ............................... 24%
PASS 2021 Survey On
1. Searching/discovering data..... 50% Data Diversity,” produced 5. IBM Db2 ..................................... 23%
2. Reducing data latency............... 49% by Unisphere Research
6. MongoDB ................................... 19%
in partnership with
3. Scaling for growth ..................... 48% Professional Association 7. Teradata ..................................... 19%
4. Siloed or inaccessible data ..... 41% for SQL Server (PASS),
and sponsored by Source: “2020 Quest IOUG Database Priorities Survey,”
5. Aligning security......................... 7% Dell Technologies produced by Unisphere Research and sponsored by Dell EMC

24 BI G D ATA QU A RTERLY | SU MMER 2021


INSIGHTS
Enabling Data Intelligence:
Q&A With Zaloni’s Ben Sharma
Today, orga niz ations across a ll industr ie s a r e practices in the DevOps world: having an automated
struggling to achieve enterprisewide visibility and approach in terms of bringing in the data and making
maximize use of data and related digital assets for sure the data gets validated, making sure that you can
business advantage. trust this data. And at the same time, there is gover-
Recently, Ben Sharma, co-founder and chief product nance and other principles being applied to that data
officer of Zaloni, a provider of enterprise DataOps soft- in an implicit way based on your data strategy and the
ware whose flagship product is the Arena platform, spoke policies that you have adopted within your organiza-
with BDQ about the challenges companies are facing in tion. Those are the critical aspects of making sure that
their efforts to gain more value from data. you have a DataOps mindset or approach.

You must be The past year has been a really volatile time. Automation and governance are the two critical
able to use data What are the challenges that you’re helping pillars there.
effectively in a companies navigate through? That’s right. And when I say “automation,” it’s not
timely manner There are several challenges that come to mind as just moving data from point A to point B; it’s also val-
so that you can we have worked with customers in various different idating it, running your test cases, and making sure
adapt to change, verticals over the last several years and, more impor- that they succeed before you promote from one envi-
and so that you
tantly, in the last couple of months. One is that there ronment to another environment, before you actu-
can reinvent
are more data silos being created. As you think about ally make the data available from one zone to another
some of the
business models. the speed to execute, oftentimes, that actually means zone in a trusted manner for the rest of your data
going as fast as you can and doing whatever you need consumers. All of that is front and center in terms of
to do to get the results or create the outcomes. And a DataOps approach.
that’s creating more data silos.
Has the rise of hybrid architectures combining
What is happening? multi-cloud and on-prem deployments made data
Organizations that lack a data strategy for managing management more difficult for organizations?
data across silos are struggling because once they create If you think about it, every cloud provider has
the data sprawl, they don’t have the governance, they their own way of doing things, which fits their use
don’t know how to manage data, and they don’t know cases and how they’re bringing their services to the
what data exists where. Having a single unified view of market. Now, if you’re the customer, and you’re try-
governance has become critical from our perspective ing to do these things across multiple infrastruc-
and from what we’re seeing from various customers and tures and multiple platforms, you don’t have a com-
their use cases. Along with that, organizations that do mon way of thinking about data. You don’t have a
not have a strong approach in terms of how they think common way of thinking about security. You have
about DataOps and automation are struggling because a very fragmented approach, and unless there is an
it just takes too long for them to get access to the data abstraction layer that allows you to think about this
and to give data to the right people for the right business in an organized manner, you have to build this in a
use cases. The two critical things are that, one, we see very proprietary manner each time you are stand-
companies struggling with governance and then, two, ing up these environments. That creates more chal-
we see organizations struggling with time-to-market or lenges in terms of thinking about governance and
time-to-insight. thinking about compliance to various regulatory
requirements that you may have, depending on your
What is Zaloni’s definition of DataOps? industry. All of that adds and multiplies in terms
Our view is quite simple. DataOps has emerged as of the challenges that you have to deal with as you
a discipline taking the learnings from some of the best manage data.

DBTA. COM/ BI GDATAQUARTERLY 25


INSIGHTS

What is at stake for companies your business use cases. With Arena,
that don’t take a comprehensive we think about having a unified view
approach? of data across all your different envi-
To put it very simply, it’s a question ronments and we call it the three C’s.
of survivability. How do companies It’s where we catalog the data, then
survive, given that they have to adapt allow you to control the data, no mat-
to change? In the past 12 months, we ter where it exists, so that you have the
If you think about Arena and its
saw retailers whose approaches were right governance model for the data.
no longer valid because they didn’t significance, the software platform
have any traffic coming into their provides a common gathering place. And the third C?
stores. Changing to an online model, And then the third piece which is
accelerating in terms of digital trans- critically important is: How do we
formation, and making adjustments sooner than later in a very allow you to consume that data so that your data consumers can
kind of agile mode were critically important for these businesses come in and have easy access to it?
to survive. What we see is that using data to make informed deci-
sions so that you can retain and grow your customers is at stake. Looking ahead, is there a direction that you’re going in that
You must be able to use data effectively in a timely manner so that is possibly different than other companies?
you can adapt to change, and so that you can reinvent some of the There are two key things that we’re focused on. One is, now
business models. that we have a base foundation where we have all these different
capabilities along the data supply chain for enabling a DataOps
That brings us to the idea of data democratization, which approach in these organizations, we’re adding more and more
has been a major theme for Zaloni. machine learning capabilities in our platform so that we can
At the highest level, what we mean by that is that companies make data management and data governance much more intel-
need to be able to provide or enable access to data across their ligent. The idea is that as you bring in the data, our platform can
organization but do it in a meaningful way where you’re pro- automatically detect that data and not take bad records forward in
viding the right data to the right people. As an organization, you the process so that you can automatically enable trust in that data.
also have a responsibility to safeguard sensitive data. If there is Our platform also automatically detects sensitive data so that, as you
PII data, you need to think about complying with CCPA- and need to comply with various regulations, we can flag datasets that
GDPR-type regulations so that you’re protecting the data, have have PII so that you’re not making them generally available or you
role-based access control on the data, and are making sure that are automatically applying our masking and tokenization functions
you’re not letting the data be available in an ungoverned man- to anonymize that data. Things like that—that are more about aug-
ner because that, from our perspective, reduces the trust in the menting that data management approach with system-generated
data. You need to have an approach where you can say that intelligence—are what we broadly call “data intelligence.” Enabling
this is the original data, which may or may not be trusted, but data intelligence from a DataOps perspective—and from a data sup­
then do something to the data to apply checks and balances ply chain perspective—is one of the key things we are focused on.
and make it more trusted so that as people in the rest of the
organization consume it, they can know that this data has been And the second?
approved by a centralized data authority. The second thing that we are all focused on is becoming that
single pane of glass—that single cockpit—for customers across
Does that tie into the rebranding last year of the Zaloni all of the different cloud providers. We are not just providing a
data platform as Arena? shim layer on top of these cloud service providers. We’re actu-
Absolutely. If you think about Arena and its significance, ally doing deep integration with these cloud service providers:
our software platform provides a common gathering place, if talking to their APIs, leveraging the innovation and the new ser-
you will, so that data and the data consumers, the data gover- vices that they’re bringing to the market, but at the same time,
nance folks, and other partners are aligned with a unified view. providing that layer of abstraction so that our customers do not
We are allowing our customers to create these experiences where have to deal with the internal details and they have much more
data’s potential is realized through collaboration and controlled portability in terms of moving the data from one environment
access across the organization. From our perspective, when we to another environment. Those are the two things that are front
talk about Arena, Arena is that space where data and informa- and center in our focus as we go forward.
tion are not only organized, accessed, and shared, but are also
transformed into insights, into something that is meaningful for Interview conducted, edited, and condensed by Joyce Wells.

26 B IG D ATA QU A RTERLY | SU MMER 2021


DATA SCIENCE PLAYBOOK

Log Parsing With AI:


Faster and With Greater Accuracy
Network security logs are a ubiquitous record of system run- tions from Transformers (BERT), introduced by Google research-
time states and messages of system activities and events. They ers, is one such innovation. The bidirectional encoder takes two
become the primary source of system behavior and are critical sequences for encoding; one is the normal sequence, and the other
when triaging abnormalities in otherwise normal system execu- is the reverse of it. It consists of two encoders for encoding the two
tion. The logs are usually unstructured textual messages that are sequences. For the final output, both encoding results are consid-
difficult to go through manually because of the ever-increasing ered. The bidirectional training of language models gives them
rate at which they are created. The raw data from the logs is deeper insight into the context of the text.
unstructured, noisy, and inconsistent; thus, some preprocessing
and parsing is essential. Enter cyBERT
Parsing logs with regular expressions is the most widely While BERT has achieved state-of-the-art results in a variety
utilized method available for network log analysis. A regular of NLP tasks related to written human language, applying its
expression (regex) is a sequence of characters specifying how to pretrained base model directly to network security logs required
match a sequence of characters. Outside of one-off parsing, you additional experimentation and training as well as adjustment
are most likely going to use regular expressions to repeatedly of the size of the input sequences that could be fed into a BERT
parse and normalize log files as part of the analysis infrastruc- model. This resulted in cyBERT (https://github.com/rapidsai
ture. However, as the log file format changes, regular expres- /clx/tree/branch-0.11/notebooks/cybert).
sions fail, and this can create failures in how log data is pro- The cyBERT project is an ongoing experiment to train and
cessed and evaluated. This is often the case as log structures vary optimize transformer networks to provide flexible and robust
in source, format, and time. As the number of sources increases, parsing of logs of heterogeneous network security data. It is part
the number of custom regex parsers increases as well. of the Cyber Log Accelerators (CLX) library, used to bring the
GPU acceleration of RAPIDS to real-world cybersecurity use
Advances in NLP cases. The goal of cyBERT and CLX is to allow network security
To mitigate the need to create hundreds of custom parsers for personnel, cyber data scientists, digital forensic analysts, and threat
each log, natural language processing (NLP) methods are now hunters to develop network security log data workflows that do
utilized to automate the task of parsing network security logs. not require custom regex parsing processes to get the data into
These initial NLP techniques were N-gram analysis, distance a format for evaluation and diagnosis.
measures (Jaccard, Levenshtein), and word embeddings (word2vec). Network security logs contain file paths, IP addresses, port
These methods attempt to evaluate the raw log data, extract nec- numbers, and hexadecimal values in a firm order versus what you
essary features from it (source, time, action), and restructure the would see in a typical string of words. The combination of these
log in a way it can be analyzed using common techniques. NLP log inputs can lead to complex regex that can change depending
methods are used when the features of the logs are unknown. on the source or the time of creation. cyBERT removes the need
The last few years have yielded advances in NLP that take to create the regex parsers as it determines each of the log inputs
advantage of more complex neural network word representations intuitively without having to account for every combination
than were seen in word2vec. Bidirectional Encoder Representa- of characters.

A Game Changer
Jim Scott is head of developer relations, cyBERT is built to be general enough that an organization
Data Science, at NVIDIA (www.nvidia.com).  can take it and train it for its custom network behavior. Instead
Over his career, he has held positions running of using the default corpus of English-language words in BERT,
operations, engineering, architecture, and QA cyBERT is developed using a custom tokenizer and represen-
teams in the big data, regulatory, digital adver-
tation trained from scratch on a large corpus of diverse cyber
tising, retail analytics, IoT, financial services,
manufacturing, healthcare, chemicals, and geographical logs. Providing a toolset powered by NLP to perform log pars-
information systems industries. ing is a game changer in the critical and time-sensitive area
of cybersecurity.

DBTA. COM/ BI GDATAQUARTERLY 27


DATA DIRECTIONS
The Coming Tsunami of Automation
A s many people in the world pass through Next, if they look around at the other
the eye of the pandemic storm, aspiring COVID-19 was a wake-up call stores in the mall, they may be terri-
to survive and experience the sun again, for many companies to find fied when they consider that the pleth-
another catastrophe looms—albeit barely ora of fast-food restaurants, hardware
visible. This potentially devastating prob-
alternative ways to do business. stores, gas stations, and everything else
lem lurks around the corner from the big- in view will soon be automated. More-
box store that may have just provided their over, the self-driving trucks making
COVID-19 vaccinations, but it is of a very the deliveries in the future will be able
different nature. We are facing a pending to travel 24 hours a day. They’ll follow
tsunami of automation. highway regulations as a matter of pro-
This pending tsunami is fueled by many gramming, so all is not bad. But these
converging factors, with technology at its trucks will also unload themselves. This
center. As companies struggle to fill posi- scenario isn’t science fiction or a figment
tions, they will look toward automation as of Tesla CEO Elon Musk’s imagination,
a solution. And, as a rising minimum wage and it isn’t waiting for a future genera-
affects their bottom lines, they will seek tion. It’s an inevitability, and it’s coming
ways to replace people with more cost- to a strip mall near you.
effective, profit-friendly tools. COVID-19 As everyone emerges from their bun-
was a wake-up call for many companies to kers and shelters, they will discover that
find alternative ways to do business. the companies of the six sister cities of
Silicon Valley are replicating as quickly
Winners and Losers as the perpetrator of the last global
Of course, automation won’t be a disaster for everyone, as the catastrophe. Processor technologies that would baffle Star Trek’s
six sister cities of Silicon Valley (https://www.dbta.com/BigData Mr. Spock are being blasted out by NVIDIA, Intel, AMD, and
Quarterly/Articles/The-Six-Cities-of-Silicon-Valley-125014.aspx) others. Cloud companies such as Google, AWS, Microsoft, and
will create millions of new millionaires and thousands of VMware are providing effectively unlimited capacity and ser-
new billionaires. We suggest that all the delighted recipients of vices. These technologies and platforms are enabling old, new,
whichever jab was received at the corporate superstore—amid and yet-to-be-booted-up companies to conceive of genius uses
their bliss that the ordeal may finally be over and that sanity of AI inference, machine-learning training, and deep-learning
may return soon—should also consider the organization behind muscle memory. There is almost nothing outside of the arts
the building they just left and ask why anyone is actually working and sports that will not be automated.
in there.
No Heavy Lifting
Let’s consider examples such as the amazing Amazon “Go” store
(www.geekwire.com/2020/amazon-goes-bigger-first-amazon
Michael Corey is co-founder of LicenseFortress -go-grocery-new-seattle-store-using-cashierless-technology/amp).
(www.licensefortress.com). He was recognized In this scenario, a mind-bending array of cameras and interpola-
in 2015 and 2017 as one of the Top 100 people tions determines what items the human customer has selected and
who influence the cloud. Corey is an Oracle Ace, then charges them to the customer’s credit card. Notice that this is
VMware vExpert, a former Microsoft Data Plat- the only mention of a human doing anything in this article. How
form MVP, and a past president of the IOUG. Check out his
blog at http://michaelcorey.com. are the shelves stacked? Surely that job requires a hefty and sweaty
employee? Not so. Think of “Stretch” (https://yorknewstimes​
Don Sullivan has been with VMware (www.
.com/news/boston-dynamics-unveils-stretch-a-warehouse-
vmware.com) since 2010 and is the product line
marketing manager for Business Critical Appli- robot/video_ef1a5b45-76b1-5965-aa65-f6b519565972.html), a
cations and Databases with the Cloud Platform new robot from Boston Dynamics.
Business Unit. And, if you aren’t a fan of Stretch and want to move into the
25th century a bit more slowly, try automating your warehouse

28 BI G D ATA QU A RTERLY | SU MMER 2021


DATA DIRECTIONS

processes with technology offered by companies such as 6 River


Systems (https://6river.com/automate-your-warehouse). There is almost nothing outside of
There are more questions to consider, such as one posed by
the arts and sports that will not be automated.
fast-food restaurants over the last 14 months: Why do any of them
have dining rooms? And wouldn’t an automated system be trivial?
“I’ll take a number 3, large, with fries” is not linguistically challeng-
ing, and we won’t need Captain Picard (from Star Trek: The Next
Generation) to deploy a futuristic food synthesizer to pass that
burger into the driver’s-side window.

The Real Cost


Unfortunately, this new technology buffet comes at a fright-
ening cost. For those of us concerned with the privacy-busting
but comfort-creating idea of vaccine passports, there is much
more to come. The 3 billion or so nucleotides that make us the
unique animals we call humans can be sequenced, of course,
and the data stored. So why not automate that? And, since we’re
on the subject, let’s do it as quickly as a barcode can be scanned.
So, what will the average blue-collar worker do in this
new techtopia? We can all watch with unlimited attention
and enthusiasm the ongoing debate among politicians and
economists about the efficacy of a universal basic income,
but some quick Googling (yes, we used Google as a verb) will
provide contrary views on the long-term consequences of such
social engineering.
The answer is: “Who knows?” But flippancy aside, we are
about to see the emergence of a gaggle of new AI/machine learn-
ing/analytics companies that will create an unlimited menu of
physical and programmatic automation options and become
very rich in the process. Where that will take humanity is a mys-
tery that even The Twilight Zone’s Rod Serling might choose to
leave in an unknown dimension. However, back on Earth in
2021, we, the authors of this article, take solace in the fact that
baseball is again being played with fans in the cheap seats—
and we do believe that the Yankees and the Red Sox won’t soon
be automated.

MODERNIZING DATA
FALL MANAGEMENT FOR THE HYBRID,
MULTI-CLOUD WORLD
2021 For sponsorship details, contact Stephen Faig,
[email protected], or 908-795-3702.

DBTA. COM/ BI GDATAQUARTERLY 29


THE DATA-ENABLED ORGANIZATION
Enabling the Entire Organization
As the data industry continues on a trajectory of self-service, or, to put it plainly, just because something is pretty doesn’t
data enablement, and analytics empowerment, organizations mean anyone will want to work with it.
need to change the way they think about data professionals.
Too often, we think “data professionals” but hear “data sci- Participation and Collaboration
entists,” “business analysts,” and “power users.” And while those Stop thinking about tools and start thinking about platforms
data professionals certainly need to be empowered with better designed to encourage participation, collaboration, and ease of
tools, they still only make up a minority (~20%) of the people use. A new generation of vendors is bridging this gap, and while
within the organization who already work with these tools will likely find themselves part of an
data. What about the other 80%? Ultimately, the success enterprise ecosystem, they offer instant value due
of a data-enabled to their familiarity. With solutions such as Data-
User Experience Is Everything meer Spotlight for data prep or Grid.is for amp-
organization will be
Empowering every data user starts with reassess- ing up Excel, even the most basic data user can
ing how we think people work with data. There is
largely dependent on its begin providing benefits back to the business.
a misconception that data is scary and that people ability to support self- Next, while organizations should look to
don’t know how to find, analyze, and use data in service users with the enable every business user to work with data,
decision making. This is simply untrue. People use platforms they need to they should also enable IT to support data
everyday data to make decisions routinely. They work with data. users. Data management is IT’s responsibility,
know how to find data online (search) and how but so is providing access to data. If people are
to collaborate and share findings (texting and posting). Consider intimidated by IT systems and processes, this presents a barrier.
the process of perusing the ratings and review sections of sites such Think of businesspeople making decisions with data as water
as Amazon and Yelp (trusting data) to make informed purchase flowing down a river: If you put in an obstacle, they will flow
decisions or select where to have lunch (taking action). These are right around it. Telling an eager data user they can’t work with a
simple choices, requiring simple analysis. But there are larger, more dataset and/or can’t have access to a database just means they’ll
complex evaluations that necessitate larger, more complex analy- find a way to do it anyway—and likely away from the watchful
sis—selecting and buying a vehicle, researching all the variables gaze of IT. As gatekeepers, IT team members aren’t just custo-
surrounding the purchase of a new home, and so on—and people dians of data; they are there to enable the organization to work
use analysis to make these decisions too. Making data approachable with its data—to provide secure access, management structure,
isn’t the problem. We need to give emergent data professionals the and governance safety.
data they need within an environment designed to make them suc-
cessful so that the task of analysis isn’t a chore—it’s an experience. Evaluating Experience
User experience is everything. Enabling data professionals Shifting perspectives and adopting new organizational cul-
requires providing an environment to work with data in a natural, tures doesn’t happen overnight. However, we can make data-​
intuitive format. Too many tools are designed for the power user informed, analytics-driven decisions to measure data enable-
with a developer’s mindset. While the analytics capability is cer- ment. While metrics for the impact of “self-service” are still
tainly there, merely updating the user interface doesn’t make a tool somewhat nebulous, we can measure the efficacy of end-user-​
user-friendly any more than watering down bloated enterprise soft- designed experience for data work by simply changing the way
ware makes it “self-service.” Aesthetics do not equal usability— we evaluate its adoption. Rather than counting how many users
are engaging with a tool or platform (the number of licenses in
use), we should consider how often they are using, sharing, and
publishing it (how often and how it is being used). It’s “screen
Lindy Ryan is the chief content officer
at Radiant Advisors, a trusted research and time” analysis for organizational self-service.
advisory firm that leverages experience Ultimately, the success of a data-enabled organization will be
and industry involvement to deliver prag- largely dependent on its ability to support self-service users with
matic guidance in executing data and the platforms they need to work with data and the data with which
analytics strategies. She is an award-win- they want (and need) to work. This means we must empower all
ning professor of visual analytics and the author of two data users and consider the needs of internal customers as much
textbooks on visual data culture and data visualization.
as external customers. Productive data users are happy data users
Follow her on Twitter @Radiant_Lindy.
and role models, and happiness starts from within.

30 BI G D ATA QU A RTERLY | SU MMER 2021


THE IoT INSIDER
The Broken Promise of the Connected Oven
Creating a great customer experience, the holy grail I downloaded the app, created my profile, and
for many companies, got its start in stores. Clever From a customer connected all three devices. This all went quite
retailers learned that nurturing positive feelings experience perspective, smoothly. It was nice to see the timer going off
during the customer’s interaction could be scripted if your products are on my mobile phone, reminding me while I was
in such a way that the outcome benefited the sale. connected you also must working that there was still something in the
The thinking was—and still is—that a positive cus- aim for a connected oven. I even figured out how to start the oven
tomer experience promotes loyalty and encourages customer experience. safely from a distance. So far so good.
brand advocacy. That is, until we wanted to bake a pizza. While
While the origin of customer experience was heating the oven to 300°, it started to give an
in-store, this quickly expanded beyond the walls of the shop into alert on the display: “Contact service desk.”
other channels—the first being call centers—and, with the rise of I was looking up the phone number in the manual and noticed
the internet, to email and social media. As with all things new, a lot that it said if your oven is connected, your service assistant can
of these efforts happened in isolation. Employees in the shop didn’t diagnose the problem remotely. “Aha!” I thought. “The wonders
have a clue to whom you had already spoken in the call center and of IoT finally at work.” What a deception it turned out to be.
vice versa. From a traditional experience perspective, the whole conversa-
A lot of time and money has been spent in the last few years tion with the call center went fine. The phone was picked up quickly,
to improve that. Customer 360 projects (using data from customer and the person was friendly. However, when I mentioned that I had
touchpoints to get a complete view from purchase to service and a connected oven and that I had already registered, she had to admit
support) and omnichannel initiatives are being initiated to bring that there was no link between her administration and the app.
down those walls. All this sounds familiar, right? I wasn’t known in the (service desk’s) system yet. Why not? I
However, a new type of channel looms on the horizon, and it registered my device on the app! We then had to go through the
is going to be the toughest one of all to achieve. In order to master whole registration process again. She couldn’t access my oven (and
it, customer experience needs to evolve to the next level—a level honestly, I don’t think she would have known what to diagnose if
I call the “connected customer experience.” The channel? Con- she could have). So, I had to read out the error codes, and a bunch
nected products. of other codes from an impossible place, lying down on the ground
With the advance of IoT, more and more products are getting to see the sticker at the bottom of the door (which I am pretty sure
smarter. But making things smarter also raises expectations because, would all have been accessible through the app). A service visit has
as Peter Parker (alias Spider-Man) knows, with great power comes now been scheduled with a technician who is going to visit me in
great responsibility. And that is exactly what is going to bite a lot of 10 days’ time. I will keep you posted.
companies in the backside.
The Last Mile of CX
Beware the Disconnected Customer Experience This example underlines the point that, from a customer experi-
I recently had an experience that brought it all home. Last year, ence perspective, if your products are connected, you also must aim
when I got my kitchen redone, I spent a fortune on an oven from a for a connected customer experience. Your product must become
very exclusive brand. My wife selected the oven based on the brand, integral to the traditional channels as well as integrated through-
the aesthetics, and some of the features—such as being able to heat out the whole support ecosystem—across email, social media, chat,
it to 300° Celsius, turning it into a real pizza oven. and call-center systems.
Of course, I couldn’t help myself and had to figure out all the gad- Why not dispatch the error message automatically from the
gets and gizmos, and quite quickly ended up consulting the paper app to the service desk? Why not have the service desk call you
manual. It turned out that the oven—and another two devices I automatically, informing you of the issue and the resolution?
bought—had a connected feature. “Cool!” I thought. “Let’s connect.” With everything in place, that last mile shouldn’t be too hard.
After all, there is nothing comparable to a connected customer
experience—where everything goes smoothly and as it should—
to help gain customer loyalty. Which brings me to this conclusion:
Bart Schouw is vice president
of technology and digital alliances, The end game of IoT is not device connectivity or management.
Software AG (www.softwareag.com). The end game of IoT is business process integration, because a
disconnected experience, such as the one I had, will quickly sour
your customer on your product—connected or not.

DBTA. COM/ BI GDATAQUARTERLY 31


GOVERNING GUIDELINES

AI Governance: Cultivating Critical Thinking


There is, to the best of my knowledge, no way to constrain base
AI technologies in ways that preclude use by bad actors—short, The single most important output of your AI
of course, of putting the genie back in the bottle, which has (or any other) governance program
never been a winning strategy in fantasy or real life. Nor is it may be the capacity for critical thinking.
possible for poor outcomes by well-intentioned individuals to be
entirely prevented. We can, however, reduce the prospect of such • Are there inherent limitations, including biases, that may be
situations. Doing so requires getting up close and comfortable reflected in the solution? If an algorithm is trained on data from
with uncertainty and risk. It involves lowering the barriers to dis- a predominantly elderly population, for example, the algo-
cussing uneasy topics openly, honestly, and without prevarication, rithm may not be as accurate when applied to younger cohorts.
and last, but not least, making it not only acceptable but expected • What other perceptions or factors may impact how the solu-
for teams to actively seek to disprove their own theories and crit- tion is received? Despite their increasing sophistication, AI
icize their creations. In this regard, the single most important chatbots are not known for their bedside manner.
output of your AI (or any other) governance program may be • W hat happens next? When it comes to clinical decision
the capacity for critical thinking. making, a differential diagnosis or predicted outcome is not
the end of the story. Rather, it’s the beginning of a complex,
Understanding the Risks ongoing dialogue that must marry clinical insight with
Deploying AI fairly, safely, and responsibly requires clarity about empathetic human understanding.
the risks and rewards of an imperfect solution, not the attainment • What is the potential harm and error tolerance of parties
of perfection. An AI algorithm will make mistakes. The error rate impacted by the solution? Consider that patients report they
may be equal to or lower than that of a human. Regardless, until are more inclined to forgive a mistake made by a human
data perfectly representing every potential state—past, current, doctor than a machine—even if the error is one the human
and future—exists, even a perfectly prescient algorithm will err. clinician may make more frequently. Paradoxically, patients
Given that neither perfect data nor perfect algorithms exist, are also receptive to and, in many cases, welcoming of AI,
the question isn’t whether errors will happen but instead: When, for instance, in robotic surgery—if they perceive a positive
under what conditions, and at what frequency are mistakes likely? benefit to their long-term health.
Enter the premortem. This concept—which Steven Johnson • What are potential ramifications of this solution today and
highlights in Farsighted: How We Make the Decisions That Matter the tomorrow—even in the absence of errors?
Most—is particularly important in environments that are inherently • What other options exist to solve this problem?
complex, where uncertainty abounds and the stakes are high. Outside
of rote task automation, this description applies to most AI solutions. Mitigating Harm and Course-Correcting
Rigorous premortems confront uncertainty head-on by forc- Of course, merely posing these questions is not enough.
ing teams to explicitly consider questions such as the following: Mindful debate and out-of-the-box viewpoints should be encour-
• What are the real mistakes this solution could make and their aged. Such deliberations can be advanced through the use of tools
quantifiable impact? In healthcare, a system could underesti- such as scenario planning, decision mapping, simulation, adver-
mate the need for nursing staff, apply an incorrect diagnosis sarial game-playing, value modeling, and storytelling.
code, or generate a false-negative diagnosis. Done well, premortems allow diverse stakeholders to consider
the impact of a given AI solution in the context of what is known and
unknown. This, in turn, allows informed decision making regarding
Kimberly Nevala is a strategic advisor at both if and how a given AI solution should be implemented, and it
SAS (www.sas.com). She provides counsel encourages teams to preemptively eliminate or mitigate harm and
on the strategic value and real-world realities
of emerging advanced analytics and infor- rapidly course-correct before and after a solution is deployed.
mation trends to companies worldwide. So, while we often think of governance as dictating rules, we are
She is is currently focused on demystifying better served if governance promotes critical thinking. Indeed, the
the business potential and practical implications of AI and extent to which your AI governance program creates the capacity
machine learning. for continuous, constructive critique may be the extent to which
your AI program does more good than harm.

32 BI G D ATA QU A RTERLY | SU MMER 2021


DATA
Now, more than ever, the ability to pivot and adapt is a key characteristic of
modern companies striving to position themselves strongly for the future.
Download this year’s Data Sourcebook to dive into the key issues impacting
enterprise data management today, and gain insights from leaders in cloud,
data architecture, machine learning, and data science and analytics.

Download Your Copy Today!


https://bit.ly/BDSbook8

FROM THE PUBLISHERS OF

You might also like