Recognize The Key Characteristics of Big Data
Recognize The Key Characteristics of Big Data
Big data – what exactly is it? Think of big data as both internal and external data collected
from both traditional and digital sources. It's analyzed to drive improvements in your
business – which sometimes leads to unexpected discoveries. Sometimes data is already
structured in a certain way, but often it's only partially structured, or not at all – think
metadata or Twitter tweets. The data's always been around, but today it's processed in the
blink of an eye. You've likely heard people talking about the three V's of big data – high
volume, high velocity, and wide data variety. What's "big" about data is just how much of it
there is – the volume. But giant data volumes are nothing new – what has changed is the
desire to interpret this information. The problem is that traditional relational database
management systems, or RDBMS, can't handle such big volumes on a single server. As a
result, they can be plagued by response times that are too slow for many of today's
applications and struggles with data that's not in a format it expects. Another volume
concern is how long the stored data remains valuable – when do you throw it out? Old
data takes up the space, and maintaining it can become cumbersome. Another problem
facing companies is the high velocity of data. This is the incoming data speed. Picture a
tap open to maximum. Can you catch all the water? Can you do it in one go without
pausing? And coming back to data, even when you're capturing data at high speed, you
need to be certain of its consistency and completeness. Timeliness, or latency, is a
velocity issue – is the data being captured at a reasonable rate, or is it too slow, which
may make the data worthless. So for real-time analytics to be of any value, data storage
and retrieval needs to happen at a high velocity. For example, think about your favorite
music streaming site. Each time you log in, you instantly see a brand new list of songs and
albums recommended for you, generated from all the data collected about you. All this
data is a nightmare to manage. Trying to match data streams to specific events is a tough
task, made even tougher by the overwhelming variety of data. Videos, photos, music, GPS
and sensor signals, and billions of social media text messages. It's not structured, and it's
estimated that this data makes up 80% of what companies store. So – there are various
data types, but there's also a variety of sources, both inside and outside your company. A
real game changer will be finding new ways to store and retrieve data as quickly as
possible. Or ways of packaging it for better analysis. While volume, velocity, and variety
are the big V's, don't forget there are some smaller V's too! Value, veracity, variability,
viscosity, and virality are all relevant to any big data plan. Big data is an evolving term, in
an evolving industry. So, expect updates as our understanding of what data is – and what
it could be – changes.
TRADUCCION
Big data – que es exactamente? Piense en Big Data como datos tanto internos como
externos recopilados de fuentes tradicionales y digitales. Se analiza para impulsar
mejoras en su negocio, lo que a veces conduce a descubrimientos inesperados. A veces,
los datos ya están estructurados de cierta manera, pero a menudo están solo
parcialmente estructurados, o no lo están en absoluto: piense en metadatos o tweets de
Twitter. Los datos siempre han estado disponibles, pero hoy se procesan en un abrir y
cerrar de ojos. Probablemente has escuchado a personas hablar sobre las tres V de big
data -
Today's consumers are picky. It's easy to research a product before you buy it – you could
even do this in the store. With big data, you can profile your customers, and engage their
personal tastes in real-time. Let's examine some other benefits of big data.
Cost and time reductions are almost always going to increase profits. Hadoop, and similar
big data technologies, offer huge improvements over the older tech, like data warehouses.
There's a trend where big companies like Citi are using both the old and new tech
together, and the result is lower costs. Just think about it: you can model and test millions
of variations on a product, and then simulate them in record time. The amount of time,
money, and effort saved is truly incredible. Big data analytics really turns up the dial on the
efficiency of your business processes. For example, instead of replacing all the machines
in a factory at an average breakdown interval, big data allows specific, real-time checks to
pinpoint specific machines – or even individual components – and just replace those as
needed. Less downtime, less money spent, far more effective. Time is money, as they say,
and the new tech is speeding up decision-making. Architectures like Hadoop give
companies lightning fast in-memory analytics. Caesars – a gaming company – has
adopted Hadoop. They've always collected and analyzed information from their slot
machines, loyalty program, and web clickstreams. But big data lets them analyze it in real-
time. Now they can react instantly, while the customer is still in the casino. For example, if
someone is having terrible luck at the slots, a Caesars employee might approach them
with a coupon to tempt them to stay longer. Besides improving existing services, big data
creates many new opportunities. For the first time offline companies – as online
companies have done for years – are trying to use big data insights to create new products
and services. Sports teams are using data about who attended their live games and who's
traveled to be there. Also what extra activities fans are doing before or after the game –
like which fast food chain is the most popular? This kind of data is invaluable for targeted
advertising to get in the heads of customers. Another example is analyzing social media
posts. This lets you know how people feel about your brand or the quality of your service.
Another development is that companies can sell your market insights as trend data, after
stripping out all the personal information. So the data itself can be a valuable revenue
stream. But these aren't the only benefits. Big data opens up dialog with individual
customers. You can tweak web sites on the fly. It's easier to perform risk-analysis, and
your data is safer – not to mention initiatives like smart cities and personalized healthcare.
Likely your organization is already collecting big data. Harness it properly, and you'll be in
pole position to reap the rewards.
Is your data more like a neatly organized financial filing cabinet, or like a dusty attic full of
odds and ends? Well, that's not too different from how databases view structured and
unstructured data.
Financial data – much like other numbers, dates, addresses, or names of things – are data
units called strings. Strings have a specific length and format. This makes them structured
data. It's easy to store structured data in a database. Then you can use a structured query
language, or SQL, to extract specific, usable information from it. Typical sources include
data from enterprise management planning, customer relationship management systems,
or your financial data.
Some structured data is machine generated, like point-of-sale data. Global positioning
system – GPS, Radio frequency identification – RFID, and other sensor data are also
created without human intervention, as are stats from servers, applications, and networks.
The other kind of structured data is the result of human interaction with a machine. Data
being captured from hand written forms is an example, as are clickstreams. Casinos might
monitor every move you make on one of their slot machines.
If you're trying to predict patterns in user behavior, structured data is your best friend.
However, only 20% of data is structured. But stop and think about the YouTubers who
upload 48 hours of video every minute. This is a massive stream of data with no specific
format so it's unstructured data. Unstructured data makes up the remaining 80% of big
data. It's the vast majority.
Consider all the images and videos captured every day. How much data is that? Just think
about the amount of data fed into Google Earth. These examples are all machine-
generated. Scientific data from the atmosphere, radar, and sonar are also made by
machines. But throw some human interaction into the mix and you get all the text inside
documents, PDFs, and e-mails; you get every scrap of social media input; mobile text
messages; and the content of web sites like YouTube, Flickr, and Instagram.
Now you might be thinking: I don't agree – many of these examples do have structure! If
so, you're not alone. Some people feel that in an email, for example, there's a definite
structure – it's an email! The problem is that depending on the model, data mining tools
can't get any useful information from it. How do you figure out the content of the email?
What's in it? Or what's it about?
But despite the seemingly countless number of unstructured data types, they all have one
thing in common. The format they're in doesn't matter. And they can be stored without the
system knowing what they are.
The challenge for many organizations is no longer handling all this information – the
challenge is making it meaningful. The more you can understand and make unstructured
data useful, the more value you offer your company.
How can big data help you in your industry? Pretty much any organization in any market
segment can use big data to their advantage. There's something for everyone.
There are obvious uses for the banking and finance sector. A wealth of information means
products can be tailored to meet client needs. Tracking the way people behave leads to
insights on how to monetize customer shopping habits. But with all this data streaming in,
keeping it all secure is an ongoing challenge. Banks and financial institutions must be
alert, and have systems in place to minimize risk and prevent fraud.
Take the GE Capital Retail Bank for example. They now offer custom credit cards to meet
client needs, based on the data that they've gathered. It's linked to an initiative that gathers
data on how users are spending.
It's well known that the US government owns six of the world's top ten supercomputers.
Big data has many uses in the government sector. For example, it's much easier to track,
manage, and optimize utilities. Likewise, the use of analytics lets big data help with
running governmental agencies. Crime can be reduced or prevented. And traffic
congestion can be analyzed and eased.
Another example, the Food and Drug Administration uses big data to study patterns in
food related illnesses, which lets them provide quicker access to treatments.
Big data has a lot to offer the education sector. Analytics can be used to track student
progress and identify at-risk students. The data is also useful for improving evaluation and
staff support systems. It's difficult to decide between all the curriculums out there, but
feedback from big data makes this much easier.
Some universities already use learning management systems to track student progress.
They log the time students spend on different web pages, and keep tabs on monthly and
yearly progress.
Big data offers higher accuracy in record keeping. This directly benefits the health care
sector. But more accurate records are just the tip of the iceberg. Another benefit is faster
access to patient records, prescriptions, and treatment plans.
Imagine free, instant public health data. Imagine having enough information to reduce the
spread of chronic diseases, or of dangerous airborne infections…thanks to big data, these
dreams are already realities.
Improved supply chain efficiency is an obvious advantage for the manufacturing sector.
Statistics also show how using big data increases the quality and quantity of output; while
still reducing waste. Speeding up problem-solving and better decision-making are also
welcome byproducts of using big data in manufacturing.
Big data lets retailers analyze subtle trends in shopper behavior by examining the
information from point-of-sales, loyalty cards, and radio frequency identification. This leads
to better customer relations, better marketing decisions, and better transaction methods.
There's no doubt big data offers a broad spectrum of benefits, across a number of
industries. Where can your company use this technology to improve its performance?
Cyber-thieves are devious and probably have dozens of ways to get a hold of your
personal data. You're at risk from poorly designed or encrypted software, companies that
don't protect personal data, and insecure network services. Most vulnerable are
infrastructure security and data privacy. Remember, a big data security breach could affect
thousands – or millions – of people. Luckily there are a number of best practices for
securing big data. Let's examine some of them.
Who's accessing your data must be tracked – and the best way is real-time monitoring.
Any unauthorized access is flagged the instant it happens. If there are more serious,
skilled attacks, you'll need to have a level of cyber threat intelligence in place to detect and
prevent it.
Likely you're storing big data in the cloud. You need to vet your cloud providers. What
protection methods do they use? Are there any penalties for them not maintaining your
security? Who else is using this provider, and what do they have to say?
It's always possible that your data will get leaked. A good practice is to always remove
personal information from stored data. But making data anonymous is just a good start.
Data encryption is essential. Both your raw data and your analytics data must be
encrypted. Attribute-Based Encryption provides a careful control to encrypted data access.
It's public-key encryption, with the user's secret key and encrypted text depending on
attributes, such as where the user lives.
Another approach is to run periodic risk assessments. If you're collecting private data from
customers, what policies do you have in place to make sure it stays secure? How do you
ensure that their right to privacy isn't breached? You might also be sharing data with
another company. Consider how exactly the data is shared. And also think about how safe
it is. The reputation of both companies is at stake – so you need to think of ways to lower
the risk. Any deliberately, or even accidentally, released data that violates user privacy
policies can spiral into a legal fiasco.
Physical devices, such as your phone, are difficult to make secure. Any time you leave
your phone on the table of a restaurant, or at work, someone could take it, or try to access
it before you come back. There are ways to bypass the lock screen and password. That's
why it's better to focus on application security over device security.
Also, you can consider isolating the devices and servers containing critical data.
The last, and perhaps most obvious, security measure is to use antivirus software. Most
antivirus packages come bundled with a few security features.
Protecting data is no easy task. We live in a world where even depersonalized data can be
run through systems that cross-reference it with existing datasets, filling in the identifiers.
So you have to do what it takes to keep your data safe.
The life cycle of a piece of data can range from seconds to decades. The data type of a
piece of data plays a big role – different organizations use different data life cycles. That
said, there is a universal big data life cycle. Any piece of data passes through its five basic
phases.
Somewhere, someone – or something – creates a unit of data. This could be a sensor
registering wind speed, a data entry worker clicking save in a new spreadsheet, or sending
a text message to a friend. When you upload a document to your company's server, it
passes through the firewall and is copied, or created, in your company’s database. This is
new data, it's never existed in your company before. This includes any data the company
collects.
The newly created data is then shared. This happens inside your company and with any
number of external parties. Social media makes sharing data a public activity. Once
information is on the Internet, it's not really a question of if your data's being shared, but a
question of when. The moment your data leaves a company's internal system, there's no
way to correct or recall it.
Whether internally or externally, someone's going to use your data. Companies use data
as information to aid their business processes. Before you upload anything you would've
had to agree to – but probably didn't read – the company's terms of service or privacy
policy. This is the legal agreement explaining how the organization will handle your data.
Most people just accept that their data is continuously being collected. But they expect to
have a choice over how it's used. I mean, nobody wants their information to be sold behind
their backs.
What happens when your data isn't being used anymore? The next phase in the data life
cycle is where data is stored in an archive. This determines the lifetime of your data.
Regulations limit data archive time in some sectors, like the US health and finance
industries, but mostly cloud storage means your data can effectively be kept forever – and
it isn't expensive to do this. This means organizations can run long-term analytics on vast
data stores.
Sadly, at some point all things must die and data is no different. You can destroy data by
erasing every single copy from a company database. This is often called data purging.
Depending on the terms of service you agreed to, you might be able to request that a
company destroy your data – but not always. Often organizations keep some form of your
data archived.
Every single piece of data goes through a data life cycle. This can take moments, or
decades. With more and more space becoming available, some data might indeed live on
forever in the cloud.