Big Data
Big Data
Volume
Volume is the v most associated with big data because, well, volume can be big. What we're
talking about here is quantities of data that reach almost incomprehensible proportions.
Velocity
Velocity is the measure of how fast the data is coming in. Facebook has to handle a tsunami of
photographs every day. It has to ingest it all, process it, file it, and somehow, later, be able to
retrieve it.
Variety
Data was once collected from one place and delivered in one format. Once taking the shape of
database files - such as, excel, csv and access - it is now being presented in non-traditional forms,
like video, text, pdf, and graphics on social media, as well as via tech such as wearable devices.
Many of the times, companies fail to know even the basics: what big data actually is, what its
benefits are, what infrastructure is needed, etc. Without a clear understanding, a big data
adoption project risks to be doomed to failure. Companies may waste lots of time and resources
on things they don’t even know how to use. And if employees don’t understand big data’s
value and/or don’t want to change the existing processes for the sake of its adoption, they can
resist it and impede the company’s progress.
Unreliable data
Nobody is hiding the fact that big data isn’t 100% accurate. And all in all, it’s not that critical.
But it doesn’t mean that we shouldn’t at all control how reliable our data is. Not only can it
contain wrong information, but also duplicate itself, as well as contain contradictions. And it’s
unlikely that data of extremely inferior quality can bring any useful insights or shiny
opportunities to our precision-demanding business tasks.
Security
Data brings in its wake the issue of governance and security. Big data, by its very nature, means
that data flows from different sources. The more nodes there are, the more the system is
vulnerable to exploits that could lead to losses. Managing such sources and ensuring integrity as
well as security call for expert governance measures.
Organizational Resistance
Organizational resistance even in other areas of business has been around since forever. It is a
problem that companies can anticipate, and as such, decide the best way to deal with the
problem. If it’s already happening in our organization, we should know that it is not something
out of the ordinary. Of the utmost importance is to determine the best way to handle the situation
to ensure big data success.
The management of big data, right from the adoption stage, demands a lot of expenses. For
instance, if your company chooses to use an on-premises solution must be ready to spend money
on new hardware, electricity, new recruitments such as developers and administrators and so on.
Additionally, you will be required to meet the costs of developing, setting up, configuring and
maintaining new software even though the frameworks needed are open source.
HADOOP -
Hadoop is a framework permitting the storage of large volumes of data on node systems. The
Hadoop architecture allows parallel processing of data using several components:
Hadoop Architecture -
Hadoop has a Master-Slave Architecture for data storage and distributed data processing
using Map Reduce and HDFS methods.
Name Node - Name Node represented every files and directory which is used in the
namespace
Data Node - Data Node helps you to manage the state of an HDFS node and allows you to
interacts with the blocks
Master Node - The master node allows you to conduct parallel processing of data using
Hadoop Map Reduce.
Slave Node - The slave nodes are the additional machines in the Hadoop cluster which
allows you to store data to conduct complex calculations. Moreover, the entire slave node
comes with Task Tracker and a Data Node. This allows you to synchronize the processes
with the Name Node and Job Tracker respectively.
Hadoop can efficiently analyze server-log data and respond to a security breach in real-time.
Server-logs are nothing but computer-generated logs that capture network data operations,
particularly the security and regulatory compliance data. Server-log provides companies and
organizations important insights pertaining to network usage, security threats and compliance.
Hadoop is the perfect fit for staging and analyzing this data.
3. Geo-location Data
We are a part of a fast growing technological world, where smart-phones play a major role.
Retail, manufacturing, auto industry and other enterprises can now track their customers’
movement and predict customer purchases using geo-location data using smart phones and
tablets. Hadoop clusters help in streamlining enormous amount of geo-location data for the
organizations to figure out their trouble areas in the business.
Retailers both online and offline use Hadoop for improving their sales. Many e-commerce
companies use Hadoop for keeping track of the products bought together by the customers. On
the basis of this, they provide suggestions to the customer to buy the other product when the
customer is trying to buy one of the relevant products from that group.