Case Study Bigdata Tools
Case Study Bigdata Tools
TAE
Mohit Maharwade(32)
Aditya Bhurkunde(26)
7th Sem IT
Case Study on Big Data Tools
As we all know, data is everything in today's IT world. Moreover, this data keeps multiplying by
manifolds each day.
Earlier, we used to talk about kilobytes and megabytes. But nowadays, we are talking about
terabytes.
Data is meaningless until it turns into useful information and knowledge which can aid the
management in decision making. For this purpose, we have several top big data software
available in the market. This software help in storing, analyzing, reporting and doing a lot more
with data.
CDH aims at enterprise-class deployments of that technology. It is totally open source and has
a free platform distribution that encompasses Apache Hadoop, Apache Spark, Apache Impala,
and many more.
It allows you to collect, process, administer, manage, discover, model, and distribute unlimited
data.
Pros:
Comprehensive distribution
Cloudera Manager administers the Hadoop cluster very well.
Easy implementation.
Less complex administration.
High security and governance
Cons:
Apache Hadoop is a software framework employed for clustered file system and handling of big
data. It processes datasets of big data by means of the MapReduce programming model.
Hadoop is an open-source framework that is written in Java and it provides cross-platform
support.
No doubt, this is the topmost big data tool. In fact, over half of the Fortune 50 companies use
Hadoop. Some of the Big names include Amazon Web services, Hortonworks, IBM, Intel,
Microsoft, Facebook, etc.
Pros:
The core strength of Hadoop is its HDFS (Hadoop Distributed File System) which has
the ability to hold all type of data – video, images, JSON, XML, and plain text over the
same file system.
Highly useful for R&D purposes.
Provides quick access to data.
Highly scalable
Highly-available service resting on a cluster of computers
Cons:
Sometimes disk space issues can be faced due to its 3x data redundancy.
I/O operations could have been optimized for better performance.
#3) Cassandra
Apache Cassandra is free of cost and open-source distributed NoSQL DBMS constructed to
manage huge volumes of data spread across numerous commodity servers, delivering high
availability. It employs CQL (Cassandra Structure Language) to interact with the database.
Some of the high-profile companies using Cassandra include Accenture, American Express,
Facebook, General Electric, Honeywell, Yahoo, etc.
Pros:
Cons:
#4) Knime
KNIME stands for Konstanz Information Miner which is an open source tool that is used for
Enterprise reporting, integration, research, CRM, data mining, data analytics, text mining, and
business intelligence. It supports Linux, OS X, and Windows operating systems.
It can be considered as a good alternative to SAS. Some of the top companies using Knime
include Comcast, Johnson & Johnson, Canadian Tire, etc.
Pros:
Cons:
Pricing: Knime platform is free. However, they offer other commercial products which extend
the capabilities of the Knime analytics platform.
#5) Datawrapper
Datawrapper is an open source platform for data visualization that aids its users to generate
simple, precise and embeddable charts very quickly.
Its major customers are newsrooms that are spread all over the world. Some of the names
include The Times, Fortune, Mother Jones, Bloomberg, Twitter etc.
Pros:
Device friendly. Works very well on all type of devices – mobile, tablet or desktop.
Fully responsive
Fast
Interactive
Brings all the charts in one place.
Great customization and export options.
Requires zero coding.