0% found this document useful (0 votes)
72 views5 pages

Case Study Bigdata Tools

This case study discusses 5 popular big data tools: 1. CDH is an enterprise-grade distribution of Apache Hadoop from Cloudera. It allows storing and processing unlimited data across clusters but licensing is expensive. 2. Apache Hadoop is the most widely used open-source framework for distributed storage and processing of large datasets. It is used by many large companies and is free to use. 3. Apache Cassandra is a free open-source NoSQL database designed for managing large amounts of structured data across servers with no single point of failure. 4. KNIME is an open-source tool for data integration, reporting, analytics, and more. It offers free platform with

Uploaded by

Pratik Bhongade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views5 pages

Case Study Bigdata Tools

This case study discusses 5 popular big data tools: 1. CDH is an enterprise-grade distribution of Apache Hadoop from Cloudera. It allows storing and processing unlimited data across clusters but licensing is expensive. 2. Apache Hadoop is the most widely used open-source framework for distributed storage and processing of large datasets. It is used by many large companies and is free to use. 3. Apache Cassandra is a free open-source NoSQL database designed for managing large amounts of structured data across servers with no single point of failure. 4. KNIME is an open-source tool for data integration, reporting, analytics, and more. It offers free platform with

Uploaded by

Pratik Bhongade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Case Study BigData Tools

SUBJECT: BigData & Hadoop

TAE

Mohit Maharwade(32)
Aditya Bhurkunde(26)
7th Sem IT
Case Study on Big Data Tools

As we all know, data is everything in today's IT world. Moreover, this data keeps multiplying by
manifolds each day.
Earlier, we used to talk about kilobytes and megabytes. But nowadays, we are talking about
terabytes.
Data is meaningless until it turns into useful information and knowledge which can aid the
management in decision making. For this purpose, we have several top big data software
available in the market. This software help in storing, analyzing, reporting and doing a lot more
with data.

#1) CDH (Cloudera Distribution for Hadoop)

CDH aims at enterprise-class deployments of that technology. It is totally open source and has
a free platform distribution that encompasses Apache Hadoop, Apache Spark, Apache Impala,
and many more.
It allows you to collect, process, administer, manage, discover, model, and distribute unlimited
data.
Pros:

 Comprehensive distribution
 Cloudera Manager administers the Hadoop cluster very well.
 Easy implementation.
 Less complex administration.
 High security and governance

Cons:

 Few complicating UI features like charts on the CM service.


 Multiple recommended approaches for installation sounds confusing.

However, the Licensing price on a per-node basis is pretty expensive.


Pricing: CDH is a free software version by Cloudera. However, if you are interested to know the
cost of the Hadoop cluster then the per-node cost is around $1000 to $2000 per terabyte.

#2) Apache Hadoop

Apache Hadoop is a software framework employed for clustered file system and handling of big
data. It processes datasets of big data by means of the MapReduce programming model.
Hadoop is an open-source framework that is written in Java and it provides cross-platform
support.
No doubt, this is the topmost big data tool. In fact, over half of the Fortune 50 companies use
Hadoop. Some of the Big names include Amazon Web services, Hortonworks, IBM, Intel,
Microsoft, Facebook, etc.
Pros:

 The core strength of Hadoop is its HDFS (Hadoop Distributed File System) which has
the ability to hold all type of data – video, images, JSON, XML, and plain text over the
same file system.
 Highly useful for R&D purposes.
 Provides quick access to data.
 Highly scalable
 Highly-available service resting on a cluster of computers

Cons:

 Sometimes disk space issues can be faced due to its 3x data redundancy.
 I/O operations could have been optimized for better performance.

Pricing: This software is free to use under the Apache License.

#3) Cassandra

Apache Cassandra is free of cost and open-source distributed NoSQL DBMS constructed to
manage huge volumes of data spread across numerous commodity servers, delivering high
availability. It employs CQL (Cassandra Structure Language) to interact with the database.
Some of the high-profile companies using Cassandra include Accenture, American Express,
Facebook, General Electric, Honeywell, Yahoo, etc.
Pros:

 No single point of failure.


 Handles massive data very quickly.
 Log-structured storage
 Automated replication
 Linear scalability
 Simple Ring architecture

Cons:

 Requires some extra efforts in troubleshooting and maintenance.


 Clustering could have been improved.
 Row-level locking feature is not there.

Pricing: This tool is free.

#4) Knime
KNIME stands for Konstanz Information Miner which is an open source tool that is used for
Enterprise reporting, integration, research, CRM, data mining, data analytics, text mining, and
business intelligence. It supports Linux, OS X, and Windows operating systems.
It can be considered as a good alternative to SAS. Some of the top companies using Knime
include Comcast, Johnson & Johnson, Canadian Tire, etc.
Pros:

 Simple ETL operations


 Integrates very well with other technologies and languages.
 Rich algorithm set.
 Highly usable and organized workflows.
 Automates a lot of manual work.
 No stability issues.
 Easy to set up.

Cons:

 Data handling capacity can be improved.


 Occupies almost the entire RAM.
 Could have allowed integration with graph databases.

Pricing: Knime platform is free. However, they offer other commercial products which extend
the capabilities of the Knime analytics platform.

#5) Datawrapper

Datawrapper is an open source platform for data visualization that aids its users to generate
simple, precise and embeddable charts very quickly.
Its major customers are newsrooms that are spread all over the world. Some of the names
include The Times, Fortune, Mother Jones, Bloomberg, Twitter etc.
Pros:

 Device friendly. Works very well on all type of devices – mobile, tablet or desktop.
 Fully responsive
 Fast
 Interactive
 Brings all the charts in one place.
 Great customization and export options.
 Requires zero coding.

Cons: Limited color palettes


Pricing: It offers free service as well as customizable paid options as mentioned below.

 Single user, occasional use: 10K


 Single user, daily use: 29 €/month
 For a professional Team: 129€/month
 Customized version: 279€/month
 Enterprise version: 879€+

You might also like