0% found this document useful (0 votes)
85 views5 pages

A Study of Big Data Characteristics: October 2016

This document summarizes a research paper on big data characteristics that was published in 2016. The paper identifies and defines three new proposed characteristics of big data: viscosity, volatility, and validity. It discusses how previous studies have identified characteristics like volume, velocity, variety, and value but that new issues are still emerging. The full paper aims to further explore these three new proposed characteristics to help efficiently handle big data.

Uploaded by

Johnny Andreaty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views5 pages

A Study of Big Data Characteristics: October 2016

This document summarizes a research paper on big data characteristics that was published in 2016. The paper identifies and defines three new proposed characteristics of big data: viscosity, volatility, and validity. It discusses how previous studies have identified characteristics like volume, velocity, variety, and value but that new issues are still emerging. The full paper aims to further explore these three new proposed characteristics to help efficiently handle big data.

Uploaded by

Johnny Andreaty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/315867458

A study of big data characteristics

Conference Paper · October 2016


DOI: 10.1109/CESYS.2016.7889917

CITATIONS READS
9 7,482

3 authors:

Gayatri Kapil Alka Agrawal


Babasaheb Bhimrao Ambedkar University Babasaheb Bhimrao Ambedkar University
10 PUBLICATIONS   39 CITATIONS    112 PUBLICATIONS   736 CITATIONS   

SEE PROFILE SEE PROFILE

Prof. Raees Ahmad Khan


Babasaheb Bhimrao Ambedkar University
212 PUBLICATIONS   1,382 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Healthcare Data Security View project

Fog Computing Security View project

All content following this page was uploaded by Gayatri Kapil on 05 September 2018.

The user has requested enhancement of the downloaded file.


A Study of Big Data Characteristics
Gayatri Kapil, Alka Agrawal, and R. A. Khan
SIST-Department of Information Technology, Babasaheb Bhimrao Ambedkar University (A Central University), Lucknow,
India
Email:[email protected], [email protected], [email protected]

Abstract — Bit by bit analysis and research on big data to capturing it, curate it, handle it and process it. Fig. 1
has become a hot cake for many organisations and can be more shows the exponential growth of big data volume with time.
helpful for the industries like banking, e-commerce, insurance,
manufacturing etc. to facilitate their customers. Traditionally,
when the data was low in volume, it was easily managed and
processed by traditional technologies. These technologies are
incapable of handling it as big data differs in terms of volume,
velocity and value as compared to the other data. Researchers
& practitioner have identified, defined and explored big data in
terms of its characteristics including volume, velocity, variety,
value, virality, volatility, visualization, viscosity and validity.
But these studies have been proven to be insufficient because of
the growing issues repeated day by data. This paper has
identified & defined three new characteristics of big data to be
explored further to handle big data efficiently.

Index Terms - Data, Big Data, Big Data Characteristics. Fig.1 Growth of Data

I. INTRODUCTION II. IDENTIFIED BIG DATA


CHARACTERISTICS
Big data is a collection of data sets or a combination of
data sets. The concept of big data has been endemic within
Big data is a new idea, and it has got numerous
digital communication and information science since the
definitions from researchers, organizations, and individuals.
earliest days of computing. Big data is growing day by day
In 2001, industry analyst Doung Laney (currently with
because data is created by everyone and for everything from
Gartener), articulated the mainstream of definition of big
mobile devices, call centers, web servers, and social
data regarding in terms of three V's; Volume, Velocity, and
networking sites, etc [1]. But the challenge is that it is too
Variety [3]. SAS (Statistical Analysis System) has added two
large, too fast and hard to handle for traditional database and
additional dimensions i.e. Variability and complexity [4].
existing technologies. Many organizations gather the massive
Further, Oracle has defined big data in terms of four V's i.e.
amounts of data generated from high-volume transactions
Volume, Velocity, Variety and Value [5]. Furthermore,
like call centers, sensors, web logs, and digital images. The
Oguntimilehin A, presented big data in terms of five V's
success of their business depends on meeting big data
Volume, Velocity, Variety, Variability, Value and a
challenges while continually improving operational
Complexity [2]. A in 2014, Data Science Central, Kirk Born
efficiency.
has defined big data in 10 V’s i.e. Volume, Variety,
Big data is continuously including more & more data
Velocity, Veracity, Validity, Value ,Variability ,Venue,
sets with high volume beyond the capability of regularly
Vocabulary, Vagueness [6] .All the characteristics has been
used software tools to capture, curate, handle and process
listed and defined in table 1. These characteristics provide
data set within a tolerable elapsed time. A huge amount of
research horizon to the researcher and practitioners in order
data sets is created every second from every part of the world
to effectively manage big data. The whole research in big
i.e. the volume of data can never be reduce but increases day
data revolves around these characteristics (Fourteen Vs and a
by day. Nearly five years ago, personal computer storage
C Defined in Table 1) in order to effectively manage and
was tens to hundreds of gigabytes. Today IDC's Digital
use big data efficiently & effectively. But some gap still
Universe Study predicts that between 2009 and 2020 digital
exists which need to be addressed in order to get better
information data will grow by 44% from 0.8 ZB to 35 ZB.
insight in the area.
Many surveys expect that volume of data will grow by 45%
in the next two years, and few said it will be doubled [2].
Thus, big data is a moving target and requires more attention
Table 1: Big Data Characteristics
S. No. Big Data Elucidation Description
Characteristics
1 Volume Size of Data Quantity of collected and stored data. Data size is in TB, PB [2-
4, 6].
2 Velocity Speed of Data The transfer rate of data between source and destination [3-5,
6].
3 Value Importance of Data It simply represents the business value to be derived from big
data [4-5, 17].
4 Variety Type of Data Different type of data like pictures, videos, audio etc. arrives at
the receiving end [3-6].
5 Veracity Data Quality Accurate analysis of captured data is virtually worthless if it’s
not accurate [6].
6 Validity Data Authenticity Correctness or accuracy of data used to extract result in the form
of information [17].
7 Volatility Duration of Big data volatility means the stored data and how long is useful
Usefulness to the user [17].
8 Visualization Data Process/ It is a process of representing abstract [17].
Data act
9 Virality Spread Speed It is defined as the rate at which the data is broadcast/spread by
a user and received by different users for their use [16].
10 Viscosity Lag of Event It is a time difference the event occurred and the event being
described [16].
11 Variability Data Differentiation Data arrives constantly from different sources and how
efficiently it differentiates between noisy data or important data
[5-6, 17].
12 Venue Different Platform Various types of data arrived from different sources via
different platforms like personnel system, private & public
cloud etc [6].
13 Vocabulary Data Terminology Data terminology likes data model, data structures etc [6].
14 Vagueness Indistinctness of Vagueness concerns the reality in information that suggested
existence in a Data little or no thought about what each might convey [6].
15 Complexity Correlation of Data Data comes from different sources and it is necessary to figure
out the changes whether small or large in data with respect to
the previously arrived data so that information can get quickly
[4-5].

III. NEED FOR MORE EXPLORATION OF BIG DATA

S.Vikram Phaneendra and Madhusdhan Reddy, explored the big data in terms of volume, velocity, variety,
discussed that the data was less and can be easily handled by variability, velocity, variety, value, virality, volatility,
RDBMS but now –a- days it is not possible through RDMS visualization, viscosity and validity [16-17].
tools, to manage big data. Because big data is different from In 2016, Experian discussed that 97% of US businesses
other data in terms of five characteristics likes volume, are seeking to achieve a complete view of their customer, but
velocity, variety, value, and complexity [7]. Kiran Kumar the biggest problem organisations are facing is big data
Reddi and Dnvsl Indira, have explained that big data is in the management [10]. US government accountability office has
form of Structured, un-structured, massive homogenous and shown that five of the six most damaging data thefts of all
heterogeneous. They have also advised to use a better and time have happened in the last two years [11]. In November
modified model to handle and transfer of big data over the 2015, New York Times has published an article over DNA,
network [8]. Wei fan and Albert Bifet explored big data which discusses about a genomics company in China. The
mining, as the capability of taking out the useful information company has been generating so many data that it could not
from large data sets due to its characteristics likes volume, be transmitted electronically. Hence instead of going up
variability and velocity. It was not possible to do it before online they have been using disc data transfer method.
[9]. So, researchers and practitioners have
On the basis of the above discussion, it can be inferred research[18,19], financial service organizations to identify
that the research in big data exploring only these and prevent fraud, government agencies to improve services
characteristics (Fourteen Vs and a C) is not sufficient. In in their respective fields. Keeping in mind the voluntary
addition, because of the huge volume of big data, traditional behaviour of the big data, ‘voluntary’ has been defined as
methods for managing extracting and analyzing the same are one of the characteristic of big data which is defined as “The
not very useful as these may not provide accurate result for will full availability of big data to be used according to the
decision making etc. For using big data in a managed way, context.”
some more characteristics of the same must be explored and
defined. As it is already discussed that the available research B. Versatility:
on big data is not sufficient to manage and use the same, it Big data is evolving to satisfy the needs of many
must be explored further to identify its more characteristics. organisations, researchers and Government. It facilitate the
The research on newly identified characteristics of big data urban planning, environment modelling, visualization,
may provide simple and effective management and use. analysis, quality classification, securing environment,
computational analysis, biological understanding, designing
IV. RESEARCH WORK IN BIG DATA and manufacturing process required by organisations and
CHARACTERISTICS cost-effective models as well as elegant exploration of the
result. Keeping in mind the resourceful/adaptable nature of
Most of the organisations in today’s world are dealing the big data, we have identified ‘versatility’ as one of the
with huge amount of data. The big nature raises several characteristic of big data which is defined as “The ability of
issues. Though there are many solutions offered to solve the big data to be flexible enough to be used differently for
issues but as discussed in above section, problems still exist. different context.”
This inspires to develop in depth understandings of big data.
This will help to solve issues related to big data effectively. The three characteristics obtained in the research may
While going through the literature about big data, the study prove to be a milestone for the purpose of research, if
present in the paper has dug out three more characteristics of explored properly. Further research in these characteristics
it. These characteristics are defined as: may resolve many issues related to big data. It may also help
to differentiate the big data nature.
A. Verbosity:
Big data is a massive data that comes from different
sources which may be structured or unstructured data, V. CONCLUSION
good/bad data. Bad data refers to the information which is
wrong, out of date or incomplete. The consequences of Big data is a collection of data sets which is growing
storing these types of information may be dangerous day by day because data is created by everyone and for
sometimes. So, it is recommended to check that the stored everything from mobile device, call centre etc [1].This paper
data is secured, relevant, complete, and trustworthy. If a revolves around the big data and its characteristics in terms
suitable technique at the initial stage is applied to decide of V’s like volume, velocity, value, variety, veracity,
whether the information is useful or not, then storage space, validity, visualization, virality, viscosity, variability,
as well as processing time can be saved. Keeping in mind the volatility, venue, vocabulary, vagueness, and complexity.
verbose nature of the big data, we have identified ‘verbosity’ The day by day reported issues reflects that available
as one of the characteristic of big data which is defined as research is not sufficient to manage and process big data.
“The redundancy of the information available at different Hence the research presented in the paper has explored big
sources.” data further to identify the three new ‘V’ characteristics i.e.
Verbosity, Voluntariness, and Versatility. It is expected that
Voluntariness: research on newly identified characteristics may provide the
Big data is a set of huge amount of data which can be used simple and effective management of big data which can be
as a volunteer by different organisations without any used in value added applications and research environment.
interference. Big data voluntarily help numerous enterprises.
It assist retailers by giving them knowledge of customer REFERENCES
preferences, urban planning by visualization of environment
modelling and traffic patterns, manufacturers by predicting 1. Available: https://www.youtube.com/watch?v=S89o3INzIJc
2. Oguntimilehin A., Ademola E.O., ‘‘A Review of Big Data
product issues to optimize their productivity and to improve
Management, Benefits and Challenges,’’ Journal of Emerging Trends
the equipment and customers performance, energy in Computing and Information Sciences, vol-5, pp-433437, June 2014.
companies to meet out energy demands during peak time and 3. Stephen Kaisler, Frank Armour, J.Alberto Espinosa and Wolliam
consequently increase production and improving efficiency Money “Big Data: Issues and Challenges Moving Forward,” Hawaii
by reducing the losses, healthcares professionals to prevent International Conference on System Sciences 46th, pp-995-1003, 2013.
4. Mark Troester(2013), ―Big Data Meets Big Data Analyticsǁ,
diseases and improving patient health [2], research www.sas.com/resources/.../ WR46345.pdf, retrieved 10/02/14.
organisations to obtain quality of research and revolutionize
life science, physical science, medical science and scientific
5. Oracle (2013), ―Information Management and Big Data: A Reference
Architectureǁ, www.oracle.com/.../ infomgmt-big-data-r..., retrieved
20/03/14.
6. Available: http://www.datasciencecentral.com/profiles/blogs/top-10-
list-the-v-s-of-big-data
7. S.Vikram Phaneendra & E.Madhusudhan Reddy “Big Data- solutions
for RDBMS problems- A survey,” In 12th IEEE/IFIP Network
Operations & Management Symposium (NOMS 2010) (Osaka, Japan,
Apr 19{23 2013).
8. Kiran kumara Reddi & Dnvsl Indira “Different Technique to Transfer
Big Data : survey,” IEEE Transactions on 52(8) (Aug.2013) 2348 {
2355}
9. Harshawardhan S.Bhosale, Devendra P.Gadekar, “A Review Paper on
Big Data and Hadoop,” International Journal of Scientific and
Research Publication vol-4, 2014.
10. Available: http://www.inc.com/bill-carmody/biggest-problem-with-
big-data-management-in-2016.html
11. Available: http://www.kdnuggets.com/2015/08/ieg-big-data-
innovation-boston-4-problems.html
12. K. V. S. N. Rama Rao, M. Pranava and A. ‘’Mounika Effect of Big
Data Characteristics on Security leveraging Existing Security
Mechanisms for Protection,’’ ARPN Journal of Engineering and
Applied Sciences, vol. 10, pp-2023-2026, 2015.
13. Umasri.M.L, Shyamalagowri.D ,Suresh Kumar.S “Mining Big Data:-
Current status and forecast to the future,” vol 4, Issue 1, January 2014
ISSN: 2277 128X
14. Available: http://insidebigdata.com/2013/09/12/beyond-volume-
variety-velocity-issue-big-data-veracity/
15. Available: http://data-magnum.com/how-many-vs-in-big-data-the-
characteristics-that-define-big-data/
16. Available: http://data-magnum.com/how-many-vs-in-big-data-the-
characteristics-that-define-big-data/
17. Suhail Sami Owais, Nada Sael Hussein, “Extract Five Categories
CPIVW from the 9V’s Characteristics of the Big Data”, International
Journal of Advanced Computer Science and Applications, vol 7, pp-
254-258, 2016.
18. Available: http://www.ibmbigdatahub.com/presentation/current-
challenges-and-opportunities-big-data-and-analytics-emergency-
management
19. Available:GlobalPulse
http://www.unglobalpulse.org/sites/default/files/BigDataforDevelopme
nt-UNGlobalPulseJune2012.pdf
20. Gang Zhao, “A Query Processing Framework based on Hadoop”
International Journal of Database Theory and Application Vol.7, pp.
261-272, 2014.
21. Available:at
https://software.intel.com/sites/default/files/article/402274/etl-big-data-
with-hadoop.pdf
22. Available: http://www.nytimes.com/2012/03/29/technology/new-us-
research-will-aim-at-flood-of-digital-data.html

View publication stats

You might also like