Big Data - Current Challenges and Future Scope
Big Data - Current Challenges and Future Scope
Abstract— Big Data encompasses huge amounts of raw II. RELATED WORKS
material which influence multitude of research fields as well as
different industries performance such as business, marketing, Even though numerous researches are conducted in this
social network analysis, educational systems, healthcare, IoT, field researches in this same area, it is still essential to
meteorology, fraud detection. It aimed to uncover hidden trends constantly conducting new researches. This is to include the
and has prompted a development from a model-driven latest findings.
perspective to a data-driven approach. Among numerous
A study on the bibliometrics of Big Data reveals that there
properties of Big Data, datasets of Big Data are identified
primary as 3Vs attributes which have high variety, velocity and has been a phenomenal growth in the number of researches on
volume. These provide an invaluable insight and assist in Big Data. Based on an analysis on the trends of publication,
making precise decisions. Analyzing this information and up to 2011, less than 38 publications were conducted on ‘Big
outlining the outcome into helpful data is the method for Data’; however, by the year 2017, the number had grown
extricating an incentive from these enormous volumes of rapidly to 3890 publications [4]. Big Data’s time trend is
datasets. Nevertheless, Big Data containing unique features that denoted in the fig. 1.
cannot be handled and processed using the conventional
methods. This has presented a significant challenge to the
industry. This research paper presents a general outline of the
characteristics of Big Data as well as expounds on the present
challenges and limitations in this area. It further discusses the
future scope in particular the future direction for Big Data
research.
Authorized licensed use limited to: University of Exeter. Downloaded on June 09,2020 at 15:15:01 UTC from IEEE Xplore. Restrictions apply.
some of the major challenges of Big Data including storage, Variability Data Differentiation
heterogeneity, security, etc. Venue Different Platform
The rising number of studies on Big Data proves the Vocabulary Data Terminology
importance of Big Data; additionally, the difficulties in this Vagueness Indistinctness of existence in a Data
area. Nevertheless, if the issues identified with Big Data are
addressed, It will be a great contribution to various sectors
IV. CURRENT CHALLENGES OF BIG DATA
including healthcare, banking, telecommunication, food, and
fraud detection [7]. In It is common that opportunities are normally followed
by challenges. Big Data has some noteworthy challenges and
III. KEY CHARACTERISTICS OF BIG DATA these challenges can be divided into three fundamental
In order to identify challenges of Big Data, it is necessary classifications, according to the life cycle of data namely data,
to understand the characteristics of Big Data, There are management, and challenges of process analysis [5].
different definitions for Big Data; in general, Big Data alludes • Data challenges are identified with the
to substantially large sets of unstructured and organized characteristics of data itself (for example volume,
complex data that conventional processing systems are not variety, and velocity of data).
able to manage. It aims to uncover hidden trends and has
prompted a development from a model-driven perspective to • Processing challenges are identified with an
a data-driven approach [8]. arrangement of how approaches: how to capture
data, how to change data, how to incorporate data,
The major characteristics of Big Data are clearly described how to choose the correct model for investigation,
by Gartner [9], according to their definition, Big Data is a and how to present the findings.
high-volume, high-velocity as well as high-variety data
resource that requires new types of handling to empower • Management challenges include issues such as
improved decision makings, knowledge disclosure, and security, privacy, governance, and ethical
process maximization. The three Vs (volume, velocity, and perspectives.
variety) are the fundamental characteristics of Big Data.
However, various organizations and institutes have come out A. Data Complexity:
with different definitions The common characteristics of Big Data are categorized
into types and trends, complex relationships, and a high
SAS (Statistical Analysis System) has included two extra variety of data quality. The natural complexity of Big Data
components of Variability and Complexity to Gartner's three (such as complicated types, complex structures, and complex
Vs [10]. trends) makes its recognition, portrayal, comprehension and
In addition, Oracle has characterized Big Data based on calculation additionally difficult. As the result a sharp
four Vs namely Volume, Velocity, Variety and Value [11]. increment in the computational complications occurs when
contrasted with customary computing models dependent on
In 2014, Data Science Central has characterized Big Data complete data. Customary analysis and mining of data jobs,
in 10 Vs Volume, Variety, Velocity, Veracity, Validity, for example, recovery, subject revelation, semantic and
Value, Variability, Venue, Vocabulary, Vagueness [12]. sentiment investigations, turn out to be incredibly tedious
Researchers from IBM mentioned that Big Data contains when utilizing Big Data [16].
four components. Other than the three Vs referenced by B. Data Heterogeneity:
Gartner, IBM included Veracity as the fourth Big Data
characteristic [13]. Another significant challenge that researches are facing is
to integrate data from various sources to optimize their value.
The longest rundown of Big Data characteristics was A large amount of data that is generated by social media,
discussed by several researchers in 2016 [14], these 15 blogospheres, and websites, each source is diverse in terms of
characteristics and a brief depiction is denoted in the table I. format, semantics, and source of data. Data structure from
these sources varies from very organized data (databases) to
unstructured data (heterogeneous reports) [17].
TABLE I: CHARACTERISTICS OF BIG DATA
Big Data C. Quality of Data:
Elucidation
Characteristics Data quality is an important criterion to decides on the
dependability of Big Data for decision making. The data
Volume Size of Data
quality is heavily reliant on four criteria [18]:
Velocity Speed of Data
• Complete: all pertinent data is accessible, for
Variety Type of Data
instance vendor details such as name, address, bank
Veracity Data Quality account, etc.
Validity Data Authenticity
• Accurate: data does not have incorrect spelling,
Value Importance of Data typos, wrong and abbreviated terms.
Volatility Duration of Usefulness
• Available: data is accessible when asked for and
Visualization Data Process/ Data act simple to discover.
Vitality Spread Speed
• Timely: information is current and ready to help with
Viscosity Lag of Event decision making.
132
Authorized licensed use limited to: University of Exeter. Downloaded on June 09,2020 at 15:15:01 UTC from IEEE Xplore. Restrictions apply.
D. Scalability: I. Analytics:
Scalability is another challenge in Big Data especially in Big Data carries with it large analytical challenges [24].
the analysis stage. Incremental strategies have great scalability Big Data analytics is the way toward inspecting Big Data. This
features in terms of analysis of big Data. Since the size of the will help with revealing hidden trends, obscure relationships
data scales faster than the CPU speed. This challenge prompts and other valuable information that can be utilized for better
the advancement of parallel computing. Real time applications decision making. A high level of technical skills is needed to
such as social networks, navigations, finance, timeliness, web carry out these types of analysis on vast amounts of data that
search, etc. necessitated computing requires parallel are unstructured, semi-organized and organized. Moreover, it
processing [19]. is yet unclear how an ideal architecture of analytics techniques
ought to be able to manage historic information with real-time
E. Storage and Management: data simultaneously.
The large amount of data that is generated on a daily basis
should be stored for analytical reasons as well as in line with J. Real-time analysis:
laws and service level agreements to secure and save data. Real-time analytical services are to determine the root
Management and storage are the two key challenges in the Big causes for business and operational situations and exceptions.
Data field [20]. The capacity for storage devices to scale in Real-time analytical handling may include single or multiple
order to address the data growth, improve access time and incorporated analytical services. It relies on real-time
transfer rate are equally challenging. These variables, to a information passed from the situation discovery process and
great) degree, determine the general performance of storage other offline stored Big Data such as maps, previous
and management of data. transactions, situations and decisions. Dealing with stored Big
Data is usually challenging [25]. These services need to
F. Privacy and Security: deploy fast algorithms which offer alternate choices within a
In 2015, a comprehensive review concluded that it is limited time. These alternatives can be optimal or semi-
extremely difficult to store and examine Big Data with optimal because of the constrained time and the available
conventional applications. Additionally, there are problems resources.
with privacy and security issues. Encryption plans, firewalls,
access permissions, transport layer security can be insecure; The Big Data challenges mentioned above have been
The provenance of data can be obscure, even anonymous data summarized in the table II.
can be re-distinguished. The research came to conclusion that TABLE II: CHALLENGES OF BIG DATA
privacy and security of Big Data are the issues that should be Category Big Data Challenges
researched further. Despite the increasing number of R&D, r
Complexity of Data
recent analysis in 2018 demonstrated that privacy and security
are still huge problems for Big Data and its execution [21]. Heterogeneity of Data
Data
Quality of Data
G. Preprocessing:
Scalability of Data
Preprocessing is an important part of Data Mining. This is
mainly due to the fact that real-world databases are greatly Storage
affected by the existence of noise, missing values, conflicting Management
and unnecessary data. Preprocessing is the set of strategies, Management
Security
selection of feature, defective data, imbalanced learning, etc...
which is utilized before the utilization of the data mining Privacy
strategy and it is one of the significant concerns within the Preprocessing
infamous process of Knowledge Discovery from Data [22]. Visualization
Analysis process
H. Visualization: Analytics
One of the major tasks of Big data analysis is how to Real-time analysis
visualize the results of any data. Due to huge size of data, it is
extremely hard to create user-friendly visualization. The aim
of data visualization is to present more adequately using some V. FUTURE SCOPE OF BIG DATA
techniques of graph theory. Graphical visualization creates the
connection between data and its right interpretation. Online Big Data can be utilized in various applications and fields,
marketplaces such as Amazon and E-cove have a huge for example, healthcare, marketing, telecommunication,
number of users and billions of merchandises. This produces education. These are the most researched areas [26]. Utilizing
a great deal of data. As such, certain organizations utilize a tools to evaluate and comprehend Big Data will help
tool called Tableau for the visualization of big data. It has the organizations in different areas to understand the advantages
capacity to transform substantial amount of data in a complex of integrating Big Data. This helps industries to move toward
form into intuitive pictures. This assists in: a more extensive technological move as integrating Big Data
turns into a part of standard business practices [27]. As
• envisioning search significance indicated by Susan Hauser, Strategic Advisor and previous
corporate VP of Microsoft, "Big Data can potentially
• monitor the most recent client feedback transform the way governments, associations, and scholastic
• analysis of their sentiment. establishments carry out businesses and make disclosures, and
it is liable to change how everybody experiences their
The current Big Data visualization devices generally have everyday lives”.
poor performances in scalability, functionalities, and reaction
in time [23].
133
Authorized licensed use limited to: University of Exeter. Downloaded on June 09,2020 at 15:15:01 UTC from IEEE Xplore. Restrictions apply.
The fast expansion of data has brought about a quick [5] [5] U. Sivarajah, M. M. Kamal, Z. Irani, and V. Weerakkody,
development of digitized information and has likewise “Critical analysis of Big Data challenges and analytical methods,” J.
Bus. Res., vol. 70, pp. 263–286, 2017.
acquired vast consideration on research opportunities in the
[6] P. V. Desai, “A survey on big data applications and challenges,” 2018
analytics of Big Data. Big Data has no value in a vacuum. Its Second Int. Conf. Inven. Commun. Comput. Technol., no. Icicct, pp.
potential worth is realized only when it is utilized in making 737–740, 2018.
informed decisions [28]. [7] S. Mukherjee and R. Shaw, “Big Data – Concepts, Applications,
Challenges and Future Scope,” Int. J. Adv. Res. Comput. Commun.
At present, there are numerous technologies that have been Eng., vol. 5, no. 2, pp. 66–774, 2016.
utilizing Big Data for controlling, aggregating, visualization, [8] K. Taylor-Sakyi, “Understanding big data,” arXiv.org > cs >
and analyzing. These tools and technologies are drawn from a arXiv1601.04602, no. January, p. 166, 2016.
few fields such as statistics, software engineering, artificial [9] M. Beyer and D. Laney, “The Importance of ‘Big Data’: A Definition,”
intelligence, applied mathematics, and so forth. Big Data Gartner, Analysis Report G00235055, 2012. [Online]. Available:
needs outstanding technologies to proficiently process https://www.gartner.com/doc/2057415/importance-big-data-
extensive amounts of data within an elapsed period that is definition. [Accessed: 02-Sep-2018].
tolerable [24]. The usage of existing devices and methods for [10] M. Troester, “Big Data Meets Big Data Analytics,” SAS White Pap.,
Big Data handling results in productivity loss and creates 2015.
numerous complexities. Thus, the present technologies are not [11] D. Cackett, “Information Management and Big Data A Reference
Architecture,” Oricale White Pap., no. February, p. 28, 2013.
able to resolve Big Data issues totally.
[12] K. Borne, “10 V’s of Big Data,” Data Science Central, 2014. [Online].
New storage, processing, analytics, and efficient data- Available: https://www.datasciencecentral.com/profiles/blogs/top-10-
intensive technologies from the software to the hardware list-the-v-s-of-big-data. [Accessed: 20-Nov-2018].
perspective are needed [3]. Big Data Analytics is [13] IBM, “The Four V’s of Big Data,” IBM Big Data & Analytics Hub,
2018. [Online]. Available:
progressively turning into a trending practice that numerous https://www.ibmbigdatahub.com/infographic/four-vs-big-data.
firms are embracing with the aim of developing significant [Accessed: 20-Nov-2018].
information using Big Data. [14] G. Kapil, A. Agrawal, and R. A. Khan, “A study of big data
characteristics,” 2016 Int. Conf. Commun. Electron. Syst., pp. 1–4,
VI. CONCLUSIONS 2016.
In this paper we examined the fundamental attributes of [15] R. I. Jony, R. I. Rony, A. Rahat, and M. Rahman, “Big Data
Characteristics , Value Chain and Challenges,” 1st Int. Conf. Adv. Inf.
Big Data. Among numerous properties which are referenced, Commun. Technol. 2016, Chittagong Indep. Univ. Bangladesh., no.
volume, speed and assortment have been identified as the May, pp. 1–6, 2016.
primary attributes of Big Data. Notwithstanding that, we have [16] X. Jin, B. W. Wah, X. Cheng, and Y. Wang, “Significance and
additionally examined that Big Data is an imperative hotspot Challenges of Big Data Research,” Big Data Res., vol. 2, no. 2, pp. 59–
for significant bits of knowledge and at last settling on 64, 2015.
increasingly educated choice in various regions, such as [17] P. Murali K, M. Salehi Amini, K. Jayasimha R., Y. Xie, and V.
business and showcasing, instruction frameworks and social Raghavan, “Massive Data Analysis: Tasks, Tools, Applications, and
insurance space. Analyzing this information and outlining the Challenges,” Big Data Anal. Methods Appl., pp. 1–276, 2016.
outcome into helpful data is the method for extricating an [18] N. T. Tariq RS, “Big Data Challenges,” Comput. Eng. Inf. Technol.,
vol. 04, no. 03, 2015.
incentive from these enormous volumes of datasets.
[19] R. S. K. Althaf, R. K. Sai, and R. K. Girija, “Challenging tools on
Likewise, there was a discussion on the present and future Research Issues in Big Data Analytics,” Int. J. Eng. Dev. Res., vol. 6,
challenges in this area. Despite the imperative improvements no. 1, pp. 637–644, 2018.
in the Big Data analytics field, there are still many [20] R. Agrawal and C. Nyamful, “Challenges of big data storage and
management,” Glob. J. Inf. Technol., vol. 6, no. 1, 2016.
inadequacies. Mostly these issues are identified with the
[21] R. Milan, K. Kumar Pandey, and D. Shukla, “Security and Privacy
present methods which are not adjusted to greater, varied, and Challenges in Big Data Environment,” in National Conference on
increasingly complex datasets that may affect the execution “Data Analytics, Machine Learning and Security” 15 – 16 February
and precision of the analysis. Further researches should be 2018, 2018, no. February, pp. 315–325.
conducted in few domains, organization of Data, analysis [22] S. García, S. Ramírez-Gallego, J. Luengo, J. M. Benítez, and F.
techniques, tools, and platform to develop a next generation Herrera, “Big data preprocessing: methods and prospects,” Big Data
Big Data technology are some of these domains. Anal., vol. 1, no. 1, p. 9, 2016.
Subsequently, technological issues exist in numerous Big [23] D. P. and K. Ahmed, “A Survey on Big Data Analytics: Challenges,
Data fields, namely Big Data analytics systems. This can be Open Research Issues and Tools,” Int. J. Adv. Comput. Sci. Appl., vol.
7, no. 2, 2016.
comprising a vital research subject.
[24] S. Mishra, V. Dhote, G. S. Prajapati, and J. P. Shukla, “Challenges in
Big Data Application: A Review,” Int. J. Comput. Appl., vol. 121, no.
REFERENCES 19, pp. 42–46, 2015.
[1] W. A. Günther, M. H. Rezazade Mehrizi, M. Huysman, and F. [25] [25] N. Mohamed and J. Al-Jaroodi, “Real-time big data analytics:
Feldberg, “Debating big data: A literature review on realizing value Applications and challenges,” Proc. 2014 Int. Conf. High Perform.
from big data,” J. Strateg. Inf. Syst., vol. 26, no. 3, pp. 191–209, 2017. Comput. Simulation, HPCS 2014, no. October, pp. 305–310, 2014.
[2] SAS, “What Is Big Data?,” SAS Big Data Insights, 2018. [Online]. [26] J. Akoka, I. Comyn-Wattiau, and N. Laoufi, “Research on Big Data –
Available: https://www.sas.com/en_my/insights/big-data/what-is-big- A systematic mapping study,” Comput. Stand. Interfaces, vol. 54, no.
data.html. [Accessed: 19-Nov-2018]. April 2016, pp. 105–115, 2017.
[3] I. Yaqoob et al., “Big data: From beginning to future,” Int. J. Inf. [27] I. Lee, “Big data: Dimensions, evolution, impacts, and challenges,”
Manage., vol. 36, no. 6, pp. 1231–1247, 2016. Bus. Horiz., vol. 60, no. 3, pp. 293–303, 2017.
[4] T. P. Liang and Y. H. Liu, “Research Landscape of Business [28] A. Gandomi and M. Haider, “Beyond the hype: Big data concepts,
Intelligence and Big Data analytics: A bibliometrics study,” Expert methods, and analytics,” Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–
Syst. Appl., vol. 111, no. 128, pp. 2–10, 2018. 144, 2015
134
Authorized licensed use limited to: University of Exeter. Downloaded on June 09,2020 at 15:15:01 UTC from IEEE Xplore. Restrictions apply.