Big Data Methodology
Big Data Methodology
Introduction
Big Data means the exposure of most of the data for the diverse variety of the users, it
provides access to data sources in an easy manner, security linked to the data base is provided,
the private information is opened to be analyzed by the person, the methods of the data
Big Data is an issue without a doubt yet it isn't as close beginning at an issue or at the size of
use, control, and security of the information. The advanced security is one of the most
concerning issue on the rising. An outline would be the security attack on Target.The
programmers took all the data set aside inside targets data base, including customer's
information, for instance card information, name, address, character, social protections and
altogether more. So not considering the way that information is immense, the issue of the
advanced security and control is altogether significant. The information is excessive anyway it
also causes you make decisions speedier, quickens the system even more then ever some time
as of late.
Wrong information can be a major issue, since we are depending on data that isn't correct and
can prompt huge disappointment of the framework. It looks like having a correct name yet not
the location. Assume you convey an item like this case, the buyer can gripe that he didn't get,
Another model can be a client taken care of the apparent multitude of tabs, however the bogus
information refreshed for another client. This could prompt separation of assets for the first
client. Every choice that made on defective information would prompt a disappointment. So
The information is costly to keep up and it is never to going to stop develop. The information
will build each day, so is the expense of upkeep. Something else is organization needs to plan
approaches for who to access and who to adjust the information and arranging it. The
information can be of various sources in various dialects and it very well may be repetitve as
well. So the organization ought to have components to deal with immense measures of
information.
The low quality of information for a model repetitive data of same thing, which increment
expenses and endeavors to look after it. Ineffectively composed information will prompt
Data Sets
There are several data sets used for the big data as it includes structured, unstructured and semi-
structured. The semi-structured if further classified into variety, velocity and volume.
Structured Data
It is the data knowledge which can be controlled and then recovered in a particular fixed
organization or firm. It refers to the data that is sorted and then can be used without any
Unstructured Data
This data refers to the knowledge about the data which do not possess the particular structure and
and the spreadsheets, which just responses to the inquiries regarding what occurred. These
information base just gives an understanding to an issue at the little level. Anyway so as to
improve the capacity of an association, to acquire understanding into the information and
information accumulated from various sources like clients, crowd or endorsers. After the
Semi-structured Data
It relates to the information that comprise of both the arrangements linked to the organized
information and to be precise it relates mainly to the information that is not characterized and yet
it contains, yet contains imperative data or labels that isolate singular components inside the
information.
Variety
This type of data that is accumulated from different sources. This specific information must be
gathered from spreadsheets and data sets, today information arrives in a variety of structures, for
example, messages, PDFs, photographs, recordings, sounds, SM posts, thus substantially more.
Velocity
It refers to the speed at which information is being gathered continuously. In a more extensive
possibility, it consists of the progress pace, connection and the particular information indexes.
Volume
It is one of the attributes of large information. We definitely realize that Big Data demonstrates
gigantic 'volumes' of information that is being created constantly using various links like web-
based media stages, business measures, machines, organizations, human collaborations, and so
on and this large amount of information are put away in information distribution centers.
Subsequently reaches the finish of qualities. It is capacity of huge measure of information would
lessen the expense for putting away the information and aids in providing the business
knowledge.
Research Methodology
The research work provided here is the secondary one and the methodology used is qualitative
and quantitative as first one involves the inclusion of non-numerical data and information and
later includes the use of numerical data as analysis is done to find out what problems and on how
Research Philosophy
The philosophy of the big data is explained by help of the two types of the data. Firstly, the
theoretical information that is collected from the work of the others in which data challenges and
risks of security of big data is explained in detail and after that the other philosophy linked is the
numeric data i.e. quantitative data and the data gathered for this process is from the primary
research conducted of the companies as questionnaire was filled by manager of every firm and
data was organized on the basis of that and pie chart are also drawn to exhibit the information.
Research Approach
The approach of the research is quite obvious as the data gathered from the literature review is
used for the qualitative research termed as secondary data and on other hand the data gathered
using the questionnaires are collected by surveying the organizations using the big data. The
Research Strategy
There are several strategies used for the research process as here the one used is of questionnaire
as several questions related to big data security are asked from the high level managers of the
organizations and responses are collected by this source. The data comprise of both primary and
secondary data as the primary data is the one in which data is gathered on first hand source as by
help of the questionnaires and secondary data is that which is collected by the work of the others.
Data Analysis
In this phase, data is collected by the method as discussed above as here about 10 organizations
were taken and research was conducted from them. The necessary information will be gathered
utilizing a survey with help of the close ended questions. This will take into consideration less
equivocalness and guarantee that the information gathered is exact and thusly fitting for the
examination reason. Interestingly, the examination plan of the paper that has been counseled for
this investigation has an experimental methodology and the short pool of cases broke down
So as to gather the necessary information, particular SMEs will be chosen, commonly the
individuals who as of now have executed the utilization of large information examination in their
business measures. The inquiries will be pointed toward understanding the reduction or the
expansion in their going through with respect to the security and protection upkeep of the
information frameworks. This information as gathered will at that point be examined utilizing
Survey Questions
Yes = 36.8
No = 19.8
Hadoop = 21
Cloud computing = 54
Monitoring = 18
Auditing = 7
Q 3 What are challenges of big data faced by your organization?
Integrating data = 37
Securing data = 19
Privacy problem = 35
Yes = 45
No = 55
No= 45
May be= 20
Yes =75
No = 25
Q8 Keeping in view the threats to big data, is this really the future?
Yes = 85
No = 10
May be = 5
References
Kim, S. H., Kim, N. U., & Chung, T. M. (2013, December). Attribute relationship evaluation methodology
for big data security. In 2013 International conference on IT convergence and security (ICITCS) (pp. 1-4).
IEEE.
Zhang, Y., Zhang, G., Chen, H., Porter, A. L., Zhu, D., & Lu, J. (2016). Topic analysis and forecasting for
science, technology and innovation: Methodology with a case study focusing on big data