0% found this document useful (0 votes)

419 views10 pages

Big Data Research Paper

The document discusses issues, challenges, and solutions related to big data mining. It provides an overview of big data, defining it as large datasets that cannot be stored, managed, or analyzed with traditional database tools due to their size. It outlines some of the computational and statistical challenges of big data, including scalability bottlenecks, noise accumulation, spurious correlations, and estimation errors. The document also reviews literature on big data mining and discusses strategies for dealing with big data, including using frameworks like Hadoop and MapReduce.

Uploaded by

Meraj Alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

419 views10 pages

Big Data Research Paper

Uploaded by

Meraj Alam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

ISSUES, CHALLENGES, AND

SOLUTIONS: BIG DATA

MINING
ABSTRACT:
Data has turned into an imperative part of each economy, industry,
association, business capacity and person. Huge Data is a term used to
recognize the datasets that whose size is past the capacity of run of the mill
database programming apparatuses to store, oversee and dissect. The Big
Data present one of a kind computational and measurable difficulties,
including adaptability and capacity bottleneck, commotion collection,
spurious relationship and estimation mistakes. These difficulties are
recognized and require new computational and measurable worldview. This
paper displays the writing audit about the Big Data Mining and the issues
and difficulties with accentuation on the recognized components of Big Data.
It likewise talks about a few strategies to bargain with enormous information.

KEYWORDS: Big data mining, Security, Hadoop, MapReduce

INTRODUCTION:
Data is the gathering of values and variables related in some sense and
contrasting in some other sense. As of late the sizes of databases have
expanded quickly. This has lead to a developing enthusiasm for the
advancement of instruments able in the programmed extraction of learning
from information
[1]. Data are gathered and broke down to make data reasonable for
deciding. Thus
Data give a rich asset to learning revelation and choice backing. A database
is an organised collection of information so that it can easily be accessed,
managed and updated. Data mining is the procedure finding interesting
information, for example, association, designs, changes, irregularities and
significant structures from a large amount of information stored in
databases, data warehouses or other data stores. A widely accepeted formal
meaning of data mining is given in this manner. As indicated by this

definition, data mining is the non-trivial extraction of certain previously

unknown and potentially helpful information about data [2]. Data mining
reveals fascinating examples and connections covered up in a vast volume
of raw Data. Big Data is a new term used to distinguish the datasets that are
of expansive size and have grater complexity
[3]. So we can't store, manage and analyze them with our present
methodologies or data mining software tool. Big data is a heterogeneous
gathering of both organized and unorganized data. Organizations are
fundamentally concerned with managing unstructured data. Big Data mining
is the capability of extracting helpful information from these large datasets
or streams of information which were unrealistic before due to its volume,
variety, and velocity.
The separated information is exceptionally valuable and the mined learning
is the representation of diverse sorts of examples and every example
compares to data. Data Mining is investigating the information from
alternate points of view and condensing it into helpful information that can
be utilized for business arrangements and foreseeing the future patterns.
Mining the data makes a difference associations to settle on information
driven choices. Data mining (DM), additionally called Knowledge Discovery in
Databases (KDD) or Knowledge Discovery and Data Mining, is the procedure
of seeking large volumes of information consequently for examples,
affiliation rules [4]. It applies numerous computational strategies from
measurements, data recovery, machine learning and design
acknowledgment. Data mining extract only required pattern from the
database in a brief timeframe range. Based on the type of pattern to be
mined, data mining task can be grouped into summarization, classification,
clustring, association and trends investigation.Huge measure of information
are created each moment. A recent study estimated that every minute,
Google gets more than 4 million inquiries, email clients send more than 200
million messages, YouTube clients transfer 72 hours of video, Facebook
clients offer more than 2 million bits of content, also, Twitter clients create
277,000 tweets [5]. With the measure of information becoming
exponentially, enhanced analysis is required to concentrate data that best
matches client interests. Big Data refers to quickly developing datasets with
sizes beyond the capacity of traditional data base tools to store, manage and
investigate them. Bid Data is a heterogeneous collection of both organized
and unorganised information. Increment of storage capacity, Increase of
handling force and accessibility of information are the primary purpose
behind the appearance and development of enormous information.

Enormous information alludes to the utilization of huge information sets to

handle the gathering or reporting of information that serves organizations or
other beneficiaries in basic leadership. The information might be venture
particular or general and private or open. Enormous information are
portrayed by 3 V's: Volume, Velocity, and Variety [6]
Big Data mining refers to the action of going through big data sets to search
for relevant data. Big Data tests are accessible in space science, climatic
science, social organizing destinations, life sciences, therapeutic science,
government information, characteristic fiasco and asset administration, web
logs, cellular telephones, sensor systems, experimental exploration,
broadcast communications [7]. Two principle objectives of high dimensional
information investigation are to create powerful strategies that can precisely
predict the future perceptions and in the meantime to pick up understanding
into the relationship between the components and reaction for logical
purposes. Big Data have applications in many fields, for example, Business,
Technology, Health, Smart urban communities and so on..
individuals to have better service, better client experiences, furthermore to
anticipate and recognize sickness much easier than before [8]. The fast
development of Internet and versatile innovations has a vital part in the
development of data creation and capacity. Since the amount of information
is becoming exponentially, enhanced investigation of vast information sets is
required to extract data that best matches client interests. New technology
are required to store unstructured huge information sets and handling
strategies, for example, Hadoop and Map Reduce have more noteworthy
significance in enormous information investigation. To process extensive
volumes of information from various sources rapidly, Hadoop is utilized.
Hadoop is a free, Java-based programming system that backings the
handling of expansive information sets in a distributed computing
environment. It permits running applications on frameworks with a large
number of hubs with thousands of terabytes of data. Its distributed
document framework supports quick data exchange rates among hubs and
permits the framework to keep working continuous on occasion of hub
disappointment. It runs Map Decrease for distributed data handling and is
works with organized and unstructured information [6].

LITERATURE REVIEW

Puneet Singh Duggal, Sanchita Paul, Big Data Analysis : Challenges and
Solutions, international Conference on Cloud, Big Data and Trust 2013, Nov
13-15, RGPV.
This paper presents various methods for handling the problems of big data
analysis through Map Reduce framework over Hadoop Distributed File
System (HDFS). Map Reduce techniques have been studied in this paper
which is implemented for Big Data analysis using HDFS.
This paper exhibits a review of different algorithm from 1994-2014 importent
for taking care of Big Data set. It gives an overview of engineering and
algorithm utilized as a part of substantial information sets. These algorithms
characterize different structures and strategies executed to handle Big Data
and this paper lists different tools that were produced for analyzing them. It
also explain about the different security issues, application and patterns took
after by a huge information set [9]. Wei Fan, Albert Bifet, "Mining Big Data:
Current Status, and Forecast to the Future", SIGKDD Investigations, Volume
14, Issue 2 The paper introduces a wide outline of the point Big Data mining,
its present status, debate, what's more, figure to what's to come. This paper
additionally covers different fascinating and cutting edge points on Big Data
mining.
Priya P. Sharma, Chandrakant P. Navdeti, "Securing Big Data Hadoop: A
Review of Security Issues, Threats and Solution", IJCSIT, Vol 5(2), 2014, 21262131
This paper examines about the enormous information security at nature
level alongside the testing of inherent insurances. It also presents some
security issues that we are managing today and propose security
arrangements and economically available procedures to address the same.
The paper likewise covers all the security answers for secure the Hadoop
biological system. Richa Gupta, Sunny Gupta, Anuradha Singhal, "Huge
Data : Overview", IJCTT, Vol 9, Number 5, Walk 2014
This paper gives an outline on Big Data, its significance in our live and a
some technology to handle Big Data. This paper likewise states how Big Data
can be connected to self-sorting out sites which can be reached out to the
field of promoting in organizations.

ISSUES AND CHALLENGES

Big data analysis is the way toward applying progressed examination and
perception procedures to extensive information sets to reveal hidden pattern
and unknown relationships for viable decision making. The analysis of Big
Data includes numerous unmistakable stages which incorporate information
obtaining what's more, recording, data extraction and cleaning, information
coordination, accumulation and representation, inquiry preparing,
information displaying and examination and Interpretation. Each of these
stages presents challenges. Heterogeneity, scale, opportuneness, manysided quality and security are sure difficulties of Big Data mining.

Heterogeneity and Incompleteness

Big Data investigation is the way toward applying progressed examination
and perception procedures to huge information sets to reveal shrouded
examples and obscure relationships for compelling choice making. The
examination of Big Data includes numerous particular stages which
incorporate information obtaining what's more, recording, data extraction
and cleaning, information incorporation, total and representation, inquiry
handling, information demonstrating and investigation and Interpretation.
Each of these stages presents challenges. HeteThe challenges of enormous
information examination get from its substantial scale and additionally the
nearness of blended information taking into account distinctive examples or
standards (heterogeneous blend information) in the gathered and put away
information. On account of muddled heterogeneous blend information, the
information has a few examples and rules and the properties of the
examples change incredibly. Information can be both organized and
unstructured. 80% of the information produced by associations are
unstructured. They are exceptionally rapid and does not have specific
organization. It might exists as email connections, pictures, pdf reports,
therapeutic records, X beams, phone messages, design, video, sound and so
on and they can't be put away in line/section position as organized
information. Changing this information to organized organization for later
examination is a noteworthy test in huge information mining. So new
advances must be received for managing such information. Inadequate
information makes vulnerabilities amid information examination and it must
be overseen amid information investigation. Doing this effectively is
additionally a test. Deficient information alludes to the missing of
information field values for a few examples. The missing qualities can be
brought on by various substances, for example, the glitch of a sensor hub, or
some deliberate strategies to purposefully skirt a few qualities. While most

advanced information mining calculations have inbuilt answers for handle

missing qualities (such as disregarding information fields with missing
qualities), information attribution is a built up examination field which tries
to ascribe missing qualities keeping in mind the end goal to create enhanced
models (contrasted with the ones worked from the first information).
Numerous attribution strategies exist for this reason, and the major
methodologies are to fill most every now and again watched values or to
construct learning models to foresee conceivable qualities for every
information field, in light of the watched estimations of a given example.
rogeneity, scale, opportuneness, many-sided quality and security are sure
difficulties of huge information mining.

Scale and complexity

Managing large and rapidly increasing volumes of data is a challenging issue.
Traditional software tools are not enough for managing the increasing volumes of
data. Data analysis, organization, retrieval and modeling are also challenges due to
scalability and complexity of data that needs to be analysed.

Timeliness
As the size of the information sets to be prepared expands, it will take more
opportunity to dissect. In a few circumstances aftereffects of the
examination is required promptly. For instance, if a fake charge card
exchange is suspected, it ought to in a perfect world be hailed before the
exchange is finished by keeping the exchange from occurring by any means.
Clearly a full investigation of a client's buy history is not prone to be practical
progressively. So we have to create halfway results ahead of time so that a
little measure of incremental calculation with new information can be utilized
to touch base at a snappy assurance.
Given a substantial information set, it is regularly important to discover
components in it that meet a predetermined foundation. In the course of
information examination, this kind of hunt is liable to happen over and over.
Checking the whole information set to discover appropriate components is
clearly unfeasible. In such cases Index structures are made ahead of time to
allow discovering qualifying components rapidly. The issue is that every file
structure is intended to bolster just a few classes of criteria.

SECURITY AND PRIVACY CHALLENGES FOR BIG DATA

Big data refers to collections of data sets with sizes outside the ability of
commonly used software tools such as database management tools or
traditional data processing applications to capture, manage, and analyze
within an acceptable elapsed time. Big data sizes are constantly increasing,
ranging from a few dozen terabytes in 2012 to today many petabytes of data
in a single data set.
Big data creates tremendous opportunity for the world economy both in the
field of national security and also in areas ranging from marketing and credit
risk analysis to medical research and urban planning. The extraordinary
benefits of big data are lessened by concerns over privacy and data
protection. As big data expands the sources of data it can use, the trust
worthiness of each data source needs to be verified and techniques should
be explored in order to identify maliciously inserted data. Information
security is becoming a big data analytics problem where massive amount of
data will be correlated, analyzed and mined for meaningful patterns. Any
security control used for big data must meet the following requirements:
It must not compromise the basic functionality of the cluster.
It should scale in the same manner as the cluster.
It should not compromise essential big data characteristics.
It should address a security threat to big data environments or data
stored within the
Cluster
Unauthorized release of information, unauthorized modification of information and
denial of resources are the three categories of security violation. The following are
some of the security threats:
An unauthorized user may access files and could execute arbitrary code or carry
out
further attacks.
An unauthorized user may eavesdrop/sniff to data packets being sent to client.
An unauthorized client may read/write a data block of a file.
An unauthorized client may gain access privileges and may submit a job to a
queue or delete or change priority of the job.
Security of big data can be enhanced by using the techniques of authentication,
authorization, encryption and audit trails. There is always a possibility of occurrence
of security violations by unintended, unauthorized access or inappropriate access
by privileged users. The following are some of the methods used for protecting big
data:

1. Using authentication methods

2. Use file encryption
3. Implementing access controls
4. Use key management
5. Logging
6. Use secure communication
TECHNIQUES FOR BIG DATA MINING

Big data has great potential to produce useful information for companies
which can benefit the way they manage their problems. Big data analysis is
becoming indispensable for automatic discovering of intelligence that is
involved in the frequently occurring patterns and hidden rules.
These massive data sets are too large and complex for humans to effectively
extract useful information without the aid of computational tools. Emerging
technologies such as the Hadoop framework and MapReduce offer new and
exciting ways to process and transform big data, defined as complex,
unstructured, or large amounts of data, into meaningful knowledge.

Hadoop
Hadoop is a scalable, open source, fault tolerant Virtual Grid operating
system architecture for data storage and processing. It runs on commodity
hardware, it uses HDFS which is fault-tolerant high bandwidth clustered
storage architecture. It runs MapReduce for distributed data processing and
is works with structured and unstructured data [11]. For handling the
velocity and heterogeneity of data, tools like Hive, Pig and Mahout are used
which are parts of Hadoop and HDFS framework. Hadoop and HDFS (Hadoop
Distributed File System) by Apache is widely used for storing and managing
big data.
Hadoop consists of distributed file system, data storage and analytics
platforms and a layer that handles parallel computation, rate of flow
(workflow) and configuration administration [6].
HDFS runs across the nodes in a Hadoop cluster and together connects the
file systems on many input and output data nodes to make them into one
big file system. The present Hadoop ecosystem, as shown in Figure 1,
consists of the Hadoop kernel, MapReduce, the Hadoop distributed file
system (HDFS) and a number of related components such as Apache Hive,
HBase, Oozie, Pig and Zookeeper and these components are explained as
below:
HDFS: A highly faults tolerant distributed file system that is responsible for
storing data on the clusters.
MapReduce: A powerful parallel programming technique for distributed
processing of vast amount of dataon clusters.
HBase: A column oriented distributed NoSQL database for random
read/write access.
Pig: A high level data programming language for analyzing data of Hadoop
computation.
Hive: A data warehousing application that provides a SQL like access and
relational model.
Sqoop: A project for transferring/importing data between relational
databases and Hadoop.
Oozie: An orchestration and workflow management for dependent Hadoop
jobs.

MapReduce:
MapReduce is a programming model for processing large data sets with a
parallel, distributed algorithm on a cluster. Hadoop MapReduce is a
programming model and software framework for writing applications that
rapidly process vast amounts of data in parallel on large clusters of compute
nodes [11].
The MapReduce consists of two functions, map() and reduce(). Mapper
performs the tasks of filtering and sorting and reducer performs the tasks of
summarizing the result. There may be multiple reducers to parallelize the
aggregations [7]. Users can implement their own processing logic by
specifying a customized map() and reduce() function. The map() function
takes an input key/value pair and produces a list of intermediate key/value
pairs. The MapReduce runtime system groups together all intermediate pairs
based on the intermediate keys and passes them to reduce() function for
producing the final results. Map Reduce is widely used for the Analysis of big
data.
Large scale data processing is a difficult task. Managing hundreds or
thousands of processors and managing parallelization and distributed
environments makes it more difficult. Map Reduce provides solution to the
mentioned issues since it supports distributed and parallel I/O scheduling. It
is fault tolerant and supports scalability and it has inbuilt processes for
status and monitoring of heterogeneous and large datasets as in Big Data
[11].

CONCLUSION
The amounts of data is growing exponentially worldwide due to the
explosion of social networking sites, search and retrieval engines, media
sharing sites, stock trading sites, news sources and so on. Big Data is
becoming the new area for scientific data research and for business
applications. Big data analysis is becoming indispensable for automatic
discovering of intelligence that is involved in the frequently occurring
patterns and hidden rules. Big data analysis helps companies to take better
decisions, to predict and identify changes and to identify new opportunities.
In this paper we discussed about the issues and challenges related to big
data mining and also Big Data analysis tools like Map Reduce over Hadoop
and HDFS which helps organizations to better understand their customers
and the marketplace and to take better decisions and also helps researchers
and scientists to extract useful knowledge out of Big data. In addition to that
we introduce some big data mining tools and how to extract a significant
knowledge from the Big Data. That will help the research scholars to choose
the best mining tool for their work.

REFERENCES

[1] Julie M. David, Kannan Balakrishnan, (2011), Prediction of Key Symptoms

of Learning Disabilities in School-Age Children using Rough Sets, Int. J. of
Computer and Electrical Engineering, Hong Kong, 3(1), pp163-169.
[2] Julie M. David, Kannan Balakrishnan, (2011), Prediction of Learning
Disabilities in School-Age Children using SVM and Decision Tree, Int. J. of
Computer Science and Information Technology, ISSN 0975-9646, 2(2),
pp829-835.
[3] Albert Bifet, (2013), Mining Big data in Real time, Informatica 37,
pp15-20
[4] Richa Gupta, (2014), Journey from data mining to Web Mining to Big
Data, IJCTT, 10(1),pp18-20
[5] http://www.domo.com/blog/2014/04/data-never-sleeps-2-0/

Management Information systems - MIS: Business strategy books, #4
From Everand
Management Information systems - MIS: Business strategy books, #4
SANJIVAN SAINI
No ratings yet
Big Data Research Paper
No ratings yet
Big Data Research Paper
14 pages
Business and Data Analytics
No ratings yet
Business and Data Analytics
4 pages
Building Recommendation System Using Movielens Data
No ratings yet
Building Recommendation System Using Movielens Data
6 pages
Vilfredo Pareto Beyond Disciplinary Boundaries
100% (2)
Vilfredo Pareto Beyond Disciplinary Boundaries
214 pages
Big Data Summery
No ratings yet
Big Data Summery
9 pages
Big Data Smart Cities
0% (1)
Big Data Smart Cities
52 pages
Big Data Assighmwnt 2
No ratings yet
Big Data Assighmwnt 2
60 pages
Submitted To Submitted By: (Ec Department) (1416531025)
No ratings yet
Submitted To Submitted By: (Ec Department) (1416531025)
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Dbms Mini Project
No ratings yet
Dbms Mini Project
19 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
Artificial Intelligence Big Data 01 Research Paper
No ratings yet
Artificial Intelligence Big Data 01 Research Paper
32 pages
Data Mining
100% (1)
Data Mining
18 pages
Cloud Computing Big Data Technology
No ratings yet
Cloud Computing Big Data Technology
2 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
Presented by Sourabh Chavan (20-HR-03) Madhuri Dond (20-HR-04) Anvay Madavi (20-HR-07) Neha Malwani (20-HR-08) Smita Varma (20-HR-19)
100% (2)
Presented by Sourabh Chavan (20-HR-03) Madhuri Dond (20-HR-04) Anvay Madavi (20-HR-07) Neha Malwani (20-HR-08) Smita Varma (20-HR-19)
11 pages
Why Data Preprocessing?: Incomplete
No ratings yet
Why Data Preprocessing?: Incomplete
17 pages
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
No ratings yet
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
5 pages
DEA-7TT2 Associate-Data Science and Big Data Analytics v2 Exam
0% (1)
DEA-7TT2 Associate-Data Science and Big Data Analytics v2 Exam
4 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Text Analytics: Visualizing and Analyzing Open-Ended Text Data
No ratings yet
Text Analytics: Visualizing and Analyzing Open-Ended Text Data
6 pages
Toward Scalable Systems For Big Data Analytics: A Technology Tutorial
No ratings yet
Toward Scalable Systems For Big Data Analytics: A Technology Tutorial
36 pages
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
No ratings yet
Data Science and Its Relationship To Big Data and Data-Driven Decision Making
22 pages
Big Data Analytics in Weather Forecasting
No ratings yet
Big Data Analytics in Weather Forecasting
29 pages
Big Data
No ratings yet
Big Data
16 pages
Big Data Analysis
100% (1)
Big Data Analysis
30 pages
Project Report: "To Study The Online Shopping Portal Using PHP & Mysql"
No ratings yet
Project Report: "To Study The Online Shopping Portal Using PHP & Mysql"
47 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
100% (5)
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
63 pages
Big Data Engineering PDF
No ratings yet
Big Data Engineering PDF
17 pages
Data Mining
100% (13)
Data Mining
25 pages
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
From Everand
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
Andrei Gheorghiu
No ratings yet
IBM Watson - How Cognitive Computing Can Be Applied
No ratings yet
IBM Watson - How Cognitive Computing Can Be Applied
14 pages
Fundamentals of Business Analytics
No ratings yet
Fundamentals of Business Analytics
5 pages
Big Data in E-Commerce
100% (2)
Big Data in E-Commerce
21 pages
An Introduction To Big Data
No ratings yet
An Introduction To Big Data
31 pages
Introduction To Data Science 5-13
No ratings yet
Introduction To Data Science 5-13
19 pages
Distributed Database System
No ratings yet
Distributed Database System
6 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
14 pages
Big Data Analytics Presentation
100% (1)
Big Data Analytics Presentation
34 pages
Health Insurance Cost Prediction Using IBM Watson
No ratings yet
Health Insurance Cost Prediction Using IBM Watson
27 pages
Machine Learning: Abstract
No ratings yet
Machine Learning: Abstract
11 pages
Chapter 3 Data Exploration
No ratings yet
Chapter 3 Data Exploration
84 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
User Interface Design Guidelines - Nielsen & Molich
No ratings yet
User Interface Design Guidelines - Nielsen & Molich
8 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
Customer Segmentation Based On GRFM Case Study
No ratings yet
Customer Segmentation Based On GRFM Case Study
6 pages
Data Analytics
100% (1)
Data Analytics
24 pages
BigDataAnalytics
100% (1)
BigDataAnalytics
36 pages
Data Warehousing & Mining: Unit - V
100% (2)
Data Warehousing & Mining: Unit - V
13 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Real-World Use of Big Data in Telecommunications
No ratings yet
Real-World Use of Big Data in Telecommunications
20 pages
Bussiness Intelligence
No ratings yet
Bussiness Intelligence
6 pages
Seminar
No ratings yet
Seminar
16 pages
Data Management
From Everand
Data Management
IntroBooks Team
No ratings yet
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
Data Governance for Tax Administrations: A Practical Guide
From Everand
Data Governance for Tax Administrations: A Practical Guide
Inter-American Center of Tax Administrations – CIAT
No ratings yet
The Effects of Cybercrime in the U.S. and Abroad
From Everand
The Effects of Cybercrime in the U.S. and Abroad
Randall Knight B.S. L.L.B L.L.M.
No ratings yet
Assignment 2
No ratings yet
Assignment 2
2 pages
Visionary Leadership: Great Video On The 3 Most Important
No ratings yet
Visionary Leadership: Great Video On The 3 Most Important
21 pages
(Council of Scientific & Industrial Research) 196, Raja S. C. Mullick Road, Kolkata-32, Website
No ratings yet
(Council of Scientific & Industrial Research) 196, Raja S. C. Mullick Road, Kolkata-32, Website
4 pages
48423B Fusion Whitepaper WEB
No ratings yet
48423B Fusion Whitepaper WEB
8 pages
Cephalopod A
No ratings yet
Cephalopod A
156 pages
Adaptive Finite Element Methods: Lecture Notes Winter Term 2011/12
No ratings yet
Adaptive Finite Element Methods: Lecture Notes Winter Term 2011/12
144 pages
Inferring The Meaning of Unfamiliar Words Using Context
No ratings yet
Inferring The Meaning of Unfamiliar Words Using Context
10 pages
Non-Pharmacological Pain Management
No ratings yet
Non-Pharmacological Pain Management
19 pages
International Journal of Chemtech Research: Anil Kumar Reddy Chammireddy, Karthikeyan J
No ratings yet
International Journal of Chemtech Research: Anil Kumar Reddy Chammireddy, Karthikeyan J
5 pages
Mission Statement: Definition of Genre
No ratings yet
Mission Statement: Definition of Genre
3 pages
Hostel Design Thinking
No ratings yet
Hostel Design Thinking
2 pages
T - Test
100% (2)
T - Test
32 pages
حل,واجبات,الجامعة,العربية, 00966597837185 T215a,المفتوحة, حل واجب T215a حلول واجبات الجامعة العربية المفتوحة
No ratings yet
حل,واجبات,الجامعة,العربية, 00966597837185 T215a,المفتوحة, حل واجب T215a حلول واجبات الجامعة العربية المفتوحة
7 pages
Ubd Template
No ratings yet
Ubd Template
2 pages
Chapter 7 Flashcards - Quizlet
No ratings yet
Chapter 7 Flashcards - Quizlet
3 pages
Leadership Style of Managers in 5 Star Hotels
No ratings yet
Leadership Style of Managers in 5 Star Hotels
6 pages
Self Made Questionnaire Version2
No ratings yet
Self Made Questionnaire Version2
3 pages
ICH Q8 and Q9 - A Review
No ratings yet
ICH Q8 and Q9 - A Review
22 pages
Regenerating Informal Settlements
100% (2)
Regenerating Informal Settlements
27 pages
LP For Reading and Writing Skills
No ratings yet
LP For Reading and Writing Skills
4 pages
Journal of Graphic Novels & Comics: Publication Details, Including Instructions For Authors and Subscription Information
100% (1)
Journal of Graphic Novels & Comics: Publication Details, Including Instructions For Authors and Subscription Information
20 pages
Advances in Engineering Software: M.J. Esfandiari, G.S. Urgessa, S. Sheikholare Fin, S.H. Dehghan Manshadi
No ratings yet
Advances in Engineering Software: M.J. Esfandiari, G.S. Urgessa, S. Sheikholare Fin, S.H. Dehghan Manshadi
12 pages
Chem JUJ K1 K2 K3 Skema Jawapan SET 2
33% (6)
Chem JUJ K1 K2 K3 Skema Jawapan SET 2
18 pages
Working Drawing: UV Sterilizer
No ratings yet
Working Drawing: UV Sterilizer
19 pages
Factual Description
No ratings yet
Factual Description
14 pages
Linear Vibration Analysis of Cantilever Plates Partially Submerged in Fluid
No ratings yet
Linear Vibration Analysis of Cantilever Plates Partially Submerged in Fluid
13 pages
Heat Transfer
No ratings yet
Heat Transfer
12 pages
Crop Tool and Lasso Tool Lesson Plan
No ratings yet
Crop Tool and Lasso Tool Lesson Plan
2 pages
Simple de Morgan Law and Biconditional
No ratings yet
Simple de Morgan Law and Biconditional
51 pages

Big Data Research Paper

Uploaded by

Big Data Research Paper

Uploaded by

ISSUES, CHALLENGES, AND

SOLUTIONS: BIG DATA

KEYWORDS: Big data mining, Security, Hadoop, MapReduce

definition, data mining is the non-trivial extraction of certain previously

Enormous information alludes to the utilization of huge information sets to

ISSUES AND CHALLENGES

Heterogeneity and Incompleteness

advanced information mining calculations have inbuilt answers for handle

Scale and complexity

SECURITY AND PRIVACY CHALLENGES FOR BIG DATA

1. Using authentication methods

[1] Julie M. David, Kannan Balakrishnan, (2011), Prediction of Key Symptoms

You might also like