0% found this document useful (0 votes)

136 views25 pages

Mini Project Doc 2

This document provides an overview of using big data in healthcare analytics. It discusses the 3 V's of big data - volume, velocity, and variety. It then describes the Hadoop ecosystem including HDFS for storage, MapReduce for processing, Pig and Hive for querying, Sqoop for data transfer, and Impala and Cloudera for analytics. The document outlines how these tools can be used to get healthcare data from various sources, store it in HDFS, process it using MapReduce, query it using Pig/Hive, and perform analytics and visualization. Screenshots of implementing the system are also included to demonstrate loading and analyzing healthcare data using Hadoop technologies.

Uploaded by

Likhil Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views25 pages

Mini Project Doc 2

Uploaded by

Likhil Goud

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

A

MINI PROJECT REPORT

on
HEALTH CARE ANALYTICS USING BIGDATA

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE AND ENGINEERING

Submit by
CH. Likhil Kumar Goud

(197Y1A0521)

Y. Navadeep Reddy
(197Y1A0526)

Under the Guidance of

Mrs. K. Jaysri (Assistant Professor)

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING
MARRI LAXMAN REDDY

INSTITUTE OF TECHNOLOGY AND MANAGEMENT

(AUTONOMOUS)

(Affiliated to JNTU-H, Approved by AICTE New Delhi and Accredited by

NBA & NAAC With ‘A’ Grade)
CERTIFICATE

This is to certify that the project report titled “Health Care Analytics using Bigdata” is

being submitted by CH. Likhil Kumar Goud (197Y1A0521) in IV B.Tech I Semester

Computer Science & Engineering is a record bonafide work carried out by him. The

results embodied in this report have not been submitted to any other University for the

award of anydegree.

Internal Guide HOD

Principal External Examiner

DECLARATION

I hereby declare that the Minor Project Report entitled, “Health Care Analytics using

Bigdata” submitted for the B.Tech degree is entirely my work and all ideas and references

have been duly acknowledged. It does not contain any work for the award of any other

degree.

Date:

CH.Likhil Kumar Goud

(197Y1A0521)

Y. Navadeep Reddy

(197Y1A0526)
Health Care Analytics

ACKNOWLEDGEMENT

I am happy to express my deep sense of gratitude to the principal of the college Dr. K.
Venkateswara Reddy, Professor, Department of Computer Science and Engineering, Marri
Laxman Reddy Institute of Technology & Management, for having provided me with adequate
facilities to pursue myproject.

I would like to thank Mr. Abdul Basith Khateeb, Assoc. Professor and Head, Department of
Computer Science and Engineering, Marri Laxman Reddy Institute of Technology &
Management, for having provided the freedom to use all the facilities available in the
department, especially the laboratories and the library.

I am very grateful to my project guide Mrs. K.Jaysri, Assi. Prof., Department of Computer
Science and Engineering, Marri Laxman Reddy Institute of Technology & Management, for his
extensive patience and guidance throughout my project work.

I sincerely thank my seniors and all the teaching and non-teaching staff of the Department of
Computer Science for their timely suggestions, healthy criticism and motivation during the
course of this work.

I would also like to thank my classmates for always being there whenever I needed help or
moral support. With great respect and obedience, I thank my parents and brother who were the
backbone behind my deeds.

Finally, I express my immense gratitude with pleasure to the other individuals who have either
directly or indirectly contributed to my need at right time for the development and success of
this work.

Department of CSE, MLRITM Page 4

September 2022
Health Care Analytics

CONTENTS

TABLE OF CONTENTS:
Certificates ii

Acknowledgement

Abstract vii

Chapter 1: Introduction 1

Chapter 2: Literature survey 2

1. INTRODUCTION
1.1 Bigdata 3V’s

1.2 Ecosystem
Hdfs
Map reduce
Pig
Hive

Sqoop

Impala

1.3 Applications of Bigdata:

Department of CSE, MLRITM Page 5
September 2022
Health Care Analytics

1.4 Cloudera

1.5 Hue
2. LITERATURE SURVEY

2.1 Existing system

2.2 Proposed system

3. REQUIREMENT ANALYSIS

3.1 Hardware requirements

3.2 Software requirements
4.IMPLEMENTATION

4.1 Problem Definition:

4.2 System Architecture
Get to the Source
Ingestion Strategy and Acquisition
Storage
Data processing
. Export Data sets

. Reporting and visualization

. Data Exploration

. Adhoc Querying

Department of CSE, MLRITM Page 6

September 2022
Health Care Analytics

5. METHODOLOGY

5.1 how Hdfs is used in our project:

5.2 how hive is used:
5.3 how cloudera is used:
5.4 how hue is used:
5.5 how sqoop is used:
6. SCREENSHOTS

 To create database
 To create table
 To display fields
 Loading data into mysql
 To import data from mysql to hdfs
 COMPILATION TIME

Department of CSE, MLRITM Page 7

September 2022
Health Care Analytics

LIST OF FIGURES
4.2 System Architecture 17
5.1 how Hdfs is used in our project 21

Department of CSE, MLRITM Page 8

September 2022
Health Care Analytics

LIST OF TABLES

6. SCREENSHOTS 22

Department of CSE, MLRITM Page 9

September 2022
Health Care Analytics

ABSTRACT
In today's modern world, healthcare also needs to be modernized. It
means that the healthcare data should be properly analyzed so that we
can categorize it into groups of Gender, Disease, City, symptoms and
treatment.
BIGDATA is used to predict epidemics, cure disease, improve quality
of life and avoid preventable deaths. With the increasing population of
the world, and everyone living longer, models of treatment delivery are
rapidly changing and many of the decision behind those changes are
being driven by data.
The drive now is to understand as much as a patient as possible as early
in their life as possible, hopefully picking up warning signs of serious
illness at early enough stage that treatment is far simpler and less
expensive than if it had not been spotted until later. The gigantic size of
analytics will need large computation which can be done with the help
of distributed processing HADOOP.
The frameworks use will provide multipurpose beneficial outputs which
includes getting the healthcare data analysis into various forms.The
groups made by the system would be symptoms wise, age wise, gender
wise, season wise, disease wise etc. As the system will display the data
group wise, it would be helpful to get a clear idea about the disease and
their rate of spreading, so that appropriate treatment could be given at
proper time.

Department of CSE, MLRITM Page 10

September 2022
Health Care Analytics

1. INTRODUCTION
1.1 Bigdata 3V’s:

The 3Vs that define Big Data are Variety, Velocity

and Volume.
Volume
We currently see the exponential growth in the data storage as
the data is now more than text data. We can find data in the
format of videos, music and large images on our social media
channels. It is very common to have Terabytes and Petabytes of
the storage system for enterprises. As the database grows the
applications and architecture built to support the data needs to be
reevaluated quite often. Sometimes the same data is re-evaluated
with multiple angles and even though the original data is the
same the new found intelligence creates explosion of the data.
The big volume indeed represents Big Data.

Velocity
The data growth and social media explosion have changed how
we look at the data. There was a time when we used to believe
that data of yesterday is recent. The matter of the fact
newspapers is still following that logic. However, news channels
and radios have changed how fast we receive the news. Today,
people reply on social media to update them with the latest
happening. They often discard old messages and pay attention to
recent updates. The data movement is now almost real time and
the update window has reduced to fractions of the seconds. This
high velocity data represent Big Data.

Department of CSE, MLRITM Page 11

September 2022
Health Care Analytics

Variety
Data can be stored in multiple format. For example database,
excel, csv, access or for the matter of the fact, it can be stored in
a simple text file. Sometimes the data is not even in the
traditional format as we assume, it may be in the form of video,
SMS, pdf or something we might have not thought about it. It is
the need of the organization to arrange it and make it
meaningful. It will be easy to do so if we have data in the same
format, however it is not the case most of the time. The real
world have data in many different formats and that is the
challenge we need to overcome with the BigData. This variety
of the data represent represent Big Data.

1.2 Ecosystem
Hdfs:
HDFS is built to support applications with large data sets,
including individual files that reach into the terabytes. It uses a
master/slave architecture, with each cluster consisting of a single
Namenode that manages file system operations and supporting
Datanodes that manage data storage on individual compute
nodes.
Map reduce:
MapReduce is a programming model for processing large
data sets with a parallel , distributed algorithm on a

Department of CSE, MLRITM Page 12

September 2022
Health Care Analytics

cluster. MapReduce model consist of two separate routines,

namely Map-function and Reduce-function. The computation on
an input in MapReduce model occurs in three stages:
In the map stage, the mapper takes a single(key, value) pair
as input and produces any number of pairs as output .
The shuffle stage is automatically handled by the
MapReduce framework. The underlying system implementing
MapReduce routes all of the values that are associated with an
individual key to the same reducer.
In the reduce stage, the reducer takes all of the values
associated with a single key k and outputs any number of pairs.
Pig:
Pig is a high-level platform for creating programs that run
on Apache Hadoop. The language for this platform is called Pig
Latin. Pig Latin abstracts the programming from
the Java MapReduce idiom into a notation which makes
MapReduce programming high level, similar to that
of SQL for RDBMS.

Hive:
Hive is a data warehouse infrastructure tool to process
structured data in Hadoop. It resides on top of Hadoop to
summarize Big Data, and makes querying and analyzing easy.
Hive gives an SQL-like interface to query data stored in various

Department of CSE, MLRITM Page 13

September 2022
Health Care Analytics

databases and file systems that integrate with Hadoop. The

traditional SQL queries must be implemented in
the MapReduce Java API to execute SQL applications and
queries over a distributed data.
Sqoop:
Sqoop is a command-line interface application for
transferring data between relational databases and Hadoop. It is
a big data tool that offers the capability to extract datafrom
non-Hadoop data stores, transform the data into a form usable
by Hadoop, and then load the data into HDFS. This process is
called ETL, for Extract, Transform, and Load.
Impala:
Cloudera Impala is Cloudera's open source massively
parallel processing (MPP) SQL query engine for data stored in
a computer cluster running Apache Hadoop. Impala brings
scalable parallel database technology to Hadoop, enabling users
to issue low-latency SQL queries to data stored
in HDFS and Apache HBase without requiring data movement
or transformation. Impala is integrated with Hadoop to use the
same file and data formats, metadata, security and resource
management frameworks used by MapReduce, Apache
Hive, Apache Pig and other Hadoop software.
1.3 Applications of Bigdata:
Healthcare contributions

Banking Sectors and Fraud Detection

Department of CSE, MLRITM Page 14

September 2022
Health Care Analytics

Private sector uses the big data in traffic management,

direction preparation,intellectual transportation arrangements
and overcrowding administration.
Private sector uses the big data in income administration,
industrial improvements, logistics and for reasonable benefit.

1.4 Cloudera:
Cloudera's open-source Apache Hadoop distribution, CDH
(Cloudera Distribution Including Apache Hadoop), targets
enterprise-class deployments of that technology. Cloudera says
that more than 50% of its engineering output is donated
upstream to the various Apache-licensed open source projects
(Apache Hive, Apache Avro, Apache HBase, and so on) that
combine to form the Hadoop platform.
1.5 Hue:
Hue is an open source Web interface for analyzing data with
any Apache Hadoop.Hue allows technical and non-technical
users to take advantage of Hive, Pig, and many of the other tools
that are part of the Hadoop.
You can load your data, runinteractive Hive queries, develop
and run Pig scripts, work with HDFS, check on the status of
your jobs, and more. Hue’s File Browser allows you to browse

Department of CSE, MLRITM Page 15

September 2022
Health Care Analytics

Amazon Simple Storage Service (S3) buckets and you can use
the Hive editor to run queries against data stored in S3.
2. LITERATURE SURVEY
2.1 Existing system:
The existing systems are done using RDBMS which stores
data in the form of tables. RDBMS allows to store only
structured data.
When any user want to know about the basic information of
diseases the person will interact the concern hospital and if the
user want to take appointment user want to go directly to the
hospital to fix the appointment . if the user is enable to go
hospital in particular time. User will enable to take appointment
instantly.

2.2 Proposed system:

The proposed system will group together the disease and
their symptoms data and analyze it to provide cumulative
information. After the analysis, algorithm could be applied to
the resultant and grouping can be made to show a clear picture
of the analysis.

3. REQUIREMENT ANALYSIS
3.1 Hardware requirements
Processor
16 GB Memory
4 TB Disk

Department of CSE, MLRITM Page 16

September 2022
Health Care Analytics

3.2 Software requirements

VM ware
Linux OS

4.IMPLEMENTATION
4.1 Problem Definition:
Health care analytics using big data hadoop.
4.2 System Architecture

Get to the Source!

Source profiling is one of the most important steps in deciding
the architecture. It involves identifying the different source
systems and categorizing them based on their nature and type.
Points to be considered while profiling the data sources:
 Identify the internal and external sources systems
 High Level assumption for the amount of data ingested
from each source
 Identify the mechanism used to get data – push or pull
 Determine the type of data source – Database, File, web
service, streams etc.
 Determine the type of data – structured, semi structured or
unstructured.
Ingestion Strategy and Acquisition

Department of CSE, MLRITM Page 17

September 2022
Health Care Analytics

Data ingestion in the Hadoop world means ELT (Extract, Load

and Transform) as opposed to ETL (Extract, Transform and
Load) in case of traditional warehouses.
Points to be considered:
 Determine the frequency at which data would be ingested
from each source
 Is there a need to change the semantics of the data append
replace etc?
 Is there any data validation or transformation required
before ingestion (Pre-processing)?
 Segregate the data sources based on mode of ingestion –
Batch or real-time
Storage:
Hadoop distributed file system is the most commonly used
storage framework in BigData world, others are the NoSql data
stores – MongoDB, HBase, Cassandra etc. One of the salient
features of Hadoop storage is its capability to scale, self-manage
and self-heal.
Things to consider while planning storage methodology:
 Type of data (Historical or Incremental)
 Format of data ( structured, semi structured and
unstructured)
 Compression requirements
 Frequency of incoming data
 Query pattern on the data
 Consumers of the data
Data processing:
Earlier frequently accessed data was stored in Dynamic
RAMs but now due to the sheer volume, it is been stored on
Department of CSE, MLRITM Page 18
September 2022
Health Care Analytics

multiple disks on a number of machines connected via the

network. Instead of bringing the data to processing, in the new
way, processing is taken closer to data which significantly
reduce the network I/O.The Processing methodology is driven
by business requirements. It can be categorized into Batch, real-
time or Hybrid based on the SLA.
 Batch Processing – Batch is collecting the input for a
specified interval of time and running transformations on it
in a scheduled way. Historical data load is a typical batch
operation.
Technology Used: MapReduce, Hive, Pig
 Real-time Processing – Real-time processing involves
running transformations as and when data is acquired.
Technology Used: Impala, Spark, spark SQL.
 Hybrid Processing – It’s a combination of both batch and
real-time processing needs.
Data consumption:
Different users like administrator, Business users, vendor,
partners etc. can consume data in different format. Output of
analysis can be consumed by recommendation engine or
business processes can be triggered based on the analysis.
Different forms of data consumption are:
 Export Data sets – There can be requirements for third
party data set generation. Data sets can be generated using
hive export or directly from HDFS.

Department of CSE, MLRITM Page 19

September 2022
Health Care Analytics

 Reporting and visualization – Different reporting and

visualization tool scan connect to Hadoop using
JDBC/ODBC connectivity to hive.
 Data Exploration – Data scientist can build models and
perform deep exploration in a sandbox environment.
Sandbox can be a separate cluster (Recommended
approach) or a separate schema within same cluster that
contains subset of actual data.
 Adhoc Querying – Adhoc or Interactive querying can be
supported by using Hive, Impala or spark SQL.
5. METHODOLOGY
5.1 how Hdfs is used in our project:
HDFS holds very large amount of data and provides
easier access. To store such huge data, the files are stored
across multiple machines. These files are stored in
redundant fashion to rescue the system from possible data
losses in case of failure.
HDFS also makes applications available to Parallel
processing. HDFS mainly consists of two nodes
 Namenode
 Datanode

Department of CSE, MLRITM Page 20

September 2022
Health Care Analytics

5.2 how hive is used:

Hive gives an SQL-like interface to query data stored
in various databases and file systems that integrate
with Hadoop.
Hive supports easy portability of SQL-based application
to Hadoop.
SQL statements are broken down by the Hive service
into MapReducejobs and executed across a Hadoop cluster.

5.3 how cloudera is used:

5.4 how hue is used:
5.4 how sqoop is used:

Department of CSE, MLRITM Page 21

September 2022
Health Care Analytics

6. SCREENSHOTS
 To create database
 To create table

 To display fields

Department of CSE, MLRITM Page 22

September 2022
Health Care Analytics

 Loading data into mysql

 To import data from mysql to hdfs

Department of CSE, MLRITM Page 23

September 2022
Health Care Analytics

COMPILATION TIME

Department of CSE, MLRITM Page 24

September 2022
Health Care Analytics

Department of CSE, MLRITM Page 25

September 2022

Real-Time Data Processing & Analytics - Distributed Computing & Event Processing Using Spark, Flink, Storm, Kafka
100% (4)
Real-Time Data Processing & Analytics - Distributed Computing & Event Processing Using Spark, Flink, Storm, Kafka
422 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
14 pages
Big Data Analytics in Healthcare
100% (3)
Big Data Analytics in Healthcare
193 pages
Mar Publishing
No ratings yet
Mar Publishing
7 pages
Big Data Analytics For Healthcare Industry
100% (1)
Big Data Analytics For Healthcare Industry
20 pages
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
No ratings yet
Big Data Analytics For Healthcare Industry: Impact, Applications, and Tools
10 pages
Bsa Assignment
No ratings yet
Bsa Assignment
13 pages
Big Data Final Report
100% (3)
Big Data Final Report
34 pages
The Role of Big Data Analytics in Hospital Management System
No ratings yet
The Role of Big Data Analytics in Hospital Management System
6 pages
Seminar Big Data in Health Care
No ratings yet
Seminar Big Data in Health Care
36 pages
Seminar Big Data in Health Care
No ratings yet
Seminar Big Data in Health Care
33 pages
Case Study DS-BDA
No ratings yet
Case Study DS-BDA
29 pages
Big Data Analytics in Health Care A Review Paper
No ratings yet
Big Data Analytics in Health Care A Review Paper
12 pages
Sample
No ratings yet
Sample
11 pages
Healt Care
No ratings yet
Healt Care
22 pages
Big Data Hadoop in Health Care
No ratings yet
Big Data Hadoop in Health Care
51 pages
10 1109ICoAC44903 2018 8939061
No ratings yet
10 1109ICoAC44903 2018 8939061
9 pages
Ojha 2016
No ratings yet
Ojha 2016
7 pages
Final Major Team 3 Document
No ratings yet
Final Major Team 3 Document
92 pages
Healthcare Analytics On Patient Data Using Big Data Technologies For Disease Prediction and Readmission Analysis
No ratings yet
Healthcare Analytics On Patient Data Using Big Data Technologies For Disease Prediction and Readmission Analysis
6 pages
The Role of Data Science in Healthcare Advancement
No ratings yet
The Role of Data Science in Healthcare Advancement
11 pages
Innovative Project1
No ratings yet
Innovative Project1
25 pages
Introduction To Big Data PDF
No ratings yet
Introduction To Big Data PDF
31 pages
Rohit Manna SEM3 CA2 MSD303 MSC (DA)
No ratings yet
Rohit Manna SEM3 CA2 MSD303 MSC (DA)
4 pages
Hadoop Research Paper
No ratings yet
Hadoop Research Paper
7 pages
Bda Test1 Key Answers
No ratings yet
Bda Test1 Key Answers
7 pages
Gag PDF
No ratings yet
Gag PDF
15 pages
A Novel Framework For Bringing Smart Big Data To Proactive Decision Making in Healthcare
No ratings yet
A Novel Framework For Bringing Smart Big Data To Proactive Decision Making in Healthcare
13 pages
De-Identified Personal Health Care System Using Hadoop
No ratings yet
De-Identified Personal Health Care System Using Hadoop
8 pages
HealthcareBigData AComprehensiveOverview
No ratings yet
HealthcareBigData AComprehensiveOverview
29 pages
BDAHC
No ratings yet
BDAHC
4 pages
FenilSeminar Presentation
No ratings yet
FenilSeminar Presentation
6 pages
Final Big Data Word
No ratings yet
Final Big Data Word
9 pages
Big Data in Health Care Management
No ratings yet
Big Data in Health Care Management
2 pages
Shantanu Main Project
No ratings yet
Shantanu Main Project
41 pages
A Project Report On Web Based Data Management
No ratings yet
A Project Report On Web Based Data Management
16 pages
A Systematic Review On Big Data Applications and Scope For Industrial Processing and Healthcare Sectors
No ratings yet
A Systematic Review On Big Data Applications and Scope For Industrial Processing and Healthcare Sectors
35 pages
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Big Data Health Care Using Tools
No ratings yet
Big Data Health Care Using Tools
18 pages
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
Statistical Analysis of Big Data To Improvise Health Care: February 2018
No ratings yet
Statistical Analysis of Big Data To Improvise Health Care: February 2018
4 pages
2nd Draft of Literature Review
No ratings yet
2nd Draft of Literature Review
6 pages
Big Data in Healthcare
No ratings yet
Big Data in Healthcare
16 pages
Full Doc Janani
No ratings yet
Full Doc Janani
121 pages
Big Data and Health Analytics. ISBN 1482229234, 978-1482229233
100% (22)
Big Data and Health Analytics. ISBN 1482229234, 978-1482229233
23 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
HAP 780 15 Big Data
No ratings yet
HAP 780 15 Big Data
19 pages
SAP HANA in Healthcare Real-Time Big Data Analysis
No ratings yet
SAP HANA in Healthcare Real-Time Big Data Analysis
25 pages
BIG DATA For Healthcare A Survey
No ratings yet
BIG DATA For Healthcare A Survey
12 pages
ARG 3203 Big Data Analytics Course Outline
No ratings yet
ARG 3203 Big Data Analytics Course Outline
2 pages
Bda Cac1
No ratings yet
Bda Cac1
3 pages
Bigdata Teikyo University PDF
No ratings yet
Bigdata Teikyo University PDF
16 pages
BDA-1st Unit
No ratings yet
BDA-1st Unit
39 pages
Application of Big Data Analytics - An Innovation in Health Care
No ratings yet
Application of Big Data Analytics - An Innovation in Health Care
14 pages
Anand J. Kulkarn
No ratings yet
Anand J. Kulkarn
4 pages
Big Datapptfina1
No ratings yet
Big Datapptfina1
25 pages
Big Data in Health Care
No ratings yet
Big Data in Health Care
4 pages
Big Data Criteria
No ratings yet
Big Data Criteria
10 pages
Phim0016 0001f 2
No ratings yet
Phim0016 0001f 2
19 pages
Big Data Is The Future of Healthcare
No ratings yet
Big Data Is The Future of Healthcare
11 pages
The Patient Revolution: How Big Data and Analytics Are Transforming the Health Care Experience
From Everand
The Patient Revolution: How Big Data and Analytics Are Transforming the Health Care Experience
Krisa Tailor
No ratings yet
Shafiq Ahmed Resume
No ratings yet
Shafiq Ahmed Resume
1 page
Lewd Fraggy Readme
No ratings yet
Lewd Fraggy Readme
5 pages
Alv Tree
No ratings yet
Alv Tree
8 pages
ACA Important Topic
No ratings yet
ACA Important Topic
2 pages
Classic: 4 Zone Hybrid Wireless Security System Plus 4
No ratings yet
Classic: 4 Zone Hybrid Wireless Security System Plus 4
4 pages
Class XI (As Per CBSE Board) : Informatics Practices
No ratings yet
Class XI (As Per CBSE Board) : Informatics Practices
9 pages
APR2024 Resume Rahul Administrator
No ratings yet
APR2024 Resume Rahul Administrator
4 pages
UNIT 2 Swing Notes
No ratings yet
UNIT 2 Swing Notes
104 pages
17.1 Sybsc CS SPPU DS Practical Slip Solutions
No ratings yet
17.1 Sybsc CS SPPU DS Practical Slip Solutions
5 pages
The Relationship Between Expert Systems and Red-Black Trees: Samuel Garcia
No ratings yet
The Relationship Between Expert Systems and Red-Black Trees: Samuel Garcia
4 pages
smart2KV2Hardware ManualIP
No ratings yet
smart2KV2Hardware ManualIP
28 pages
SystemVerilog Interface
100% (1)
SystemVerilog Interface
62 pages
Great Persada Komputer Palu: Asus X441Ma Asus A409Ma
No ratings yet
Great Persada Komputer Palu: Asus X441Ma Asus A409Ma
1 page
1 L1 - Introduction To Data Structure and Algorithms
100% (1)
1 L1 - Introduction To Data Structure and Algorithms
96 pages
STEP7 TIA Portal V15 HSP en PDF
No ratings yet
STEP7 TIA Portal V15 HSP en PDF
16 pages
Network System Administration and Security
No ratings yet
Network System Administration and Security
137 pages
ACS880 DC/DC Converter Control Program: Firmware Manual
No ratings yet
ACS880 DC/DC Converter Control Program: Firmware Manual
210 pages
DS PWA64 Kit WB - Datasheet - 20230516
No ratings yet
DS PWA64 Kit WB - Datasheet - 20230516
4 pages
IBM Mainframe Utility Programs
100% (1)
IBM Mainframe Utility Programs
12 pages
Kako Napraviti Windows Instalaciju WinAddons
No ratings yet
Kako Napraviti Windows Instalaciju WinAddons
6 pages
18EC46 Question Bank
100% (1)
18EC46 Question Bank
1 page
COA Experiment No 9-10
No ratings yet
COA Experiment No 9-10
6 pages
Stack Trace 2021.05.24 08-17-38.058
No ratings yet
Stack Trace 2021.05.24 08-17-38.058
2 pages
Developing Mixed Visual Basic/COBOL Applications : Another Way: Object-Oriented COBOL
No ratings yet
Developing Mixed Visual Basic/COBOL Applications : Another Way: Object-Oriented COBOL
7 pages
Sysnopsis Data (2) - Merged
No ratings yet
Sysnopsis Data (2) - Merged
11 pages
TI093 - ConfigMgr Questions
No ratings yet
TI093 - ConfigMgr Questions
7 pages
Performance Test Report: Portsight Secure Access 2.1
No ratings yet
Performance Test Report: Portsight Secure Access 2.1
10 pages
001-84932 PSoC 5LP CY8C58LP Family Datasheet Programmable System-on-Chip PSoC
No ratings yet
001-84932 PSoC 5LP CY8C58LP Family Datasheet Programmable System-on-Chip PSoC
140 pages
Name Synopsis: /requirepackage (Snapshot) % Needed by Bundledoc /documentclass (11Pt) (Article)
No ratings yet
Name Synopsis: /requirepackage (Snapshot) % Needed by Bundledoc /documentclass (11Pt) (Article)
5 pages

Mini Project Doc 2

Uploaded by

Mini Project Doc 2

Uploaded by

A

MINI PROJECT REPORT

COMPUTER SCIENCE AND ENGINEERING

Under the Guidance of

Mrs. K. Jaysri (Assistant Professor)

DEPARTMENT OF COMPUTER SCIENCE AND

INSTITUTE OF TECHNOLOGY AND MANAGEMENT

(Affiliated to JNTU-H, Approved by AICTE New Delhi and Accredited by

being submitted by CH. Likhil Kumar Goud (197Y1A0521) in IV B.Tech I Semester

Internal Guide HOD

Principal External Examiner

CH.Likhil Kumar Goud

Department of CSE, MLRITM Page 4

Chapter 2: Literature survey 2

1.3 Applications of Bigdata:

2.1 Existing system

3.1 Hardware requirements

4.1 Problem Definition:

. Reporting and visualization

Department of CSE, MLRITM Page 6

5.1 how Hdfs is used in our project:

Department of CSE, MLRITM Page 7

Department of CSE, MLRITM Page 8

Department of CSE, MLRITM Page 9

Department of CSE, MLRITM Page 10

The 3Vs that define Big Data are Variety, Velocity

Department of CSE, MLRITM Page 11

Department of CSE, MLRITM Page 12

cluster. MapReduce model consist of two separate routines,

Department of CSE, MLRITM Page 13

databases and file systems that integrate with Hadoop. The

Banking Sectors and Fraud Detection

Department of CSE, MLRITM Page 14

Private sector uses the big data in traffic management,

Department of CSE, MLRITM Page 15

2.2 Proposed system:

Department of CSE, MLRITM Page 16

3.2 Software requirements

Get to the Source!

Department of CSE, MLRITM Page 17

Data ingestion in the Hadoop world means ELT (Extract, Load

multiple disks on a number of machines connected via the

Department of CSE, MLRITM Page 19

 Reporting and visualization – Different reporting and

Department of CSE, MLRITM Page 20

5.2 how hive is used:

5.3 how cloudera is used:

Department of CSE, MLRITM Page 21

Department of CSE, MLRITM Page 22

 Loading data into mysql

 To import data from mysql to hdfs

Department of CSE, MLRITM Page 23

Department of CSE, MLRITM Page 24

Department of CSE, MLRITM Page 25

You might also like