83% found this document useful (6 votes)

3K views50 pages

Project Report Final

This document is a project report on using big data analytics for advertising promotions of restaurants in Delhi. It includes an introduction, methodology, design, development, results, and conclusion. The report details using Hadoop for clickstream analysis to display targeted advertisements to customers based on their profiles.

Uploaded by

Masum Hossain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

83% found this document useful (6 votes)

3K views50 pages

Project Report Final

Uploaded by

Masum Hossain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

APPENDIX 1

Advertising promotions for Restaurants of Delhi using Big

Data Analytics

By
Md. Masum Hossain
Lakshmi
Anugya Saraswat
Pranjal Sinha

A PROJECT REPORT
Submitted to the
Department of Computer Sciences & Engineering
In partial fulfillment of the requirements
for the award of the degree
of

Bachelor of Technology

April, 2016

DECLARATION

I hereby declare that this project work submission is my own work and that, to the best of
my knowledge and belief, it contains no material previously published or written by
another person nor material which has been accepted for the award of any other degree or
diploma of the university or other institute of higher learning, except where due
acknowledgment has been made in the text.

Place:

Signature of the Student

Date

Name: Md .Masum Hossain

Lakshmi
Anugya Saraswat
Pranjal Sinha

iii

CERTIFICATE
This is to certify that the report entitled Advertising promotions for Restaurants of
Delhi using Big Data Analytics By Mr. Md. Masum Hossain (Roll No.120102019),
Ms.Lakshmi (Roll No.120102015), Anugya Saraswat (Roll No.130102801),Pranjal Sinha
(Roll 110101177) to Sharda University, towards the fulfillment of requirements of the
degree of Bachelor of Technology is record of bonafide final year Project work carried out
by him/her in the Department of Computer Science, School of Engineering and
Technology, Sharda University. The results/findings contained in this Project have not
been submitted in part or full to any other University/Institute for award of any other
Degree/Diploma.

Signature of Head of Department

Name: Dr. Ishan Ranjan
(Office seal)

Signature of Supervisor
Name: Ms. Supriya Khaitan
Designation: Asst. Professor

Place:

Date:

Abstract
This project deals for for Advertisement promotion for the restaurants of Delhi applying
the Big Data Analytics. In the current era, Usage of Internet is rapidly increasing. The users
are involved in Distributed processing of mass data through composed by many machines
and personalized search services, based on the user profile have been the hotspots of
research and development.
Hadoop is a software platform which is easy for development and processing mass data. It
is written by Java. Hadoop is scalable, economical, efficient and reliable. It can be
deployed to a big cluster composed of hundreds of low-cost machines.
Main purpose of analysis is extraction of food menu as per user requirement. The system
finds out the users interested field of information through receiving, organizing and
collating the user's information of b browsing, or mining data from the history, such as
browser temporary files, personal favourites and many more.
Choosing a food from online stores is quite confusing for most of the customers.
Customers are interested in buying a food that has been widely acclaimed or available. On
the other side, the owners of restaurants are also interested in knowing where their foods
stand in competition. Both these issues tackled by the analysis of click-stream. Analysis of
clickstreams show how a website is navigated and used by visitors. Click-stream data of
online food stores of Delhi contains information which is useful for understanding the
effectiveness of marketing and merchandising efforts, such as how customers find the store
the food the environment, what food they see, and what food they purchase and finally
what is their feedback.
In our project, are tried to make an effort to help the customers in finding popular, largely
sold food items by the restaurants of Delhi much faster then the normal available methods.
For this purpose intend to create a platform that will maintain user profile and also food
product advertisements. intend to use Hadoop for click-stream analysis based on the user
profile and click-stream analysis; our website will display only that advertisement which
helps the customer in arriving decisions. In other words, contents of bpage displayed to
customer will be determined on the basis of user profile. With the advancement of
technology, large number of people are buying and selling food online. There are
commonly used techniques for online marketing such as use of banner cards, email
campaigns. Being one of the biggest cosmopolitan cities in India Delhi restaurants and its
sole websites are very busy sites. So the effective marketing depends largely on success of

online advertising and how fast is resonse is been given . Analysing the effectiveness of
website is a matter to concern for corporates that rely on b marketing. b purchasing of food
activities involves attracting and retaining customers. Traditional database technology is
indeed useful in managing the online stores. Hover, it has serious limitations. The data
generated by mouse clicks and corresponding logs are too large to be analysed by
traditional technology. New technology such as big data is being explored for finding
solution to above problems. In our project, have decided to use open source technology
Hadoop. Today the term big data draws a lot of attention, but behind the hype theres a
simple story. For decades, companies have been making business decisions based on
transactional data stored in relational databases. Beyond that critical data, hover, is a
potential treasure delightful things of non-traditional, less structured data: blogs, social
media, email, sensors, and photographs that can be mined for useful information.
Decrease in the cost of storage and increase in computing por have made it possible to
collect large data. As a result, more and more companies are now compelled to include
non-traditional yet potentially valuable data with their traditional enterprise data and using
it for their business intelligence analysis. To derive real business value from big data, need
the right tools to capture and organize a wide variety of data types from different sources,
and to be able to easily analyse it within the context of all enterprise data.
As the world is turning towards use of internet for every day-to-day activity, need for
viewing and selecting food of ones choice is kind of prime importance to the
restaurants.Same goes for the customers who are buying foods online the list of irrelevant
advertisements frustrates the user, slow or delay in response to the queries proves to be the
main reason for failures of most sites . But in our project have tried to demosntrate it
smooth for the users to select products by filtering available products based on individual
customers interests. The e-commerce field is emerging rapidly. Advertisers need a way to
promote their products in market. This way is provided by personalized websites like this
one. The reports provided by our website makes it easier for them to know the status of
their products and hence take necessary measures in order to come up for the faced losses.
To summarize, our project is a demonstration of using new technology like Big Data as an
adapter between the advertiser and customer for saving money and time.

ACKNOWLEDGEMENT

A major project is a golden opportunity for learning and self-development. I consider

myself very lucky and honored to have so many wonderful people lead me through in
completion of this project.
First and foremost would like to thank Dr. Ishan Ranjan HOD CSE who gave us an
opportunity to undertake this project.
My grateful thanks to Asst. Prof. Ms. Supriya Khaitan for her guidance in project work.
Advertising promotions for Restaurants of Delhi using Big Data Analytics, who in
spite of being extraordinarily busy with academics, took time out to hear, guide and keep
us on the correct path. We do not know where would have been without her help.
CSE department monitored our progress and arranged all facilities to make life easier. We
choose this moment to acknowledge their contribution gratefully. We are also grateful to
Maxican students for their kind help.We really appreciate all the restaurants from
Chittagong & Delhi that frequently provided us their official data.

Name and signature of Students

Md.Masum Hossain (120102019)
Lakshmi (120102015)
Anugya Saraswat (130102801)
Pranjal Sinha (110101177)

Table of Contents
Appendix 2
Chapter-1 : Project Introduction

Page no.

1.1: Motivation .. 10
1.2: Overview . 10
1.3: Expected outcome 11
1.4: Gantt Chart . 12
1.5: Possible risks .. 13
Chapter-2: Methodology
2.1: System view .. 17
2.2: System components & Functionalities ... 18
2.3: Data & relational views . 19
Chapter-3: Design Criteria
3.1: System Design 21
3.2: Design Diagrams . 22
3.3: Existing System ... 23
3.4: Application areas . 25
3.5: Advantages of proposed system .. 25
3.6 System analysis 26
Chapter-4: Development & Implementation
4.1: Developmental feasibility 30
4.2: Implementation Specifications 31

Chatper-5: Results & Testing

5.1: Result . 36
5.1.1: Success cases .. 39
5.1.2: Failure cases ... 41
5.2: Testing .. 41
5.2.1: Test results of various stages .. 42
Chapter-6: Conclusion & Future Improvements
6.1: Performance Estimation 44
6.3: Limitations 46
6.4: Scope of Improvement ... 46
6.5 Conclusion .. 49

References

List of Tables

Page no.

Table 2.1: Features of HDFS. 28

Table 2.2: Showing Hadoop response time 45
Table 2.3: Showing Data modeling comparison in RDBMS and Hadoop. 46

List of Figures
2.1: Showing the work progress over time on the project . 12
2.2: Showing the work progress over time . 13
2.3: Showing Big data insights 21
2.4: Architecture of online food promotion system .. 22
2.5: Working of a general restaurant ... 23
2.6: How HDFS makes incoming file stores across cluster 27
2.7: How map-reduce software frame work works . 29
2.8: Outlook of the website where customers will be looking for food .. 36
2.9: Section in the website where customers can see other customers feedback 37
2.10: Showing section in website for customer feedback ... 37
2.11: Database for menu and other details entry ... 38
2.12: Adding an item to the database 38
2.13: Customer feedback 39
2.14: Cluster Summary ... 39
2.15: Hadoop Task tracker .... 40
2.16: Failure in connecting localhost: 50070 . 41
2.17: Showing jdk1.8.0_77 installed in the system ... 42
2.18: Hadoop Version ... 42
2.19: Successful startup of MapReduce 43
2.20: Showing starting of Database file system in System ... 43

Appendix 2
CHAPTER-1
Introduction
1.1 Motivation
In restaurants all around the country, it seems to have become acceptable for customers to
send back a dish because they dont like it. Not because it was cold, or too salty, or because
of an untimely delay in the delivery, but purely because they happened to make the wrong
call. We have some sympathy. It happens to us lot of times when order for some food then
later change our mind, want to see foods that of similar kinds together in one place but too
tired to search each restaurant . Were all familiar with that pang of envy when our dining
companions choice looks better than our own. And, with the not inconsiderable costs of
eating out in some places, it is only human to begrudge paying for something weve
decided are not going to enjoy. However, what people dont tend to think about is that the
moment we complain to a server about a dish or food are certainly just wasting our
valuable time, subtly alter the very experience are paying for. Any restaurant will tell us
this. Everything changes; we tried to see how to reduce the hassle of customers by not
making them wait for food or ordering food for long time. So we are trying to make the
restaurants introduced with the new technology Hadoop Big-Data, the use of which will
make the whole process of ordering food much more easier for the customers and it wont
be necessary for the customers to wait and search in the different restaurants.

1.2 Overview of Project

The main target of our project is to demonstrate the benefit of using Hadoop Big Data over
normal databases in the market. Doing this want to help people to choose the right food for
themselves; for this instance are going to create a database using Big Data Analytics
where a user has to create a profile and mark his/her interests according to which the foods
available from nearby restaurants will be shown to him/her. Then they can also know about
the offer or price of the particular food item and the same time they will get to know about
the offers related to a particular food item so its beneficial for the customer as well as the
restaurants. So, ultimately we can say our project is to help people of a particular area to
choose the right food for their health and taste. Every day the amount of people using
internet to order products are increasing so is it happening on food items .people are

looking for food from different restaurants and trying to order the right food of their choice.
General database system that is available in market has a slow response time in comparison
to the query made to each database so ultimately as the pressure is building up on databases
its getting slow to response to the queries made to each database.[3] But with the recently
evolved technology Big Data we can make sure that the response is much faster as the
database works in parallaly.so our criteria is to make a database with faster response rate to
reduce the congestion of customers ordering food online .we assume that by getting faster
responses and reducing the congestion can make the online food purchasing much more
easier and convenient. Though we have few limitations as a Big data database requires a
high capacity system such as (8-10GB RAM) also of course, Real-Time Big Data Analytics
is not only positive as it also offers some challenges. It requires special computer
programing: The standard version of Hadoop is, at the moment, not yet suitable for realtime analysis. New tools need to be bought and used. There are hover quite some tools
available to do the job and Hadoop will be able to process data in real-time in the future.
Using real-time insights requires a different way of working within any workflow: for
example if any organization normally only receives insights once a week, which is very
common in a lot of organizations, receiving these insights every second will require a
different approach and way of working. Insights require action and instead of acting on a
weekly basis this action is now in real-time required. This will have an effect on the
culture. The objective should be to making a place it can be an organization
/restaurant/office an information-centric place.[7,8]

1.3 Expected outcome

As our project was mainly based on Big-Data Hadoop, the main purpose of the project was
to introduce the very new technology that is evolving every day and help the very crowded
sites that is restaurant sites to be hassle free and can work without pressure so in order to
that expect the restaurants to understand the working of Big-Data where normal databases
which are working sequentially in the retrieval process whereas Big- data will work
parallel. So, theoretically it shows that the retrieval process should be fast. Big Data has
become a buzz phrase for data production, accumulation and analytic activities that have
emerged as a result of accelerated data production triggered by declining computing por
and storage costs. Big Data is perceived as a major key in addressing food security
challenges and productivity improvements in a future era of significant resource constraints
and climate change. Indeed, some agribusiness firms are positioning themselves to exploit
data by developing systems to collect, store and analyze them. Therefore, while data and
analytics are going to continue to be important resources for the food agribusiness industry,
the real source of sustained superior performance will be stakeholders and their

employees sharper insights. So, we expect our system to show a path to the restaurants to
make it one step ahead for the overall betterment and development and progress in the
business and try to reduce the hassle on the customers.

1.4 Gantt Chart

Figure 2.1: Showing the work progress over time on the project
This Gantt chart is showing the time taken by each works done in this project such as
creating website, collecting website details, adding different closures to the website, user
details, database, price details, offers and user tactics, cluster 1size, cluster 2 size, success
rate, response rate among clusters, bug rate.

Figure 2.2: Showing the work progress over time

1.5: Possible Risks
As with any business initiative, a big data project involves an element of risk. Any project
can fail for any number of reasons: bad management, under-budgeting, or a lack of relevant
skills. Hover, big data projects bring their own specific risks. Due to the advanced
technology often needed and the relative newness of the skillsets required to truly think
big (or as I prefer to say, think smart) with data, Businesspeople are used to taking risks
assessing those risks and safeguarding against them comes naturally, or dont stay in
business for very long. So theres no need to be scared of big data. But of course we always
need to be aware of dangers that could potentially arise if we fail to cover all of it. The
phenomenon of Big Data exacerbates the tension between potential benefits and privacy
risks by upping the ante on both sides of the equation. On the one hand, big data unleashes
tremendous benefits not only to individuals but also to communities and society at large,
including breakthroughs in health research, sustainable development, energy conservation
and personalized marketing. On the other hand, big data introduces new privacy and civil
liberties concerns including high-tech profiling, automated decision-making,
discrimination, and algorithmic inaccuracies or opacities that strain traditional legal

protections
The key problems has been:[4,5]
I. Cost:
Data collection, aggregation, storage, analysis, and reporting all cost money. On top of this,
there will be compliancy costs to avoid falling foul on the issues, raising in the previous
point. These costs can be mitigated by careful budgeting during the planning stages, but
getting it wrong at that point can lead to spiraling costs, potentially negating any value
added to our bottom line by data-driven initiative. This is why starting with strategy is so
vital. A well-developed strategy will clearly set out what intend to achieve and the benefits
that can be gained so they can be balanced against the resources allocated to the project.
One of the restaurants that coordinating with us was worried about the costs of storing and
maintaining all the data it was collecting to the point that it was considering pulling the
plug on one particular analytics project, as the costs looked likely to exceed any potential
savings. By identifying and eliminating irrelevant data from the project, the restaurants re
able to bring costs back under control and achieve its objectives.
II. Bad Data:
We have come across many big data projects that start off on the wrong foot by collecting
irrelevant, out of date, or erroneous data. This usually comes down to insufficient time
being spent on designing the project strategy. The big data gold rush has led to a collect
everything and think about analyzing it later approach at many organizations. This not
only adds to the growing cost of storing the data and ensuring compliance, it leads to large
amounts of data that can become outdated very quickly. The real danger here is falling
behind competition. If some restaurants are not analyzing the right data, we wont be
drawing the right insights that will provide value. Meanwhile, competitors most likely will
be running their own data projects. And if they are getting it right, theyll take the lead.
Working with them restaurants, were able to show them how to cut the data down mostly
infographics, which clearly shod the relevant data while omitting a lot of the noise. Thats
just a simple checklist of the risks that every big data project needs to account for before
one cent is spent on infrastructure or data collecting. Businesses of all sizes should engage
wholeheartedly with big data projects. If they dont, they run the serious risk of being left
behind. But they also should be aware of the risks and enter into big data projects with their
eyes wide open. In one example, NBC used test audiences but paid for it when many in the
audience ranked successful shows such as Seinfeld poorly, and cheap copycats better.
Eventually, the marketers discovered people were only responding to familiarity, not
quality.[22]

III. Data Security:

This risk is obvious and often uppermost in our minds when we are considering the
logistics of data collection and analysis. Data theft is a rampant and growing area of crime
and attacks are getting bigger and more damaging. In fact five of the six most damaging
data thefts of all time (eBay, JP Morgan Chase, Adobe, Target, and Evernote) re carried out
within the last two years. The bigger the data, the bigger the target it presents to criminals
with the tools to steal and sell it. In the case of Target, hackers stole credit and debit card
information of 40 million customers, as well as personal identifying information such as
email and geographical addresses of up to 110 million people. In March, a federal judge
approved a settlement in which Target would pay $10 million into a settlement fund, from
which payments of up to $10,000 would be made to everyone affected by the breach.so
while developing anything in such big area have to keep in mind about this so that
restaurants doesnt face such kind of problems .[15,25]

1.5 Software Requirement specification:

Functional Requirements:
Database (Big-Data),

Hadoop compatible version installed,

Non-Functional Requirements:
ing food menus on the website,
Collecting data from database,
Establishment of Cluster connection
Users finding it easy to find the right food easily.

b) External interfaces.
When people will visit the website they will be making an ID for individual service to
provide them the right food from all the possible restaurants of Delhi.
System will generate an inquiry via Hiveql and fetch queries from Big data database.
c) Performance.
The main target to choose Big-data using Hadoop is because its speed the same query
being asked by both RDBMS and in HDFS the result was showing the difference
d) Attributes:
Our software is easily portable once the systems are enabled with or installed with the
HDFS. (It contains Hive QL method).
To maintain the software owners or supervisors has to update the database time to time
also users must put their feedback regarding which food they prefer and appreciate to have
on.
To ensure security for users separate user id will be made so users can see only the food
they are looking for.
We value customers personal security no bank details or any other details will be taken
during the time of registering oneself for the first time.
e) Design constraints:
The most difficult part of the project is to maintain a big database which includes 1000
Gb data of different restaurants that comprises of customers need.
HiveQL is used for fetching data HDFS is the base for making a Big-data. To make the
website HTML5, javascript,css being used.
Limitation of resources are there as since all the restaurants doesnt agree to share their
informations.
Operating systems should have windows 2000/ME onwards.
SAS/ACESS engine.
We were able to run our test queries through the SAS interface, which is executed in the
Hive environment within our Hadoop cluster.[19,21]

Chapter-2
Methodology

2.1 System View

The Online Product Promotion System is executed on Ubuntu platform. The Database is
created for Actors to interact with system. This website provide to every user who logs in.
The customer is shown with product of his interest (i.e. Product which matches to his
profile). The Advertiser is provided with reports with any of his products. The processing
in background to achieve these targets is as follows:[1,7]
o The Database will be connected to website which will work as the interface to the
user and database for real time updating.
o The data stored in Click-dump will be transferred to Hadoops file system (HDFS)
using Sqoop. ( Sqoop is a command-line interface application for transferring data
between relational databases and Hadoop. It supports incremental loads of a single
table or a free form SQL query as ll as saved jobs which can be run multiple times
to import updates made to a database since the last import. Imports can also be used
to populate tables in Hive Exports can be used to put data from Hadoop into a
relational database.
o The data from each restaurant will be stored on the HDFS in date-wise folder. The
table is stored in HDFS will help to calculate the targets.
o This stored click-dump will be processed by Pig Script (written in Pig Latin). This
pig script will return number of clicks to each product and send this information to
hive by loading it into it.
o The hive will provide SQL like command to execute queries. These queries will
produce target results.
o This process will be scheduled for whole day. The cronjob will be applied to
execute whole process one time in day.
A second configuration involves building a big data system separately and in parallel
(rather than integrated with) the restaurants existing production and enterprise systems. In
this model, most companies still take advantage of the cloud for data storage but develop
and experiment with enterprise-held big data applications themselves. This two-state

approach allows the restaurants to construct the big data framework of the future, while
building valuable resources and proprietary knowledge within the company. That provides
complete internal control in exchange for the duplication of much of the functionality of
the current system and allows for a future migration to a full-fledged big data platform that
will eventually allow both systems (conventional and big data) to merge.[23]

2.2 System components & Functionalities:

As Big-Data requires a huge number of data to be arranged and retrieved in short amount
of time the processor of the system should have a minimum of following components:
RAM 4Gb and above
Virtualization Activated else virtual machine installed or virtual workstation
installed in the system.
Ubuntu 14 or above versions installed.
JDK 7 or above installed in the system.
Hadoop compatible version installed.

Functionalities:
The US computer software company is the latest to develop its products to cope with the
increasingly complex world of big data. The latest version of Hadoop Marketing Suite is
designed to allow those within the marketing functions of a business to perform analysis on
masses of historical data to predict future trends. The company states that the new
developments will allow digital marketers to improve a variety of digital marketing
strategies, including personalized engagement, multi-channel campaign execution, and
media monetization. This will be achieved through the ability to forecast campaign results,
performing risk analysis all through a predictive marketing dashboard. Brad Rencher,
senior vice president and general manager of Digital Marketing Business at Hadoop,
said:[5,9]
In the early days of digital marketing, analytics emerged to tell us what happened and, as
analytics got better, why it happened. Then solutions emerged to make it easier to act on
data and optimize results.

But the sheer amount of available data presents a challenge to quickly extract insights and
act while those insights are still valuable. The new predictive capabilities within the Digital
Marketing Suite address these challenges and help marketers turn big data into a big
opportunity.
The announcement is indicative of something hear a lot at the Big Data Insight Group
that big data analytics is going to become an invaluable tool in different teams throughout
all departments of a restaurant . Rather than being controlled by the IT department alone.
Finance, business, marketing and product development will all be using masses of data to
gain insights into various aspects of their organization and improve their planning and
performance accordingly.
To learn more about big data and the opportunities it could present to our organization,
regardless of size or sector, may wish to attend the 1st Big Data Insight Group Forum. So
more than anything our project will be focusing on navigating a path to insight and
business value using technologys hottest new trend.

2.3: Data & relational views:

The high cost of rent, licensing and personnel are daunting. But restaurants face another
obstacle: critics, not only professional journalists but amateurs who offer their opinions on
social media. A bad review on Google or Yelp can undercut even the best-planned
ventures.
The importance of quality is why a number of restaurants has started using big data to
develop a better understanding of consumer preferences and to improve their food and
service. In some cases, these businesses have already achieved revenue gains as a result of
their efforts.
Using external data to improve a menu:
Some restaurants are using outside software providers to gauge which dishes are likely to
succeed and reduce the uncertainty of making menu changes. Food Genius aggregates data
from restaurant menus around the country to better understand pricing, food and marketing
trends. Indeed thats the option they have in hand to stay in competition in the market but
in the end when it comes to discussing to be the best in the market it surely is to do
something or choose something which will make a difference to be the best in the market
and stay to hold the position for the time while other companies or restaurants are still on
general data, For example, a restaurant customer can see what types of food-related

keywords and phrases are trending online, the average price of a certain dish, and which
menu items are growing or shrinking in popularity.[15,18]
The food industry for far too long has made decisions based on its gut, says Justin Massa,
CEO and founder of Food Genius. The Chicago-based company tracks menu items at more
than 350,000 locations and has partnerships with food delivery services Seamless and Grub
Hub.Massa said that the data can help restaurants seize opportunities in their niches. The
data is going to tell with something and give with important context but the thing it comes
down to is the identity of brand. Thats going to tell us how were going to explore that
data.
Increasing customer satisfaction with internal data some technical companies are helping
restaurants improve operational efficiency. Avero, a restaurant software company, tracks
purchases and voided items at point of sales. Restaurants use the data to improve server
performance, develop tactics to increase sales and even identify thieving employees, says
Sandhya Rao, vice president of marketing and products. Rao says that restaurants may
target promotions to certain days or times of the month. According to a company case
study among the 30-plus upscale casual restaurants that Avero works with, the average
sales increase was five percent, or 250,000 each, over the course of a year. In future mobile
apps that allow customers to leave reviews, sign up for loyalty programs, take surveys, and
order food through their devices.
Operators can know customers better and customers can enjoy better experiences, which
is an encouraging environment to keep them coming back and bringing their friends,
said Jitendra Gupta, CEO.Another company, TapSavvy, is also using customer insights to
assist restaurants. After customers eat at one of the restaurants that TapSavvy serves, they
receive a tablet to fill out a survey and express criticisms or compliments.By letting
customers give feedback while theyre still in the restaurant, theyre less likely to take out
their aggression online, says TapSavvy co-founder Yaniv Tal.
If a customer leaves unhappy, word spreads very quickly, Tal says.
To be sure, many restaurants are still not using big data. Yet Massa says that they are
missing a potential opportunity to improve performance. He suggests that these businesses
might begin by collecting information themselves. For our tests, we simulated a typical
data warehouse-type workload where data is loaded in batch, and then queries are executed
to answer strategic (not operational) business questions.

Chapter-3
Design Criteria

3.1: System Design:

With the advancement of technology, large number of people are buying and selling food
online. There are commonly used techniques for online marketing such as use of banner
cards, email campaigns. Being one of the biggest cosmopolitan cities in India, Delhi
restaurants and its sole websites are very busy sites. So the effective marketing depends
largely on success of online advertising. Analyzing the effectiveness of website is a matter
to concern for corporates that rely on web marketing. Web purchasing of food activities
involves attracting and retaining customers. Traditional database technology is indeed
useful in managing the online stores. Hover, it has serious limitations, when it comes to
analyzing effectiveness of online ads[9,10]

Figure 2.3: Showing Big data insights

have decided to use open source technology Hadoop. Today the term big data draws a lot
of attention, but behind the hype theres a simple story. For decades, companies have been
making business decisions based on transactional data stored in relational databases.
Beyond that critical data, hover, is a potential treasure delightful things of non-traditional,
less structured data: blogs, social media, email, sensors, and photographs that can be mined
for useful information.
Decrease in the cost of storage and increase in computing por have made it possible to
collect large data. As a result, more and more companies are now compelled to include
non-traditional yet potentially valuable data with their traditional enterprise data and using
it for their business intelligence analysis. To derive real business value from big data, need
the right tools to capture and organize a wide variety of data types from different sources,
and to be able to easily let the customer find the right food in the possible short time.
3.2: Design Diagrams

Figure 2.4: Architecture of Online Food Promotion

Food marketing and advertisements uses banner and/or referral sites to attract customers
from other sites to an online store. The online food websites uses hyperlinks and image
links within the store site for leading the customers to relevant pages.

Classifying hyperlinks by their purpose

Tracking and measuring traffic on hyperlinks.
Analyzing effectiveness (revenue generated profit etc.)

Figure 2.5: working of a general restaurants

3.3 Existing System

Most of the companies are not completely satisfied with the current level of data capture
and analysis, most companies considering a move toward adopting big data technologies
already have a well-staffed and relatively modern IT framework based on relational
database (RDB) management systems and conventional data warehousing. Any company
already managing a large amount of structured data with enterprise systems and data
warehouses is therefore fairly well versed in the day-to-day issues of large-scale data
management. It would seem natural for those companies to assume that, as big data is the
next big thing happening in the evolution of information technology, it would make sense
for them to simply build a NoSQL-type/Hadoop-type of infrastructure themselves,
incorporated directly into their current conventional framework. In fact, ESG, the advisory
and IT market research firm, estimated that at the beginning of 2014, more than half of

large organizations will have begun this type of do-it-with yourself approach. As we've
seen, as open source software, the price of a Hadoop-type framework (free) is attractive,
and it is relatively easy, providing the company has employees with the requisite skills to
begin to work up Hadoop applications using in-house data or data stored in the cloud.
Currently many organizations are trying to implement Big Data technology in there
database system which is a big step towards fast data transfer. The existing systems in the
market has shifted its database systems into Big-Data analytics Hadoop based work once
they got to know about its working methodology .Today big companies in the world such
as Twitter and Facebook uses Big-Data to handle or manage their database . But
experimenting with some Hadoop/NoSQL applications for the marketing department is a
far cry from developing a fully integrated big data system capable of capturing, storing and
analyzing large, multi-structured data sets. In fact, successful implementation of enterprisewide Hadoop frameworks is still relatively uncommon, and mostly the domain of very
large and experienced data-intensive companies in the financial services or the
pharmaceutical industries. As we have seen, many of those big data projects still primarily
involve structured data and depend on SQL and relational data models. Large-scale
analysis of totally unstructured data, for the most part, still remains in the rarified realm of
powerful Internet tech companies like Google, Yahoo, Facebook and Amazon, or massive
retailers like Wal-Mart.
Although cloud-based tools have obvious advantages, every company has different data
and different analytical requirements. Because so many big data projects are still largely
based on structured or semi structured data and relational data models that complement
current data management operations, many companies turn to their primary support
vendors -- like Oracle or SAP -- to help them create a bridge between old and new and to
incorporate Hadoop-like technologies directly into their existing data management
approach. Oracle's Big Data Appliance, for example, asserts that its preconfigured offering
-- once various costs are taken into account -- is nearly 40% less expensive than an
equivalent do-it-with itself built system and can be up and running in a third less time. And,
of course, the more fully big data technologies are incorporated directly into a company's
IT framework, the more complexity and potential for data sprawl grows. Depending on
configurations, full integration into a single, massive data pool (as advocated by big data
purists) means pulling in unstructured, unclean data to a company's central data reservoir
(even if that data is distributed) and potentially sharing it out to be analyzed, copied and
possibly altered by various users throughout the enterprise, often using different
configurations of Hadoop or NoSQL written by different programmers for different
reasons. Add to that the need to hire expensive Hadoop programmers and data scientists.
For traditional RDB managers, that type of approach raises the specter of untold additional
data disasters, costs and rescue work requests to already overwhelmed IT staff.

3.4 Application Areas

Our project is largely based on how the data can be extracted in faster and efficient way so
that can let the customers of various restaurants have their queries about foods in much
faster rate.so in a way are trying to open a path for the restaurants which are trying to grasp
number of audience online to get their desired food from one place rather than visiting
different websites and also the best part is the response time .This is the point where it all
makes the difference where the customers doesnt have to wait for long time because of
data congestion. Some key application areas of our project are:
t will let the customer get the right food in shortest possible time.
nowing about the foods of ones choice is far easier than ever expected.
reating a database in real time and with the help of Big Data analytics will help users
get real time results.
Quick response will also hold the customers from the restaurant perspective.

3.5

Advantages of Proposed System

The advantages of processing data in Big Data in real-time are many:

Errors within the menu are known instantly. Real-time insight into errors helps
restaurants to react quickly to mitigate the effects of an operational problem. This
can save the operation from falling behind or failing completely or it can save
customers from having to stop ordering or eating a particular food item or even the
service.
New strategies of any restaurants are noticed immediately. With Real-Time Big
Data Analytics we can stay one step ahead of the competition or get notified the
moment restaurants direct competitor is changing strategy or reducing its prices for
example.
Service improves dramatically, which could lead to higher conversion rate and extra
revenue. When restaurants monitor the products that are used by its customers, it
can pro-actively respond to upcoming failures. For example, cars with real-time
sensors can notify before something is going wrong and let the driver know that the
car needs maintenance.

Fraud can be detected the moment it happens and proper measures can be taken to
limit the damage. The financial world is very attractive for criminals. With a realtime safeguard system, attempts to hack into any restaurants website are notified
instantly. So IT security department of any restaurant can take immediate
appropriate action.
Cost savings: The implementation of a Real-Time Big Data Analytics tools may be
expensive, it will eventually save a lot of money. There is no waiting time for
business leaders and in-memory databases (useful for real-time analytics) also
reduce the burden on a restaurants overall IT landscape, freeing up resources
previously devoted to responding to requests for reports.
Better sales insights, which could lead to additional revenue. Real-time analytics
tell exactly how well sales are doing and in case an internet retailer sees that a
product is doing extremely well, it can take action to prevent missing out or losing
revenue.
Keep up with customer trends: Insight into competitive offerings, promotions or
customer movements provides valuable information regarding coming and going
customer trends. Faster decisions can be made with real-time analytics that better
suit the (current) customer.
3.6 System Analysis
Hadoop (Hadoop is an open-source software framework for storing data and running
applications on clusters of commodity hardware. It provides massive storage for any kind
of data, enormous processing por and the ability to handle virtually limitless concurrent
tasks or jobs.) Big Data is a shift to scalable, elastic computing infrastructure; an explosion
in the complexity and variety of data available; and the por and value that come from
combining disparate data for comprehensive analysis make Hadoop a critical new platform
for data-driven enterprises like restaurants.
Our Database consists of two main components:
1. HDFS (Hadoop Distributed File System).
2. MapReduce

1. HDFS
The file store is called the Hadoop Distributed File System, or HDFS. HDFS provides
scalable fault-tolerant storage at low cost. The HDFS software detects and compensates for
hardware issues, including disk problems and server failure. HDFS stores files across a
collection of servers in a cluster. Files are decomposed into blocks, and each block is
written to more than one (the number is configurable, but three is common) of the servers.
This replication provides both fault-tolerance (loss of a single disk or server does not
destroy a file) and performance (any given block can be read from one of several servers,
improving system throughput).HDFS ensures data availability by continually monitoring
the servers in a cluster and the blocks that they manage. Individual blocks include
checksums. When a block is read, the checksum is verified, and if the block has been
damaged it will be restored from one of its replicas. If a server or disk fails, all of the data it
stored is replicated to some other node or nodes in the cluster, from the collection of
replicas. As a result, HDFS runs very ll on commodity hardware. It tolerates, and
compensate for, failures in the cluster. As clusters get large, even very expensive faulttolerant servers are likely to fail. Because HDFS expects failure, organizations can spend
less on servers and let software compensate for hardware issues.

Figure 2.6: How HDFS makes incoming file stores across cluster

Feature
Rack awareness

Description
Considers a nodes physical location when
allocating storage and scheduling tasks
Minimal data motion Hadoop moves compute processes to the data
on HDFS and not the other way around.
Processing tasks can occur on the physical node
where the data resides, which significantly
reduces network I/O and provides very high
aggregate bandwidth.
Utilities
Dynamically diagnose the health of the file
system and rebalance the data on different
nodes
Rollback
Allows operators to bring back the previous
version of HDFS after an upgrade, in case of
human or systemic errors
Standby NameNode Provides redundancy and supports high
availability (HA)
Operability
HDFS requires minimal operator intervention,
allowing a single operator to maintain a cluster of
1000s of nodes
Table 2.1: Features of HDFS

2. MapReduce
HDFS delivers inexpensive, reliable, and available file storage. That service alone, though,
would not be enough to create the level of interest, or to drive the rate of adoption, that
characterize Hadoop over the past several years. The second major component of Hadoop
is the parallel data processing system called MapReduce. Conceptually, MapReduce is
simple. MapReduce includes a software component called the job scheduler. The job
scheduler is responsible for choosing the servers that will run each user job, and for
scheduling execution of multiple user jobs on a shared cluster. The job scheduler consults
the NameNode for the location of all of the blocks that make up the file or files required by
a job. Each of those servers is instructed to run the users analysis code against its local

block or blocks. The MapReduce processing infrastructure includes an abstraction called an

input split that permits each block to be broken into individual records. There is special
processing built in to reassemble records broken by block boundaries. The user code that
implements a map job can be virtually anything. MapReduce allows developers to write
and deploy code that runs directly on each Data Node server in the cluster. That code
understands the format of the data stored in each block in the file, and can implement
simple algorithms (count the number of occurrences of a single word, for example) or
much more complex ones (e.g. natural language processing, pattern detection and machine
learning, feature extraction, or face recognition). At the end of the map phase of a job,
results are collected and filtered by a reducer. MapReduce guarantees that data will be
delivered to the reducer in sorted order, so output from all mappers is collected and passed
through a shuffle and sort process. The sorted output is then passed to the reducer for
processing. Results are typically written back to HDFS. Because of the replication built
into HDFS, MapReduce is able to provide some other useful features. For example, if one
of the servers involved in a MapReduce job is running slowly most of its peers have
finished, but it is still working the job scheduler can launch another instance of that
particular task on one of the other servers in the cluster that stores the file block in
question. This means that overloaded or failing nodes in a cluster need not stop, or even
dramatically slow down, a MapReduce job.

Figure 2.7: How map-reduce software frame work works

Chapter-4:
Development & Implementation

4.1: Developmental feasibility

A computer controlled vending machine selling snack foods on credit at the Stanford
Artificial Intelligence Laboratory, became one of the first Internet connected appliances.
There began the saga of pervasive connectivity where every device is plugged into
everything else creating the defining trend of 2010 to 2020. In fact, the Internet of Things
is anticipated to burgeon to about of 26 billion units excluding PCs, smartphones and
tablets by 2020 and perhaps several categories of these items, that will be connected in
2020, don't even exist at present. The Internet of Things will explode connectivity, and it
will also create value as much as US$ 6.2 trillion in annual revenue by 2025 says a global
consulting company. But it will also create massive, massive amounts of data 40
zettabytes by 2020, according to one estimate. And as all know, the bulk over 80% - of
big data is unstructured, and in motion, existing in a variety of forms and formats both
inside and outside company walls.
Gathering this data is a huge challenge, but one that technology today is capable of. Its
what comes next - extracting accurate insights in real time and creating foresight from it that, enterprises are yet to nail. The difficult areas of this project was to collect huge
amount of data from each restaurants, which in terms of time and size is bit of bulky so this
involves much of burden on the system that is handling the work the system has to be
capable of handling such amount of data and has to be large enough to be compatible with
the data handling .minimum requirement of a system would be following:

Dual Quad-core CPUs or greater that have Hyper-Threading enabled. We had to

estimate our computing workload, needed to consider using a more powerful CPU.

Use High Availability (HA) and dual power supplies for the master node's host
machine.

4-8 GBs of memory per processor core, with 6% overhead for virtualization.

Use a 1 Gigabit Ethernet interface or greater to provide adequate network

bandwidth.

Though Big-data takes powerful systems to process but it is not impossible to assemble
handful number of systems then using so many systems with limited or low power.

4.2: Implementation Specifications

When re beginning with the Big Data Extensions deployment tasks, we made sure that our
system meets all of the prerequisites.
Big Data Extensions requires that needed to install and configure VMware, and that our
environment meets minimum resource requirements. We had to also make sure that have
licenses for the VMware components of deployment.
VMware Requirements
Before with can install Big Data Extensions, with must have set up the following VMware
products.

Installing VMware 10.0 (or later) Enterprise or Enterprise Plus.

Note
The Big Data Extensions graphical user interface is only supported when using VMware b
Client 10.0 and later. If install Big Data Extensions on vSphere 9.0, must perform all
administrative tasks using the command-line interface. So had to install the latest version of
VMware workstation.

When installing Big Data Extensions must use VMware vCenter Single Sign-On to
provide user authentication. When logging in can pass authentication to the VMware
Single Sign-On server, which can configure with multiple identity sources such as Active
Directory and OpenLDAP(OpenLDAP is a free, open source implementation of the
Lightweight Directory Access Protocol (LDAP) developed by the OpenLDAP Project.) On
successful authentication, with username and password is exchanged for a security token
which is used to access VMware components such as Big Data Extensions.

Enable the vSphere Network Time Protocol on the ESXi hosts. The Network Time
Protocol (NTP) daemon ensures that time-dependent processes occur in sync across hosts.
Cluster Settings
We had to configure our cluster with the following settings.

Enable Hyper V or Virtualization enabled from BIOS setup (Windows 10)

Enabled Host Monitoring.

Enabled Admission Control and set desired policy. The default policy is to tolerate one
host failure.

The virtual machine restart priority was set to High

Set the virtual machine monitoring to virtual machine and Application Monitoring.

Set the Monitoring sensitivity to High.

Enabled vMotion and Fault Tolerance Logging.

All hosts in the cluster have Hardware VT enabled in the BIOS.

The Management Network VMkernel Port has vMotion and Fault Tolerance Logging
enabled.
Network Settings
Big Data Extensions deploys clusters on a single network. Virtual machines are deployed
with one NIC, which is attached to a specific Port Group. The environment determines how
this Port Group is configured and which network backs the Port Group.

Either a vSwitch or vSphere Distributed Switch can be used to provide the Port Group
backing a Serengeti cluster. vDS acts as a single virtual switch across all attached hosts
while a vSwitch is per-host and requires the Port Group to be configured manually.
When configuring network for use with Big Data Extensions, the following ports must be
open as listening ports.

Ports 8080 and 8443 are used by the Big Data Extensions plug-in user interface and the
Serengeti Command-Line Interface Client.

Port 9000 is used by SSH clients.

To prevent having to open a network firewall port to access Hadoop services, log into the
Hadoop client node, and from that node which can access cluster.

To connect to the Internet (for example, to create an internal Yum repository from which to
install Hadoop distributions), with may use a proxy.
Direct Attached Storage
Direct Attached Storage should be attached and configured on the physical controller to
present each disk separately to the operating system. This configuration is commonly
described as Just a Bunch of Disks (JBOD).We had to create VMFS Data-stores on Direct
Attached Storage using the following disk drive recommendations.

6-8 disk drives per host. The more disk drives per host, the better the performance.

1-1.5 disk drives per processor core.

7,200 RPM disk Serial ATA disk drives.

Resource Requirements for the vSphere Management Server and Templates

Resource pool with at least 3.5GB RAM

40GB or more (recommended) disk space for the management server and Hadoop template
virtual disks.
Resource Requirements for the Hadoop Cluster

Data-store free space is not less than the total size needed by the Hadoop cluster, plus swap
disks for each Hadoop node that is equal to the memory size requested.

Network configured across all relevant hosts, and has connectivity with the network in use
by the management server.

HA is enabled for the master node if HA protection is needed. We have used shared storage
in order to use HA or FT to protect the Hadoop master node.
Hardware Requirements
Host hardware is listed in the VMware Compatibility Guide. To run at optimal
performance, install our vSphere and Big Data Extensions environment on the following
hardware.

Dual Quad-core CPUs or greater that have Hyper-Threading enabled. If we can estimate
our computing workload, consider using a more powerful CPU.

Used High Availability (HA) and dual power supplies for the master node's host machine.

4-8 GBs of memory per processor core, with 6% overhead for virtualization.

Use a 1 Gigabit Ethernet interface or greater to provide adequate network bandwidth.

Tested Host and Virtual Machine Support
The following is the maximum host and virtual machine support that has been confirmed to
successfully run with Big Data Extensions.

We have used my visual database 64 bit

Virtual hosts deployed on 4 physical hosts, running 3 virtual machines.

Licensing
With had to use a vSphere Enterprise license or above in order to use VMware High
Availability (HA) and VMware Distributed Resources Scheduler (DRS). VMware's
products predate the virtualization extensions to the x86 instruction set, and do not require
virtualization-enabled processors. On newer processors, the hypervisor is now designed to
take advantage of the extensions. However, unlike many other hypervisors, VMware still
supports older processors. In such cases, it uses the CPU to run code directly whenever
possible (as, for example, when running user-mode and virtual 8086 mode code on x86).
When direct execution cannot operate, such as with kernel-level and real-mode code,
VMware products use binary translation (BT) to re-write the code dynamically. The
translated code gets stored in spare memory, typically at the end of the address space,
which segmentation mechanisms can protect and make invisible. For these reasons,
VMware operates dramatically faster than emulators, running at more than 80% of the
speed that the virtual guest operating-system would run directly on the same hardware. In
one study VMware claims a slowdown over native ranging from 06 percent for the
VMware ESX Server. VMware's approach avoids some of the difficulties of virtualization
on x86-based platforms. Virtual machines may deal with offending instructions by
replacing them, or by simply running kernel-code in user-mode. Replacing instructions
runs the risk that the code may fail to find the expected content if it reads itself; one cannot
protect code against reading while allowing normal execution, and replacing in-place
becomes complicated. Running the code unmodified in user-mode will also fail, as most
instructions which just read the machine-state do not cause an exception and will betray the
real state of the program, and certain instructions silently change behavior in user-mode.

Chatper-5
Results & Testing

5.1: Result:
After starting collecting the menu of different restaurants around Delhi and Chittagong city
finally collected around 135Gb data which includes pictures, video, menu, restaurant
details .Our project has a vast area of exploration as began with creating a database that
will hold the menu, delivery details of the food item, time, price, picture of the food, are
have also added mail as feedback from the user.

Figure 2.8: Outlook of the website where customers will be looking for food
We tried to make the outlook as better as we can so that customers feels it as comfortable
as possible and spend some time looking for the items, its also user friendly as all the
options are nearby for a new user even we are advancing to add online immediate help for
the customer so that the customer can get the necessary help regarding their order and all
this will allow them to search and find the right food in much easier and faster way.

Figure 2.9: Section in the website where customers can see other customers feedback

Figure 2.10: Showing section in website for customer feedback

Figure 2.11: Database for menu and other details entry

Figure 2.12: Adding an item to the database

Adding items refers to adding food details such as photo of the food, cost, delivery date,
code of the food category and the seller restaurant which dispatched the item.

Figure 2.13: Customer feedback

5.1.1: Success cases
Our primary target is to make Hadoop single node cluster successfully made 3 clusters one
of it (mirror) is the ACER aspire E-11. This system has 4GB of RAM with a 2.67 Ghz
processing speed which can run initial programs of a Hadoop cluster.

Figure 2.14: Cluster Summary

Figure 2.15: Hadoop Task tracker

It shows that have successfully initiated Hadoop single node cluster in our system .as its
showing the summary of our work where it includes total running nodes, running map
reduce tasks, occupied map reduced tasks capacity, average task per node .In the task
tracker status can see Hadoop running tasks and its status, non-running tasks and its
status, tasks from running jobs and local logs. Hadoop being successfully installed required
successful installation of JDK .There was three primary steps that had to start before testing
successful integration of Hadoop cluster in the system:
Starting all database filesystem admin dfs
Starting all mapreduce functions in Hadoop mapred
These attempts will allow the user or admin to start the logging in permission for the user
to access to the database made by Hadoop. After Starting the dfs and mapred in the
command prompt it will show the process that will confirm that it is successfully loading
the both dfs and mapred in the memory and start the local host for response to the system
for any query by the user and create the possible entry made by admin or request for
inquiry by the user.

5.1.2: Failure cases

had three localhost address to check on the status of our single node cluster where have
two address http://localhost:50060, http://localhost:50030 working properly as their screen
shot has been added on the success cases. But for the address http://localhost:50070 its not
responding:

Figure 2.16: Failure in connecting localhost:50070

5.2: Testing
have tried to test the single node cluster that made and successfully achieved two
addresses working properly which need to at-least make sure that a Hadoop single node
cluster has been developed in the system and can initiate data entry in it, it provides
multiple steps as have installed Hadoop version 1.2.1 which involves the system to work
map-reduce terminology that will allow the user to access the database gradually once the
database is of the size of estimated 135GB. Testing case 1, 2 successful as the local host
responded after starting MapReduce and database file system and formatting namenode. It
provided the information off detecting single node status in the system are currently
working on.

5.2.1: Test results of various stages

First tasted if the java JDK version is successfully installed in the system

Figure 2.17: showing jdk1.8.0_77 installed in the system

Next we need check if the Hadoop is being installed in the system

Figure 2.18: Hadoop Version

Now testing startup of successful Map reduce in the system if it cannot load it will show an
error message .If successful it will ask for user password and start MapReduce connection .

Figure 2.19: Successful startup of MapReduce

Now test Hadoop database file system, to load it start the dfs load it in the system with the
code: Start-dfs.sh

Figure 2.20: Showing starting of Database file system in System

Chapter-6:
Conclusion & Future Improvements

6.1: Performance Estimation

As per the capability of the system the smooth running of the process depends largely on
the capacity of the system the more the capacity is better the performance will be good we
re trying to compare the result of Hadoop and RDBMS

Table 2.2: Showing Hadoop response time

The important takeaway is to understand at a high level how data is stored in HDFS and
managed in the Hive environment. The physical data modeling experiments that performed
ultimately affect how the data is stored in blocks in HDFS and in the nodes where the data
is located and how the data is accessed. This is particularly true for the tests in which
partitioned the data using the Partition statement to redistribute the data based on the
buckets or ranges defined in the partitions. We began our experiments without indexes,
partitions, or statistics in both schemas and in both environments. The intent of the first
experiment was to determine whether a star schema or flat table performed better in Hive

or in the RDBMS for our queries. During subsequent rounds of testing, used compression
and added indexes and partitions to tune the data, Data Modeling Considerations in Hadoop
and Hive structures. As a final test, ran the same queries against our final data structures
using Impala. Impala bypasses the MapReduce layer used by Hive

Table Name

RDBMS

PAGE_CLICK_FACT 573.18
GB

Hadoop
(Text
File)

Hadoop
(Compressed
Sequence
File)

328.30
GB

42.28 GB

PAGE_CLICK_FLAT 1001.11 991.47

GB
GB

124.59 GB

Table2.2: Showing Data modeling comparison in RDBMS and Hadoop

6.2: Limitations
Due to limited amount of time could not collect ample amount of data also the created 3
clusters out of which had to shift to a single mirror cluster as carrying 3 clusters with huge
capabilities is not an easy task. Moreover collecting different restaurant menus also
involves the restaurant authority to approve and accept the proposal wanted to know the
following questions:
What is the percentage of viewers who clicks on the advertisement?
How many of the visitors actually purchase food from the store?
How much revenue/profit is generated by advertisement this menu will be shown in a
project work .But it was never easy to find out the actual benefit .As due to security issues
not all companies or restaurants will to provide the information. While installing Hadoop in
the system had to also keep in mind about the capacity of the system as it has to be above
3.5 GB of RAM and above 2.00 Ghz of processing speed .Which was not easy to get in
order to make different clusters ,In our computer lab at SET I -214 most of the computers
has 1 GB RAM ,also processing speed was not up to standard ,combining three computer
RAM could make one, If a single cluster g5ets down the whole system becomes
unresponsive which is a difficult situation to solve .
6.4: Scope of Improvement
There is ample amount of scope for us to improve this project as are covering only the
restaurants of Delhi and not sales support. It focuses on food products, advertisers
and customers. More specifically the system will be deigned to manage the product
information. System also used to provide:
1. Statistical analysis to advertiser, offers.
2. This statistical analysis is limited to provide mostly clicked food product, showing food
of users interest, report generation about food position in market.
3. Food displayed on dashboard must match to customers profile and it has a benefit of
collecting the delivery report as feedback which can be used to improve food matters.
With the advancement of technology, large number of people are buying and selling food
online. There are commonly used techniques for online marketing such as use of banner
cards, email campaigns. Being one of the biggest cosmopolitan cities in India Delhi
restaurants and its sole websites are very busy sites. So the effective marketing depends
largely on success of online advertising. Analyzing the effectiveness of website is a matter

to concern for corporates that rely on b marketing, purchasing of food activities involves
attracting and retaining customers. Traditional database technology is indeed useful in
managing the online stores. Hover, it has serious limitations, when it comes to analyzing
effectiveness of online ads. Here, need to find answers to daunting questions such as:
wers who clicks on the advertisement?

From this point of view, study of online food product promotion for Delhi restaurants
becomes an important aspect of b marketing. The data generated by mouse clicks and
corresponding logs are too large to be analyzed by traditional technology. New technology
such as big data is being explored for finding solution to above problems. In the paper,
have decided to use open source technology Hadoop. Today the term big data draws a lot
of attention, but behind the hype theres a simple story. For decades, companies have been
making business decisions based on transactional data stored in relational databases.
Beyond that critical data, hover, is a potential treasure delightful things of non-traditional,
less structured data: blogs, social media, email, sensors, and photographs that can be mined
for useful information. Decrease in the cost of storage and increase in computing power
have made it possible to collect large data. As a result, more and more companies are now
compelled to include non-traditional yet potentially valuable data with their traditional
enterprise data and using it for their business intelligence analysis. To derive real business
value from big data, we need the right tools to capture and organize a wide variety of data
types from different sources, and to be able to easily analyses it within the context of all
our enterprise data.

Conclusion:
As the world is turning towards use of internet for every day-to-day activity, need for
viewing and selecting food of ones choice is kind of prime importance to the restaurants.
The list of irrelevant advertisements frustrates the user, which proves to be the main reason
for failures of most sites. But our website makes it smooth for the users to select products
by filtering available products based on individual customers interests. The e-commerce
field is emerging rapidly. Advertisers need a way to promote their products in market. This
way is provided by personalized websites like this one. The reports provided by our
website makes it easier for them to know the status of their products and hence take
necessary measures in order to come up for the faced losses. So, our project is an effort to
minimize the hard work of people and restaurants and get the things they want in shortest
possible time from one place we tried to explore the possibilities that can be used in regards
to handles peoples food matters that relates the restaurants with evolving technology that
can reduce the hassle and make customers feel a wonderful experience ordering food
online while we try to make sure customers dont need to visit different restaurants online
in one place they are getting all their necessary items ,price, menu and can also view the
feedback from other customers .although databases dont solve all aspects of the big data
problem, several tools some based on databases get part-way there. Whats missing
is two side folded: First, we must improve statistics and machine learning algorithms to be
more robust and easier for unsophisticated users to apply, while simultaneously training
students in their intricacies. Second, we need to develop a data management ecosystem
around these algorithms so that users can manage and evolve their data, enforce
consistency properties over it, and browse, visualize, and understand their algorithms

References
[1] Running Hadoop on Ubuntu Linux,Windows(single-node cluster). http://www.michealnoll.com/tutorials/ running-Hadoop-on-Ubuntu-Linux-single-node-cluster, December
2012. [page 32,39,49,120-140]
[2] Hadoop: The Denitive Guide. OReilly Media. From Avro to ZooKeeper, May 2012.
[pqge 3-7]
[3] The Unied Modeling Language User Guide. Addison sley, October 1998. [page 9-20]
[4] Hortonworks Ari Zilka, CTO. Hadoop. 2011. [page,3,6,19,27]
[5] Jeffrey Dean and Sanjay Ghemawat. The google le system. IEEE, 2004. [page 40-60]
[6] ZHAI Yan-dong YANG Bin HUANG Lan*, WANG Xiao-i. Extraction of user prole
based on the hadoop framework. IEEE, 2009. [page 19-29]
[7] LI Chao-qing LI Xiang-yang. Several technical problems and solutions of mass data
processing. Journal China College of Insurance Management. [page 4,29,33,40,52]
[8] MIKE2.0. Big data denition. [page 1-7]
[9] Roger S. Pressman. Software Engineering: A Practitioners Approach. 7th edition, McGrawHill, 2012. [page 9]
[10] Howard Gobioff Sanjay Ghemawat and Shun-Tak Leung. Mapreduce: Simplied data
processing on large clusters. IEEE, 2004.[page 9,12,33]
[11] Pig Programming. OReilly Media inc., Alan gates, Octomeber 2011. [page 3-100]
[12] Apache Sqoop Cookbook, OReilly Media, Inc.,Kathleen Ting and Jarek Jarcec Cecho,July,2013.
[13].Indian
restaurants
scenario
in
current
days
over
time
http://india.blogs.nytimes.com/2012/05/01/in-india-more-food-and-more-suffering/?_r=0
[14] BUSINESS INTELLIGENCE AND ANALYTICS: FROM BIG DATA TO BIG IMPACT
by Hsinchun Chen, Roger H. L. Chiang, Veda C. Storey [ page 19-65]
[15] Big Data: A Revolution That Will Transform How we Live, Work and Think by Viktor
Mayer-Schonberger , Kenneth Cukier (John Murray Publishers Ltd). [page 1-7]
[16] From Databases to Big Data by Sam Madden Massachusetts Institute of Technology

[17] Benefit-Risk Analysis for Big Data Projects by Jules Polonetsky ,Omer Tene, Joseph
Jerome [page 33-50]
[18] Data Modeling Considerations in Hadoop and Hive by Clark Bradley, Ralph
Hollinshead, Scott Kraus, Jason Lefler, Roshan Taheri October 2013[page17,40,49]
[19]How big data is changing the database scenario for good
http://www.infoworld.com/article/3003647/database/how-big-data-is-changing-thedatabase-landscape-for-good.html

[20] Analytics and Big Data: The Davenport Collection (6 Items);

By Thomas H. Davenport, Jeanne G. Harris, Jinho Kim, Robert Morison [page 234-250]
[21] Big Data & Analytics: Bangladesh on a Parallel World - Boomerang Blog
www.boomerangbd.com/blog/.../big-data-analytics-bangladesh-on-a-parallel-world/
[22] IBM - PureData Big Data Analytics - Data Warehouse - Bangladesh
www.ibm.com/ibm/puresystems/bd/en/big-data/
[23] Big Data & Analytics: Bangladesh on a Parallel World by Narmin Tartila
[24] Big Data: A Revolution That Will Transform How We Live, Work, and Think by by
Kenneth Cukier and Viktor Mayer Schonberger [page 9-135]
[25] Hadoop For Dummies by Dirk Deroos page [23,50,99]
[27] Hadoop Operations by Eric Sammer [page 90,124,223]
[28] Hadoop in Practice by Alex Holmes . [page 23,29,57,180]
[29] MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop
by Donald Miner. [page 40,43,49,67,97]
[30] Professional Hadoop Solutions by by Boris Lublinsky, Kevin T Smith, Alexey
Yakubovich [page 34,45,77,89]
[31] The Second Machine Age: Work, Progress and Prosperity in a Time of Brilliant
Technologies by Erik Brynjolfsson, Andrew McAfee and Jeff Cummings.[page 33,34,49]

Software Development: Exposys Data Labs - (Project Sheet)
No ratings yet
Software Development: Exposys Data Labs - (Project Sheet)
10 pages
3rd Year Sanskar Bba Project (Dinshaws)
No ratings yet
3rd Year Sanskar Bba Project (Dinshaws)
43 pages
Healthureum White Paper
No ratings yet
Healthureum White Paper
41 pages
JSP-Servlet Interview Questions You'll Most Likely Be Asked
From Everand
JSP-Servlet Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
4.loop Control Instructions
0% (2)
4.loop Control Instructions
5 pages
Big Data Final Report
100% (3)
Big Data Final Report
34 pages
Hadoop Final Docment
100% (1)
Hadoop Final Docment
79 pages
Computer Science and Engineering: Under The Supervision of MR - Nikunj Kumar
No ratings yet
Computer Science and Engineering: Under The Supervision of MR - Nikunj Kumar
11 pages
Joint Venture, Franchising, Licensing: and Case Study of Circle K
No ratings yet
Joint Venture, Franchising, Licensing: and Case Study of Circle K
25 pages
Online Transaction Processing
No ratings yet
Online Transaction Processing
17 pages
BlackBook PDF
0% (1)
BlackBook PDF
45 pages
Content
No ratings yet
Content
36 pages
Final Document
No ratings yet
Final Document
73 pages
Project Report On To Evaluate Consumer Perceptions To Various Promotional Offers
50% (2)
Project Report On To Evaluate Consumer Perceptions To Various Promotional Offers
55 pages
Bachelor of Computer Application BCA
No ratings yet
Bachelor of Computer Application BCA
149 pages
Growth in IT Sector
No ratings yet
Growth in IT Sector
15 pages
Business Intelligence Masters Program Curriculum
No ratings yet
Business Intelligence Masters Program Curriculum
36 pages
A Study of Cloud Computing and Its Security
No ratings yet
A Study of Cloud Computing and Its Security
25 pages
BigData Research Paper
No ratings yet
BigData Research Paper
22 pages
A Complete Beginners Guide On Power BI - Arvind
No ratings yet
A Complete Beginners Guide On Power BI - Arvind
4 pages
Enterprise Cloud Computing
No ratings yet
Enterprise Cloud Computing
7 pages
Speaking Club 9 - Higherlevels
No ratings yet
Speaking Club 9 - Higherlevels
1 page
A Study On Leadership Styles in Work Force: Submitted To University of Madras
No ratings yet
A Study On Leadership Styles in Work Force: Submitted To University of Madras
68 pages
JECRC University Prospectus 2016
No ratings yet
JECRC University Prospectus 2016
61 pages
Big Data Approach For Secure Traffic Data Analytics Using Hadoop
No ratings yet
Big Data Approach For Secure Traffic Data Analytics Using Hadoop
4 pages
Capital Markets Kunjal
No ratings yet
Capital Markets Kunjal
36 pages
Knowledge Management Life Cycle & Challenges in Building Knowledge Management System
No ratings yet
Knowledge Management Life Cycle & Challenges in Building Knowledge Management System
19 pages
Omninos Solutions: "Recruitment and Selection Process of It Personnel" IN '
100% (1)
Omninos Solutions: "Recruitment and Selection Process of It Personnel" IN '
82 pages
Data Storage Security Challenges in Cloud Computing
100% (1)
Data Storage Security Challenges in Cloud Computing
10 pages
Factors Affecting IT Implementation
100% (2)
Factors Affecting IT Implementation
132 pages
A Study On Impact of Employee Motivation On Job Performance With Special Reference To Asianet Fashion, Madurai
100% (1)
A Study On Impact of Employee Motivation On Job Performance With Special Reference To Asianet Fashion, Madurai
107 pages
Test 8
No ratings yet
Test 8
6 pages
Minor Project: Acceptance of Packed and Unpacked Milk
No ratings yet
Minor Project: Acceptance of Packed and Unpacked Milk
26 pages
HRD in Wipro
100% (1)
HRD in Wipro
13 pages
ICO Guide To Data Protection Audits
No ratings yet
ICO Guide To Data Protection Audits
27 pages
Introduction To Data Warehousing and Business Intelligence
No ratings yet
Introduction To Data Warehousing and Business Intelligence
72 pages
Project Report On Bakery Management System PDF
No ratings yet
Project Report On Bakery Management System PDF
65 pages
A Study of Cybercrimes in India Using Digital Forensics
No ratings yet
A Study of Cybercrimes in India Using Digital Forensics
15 pages
Unit 4 Innovation and Entrepreneurship
No ratings yet
Unit 4 Innovation and Entrepreneurship
50 pages
Avantika Trivedi Mini Project Mba 1
No ratings yet
Avantika Trivedi Mini Project Mba 1
56 pages
Introduction To Cloud Computing
No ratings yet
Introduction To Cloud Computing
108 pages
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
No ratings yet
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
11 pages
Big Data Hadoop
No ratings yet
Big Data Hadoop
34 pages
Project Report: Bachelor of Computer Application
No ratings yet
Project Report: Bachelor of Computer Application
78 pages
Final Report-IAMAI 40
No ratings yet
Final Report-IAMAI 40
60 pages
Pssi
100% (2)
Pssi
46 pages
A Three Layer Privacy Preserving Cloud Storage Scheme Based On Computational Intelligence in Fog Computing
No ratings yet
A Three Layer Privacy Preserving Cloud Storage Scheme Based On Computational Intelligence in Fog Computing
10 pages
Anuraag Rath MBA Dissertation
100% (1)
Anuraag Rath MBA Dissertation
74 pages
Unit 38 DatabaseManagementSyst
No ratings yet
Unit 38 DatabaseManagementSyst
27 pages
Software Project Management
No ratings yet
Software Project Management
55 pages
Explain Three
No ratings yet
Explain Three
5 pages
Database and SQL Concepts
No ratings yet
Database and SQL Concepts
14 pages
Brand Extension of Google Inc
100% (3)
Brand Extension of Google Inc
102 pages
It Project
No ratings yet
It Project
16 pages
TOP 30 Cybersecurity: Companies in Silicon Valley
No ratings yet
TOP 30 Cybersecurity: Companies in Silicon Valley
12 pages
SAUMYA SHUKLA MBA MINI PROJECT PDF - Docx 1
No ratings yet
SAUMYA SHUKLA MBA MINI PROJECT PDF - Docx 1
56 pages
Impact of Technological Innovation in Commercial Banks in Kenya.... An Evaluation of Customer Satisfaction
100% (7)
Impact of Technological Innovation in Commercial Banks in Kenya.... An Evaluation of Customer Satisfaction
42 pages
Customer 360: How Data, AI, and Trust Change Everything
From Everand
Customer 360: How Data, AI, and Trust Change Everything
Martin Kihn
No ratings yet
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Analytics in a Business Context: Practical guidance on establishing a fact-based culture
From Everand
Analytics in a Business Context: Practical guidance on establishing a fact-based culture
Frank Vella
No ratings yet
Making Big Data Work for Your Business: A guide to effective Big Data analytics
From Everand
Making Big Data Work for Your Business: A guide to effective Big Data analytics
Sudhi Sinha
No ratings yet
Database Creation
No ratings yet
Database Creation
12 pages
9.5.2.6 Packet Tracer - Configuring IPv6 ACLs Instructions IG
No ratings yet
9.5.2.6 Packet Tracer - Configuring IPv6 ACLs Instructions IG
7 pages
How To Write An Industry-Standard EEPROM (24C04) Using The MAX2990 I C Interface
No ratings yet
How To Write An Industry-Standard EEPROM (24C04) Using The MAX2990 I C Interface
4 pages
Memory Technology: CS1251 Computer Organization Carl Hamacher
No ratings yet
Memory Technology: CS1251 Computer Organization Carl Hamacher
20 pages
Ai Art
No ratings yet
Ai Art
7 pages
Hl7 Ccow Subjects CM 1 2
No ratings yet
Hl7 Ccow Subjects CM 1 2
32 pages
VB Control Structures
No ratings yet
VB Control Structures
62 pages
Guide For Digital Forensic
No ratings yet
Guide For Digital Forensic
69 pages
Full-Time MSBA Curriculum: Summer Semester: Business and Management Fundamentals
No ratings yet
Full-Time MSBA Curriculum: Summer Semester: Business and Management Fundamentals
2 pages
Oxford Semiconductor PCIe
No ratings yet
Oxford Semiconductor PCIe
2 pages
01 Introduction
No ratings yet
01 Introduction
18 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Lesson Plan FOR MICROPROCESSOR
No ratings yet
Lesson Plan FOR MICROPROCESSOR
3 pages
1Z0-062 Exam Dumps With PDF and VCE Download (1-30) PDF
No ratings yet
1Z0-062 Exam Dumps With PDF and VCE Download (1-30) PDF
17 pages
Web Interface Admin
100% (1)
Web Interface Admin
263 pages
T.Y. B.C.a. Syllabus
No ratings yet
T.Y. B.C.a. Syllabus
6 pages
Entity/Relationship Modelling
No ratings yet
Entity/Relationship Modelling
39 pages
R 2008 It Syllabus
No ratings yet
R 2008 It Syllabus
89 pages
Barracuda NG Firewall Product Guide
No ratings yet
Barracuda NG Firewall Product Guide
64 pages
Little Angels Higher Secondary School, Mhow, (M.P.) : Session-2017-18 Computer Project Work
No ratings yet
Little Angels Higher Secondary School, Mhow, (M.P.) : Session-2017-18 Computer Project Work
11 pages
Lovely Professional University, Punjab
No ratings yet
Lovely Professional University, Punjab
7 pages
012078085F
No ratings yet
012078085F
8 pages
PHP Notes
No ratings yet
PHP Notes
68 pages
Sitambas Patel UI Developer Resume New PDF
No ratings yet
Sitambas Patel UI Developer Resume New PDF
1 page
SpinalHDL: A Comprehensive Hardware Description Language
No ratings yet
SpinalHDL: A Comprehensive Hardware Description Language
1 page
Attacking SMS
No ratings yet
Attacking SMS
66 pages
Exception Handling
No ratings yet
Exception Handling
10 pages
Compiler (Very Imp.)
No ratings yet
Compiler (Very Imp.)
10 pages
ProtoNode Startup Guide For Eaton Cooper
No ratings yet
ProtoNode Startup Guide For Eaton Cooper
60 pages

Project Report Final

Uploaded by

Project Report Final

Uploaded by

APPENDIX 1

Advertising promotions for Restaurants of Delhi using Big

Signature of the Student

Name: Md .Masum Hossain

Signature of Head of Department

A major project is a golden opportunity for learning and self-development. I consider

Name and signature of Students

Chatper-5: Results & Testing

Table 2.1: Features of HDFS. 28

1.2 Overview of Project

1.3 Expected outcome

1.4 Gantt Chart

Figure 2.2: Showing the work progress over time

III. Data Security:

1.5 Software Requirement specification:

Hadoop compatible version installed,

2.1 System View

2.2 System components & Functionalities:

2.3: Data & relational views:

3.1: System Design:

Figure 2.3: Showing Big data insights

Figure 2.4: Architecture of Online Food Promotion

Classifying hyperlinks by their purpose

Figure 2.5: working of a general restaurants

3.3 Existing System

3.4 Application Areas

Advantages of Proposed System

The advantages of processing data in Big Data in real-time are many:

block or blocks. The MapReduce processing infrastructure includes an abstraction called an

Figure 2.7: How map-reduce software frame work works

4.1: Developmental feasibility

Dual Quad-core CPUs or greater that have Hyper-Threading enabled. We had to

Use a 1 Gigabit Ethernet interface or greater to provide adequate network

4.2: Implementation Specifications

Installing VMware 10.0 (or later) Enterprise or Enterprise Plus.

Enable Hyper V or Virtualization enabled from BIOS setup (Windows 10)

Enabled Host Monitoring.

The virtual machine restart priority was set to High

Set the Monitoring sensitivity to High.

Enabled vMotion and Fault Tolerance Logging.

All hosts in the cluster have Hardware VT enabled in the BIOS.

Port 9000 is used by SSH clients.

1-1.5 disk drives per processor core.

7,200 RPM disk Serial ATA disk drives.

Resource Requirements for the vSphere Management Server and Templates

Resource pool with at least 3.5GB RAM

Use a 1 Gigabit Ethernet interface or greater to provide adequate network bandwidth.

We have used my visual database 64 bit

Virtual hosts deployed on 4 physical hosts, running 3 virtual machines.

Figure 2.10: Showing section in website for customer feedback

Figure 2.11: Database for menu and other details entry

Figure 2.12: Adding an item to the database

Figure 2.13: Customer feedback

Figure 2.14: Cluster Summary

Figure 2.15: Hadoop Task tracker

5.1.2: Failure cases

Figure 2.16: Failure in connecting localhost:50070

5.2.1: Test results of various stages

Figure 2.17: showing jdk1.8.0_77 installed in the system

Figure 2.18: Hadoop Version

Figure 2.19: Successful startup of MapReduce

Figure 2.20: Showing starting of Database file system in System

6.1: Performance Estimation

Table 2.2: Showing Hadoop response time

PAGE_CLICK_FLAT 1001.11 991.47

Table2.2: Showing Data modeling comparison in RDBMS and Hadoop

[20] Analytics and Big Data: The Davenport Collection (6 Items);

You might also like