A Mini Project Report Submitted For The Partial Fulfillment For The Award of Degree of
A Mini Project Report Submitted For The Partial Fulfillment For The Award of Degree of
by
[A18CSDB28] [A18CSDB29]
Dr.S.MUTHUKUMARAN , M.Sc(IT).,M.Phil.,Ph.D.,
(Assistant Professor, P.G. and Research Department of Computer Science)
(AUTONOMOUS)
CUDDALORE-607001
APRIL - 2021
CERTIFICATE
Being submitted to
By
[A18CASB28] [A18CSDB29]
Examiners:
1. --------------------------------
2. --------------------------------
ABSTRACT
It is our earnest and sincere desire and ambition to acquire profound knowledge in the
study of Bachelors in Computer Science. We are grateful to God, the Almighty who has blessed us
abundantly and guided us to complete this task.
We express our sincere thanks to our beloved, Rev. Fr. G. PETER RAJENDIRAM.,
M.A.,M.Sc.,M.Ed.,M.Phil., Secretary, St. Joseph’s College of Arts and Science (Autonomous),
Cuddalore, for providing such a good congenial environment to enlighten my knowledge.
We express our sincere gratitude to Dr. M. Arumai Selvam, M.Sc.,M.Phil.,Ph.D.,
Principal and Head, Post Graduate and Research Department of Computer Science, St.
Joseph’s College of Arts and Science (Autonomous), Cuddalore, for his support and
encouragement.
We take immense pleasure in conveying our sincere and heartfelt gratitude to our internal
guide Dr.S. Muthukumaran, M.Sc(IT).,M.Phil., Ph.D., for her valuable guidance, inspiration,
consent and motivation throughout my project. Her untiring encouragement made my thoughts
enthusiastic in successfully completing this project.
At the Outset, we extend our sincere gratitude to all our teachers in the Department of
Computer Science department who have taken pains to shape and mould us in all our endeavors.
We will fail in our duty if we do not put on record our heartfelt gratitude to our beloved
parents for their financial and material support as well their words of encouragement and
motivation to complete this project successfully.
Last but not the Least, we place our deepest sense of gratitude to all my Friends who had
inspired and supported me to complete this project more successfully.
CERTIFICATE i
ABSTRACT ii
ACKNOWLEDGEMENT iii
SL. PAGE
CONTENT
NO NO.
INTRODUCTION
1.
1.1 ABOUT THE PROJECT 6
PROBLEM DEFINITION
2.1 EXISTING SYSTEM
2. 2.2 LIMITATIONS OF EXISTING SYSTEM 8
2.3 PROPOSED SYSTEM
2.4 FEATURES OF PROPOSED SYSTEM
SOFTWARE REQUIREMENTS SPECIFICATION
3.1 INTRODUCTION
3.2 MODULE DESCRIPTION 10
3.
3.3 HARDWARE REQUIREMENTS
3.4 SOFTWARE REQUIREMENTS
SYSTEM ANALYSIS
4.1 INTRODUCTION
4. 13
4.2 UML DIAGRAMS
4.2.1 USE CASE DIAGRAM
SYSTEM DESIGN
5.1 INTRODUCTION
5 15
5.2 USER INTERFACE DESIGN
5.3 DATABASE DESIGN
6 SYSTEM TESTING 19
6.1 INTRODUCTION
6.2 UNIT TESTING
6.3 INTEGRATION TESTING
6.4 VALIDATION TESTING
6.5 ALPHA AND BETA TESTING
6.6 SYSTEM TESTING
CONCLUSION
22
7 7.1 CONCLUSION
7.2 FUTURE ENHANCEMENTS
BIBLIOGRAPHY
23
8 8.1 BOOKS
8.2 WEBSITE
APPENDICES
APPENDIX-A
SOFTWARE DESCRIPTION 24
APPENDIX-B
SAMPLE SOURCE CODE 34
APPENDIX-C
SCREEN SHOTS 49
CHAPTER 1
INTRODUCTION
1.1 ABOUT THE PROJECT
MapReduce is a distributed programming model for processing large-scale
dataset in parallel, which has shown its outstanding effectiveness in many
existing applications [3], [4], [5]. Since original MapReduce model is not
optimized for deployment across datacenters [6], aggregating distributed data to
a single datacenter for centralized processing is a widely-used approach.
However, waiting for such centralized aggregation suffers from significantly
delays due to the heterogenous and limited bandwidth of usercloud link. Notice
that the bandwidth of inter-datacenter link is usually dedicated relatively high-
bandwidth lines [7], moving the data to multiple datacenters for map operation in
parallel and then aggregating the intermediate data to a single datacenter for
reduce operation using inter-datacenter link has potential to reduce the latency.
Furthermore, different kinds of cost (e.g., incurred by moving data or renting
VM) also can be optimized considering the heterogeneity of the link speed, the
dynamism of the data generation and the resource price. Therefore, distributing
data from multi-sources into multi-datacenters and processing them using
distributed MapReduce is an idea way to deal with the large volume dispersed
data. Hitherto, the most important questions to be solved include: 1) how to
optimize the placement of large-scale datasets from various locations onto geo-
distributed datacenter cloud for processing and 2) how many resources such as
computing resources should be provisioned to guarantee performance and
availability while minimizing the cost. The fluctuation and multiple sources of
generated data combined with the dynamic utility-driven pricing model of cloud
resource make it a very challenging problem
CHAPTER 2
PROBLEM DEFINITION
2.1 EXISTING SYSTEM
Measurements from this site were collected over a three-month period. During
this time the site received 1.35 billion requests, making this the largest Web
workload analyzed to date. By examining this extremely busy site and through
comparison with existing characterization studies, we are able to determine how
Web server workloads are evolving. We find that improvements in the caching
architecture of the World Wide Web are changing the workloads of Web servers, but
major improvements to that architecture are still necessary. In particular, we uncover
evidence that a better consistency mechanism is required for World Wide Web
caches
Based on the data and information gathered on online websites, the following
pitfalls or drawbacks were found in the current system. They are
SECURITY: This the biggest drawbacks in online shopping securities.
LACK OF PRIVACY: Many websites do not have high encryption for secure.
TAX ISSUES.
FEAR.
Product suitability.
Legal issues.
CHAPTER 3
The production of the requirements stage of the software development process is Software
Requirements Specifications (SRS) (also called a requirements document). This report lays a
foundation for software engineering activities. SRS is a formal report, which acts as a
representation of software that enables the customers to review whether SRS is according to their
requirements. Also, it comprises user requirements for a system as well as detailed specifications
of the system requirements.
The SRS is a specification for a specific software product, program, or set of applications that
perform particular functions in a specific environment. First, the SRS could be written by the client
of a system. Second, the SRS could be written by a developer of the system.
Preliminaries
System Model
Problem Formulation
(1) Data allocation variable: λ d r (t), denotes the amount of the data allocated to d
from data location r at t, which means that the data generated from each
location can be moved to any datacenter for analysis. Let ar(t) , Ar max, U d r
be the amount of data generated from the r-th region at time slot t, the max
volume of data generated in location r and the upload capacity between region
r and datacenter d, respectively
(2) ensures that the sum of data allocated to each datacenter at one time slot is
equal to the total amount data generated at that time slot. Eq.(3) ensures that
the total amount of data uploaded via link < r, d > should not exceed the
upload capacity of link < r, d >. The variable set is denoted as λ(t) = {λ d r (t),
∀r ∈ R, ∀d ∈ D}. (2) VM provisioning variable: mk d (t), n k d (t), ∀d ∈ D,
∀k ∈ K, denote the number of type-k VM rented from datacenter d at time
slot t for Map and Reduce operation, respectively. They can be scaled up and
down over time slots. Since the computation resource in a datacenter is
limited, we let N k,max d be the max number of type-k VM in datacenter d
MONITOR 17 “INCHES
SERVER SERVER
CHAPTER 4
SYSTEM ANALYSIS
4.1 INTRODUCTION
System Analysis is a process of collecting and interpreting facts, identifying the problems,
and decomposition of a system into its components. It is conducted for the purpose of studying a
system in order to identify its objectives. It is a problem solving technique that improves the
system and ensures that all the components of the system work efficiently to accomplish their
purpose.
CHAPTER 5
SYSTEM DESIGN
5.1 INTRODUCTION
Dataset Description As analyzing the huge amount of record data of large-web site
(e.g., Youtube, Facebook etc.) is increasingly important for its market decisions, we
take user log analytics as an example in the experiment. Unfortunately, the trace logs
of the well-known largescale web sites (e.g., Facebook, Linkedin) are not open
access, we use the WorldCup98 web traces dataset [33] instead to evaluate our
algorithm. We believe it does not affect the evaluation result due to its
characteristics of dynamism and inscrutability. This dataset records the information
of all the requests in the 1998 World Cup Web site between April 30, 1998 and July
26, 1998, which is from 30 servers distributed across four locations used in the web
site system (i.e., servers in Paris, France; 10 servers in Herndon, Virginia; 10 servers
in Plano, Texas; and 6 servers in Santa Clara, California). Each trace record includes
detailed information such as the request time, request client, request object and the
server that handled the request etc. We extract one week’s data between June 21 to
June 27, 1998 from it for experiment. In particular, to simulate the large-scale web
site, we augment the amount of original request with 1000x. By aggregating the
requests every 30 minutes and setting each record contents to 100bytes, we get the
corresponding data volume shown in Fig.2. 6.2 Experiment Setting In the
experiment, we simulate a DSP with 4 datasources which correspond to the servers
in four geographic locations (i.e., Santa Clara, Plano, Herndon and Paris) that serves
for the WorldCup 1998 web site and a cloud with 12 datacenters in 12 locations
corresponding to those of Amazon EC2 in Europe and America (i.e., Ashburn,
Dallas, Los Angeles, Miami, Newark, Palo Alto, Seattle, Saint. Louis, Amsterdam,
Dublin, Frankfurt and London) [34]. Five types of VM instances (i.e., c3.large,
c3.xlarge, c3.2xlarge, c3.4xlarge, c3.8xlarge ) provided by EC2 are considered in
this paper. Geographic distances between datacenters and datasources are obtained
by the online tool in [35], which can be seen in Fig.4(b). In this paragraph, some
different settings are suggested to be itemized one by one. Link delays are set based
on Round Trip Time (RTT) among the datasources and data centers, according to
their geographic distance (e.g., RTT (ms)=0.02×Distance(km)+5) [36]. The prices of
the VMs instance (p k d (t)) and storage (sd(t)) follow the prices of Amazon EC2
Spot Instance and S3 from the web sites respectively [37], [38]. To stimulate
consumption, we assume that the more VMs a customer buys from the CSPs, the
cheaper the unit price is. To simulate the VM price change over different
datacenters, we set the average electricity price of the city the datacenter located as
the price factor. Table 1 shows the electricity price in each city, where ‘cities in
USA
CHAPTER 6
SYSTEM TESTING
6.1 INTRODUCTION
Testing is the process by which developer will generate a set of test data, which gives
maximum probability of finding all types of errors that can occur in the software. Testing is an
important phase in the software development cycle. For any software that is newly developed,
primary importance has been given to testing of the system. It is the responsibility of the developer
to detect all possible errors in the software before handling it to the user or Customer.
Being a Website, “ONLINE SHOPPING SYSTEM” has to be undergone with the
following tests:
6.2 UNIT TESTING
Unit testing focuses on the smallest unit of Software component or module. For each
module interface, local data structure, boundary condition, independent paths and all error
handlings paths are tested. In this project, there are different modules like Registration, login,
admin module and so on. Each one is tested separately and the errors are rectified. For example, in
a registration form each of the element (for example a text box, list box, radio button and other
elements) should be tested.
6.3 INTEGRATION TESTING
Integration testing is a systematic technique conducted to test errors associated with
interfacing. The program is constructed and tested in a small increment. In this project, login
module and registration module are integrated and tested. Similarly, all the modules are integrated
and tested.
6.3.1 REGRESSION TESTING
After the successful integration of modules, it also check for any side effects (any
kind of new error) that may lead to introduction of any new error in different modules.
6.4 VALIDATION TESTING
The beta test is conducted at one or more end-user sites. Unlike alpha testing, the developer
generally is not present. Therefore, the beta test is a “live” application of the software in an
environment that cannot be controlled by the developer.
The customer records all problems (real or imagined) that are encountered during beta
testing and reports these to the developer at regular intervals.
System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system.
CHAPTER 7
CONCLUSION & FUTURE ENCHANCEMENTS
7.1 CONCLUSION
With high velocity and high volume of big data generated from geographically
dispersed sources, big data processing across geographically distributed datacenters
is becoming an attractive and cost effective strategy for many big data companies
and organizations. In this paper, a methodical framework for effective data
movement, resource provisioning and reducer selection with the goal of cost
minimization is developed. We balance five types of cost: bandwidth cost, storage
cost, computing cost, migration cost, and latency cost, between the two MapReduce
phases across datacenters. This complex cost optimization problem is formulated
into a joint stochastic integer nonlinear optimization problem by minimizing the five
cost factors simultaneously. By employing Lyapunov technique, we transform the
original problem into three independent subproblems that can be solved by
designing an efficient online algorithm MiniBDP to minimize the long-term time-
average operation cost. We conduct theoretical analysis to demonstrate the
effectiveness of MiniBDP in terms of cost optimum and worst case delay. We
perform experimental evaluation using real-world trace dataset to validate the
theoretical result and the superiority of MiniBDP by compared it with existing
typical approaches and offline method
Taking into consideration the factor of data replication in the model. Data replication
is well-known as an effective solution for high availability and high fault tolerance.
Given that our goal is cost minimization, introducing data replication will add
additional cost of replicating data across datacenters. Thus, this factor is not
considered in our current cost minimization algorithm and we left it to be one of our
ongoing research efforts. 3) In addition, we will concentrate on deploying the
proposed algorithm in the real systems such as Amazon EC2 to further validate its
effectiveness
CHAPTER 8
BIBLIOGRAPHY
8.1 BOOKS
[8] W. Yang, X. Liu, L. Zhang, and L. T. Yang, “Big data real-time processing
based on storm,” in Proceedings of the IEEE TrustCom’13, 2013.
[9] Y. Zhang, S. Chen, Q. Wang, and G. Yu, “i2mapreduce: Incremental mapreduce
for mining evolving big data,” IEEE Transactions on Knowledge and Data
Engineering, vol. 27, pp. 1906–1919, 2015.
[10] D. Lee, J. S. Kim, and S. Maeng, “Large-scale incremental processing with
mapreduce,” Future Generation Computer Systems, vol. 36, no. 7, pp. 66–79, 2014.
[11] B. Heintz, A. Chandra, R. K. Sitaraman, and J. Weissman, “End-toend
optimization for geo-distributed mapreduce,” IEEE Transactions on Cloud
Computing, 2014.
[12] C. Jayalath, J. Stephen, and P. Eugster, “From the cloud to the atmosphere:
Running mapreduce across data centers,” IEEE Transactions on Computers, vol. 63,
no. 1, pp. 74–87, 2014.
[13] P. Li, S. Guo, S. Yu, and W. Zhuang, “Cross-cloud mapreduce for big data,”
IEEE Transactions on Cloud Computing, 2015, dOI:10.1109/TCC.2015.2474385.
[14] A. Sfrent and F. Pop, “Asymptotic scheduling for many task computing in big
data platforms,” Information Sciences, vol. 319, pp. 71–91, 2015.
[15] L. Zhang, Z. Li, C. Wu, and M. Chen, “Online algorithms for uploading
deferrable big data to the cloud,” in Proceedings of the IEEE INFOCOM, 2014, pp.
2022–2030.
[16] Q. Zhang, L. Liu, A. Singhand et al., “Improving hadoop service provisioning
in a geographically distributed cloud,” in Proceedings of IEEE Cloud’14, 2014.
[17] A. Vulimiri, C. Curino, P. B. Godfrey, K. Karanasos, and G. Varghese,
“Wanalytics: Analytics for a geo-distributed data-intensive world,” in Proceedings
of the CIDR’15, 2015.
[18] K. Kloudas, M. Mamede, N. Preguica, and R. Rodrigues, “Pixida: Optimizing
data parallel jobs in wide-area data analytics,” Proceedings of the VLDB
Endowment, vol. 9, no. 2, pp. 72–83, 2015.
[19] Q. Pu, G. Ananthanarayanan, P. Bodik, S. Kandula, A. Akella, P. Bahl, and I.
Stoica, “Low latency geo-distributed data analytics,” in Proceedings of the ACM
SIGCOMM’15, 2015.
[20] “Facebook’s prism project,” http://www.wired.com/wiredenterprise/2012/
08/facebook-prism/.
[21] J. C. Corbett, J. Dean, M. Epstein et al., “Spanner: Google’s globallydistributed
database,” in Proceedings of the OSDI’12, 2012.
[22] “Connecting geographically dispersed datacenters,” HP, 2015.
[23] “Interconnecting geographically dispersed data centers using vpls-design and
system assurance guide,” Cisco Systems, Inc, 2009.
[24] L. Tassiulas and A. Ephremides, “Stability properties of constrained queueing
systems and scheduling policies for maximum throughput in multihop radio
networks,” IEEE Transactions on Automatic Control, vol. 37, no. 12, pp. 1936–
1948, 1992. [
25] R. Urgaonkar, U. C. Kozat, K. Igarashi, and M. J. Neely, “Dynamic resource
allocation and power management in virtualized data centers,” in Proceedings of the
IEEE NOMS, 2010, pp. 479–486.
APPENDIX A
SOFTWARE DESCRIPTION
VISUAL STUDIO
4. The most basic edition of Visual Studio, the Community edition, is available
free of charge. The slogan for Visual Studio Community edition is "Free,
fully-featured IDE for students, open-source and individual developers".
6. Visual Studio does not support any programming language, solution or tool
intrinsically; instead, it allows the plugging of functionality coded as a
VSPackage. When installed, the functionality is available as a Service.
The IDE provides three services: SVsSolution, which provides the ability to
enumerate projects and solutions; SVsUIShell, which provides windowing
and UI functionality (including tabs, toolbars, and tool windows); and
SVsShell, which deals with registration of VSPackages. In addition, the IDE
is also responsible for coordinating and enabling communication between
services.[11] All editors, designers, project types and other tools are
implemented as VSPackages. Visual Studio uses COM to access the
VSPackages. The Visual Studio SDK also includes the Managed Package
Framework (MPF), which is a set of managed wrappers around the COM-
interfaces that allow the Packages to be written in any CLI compliant
language.[12] However, MPF does not provide all the functionality exposed by
the Visual Studio COM interfaces.[13] The services can then be consumed for
creation of other packages, which add functionality to the Visual Studio IDE.
11.The Visual Studio code editor also supports setting bookmarks in code for
quick navigation. Other navigational aids include collapsing code
blocks and incremental search, in addition to normal text search
and regex search.[24] The code editor also includes a multi-item clipboard and a
task list.[24] The code editor supports code snippets, which are saved templates
for repetitive code and can be inserted into code and customized for the
project being worked on. A management tool for code snippets is built in as
well. These tools are surfaced as floating windows which can be set to
automatically hide when unused or docked to the side of the screen. The
Visual Studio code editor also supports code refactoring including parameter
reordering, variable and method renaming, interface extraction, and
encapsulation of class members inside properties, among others.
13.Debugger[edit]
16.Designer[edit]
SOURCE CODE
APPENDIX-C
SCREENSHOTS:
SAMPLE OUTPUT: