Project in Computer IV: Submitted By: Paunil Bruce Ronald Roy S. Submitted To: Ms. Jacquelyn Pasturan
Project in Computer IV: Submitted By: Paunil Bruce Ronald Roy S. Submitted To: Ms. Jacquelyn Pasturan
Project in Computer IV: Submitted By: Paunil Bruce Ronald Roy S. Submitted To: Ms. Jacquelyn Pasturan
2013-2014
History
Zuckerberg wrote a program called Facemash on October 28, 2003 while attending Harvard as a sophomore. According to The Harvard Crimson, the site was comparable to Hot or Not and "used photos compiled from the online facebooks of nine houses, placing two next to each other at a time and asking users to choose the 'hotter' person To accomplish this, Zuckerberg hacked into protected areas of Harvard's computer network and copied private dormitory ID images. Harvard did not have a student "Facebook" (a directory with photos and basic information) at the time, although individual houses had been issuing their own paper facebooks since the mid-1980s. Facemash attracted 450 visitors and 22,000 photo-views in its first four hours online. The site was quickly forwarded to several campus group list-servers,[clarification needed] but was shut down a few days later by the Harvard administration. Zuckerberg faced expulsion and was charged by the administration with breach of security, violating copyrights, and violating individual privacy. Ultimately, the charges were dropped. Zuckerberg expanded on this initial project that semester by creating a social study tool ahead of an art history final. He uploaded 500 Augustan images to a website, and each image was featured with a corresponding comments section. He shared the site with his classmates and people started sharing notes. The following semester, Zuckerberg began writing code for a new website in January 2004. He said he was inspired by an editorial about the Facemash incident in The Harvard Crimson. On February 4, 2004, Zuckerberg launched "Thefacebook", originally located at thefacebook.com. Six days after the site launched, three Harvard seniors (Cameron Winklevoss, Tyler Winklevoss, and Divya Narendra) accused Zuckerberg of intentionally misleading them into believing he would help them build a social network called HarvardConnection.com. They claimed he was instead using their ideas to build a competing product. The three complained to The Harvard Crimsonand the newspaper began an investigation. They later filed a lawsuit against Zuckerberg, subsequently settling in 2008 for 1.2m in shares (worth $300m at Facebook's IPO). Membership was initially restricted to students of Harvard College; within the first month, more than half the undergraduates at Harvard were registered on the service. Eduardo Saverin(business aspects), Dustin Moskovitz (programmer), Andrew McCollum (graphic artist), and Chris Hughes joined Zuckerberg to help promote the website. In March 2004, Facebook expanded to the universities of Columbia, Stanford, and Yale. It later opened to all Ivy League colleges, Boston University, New York University, the MIT, and gradually most universities in Canada and the United States.[31][32] In mid-2004, entrepreneur Sean Parker (an informal advisor to Zuckerberg) became the company's president. In June 2004, Facebook moved its operations base to Palo Alto, California. It received its first investment later that month from PayPal co-founder Peter Thiel. In 2005, the company dropped the from its name after purchasing the domain name facebook.com for $200,000.
Database/s
MYSQL
Facebook primarily uses MySQL for structured data storage such as wall posts, user information, timeline etc. This data is replicated between their various data centers
MEMCACHED
It is also important to note that Facebook makes heavy use of Memcached,a memory caching system that is used to
speed up dynamic database driven websites by caching data and objects in RAM to reduce reading time.Memcached is Facebooks primary form of caching and greatly reduces the database load. Having a caching system allows Facebook to be as fast as it is at recalling your data. If it doesnt have to go to the database it will just fetch your data from the cache based on your user ID.
HAYSTACK
The Photos application is one of Facebooks most popular features. Up to date, users have uploaded over 15 billion
photos which make Facebook the biggest photo sharing website. For each uploaded photo, Facebook generates and stores four images of different sizes, which translates to a total of 60 billion images and 1.5PB of storage. The current growth rate is 220 million new photos per week, which translates to 25TB of additional storage consumed weekly. Implements a HTTP based photo server which stores photos in a generic object store called Haystack.
CASSANDRA
The Apache Cassandra database is the right choice when you need scalability and high availability without
SCRIBE
Scribe is a flexible logging system that Facebook uses for a multitude of purposes internally. Its been built to be
able to handle logging at the scale of Facebook, and automatically handles new logging categories as they show up.
VARNISH
Varnish is an HTTP accelerator which can act as a load balancer and also cache content which can then be served
lightning-fast. Facebook uses Varnish to serve photos and profile pictures, handling billions of requests every day.
source in early 2010. To date, Facebook has achieved morethan a 6x reduction in CPU utilization for the site using HipHop as compared with Apache and Zend PHP.Facebook is able to move fast and maintain a high number of engineers who are able to work across the entire codebase
History
Google began in March 1996 as a research project by Larry Page and Sergey Brin, Ph.D. students at Stanford University[2] who were working on theStanford Digital Library Project (SDLP). The SDLP's goal was to develop the enabling technologies for a single, integrated and universal digital library" and was funded through the National Science Foundation, among other federal agencies.[3][4][5][6] In search of a dissertation theme, Page consideredamong other thingsexploring the mathematical properties of the World Wide Web, understanding its link structure as a huge graph.[7] His supervisor, Terry Winograd, encouraged him to pick this idea (which Page later recalled as "the best advice I ever got"[8]) and Page focused on the problem of finding out which web pages link to a given page, based on the consideration that the number and nature of such backlinks was valuable information for an analysis of that page (with the role of citations in academic publishing an additional consideration).[7] In his research project, nicknamed "BackRub", Page was soon joined by Brin, who was supported by a National Science Foundation Graduate Fellowship.[3] Brin was already a close friend, whom Page had first met in the summer of 1995Page was part of a group of potential new students that Brin had volunteered to show around the campus.[7] Page's web crawler began exploring the web in March 1996, with Page's own Stanford home page serving as the only starting point.[7] To convert the backlink data that it gathered for a given web page into a measure of importance, Brin and Page developed the PageRankalgorithm.[7] While analyzing BackRub's outputwhich, for a given URL, consisted of a list of backlinks ranked by importancethe pair realized that a search engine based on PageRank would produce better results than existing techniques (existing search engines at the time essentially ranked results according to how many times the search term appeared on a page).[7][9] A small search engine called "RankDex" from IDD Information Services (a subsidiary of Dow Jones) designed by Robin Li was, since 1996, already exploring a similar strategy for site-scoring and page ranking.[10] The technology in RankDex would be patented[11] and used later when Li founded Baidu in China.[12][13] Convinced that the pages with the most links to them from other highly relevant Web pages must be the most relevant pages associated with the search, Page and Brin tested their thesis as part of their studies, and laid the foundation for their search engine. By early 1997, the BackRub page described the state as follows:[14] Some Rough Statistics (from August 29th, 1996) Total indexable HTML urls: 75.2306 Million Total content downloaded: 207.022 gigabytes
BackRub is written in Java and Python and runs on several Sun Ultras and Intel Pentiums running Linux. The primary database is kept on an Sun Ultra II with 28GB of disk. Scott Hassan and Alan Steremberg have provided a great deal of very talented implementation help. Sergey Brin has also been very involved and deserves many thanks. Originally the search engine used the Stanford website with the domain google.stanford.edu. The domain google.com was registered on September 15, 1997. They formally incorporated their company, Google, on September 4, 1998 at a friend's garage in Menlo Park, California.
Database/s
Bigtable
A Distributed Storage System for Structured Data
Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. Some features Architecture BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time." In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal. Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space. fast and extremely large-scale DBMS a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and columnoriented databases. designed to scale into the petabyte range it works across hundreds or thousands of machines it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration each table has multiple dimensions (one of which is a field for time, allowing versioning) tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.
My observation between the two is that they have different use of database. The number of databases used by Facebook was a lot because they separate the database for photos, status, likes, friends, chat history and etc. so that ,I think, it will be organize. While Google, in the other hand, only have 1 database which is made by them. It is so amazing that they have made their own database which they can upgrade and when its broken it can fix by them. I think there is no comparison between them. They have different use of databases.
Recommendation
I both recommend using your self-made database which Google have and divide them separately like Facebook does because, I think, it is more faster to search and find the files when you have to divide it and it is easy to do maintenance when you made your own database so that it is easy to fix it and upgrade it when new modern devices and chips are formed.
Conclusion
I conclude that both of the databases which Google and Facebook has used are the best database to use when you need them. You need a database that is organize and is self-made.