Vedio Streaming Web
Vedio Streaming Web
1
Contents
1 Introduction .................................................................................................................................................. 3
2 System Architecture..................................................................................................................................... 4
4 Normalisation ............................................................................................................................................. 12
2
1. Introduction
YouTube is an American online video-sharing platform headquartered in San Bruno, California.
Three former PayPal employees-Chad Hurley, Steve Chen, and Jawed Karim—created the service in
February 2005. Google bought the site in November 2006 for US 1.65 billion; YouTube now operates
as one of Google’s subsidiaries.
Since YouTube works on such a massive scale, we decided to use this as an inspiration for the
project. YouTube’s database schemas are one of the most complicated ones currently being scaled
and used massively impacting billions of users! YouTube allows users to upload, view, rate, share,
add to playlists, report, comment on videos, and subscribe to other users. It offers a wide variety of
user-generated and corporate media videos. Available content includes video clips, TV show clips,
music videos, short and documentary films, audio recordings, movie trailers, live streams, and other
content such as video blogging, short original videos, and educational videos. Most content on
YouTube is uploaded by individuals, but media corporations including CBS, the BBC, Vevo, and Hulu
offer some of their material via YouTube as part of the YouTube partnership program. Unregistered
users can only watch (but not upload) videos on the site, while registered users are also permitted
to upload an unlimited number of videos and add comments to videos. Videos that are
agerestricted are available only to registered users affirming themselves to be at least 18 years old.
YouTube and selected creators earn advertising revenue from Google AdSense, a program that
targets ads according to site content and audience. The vast majority of its videos are free to view,
but there are exceptions, including subscription-based premium channels, film rentals, as well as
YouTube Music and YouTube Premium, subscription services respectively offering premium and
adfree music streaming, and ad-free access to all content, including exclusive content
commissioned from notable personalities. As of February 2017, there were more than 400 hours of
content uploaded to YouTube each minute, and one billion hours of content being watched on
YouTube every day. As of August 2018, the website is ranked as the second-most popular site in the
world, according to Alexa Internet, just behind Google.As of May 2019, more than 500 hours of
video content are uploaded to YouTube every minute. Based on reported quarterly advertising
revenue, YouTube is estimated to have US 15 billion in annual revenues.
Since YouTube works on such a massive scale, we decided to use this as an inspiration for the
project. YouTube’s database schemas are one of the most complicated ones currently being scaled
and used massively impacting billions of users!
3
2. System Architecture
Since the original architecture of Youtube is complex and implementing is out of scope for the
current scenario, we will be discussing the differences of both the systems
2.1.1 Statistics
• 4 billion views a day
• The number of videos has gone up 9 orders of magnitude and the number of developers has
only gone up two orders of magnitude.
• Linux - The benefit of Linux is there’s always a way to get in and see how your system is
behaving. No matter how bad your app is behaving, you can take a look at it with Linux tools.
• MongoDB - is used a lot. When you watch a video you are getting data from MongoDB.
Sometimes it’s used a relational database or a blob store. It’s about tuning and making choices
about how you organize your data.
These are some of the technologies which Youtube uses in their current system, Other templating
engines to keep the app up and running.
4
2.2 Proposed system
2.2.1 Features
In our implementation of the popular video streaming service, we have incorporated the following
features for our web application.
1. Users must be able to upload/delete videos in the system when they log in.
2. Users must be able to add comments to videos in the system when they log in.
3. Users must be able to watch videos in the system when they log in or logout.
4. Users must be able to search for videos/users/groups when they log in or logout.
7. Users can like or dislike the videos, under this condition, the system should keep numbers of
likes, dislikes, comments, views to present these numbers to users.
9. Users can see the trending videos based on the number of views
10. Users can subscribe to channels and have a separate page that has videos uploaded in
channels of their subscription
• React - Lightweight web framework which can scale on the fly. Has added layer of security to
prevent malware attacks
• MongoDB- Database used for the application. We have used the same database system, as
used in the original Youtube Architecture
5
3. Entity Relationship Diagram
Entities may be characterized not only by relationships, but also by additional properties
(attributes), which include identifiers called "primary keys". Diagrams created to represent
attributes as well as entities and relationships may be called entity-attribute-relationship diagrams,
rather than entity–relationship models.
There is a tradition for ER/data models to be built at two or three levels of abstraction. Note that
the conceptual-logical-physical hierarchy below is used in other kinds of specification, and is
different from the three schema approach to software engineering.
6
3.2 Entities and Relationships in our App
7
3.3 CLASS DIAGRAM FOR ONLINE STREAMING PLATFORM
8
3.4 Use Case Diagram
9
10
3.3 Conversion of ER Diagram to Relational schema
11
4. Normalisation
4.1 Introduction
Normalization is the process of organizing the data in the database. It is used to minimize the
redundancy from a relation or set of relations. Normalization divides the larger table into the
smaller table and links them using relationship. The normal form is used to reduce redundancy from
the database table.
• Video ID –> Video Title, Video Description, Video Path, User ID, Upload Date
• Comment ID –> Comment Text, User ID, Video ID, Comment Date/Time
12
• Subscription ID –> User ID, Channel ID
• All attribute values in all relations are atomic. So the relations are in First Normal Form.
• Since the keys of all the relations are single attributes, there are not partial functional
dependencies in any of the relations. So the relations are in Second Normal Form.
• There are no non-prime attributes that are transitively dependent on the key in any relation.
So the relations are in Third Normal Form.
• In every functional dependency X –> A in the relation schema, X is a superkey of the respective
relation. So the relations are in Boyce-Codd Normal Form.
13
5. Implementation and Screenshots
14
15
Figure 5.10: Your watch history
16
6. Conclusion and References
6.1 Conclusion
Designing and implementing a large video streaming service from the ground up teaches a lot of
things and this project definitely helped us understand the various parts. Also, dealing with complex
database designs and the unique and innovative normalisation techniques one needs to come up
with, to ensure low latency was challenging and thought provoking.
6.2 References
1. React App Documentation
4. Designing Youtube
17