0% found this document useful (0 votes)
20 views

Python Problem Statement

This document provides a problem statement and dataset information for analyzing YouTube video statistics. The goal is to perform exploratory data analysis on daily trending YouTube video data from November 2017 to March 2018. The dataset includes variables like video ID, category, title, tags, descriptions, counts of trends, comments, likes, dislikes, and views. The scope involves sentiment analysis, video categorization, comment generation, and analyzing popularity factors. The learning outcome is gaining insights into variable relationships through exploratory data analysis.

Uploaded by

sreekanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Python Problem Statement

This document provides a problem statement and dataset information for analyzing YouTube video statistics. The goal is to perform exploratory data analysis on daily trending YouTube video data from November 2017 to March 2018. The dataset includes variables like video ID, category, title, tags, descriptions, counts of trends, comments, likes, dislikes, and views. The scope involves sentiment analysis, video categorization, comment generation, and analyzing popularity factors. The learning outcome is gaining insights into variable relationships through exploratory data analysis.

Uploaded by

sreekanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Python Problem Statement

Youtube Video Statistics

Abstract:
YouTube (the world-famous video sharing website) maintains a list of the top trending
videos on the platform. According to Variety magazine, to determine the year's top-
trending videos, YouTube uses a combination of factors including measuring users
interactions (number of views, shares, comments and likes). Note: that they’re not the
most-viewed videos overall for the calendar year.

Problem Statement:
Read the youtube data and perform exploratory data analysis.

Dataset Information:
This dataset is the daily record from the top trending YouTube videos. Top 200 trending
videos of a given day. Original Data was collected during 14th November 2017 & 5th
March 2018(though, data for January 10th & 11th of 2017 is missing). Original dataset
was collected by Youtube API.

Variable Description:

Column Description

Video_id Unique Identity which tells the video_id of each subscribed


video

Category_id Unique number assigned to each category of the video

Channel_title Unique number assigned to each category of the video

Subscriber Count of all subscribers for the respective video

Title Title of the video

Tags Tags are descriptive keywords you can add to your video to
help viewers find your content

Description Description of the respective video


Python Problem Statement

Trend_day_count Trending videos count for respective video

Tag_count Tag count for the respective video

Trend_tag_count Tag count for respective trending video

Comment_count Count of comments for particular video

Comment_disabled It represents the boolean value. True represents comments are


enabled and False represents comments are disabled

Like dislike disabled It represents the boolean value. True represents comments are
enabled and False represents comments are disabled

Likes Number of likes for particular video

Dislikes Number of dislikes for particular video

Tag appeared in It represents the boolean value. True represents that the
title respective tag has appeared in a particular video. False
represents that the respective tag has not appeared in
particular video.

Views Target variable. Number of views for particular video

Scope:
● Sentiment analysis in a variety of forms
● Categorising YouTube videos based on their comments and statistics.
● Training ML algorithms like RNNs to generate their own YouTube comments.
● Analysing what factors affect how popular a YouTube video will be.
● Statistical analysis over time

Learning Outcome:
The students will get a better understanding of how the variables are linked to each
other and how the EDA approach will help them gain more insights and knowledge
about the data that we have.

You might also like