YouTube_Trending_Video_Spark_Data_Analysis

Exploratory Data Analysis of YouTube Trending Videos (2020-21)

Ramesh Chandra Vuppala, Srihari Das S G, Animesh Kumar, Ronak Dedhiya Dinesh

Email: {rameshcv , sriharidas, animeshkumar, ronakdedhiya} @iisc.ac.in

What are YouTube Trending Videos?

YouTube is the most popular and most used video platform in the world. YouTube maintains a list of trending videos. Description of trending videos as shared by google. “Trending helps viewers see what’s happening on YouTube and in the world. Trending aims to surface videos that a wide range of viewers would find interesting. Some trends are predictable, like a new song from a popular artist or a new movie trailer. The list of trending videos is updated roughly every 15 minutes”.

Goals and Motivation

We are presenting exploratory data analysis of YouTube trending videos** of US during 2020-21. We are trying to get key insights from trending videos and find interesting facts and patterns by exploring the data and by using effective visualizations. Titles, descriptions, thumbnails, tags, views, likes/dislikes, and comments were analyzed to produce intuitive results as shown in document. All trending videos for the period of August 2020- October 2021 were analyzed (More than 90,000 videos).

We are also comparing the similarities between trending videos of multiple countries (US vs Great Britain vs India) with respect to videos and distribution across different video categories.

We are also presenting a video recommendation method based on current watching trending video. One may think that having many options is a good thing, as opposed to having very few, but an excess of options can lead to what is known as a “decision paralysis”. Recommendation systems make life simpler. Therefore, recommender systems have become a crucial component in platforms, in which users have a myriad range of options available. Their success will heavily depend on their ability to narrow down the set of options available, making it easier for us to make a choice.

Technical Problem Being Solved

Above Exploratory data analysis helps people to understand which category of videos people are interested in country wise. It also helps people to estimate when video can become trending after publishing and how much advertisement needed to make it trend by showing correlation between various parameters of the video. It helps people to prepare TAGS with respect to number of words and usage of words to make video trending faster. Proposed solution also helps people to get recommendations based on current watching video, category and video TAGS.

Architecture Details

Tools Used

This analysis was performed using Python and a powerful group of Python libraries including Spark Data Frames, Spark ML, Pandas, Matplotlib, Seaborn, Word Cloud. The analysis was performed in a Google Colab Notebook.

Data Source

The data used in this analysis was retrieved from freely available data from Kaggle website. https://www.kaggle.com/rsrishav/youtube-trending-video-dataset/version/430.

Github: Link

Data Size and Description

In this Analysis, Majorly Trending Videos from US is considered. We have a total of ~92K video data with video trending multiple times in the duration from Aug 2020 – Nov 2021 (15 months). Amongst them we have ~16K unique videos. We are also using Trending Video of Great Britain and India for the same period to compare distribution across different video categories.

The following table shows an example of the data that we have for each video:

Video_id – Unique ID of Video.

Title – Name of Video.

PublishedAt – Date and time of uploading the video.

ChannelTitle – Name of Channel uploaded the video.

Trending date – Date when video became trending, there can be multiple days same video can trend. So, we can find the same video on different days with different number of views, comments, likes, etc.

Tags – Tags used for the video.

We have likes, dislikes, views, comments, category of the video along with few more parameters.

Qualitative and Quantitative Evaluation

EDA of trending video metrics:

Views/Comments/Likes for videos when they first became trending?

How long does it take a Video to become trending for the first time?

Videos from which category took longer time to trend?

What percentage of trending Videos have more Dislikes than Likes? (Negative publicity inference)

Which category of Video becomes most trending?

Users like videos from which CATEGORY the most?

Which channels published more trending videos?

How many videos appeared trending most of the days?

Video Recommendation based on Current Watching Video:

List of Various Categories in US Trending Video Data set:

|Film & Animation|Entertainment |Documentary | | :- | :- | :- | |Autos & Vehicles |News & Politics |Drama | |Music |Howto & Style |Family | |Pets & Animals |Education |Foreign | |Sports |Science & Technology |Horror | |Short Movies |Nonprofits & Activism|Sci-Fi/Fantasy | |Travel & Events |Movies |Thriller | |Gaming |Anime/Animation |Shorts | |Videoblogging |Action/Adventure |Shows | |People & Blogs |Classics |Trailers | |Comedy |Comedy || 1.

  1. Steps Followed to build recommender system:

  2. TAGS and CATEGORY are converted to Upper Case.
  3. Space is removed in TAGS.
  4. Category and First 3 TAGS for each video are selected.
  5. Unique monotonically increasing ID is assigned for each Video.
  6. Vector is generated for each video using TF-IDF (Term Frequency - Inverse Document Frequency) vectorizer.
  7. Similarity between each Video is calculated using dot product.

A recommendation system is built for 1000 videos as finding similarity between videos takes more time.

Output for Each Search is as below:

1.

  1. Correlation between various trending video metrics 1.
  2. What is the Correlation (Ratio) between Likes-Dislikes-Views-Comments in different categories?

  1. Does video trending in one country trend in other countries too?

` `

Summary:

  1. Comparison between the number of trending videos per category across 3 countries.

` `

1.

TAGS and Titles constitute an important part of each video. They describe the video for people before deciding to click on the video or not. And because of that, video TAGS and Titles are one of the crucial factors in video success; Here are some interesting facts about trending-videos TAGS and Titles.

What should be average number of TAGS? how many TAGS are there in most trending videos? What should be average number of words in video titles? how many words are there in the title of most trending videos?

Average Number of TAGS per video in data set is 16. Average Number of Words in title per video in data set is 8.

Distribution of Number of TAGS per video is as below.

||| | :- | :- | What are the 50 most common Words used in TAGS with Videos greater than 1M views?

Are there some words that occur in trending video TAGS more than others? To get the answer, we analyzed the TAGS of all trending videos and counted the occurrences of each word in those TAGS. Before that, STOPWORDS were removed from the words. Here is a word cloud of the most common 50 words in the trending TAGS. The size of the word reflects how common it is:

Below facts can be considered when uploading video to make the video trending.

Total time taken to run exploratory and recommendation model

Approximately 20minutes are taken to completely run the code including graphical analysis.

Challenges faced and Gaps from Proposal

Calculating Similarity between each video for entire data set of 16009 was consuming lot of time. So, we are going ahead with Video Search recommendations for 1000 videos.

2