Extracting YouTube Channel Statistics in Python Using YouTube Data API

9 Months Agousmanmalik57 4 99 Views

Are you interested in finding out what a YouTube channel mostly discusses? Do you want to analyze YouTube videos of a specific channel? If yes, we are in the same boat.

YouTube video titles are a great way to determine the channel's primary focus. Plotting a word cloud or a bar plot of the most frequently occurring words in YouTube video titles can give you precise insight into the nature of a YouTube channel. I will do exactly this in this tutorial using the Python programming language.

So, let's begin without ado.

Getting YouTube Data API Key

You can get information about a YouTube channel in Python programming via the YouTube Data API. However, to access the API, you must create a new project in Google Cloud Platform. You can do so for free.

Once you create a new project, click the Go to APIs overview link, as shown in the screenshot below.

Next, click the ENABLE APIS AND SERVICES link.

Search for youtube data api v3.

Click the ENABLE button.

You will need to create credentials. To do so, click the CREDENTIALS link.

If you have any existing credentials, you will see them. To create new credentials, click the + CREATE CREDENTIALS link and select API key.

Your API key will be generated. Copy and save it in a secure place.

Now, you can access the YouTube Data API in Python.

Installing and Importing Required Libraries

To begin our analysis, we must set up our Python environment by installing and importing the necessary libraries. The main libraries we will use are google-api-python-client for accessing the YouTube Data API, wordcloud for generating word clouds, and nltk (Natural Language Toolkit) for text processing.

 !pip install google-api-python-client !pip install wordcloud !pip install nltk

Next, import the required libraries. These include googleapiclient.discovery for accessing the YouTube API, re for regular expressions to clean text, Counter from the collections module to count word frequencies, matplotlib.pyplot for plotting, WordCloud from the wordcloud library, and stopwords from nltk.corpus for removing common English words that do not contribute much to the analysis.

 import googleapiclient.discovery import re from collections import Counter import matplotlib.pyplot as plt from wordcloud import WordCloud import nltk from nltk.corpus import stopwords

Extracting YouTube Channel Statistics

This section will extract video titles from a specific YouTube channel using the YouTube Data API. To do this, we need to set up our API client with the appropriate API key and configure the request to fetch video data.

First, specify the API details and initialize the YouTube API client using the API key you generated earlier. Replace "YOUR_API_KEY" with your actual API key.

 # API information api_service_name = "youtube" api_version = "v3" # API key DEVELOPER_KEY = "YOUR_API_KEY" # API client youtube = googleapiclient.discovery.build( api_service_name, api_version, developerKey=DEVELOPER_KEY)

Next, define the channel ID of the YouTube channel you want to analyze. Here, we are using the channel ID UCYNS-_653RIE9x_BINefAMA as an example. You can use any other channel if you want.

We will then create a request to retrieve video titles for the specified channel. The request fetches up to 50 video titles per call. We use pagination to fetch additional results if available to handle more videos.

 # Channel ID of the channel you want to search channel_id = "UCYNS-_653RIE9x_BINefAMA" # Request to retrieve all video titles for the specified channel request = youtube.search().list( part="snippet", channelId=channel_id, maxResults=50, type="video" ) # Initialize an empty list to store the video titles video_titles = [] # Execute the request and retrieve the results while request is not None: response = request.execute() for item in response["items"]: video_titles.append(item["snippet"]["title"]) request = youtube.search().list_next(request, response)

Finally, print the total number of extracted video titles and display the first 10 titles to ensure our extraction process is working correctly.

 # Print the video titles print("Total extracted videos:", len(video_titles)) print("First 10 videos") print("===============") for title in video_titles[:10]: print(title)

Output:

Plotting a Bar Plot with Most Frequently Occurring Words

This section will process the extracted video titles to identify the most frequently occurring words. We will then visualize these words using a bar plot.

First, download the NLTK stop words and define an additional set of stop words to exclude from our analysis. Stop words are common words like "the", "is", and "in" that does not provide significant meaning in our context.

nltk.download('stopwords') stop_words = set(stopwords.words('english')) | {'sth', 'syed', 'talat', 'hussain'}

Next, filter the video titles to include only those containing Latin characters. This step ensures we focus on titles written in English or similar languages.

# Filter videos with Latin text only latin_titles = [title for title in video_titles if re.search(r'[a-zA-Z]', title)]

We then clean the titles by removing special characters and tokenizing them into individual words. Stop words are excluded from the analysis.

# Remove special characters and tokenize words = [] for title in latin_titles: cleaned_title = re.sub(r'[^\w\s]', '', title) # Remove special characters for word in re.findall(r'\b\w+\b', cleaned_title): if word.lower() not in stop_words: words.append(word.lower())

Subsequently, we count the frequency of each word and identify the ten most common words. To avoid redundancy, exclude additional common words specific to the dataset, such as "sth", "syed", "talat", and "hussain".

 # Count the frequency of words word_counts = Counter(words) # Get the 10 most common words most_common_words = [word for word, count in word_counts.most_common(10) if word not in {'sth', 'syed', 'talat', 'hussain'}]

Finally, create a bar plot to visualize the frequency of the most common words.

 # Create bar plot plt.figure(figsize=(10, 5)) plt.bar(range(len(most_common_words)), [word_counts[word] for word in most_common_words]) plt.title('10 Most Common Words in Video Titles') plt.xlabel('Words') plt.ylabel('Frequency') plt.xticks(range(len(most_common_words)), most_common_words, rotation=45) plt.show()

Output:

The output will display a bar plot showing the 10 most common words in the video titles, providing a clear insight into the primary topics discussed on the channel.

Word Cloud of Video Titles

We will create a word cloud to visualize the word frequency data further. A word cloud presents the most common words in a visually appealing way, and the size of each word represents its frequency.

First, generate the word cloud using the WordCloud class. The word cloud is generated from the list of words we compiled earlier. Next, we display the word cloud using Matplotlib.

 # Generate word cloud wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.join(words)) # Display the word cloud plt.figure(figsize=(12, 8)) plt.imshow(wordcloud, interpolation="bilinear") plt.axis("off") plt.title("Word Cloud of Video Titles") plt.show()

Output:

The resulting word cloud provides a visual representation of the most frequently occurring words in the video titles, allowing for quick and intuitive insights into the channel's content.

Conclusion

Analyzing YouTube video titles allows us to gain valuable insights into a channel's primary topics. Using Youtube Data API and libraries such as google-api-python-client, nltk, matplotlib, and wordcloud enables us to extract video data, process text, and visualize the most common words. This approach reveals the core themes of a YouTube channel, offering a clear understanding of its focus and audience interests. Whether for content creators or viewers, these techniques effectively uncover the essence of YouTube video discussions.

data-science python

Salem commented: Interesting post+16

Be the first to reply

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.