Simple parser for youtube in google tables

Prehistory

There was a task to collect data on 2000+ youtube videos, from which it was necessary to extract the name of the video, channel and language.

The IMPORTXML function, with which you can perform parsing from sites, did not help, because, as it turned out later, youtube stores data in JSON files. I didn’t want to sort through such a number of videos and I began to look for ways to automate or at least simplify this task.

I found an old article from 15 on Habré , which describes how to get the number of video views, but since its publication youtube updated the API and I needed to get some other data.

In the process of solving the problem with the help of youtube API and Google tables, it turned out not only to extract data from a previously prepared list of video URLs, but also to parse the youtube output by keyword.
')
Being in the wildest delight of the result, I decided to share my experience and describe the process in a little more detail.

The method itself is certainly not perfect. It can be described as a very quick and simple solution. But until I started learning Python (I hope this day will come sooner or later) I have to do everything the old-fashioned way in the tables.

Training

For the parser to work, as in the original article, we need the ImportJSON script and youtube API.

Importjson

Download the script from github and install it in the table through the tools >> script editor.

Youtube API

We get our API - the Youtube Data API V3 key in the library and create credentials for it. Previously, of course, we make an account if you are not already registered in the google cloud.

Information about the video is stored in a JSON file, which is available at the link:

www.googleapis.com/youtube/v3/videos?id= {Video_ID} & key = {API_Key} & part = {Part_Parametr}

Where
Video_ID - video id
API_Key - the API key that you get in the console
Part_Parametr - the part of the file in which information is stored
Parts of the file can be of several types:
snippet - contains almost all the basic information about the video:
1. publication date
2. title of video
3. channel name
4. channel identifier
5. description under video (Description)
6. video category (number)
7. tags
contentDetails - contains the duration and resolution of the video.
topicDetails - contains a category of video in the form of a link to Wikipedia with the name of the category, but this section is not always full.
statistics
1. views
2. number of comments
3. likes
4. dislikes
In general, these are the main parts that you may need, but a complete list, if anything, is in the help .

Retrieving URL list data

Consider the work of the parser for example review badcomedian . In order to get information about the video, we need only its identifier ( Video_ID ), in this case it is EOWa0fmSGs8 .

We get and structure information using the ImportJSON formula and XPath query language.

This is how the JSON file on the server looks like:

The formula for getting the number of views in the tables:

= ImportJSON ('' https://www.googleapis.com/youtube/v3/videos?id=EOWa0fmSGs8&key= {API_Key} & part = statistics'; "/ items / statistics / viewCount;" '' noHeaders' ')

In principle, XPath can not be used in the request, but then you get the entire contents of the JSON file, without segmentation by parameters. Having created the required number of formulas with XPath parameters, you will receive only the necessary data, which will be easier to process.

Getting a list of videos by key request

The principle of issuing parsing is the same as in the example above, but the query structure changes slightly.

In this case, the JSON file with the results of the video output is located by reference.

www.googleapis.com/youtube/v3/search?part=snippet&q= {Your_Query} & type = video & key = {API_Key}

where the type parameter indicates what exactly we get at the output:

type = video - list of videos relevant to the request
type = channel - channel list
type = playlist - list of playlists
You can also specify in the request:

1. publication date or date range
2. region and search language
3. video length
4. number of results (default 5, maximum 50), etc.

A complete list of parameters and query designer is available here .

To get the list of ID on request badcomedian use the formula:

= ImportJSON (" www.googleapis.com/youtube/v3/search?part=snippet&q=badcomedian&type=video&key= {API_Key}"; "/ items / id / videoId"; '' noHeaders '')

ImportJSON works well with the CLUTCH function, which allows us to perform mass analysis or video parsing by changing various parameters.

An example of how a table is organized:

Minus method

A serious disadvantage of this method of parsing information is the rapid exhaustion of the daily limit of API requests. I did not find a way to limit the number of requests or create a sequential call to the API. That is, an attempt to parse info on 10+ keys at the same time is guaranteed to lead to an error - the end of the daily limit of requests. Therefore, it is necessary to break the keywords into small groups and save the results as values.

Source: https://habr.com/ru/post/443506/

All Articles