Blog

Product case: Should Youtube let users label training data as a micropayment instead of watching an ad mid-video?

Assumptions
  1. I assume that an ad mid-video is a mid-roll video ad that is played in the middle of content. Mid-roll ads are up to 6-20 seconds non-skippable video ads or skippable after 5 seconds video ads. 
  2. I assume that labeling training data as a micropayment means that Youtube allows third parties to create labeling tasks and pay Youtube for completion of these tasks by Youtube visitors. 

Answer
I do not recommend using data labeling instead of advertising on Youtube because of the following considerations:
  1. Data labeling on Youtube cannot guarantee high quality and accuracy of annotated data. 
  2. Labeling training data will not generate revenue comparable to ads revenue on Youtube.
  3. Labeling data will change Youtube visitors’ experience and can lead to churn and revenue loss.
  4. Data labeling jobs on Youtube will be limited to small and simple tasks that are possible to accomplish in a few seconds without prior instructions. This fact limits potential demand from AI companies.
I justify my answer in detail below. 

Youtube business goals
Mid-Roll ads have the highest completion rate and make the most money for Youtube and video creators. The audience is engaged in the mid-video and the completion rate for mid-roll ads is high. 2 billion people visit Youtube to watch 5 billion videos monthly. Youtube ads generate 5 billion dollars in revenue per quarter.
If Youtube changes the monetization model, a new monetization model (displaying labeling tasks) should generate revenue comparable to the ads monetization model. 

Youtube customers goals
The main goal of a person who visits Youtube is to watch videos. If a visitor cannot get access to a video easily the visitor will likely abandon Youtube in favor of other platforms. Tracking the Watch Time metric will help YouTube measure how data annotation tasks impact visitors’ satisfaction with Youtube experience.

Visitors watch ads in exchange for free access to content. It is crucial that ads are targeted, short and skippable.Therefore Youtube shows either short video ads up to 15-20 seconds or long video ads skippable after 6 seconds. If Youtube wants to use labeling tasks, their expected completion times have to be similar to Youtube ads times. Only simple data annotation tasks that do not require prior instructions can fit this format. It limits the number of companies that can be interested in Youtube offerings. 

Goals of companies that want to label data using Youtube 
Youtube can be an attractive platform for AI companies:
  1. Youtube has large audience (2 billion people monthly active users);
  2. Youtube has a great targeting mechanism that helps match tasks to specific audiences.

Companies that build computer vision models or natural language processing (NLP) models require high accuracy in data labeling, consistent formats of labeled data and opportunity to repeatedly contact task implementers.
Professional data labeling services like Amazon Mechanical Turk satisfy such requirements.

Youtube can not match aforementioned requirements from AI companies:
  1. Youtube can not guarantee high quality of labeled data. AI companies require correctly labeled data in the consistent format. Professional data labeling services educate people to label data in a consistent way and therefore can guarantee quality. The ad replacement on Youtube should be short and not annoying. Therefore Youtube can not add detailed instructions and examples on how to label training data. It leads to a loss of quality of labeled data.
  2. Youtube has to make tasks skippable otherwise people will leave the platform. However if tasks are skippable viewers will skip them unless the tasks completion is substantially faster and easier than skipping. This will limit the potential supply of tasks from AI companies. When people do not have a solid motivation to label data correctly, they tend to label data randomly. It leads to a loss of quality of labeled data. 
  3. A small monetary reward could motivate people to label data correctly. But rewarding people for labeling data on Youtube will lead to watching videos in order to receive tasks and rewards. Then video views will not reflect the actual preferences of Youtube visitors. Creators will not understand what content is perceived positively by visitors. Advertisers will not trust Youtube statistics and will not post advertising on Youtube. Therefore we cannot use rewards for labeling tasks on Youtube. Without rewards Youtube cannot guarantee the quality of labeled data.
  4. AI companies that need annotated data can have additional requirements such as private usage of data that Youtube can not guarantee.
  5. Youtube can not make sure that people who receive tasks have specific required skills (speak different  languages at desired level of proficiency, etc.). 

Revenue for Youtube
Even if Youtube decides to display short tasks instead of ads, revenue that Youtube can generate from these tasks is lower than revenue from ads.
The average earnings through Amazon MTurk platform is $2 per hour. It means that it’s 0.05 - 0.1 cents per task. Youtube average cost per view for ads is $0.05 – $0.30. In addition, revenue from data labeling tasks will be lower than revenue from advertising because:
  1. People are interested in watching video therefore they will tend to skip tasks. If video advertising is relevant to people they can watch it.
  2. Demand from the advertising side is bigger than demand from AI companies that need to label data. 

Summary
I do not recommend Youtube to replace mid-roll with ads with data labeling tasks. There are many services that offer professional data labeling. These services satisfy the requirements of AI companies that need data labeling.


Product