A workshop on Youtube algorithm analysis


* * *

available at https://youtube.tracking.exposed/slides/workshop

(tracking.exposed) (publication on collaborative Youtube observation )


We analyze platforms

* * *

NOT PEOPLE

TRex browser extension collects data in .json and .csv, in order to decipher the functioning of proprietary algorithms in the public interest. (git, AGPL-3)

ANONYMIZATION PROCESS

  • 01. Unique and secret token

    Every participant has a unique code attributed to download his/her evidences

  • 02. Your data, Your choice

    With the token, participants can manage the data provided: visualize, download or delete

  • 03. Not our customer

    We are not obsessed by you ;) We don't collect any data about your location, friends or similar

  • 04. WEstudy YOUtube

    We collect evidence about the algorithm's suggestions, like recommended videos

THE DATA FORMAT

* * *

JSON|CSV simplified structure

Each entry represent a recommended video from Youtube.
A few are topic-related, a few personalized, and other a mix of the two.
      {
          "savingTime": "2021-08-31T17:27:06.213Z",
          "watcher": "muffin-rhubarb-cheese",
          "blang": "en-US",
          "recommendedVideoId": "hFISmpbEg1g",
          "recommendedPubtime": "2018-08-31T17:50:59.000Z",
          "recommendedForYou": "YES",
          "recommendedTitle": "What A Difference A Day Made",
          "recommendedAuthor": "Jamie Cullum",
          "recommendedVerified": true,
          "recommendedViews": 3737849,
          "watchedId": "q-lPwo1GUKw",
          "watchedAuthor": "Jamie Cullum",
          "watchedTitle": "But For Now",
          "watchedViews": 3572985,
          "watchedId": "q-lPwo1GUKw",
      },
          

What we analyze

You can retrive data on three different algorithms:

  • Homepages

    What are the trending videos for you? They are not the same for everybody...

  • Search results

    The same query on "corona virus" can bring you to completely different framing depending on your past behavior. More details in the CHIARO page.

  • Recommended videos

    Some researches show that the algorithm suggests most of the videos watched on the platform (78%). Are we sure everything is fine here?

Today collective observation:

* * *

pad.cisti.org/p/algo.workshop

Today collective observation:

* * *

Possible type of analysis:

  • 1. Statistical analysis with Python/d3js

  • 2. Network analysis with Gephi

  • 3. Manual qualitative analysis

For new tests, ideas or participate in the FLOSS, mail: info(@)tracking.exposed