Quick tour on ytTREX technology

the reference manual for researchers and analysts

This page is a work in progress – content need to be completed and English need a full revision –.

Methodologies

The collection methodology is a crucial attribute to judge, use and analyze evidences of algorithm personalization.

Compare blindly and acritically how YouTube serves different related content it is good only for simple exercise, but this approach wouldn’t produce any meaningful observation to figure out anything on YouTube. On the table below, we summarize the different approaches and the qualities of data collected.

# Approach name Notes Experiment
1 True, Genuine, profiles
  1. Actual person using their browser
  2. The asset they share, is the uniqueness of their observation
  3. There are personal data processed, they should be accessible if data subject do not opt-in
  4. The access would be in different time of the day
  5. Even using data portability tools would not allow the reseaecher to get a clear picture on how the person been profiled by platform
DSSG-berlin Datathon,
First YouTube class,
European Election Campaign.
2 Fresh profiles under researcher control
  1. Control of most of the variables
  2. Research can try to isolate a variable and infer insights on personalization algorithm.
  3. of reality.
  4. Social Media Platforms tend to hunt and limit actions from automatized accounts.
Argentinian 2017 Analysis,
Italian 2018,
Brasialian 2019, ...
3 True profiles acting under direction
  1. Potential hype, temporary enlarged audience outside the Algoritmic accountability peoples.
  2. It is event leading to: outreach, literacy, research and open data.
  3. If researcher wants additional demographic data, they should collect and verify it independently.
  4. Plan a test might be even harder.
potest#1, wetest#1,
Workshop.

Silicon Valley exploitative business model tought to people “more data, the better”. This page intent is also in stating why this blind and acritical data collection do not make any sense for us, for the researches, and for our message.


A researcher willing in understanding personalization and algorithmic discrimination, should give more importance to the methodology and data collection, be simple and explainable.


Experiments and Experiences on algorithmic testing

— The first workshop — 3 working days — 10 students & 2 facilitators

10 different students using their computers, open the same video at the same time. Here we compare the 20 related videos suggested

— The blue circles represent the related content, they are all in the center because they are shared among profiles.

— The small groups of individually-selected-videos are represented with Green circles, and profile 4 and 3, have such dedicated content because of their configured computer language (Korean and French, while the test was performed from Amsterdam in English).

— Below, the same test did by the same students, same room and IP address, same computers, but with their browser logged in Google/YouTube:

It is visually clear how the data points linked to the profiles cause personalized suggestions.

Regardless of Youtube API

We verified which videos YouTube declares as related by using the official API, and then compared those with what is actually displayed on people interface.

Red circles represent the videos declared by YT as related. In yellow and green, the videos actually suggested to watchers.

We knew API they can’t be considered a reliable method, I want to list why this misunderstanding is

Achievements: have a rudimental method to compare among profile with reduced personalization vs profile highly personalizeed.

— The second workshop

Work in progress TODO trexit

— Collaborative, Worldwide, time restricted, guided observation (potest and wetest)


The collaborative observation is a new experiment in this regards. allow us to guide people in a few steps, we did it on PornHub, the name was poTEST, the Youtube collaborative test is a March 2020 experiment.

— Variable comparison

TODO explain from first test the CNN/FOX test.

A network visualization and analysis tool

— Automatized access approach

linkare methodology e mettere qui alcuni dei risultati (se unici e se possibili solo con accesso automatizzato)

Tools for analysts

TODO, should be done a list of tools we used, which limits and perks:

Tools and resources for wannabe algorithm analyst

API integration

Data scientists or third party integrator should use this

Our API allow to an individual to retrieved their data, and if a researcher has a list of personal token from partecipans, might collect all the data of their partecipants.

API are handy also if you want to fetch, process, or store daily the updates.
API docs
the CSV format

Data analysts, researcher should read it

CSV format is readable by spreadsheets like excel, google docs, and most of the data analysis tools.

Each line of these CSV are one of the suggested videos, the documentation explain the different between personal API (they require a personal access token to access)
CSV docs
Bokeh analyst dashboard

XXXXXXXXXXXXXXXXXX PLEASE HELP

AAAAA AAAAAAAAAAAAAA AAAAAAAAAAAAA A AAAAAAAA A IDEM

OOOOO OOOOOOOOOOOOO OOOOO OOOO OOOOOOOOOOOOOO SAME SAME SAME
repo on git