It creates cryptographic key pairs. It is the method to ensure you can access your data, and only you can mark the data received as “yours.” It is necessary because we don’t have an email address, Google profile, or YouTube username. Your official identification method is not linked to our data at all. This work for every human or bot opening youtube.com website. Every browser extension installed has a different and unique cryptographic key.
It copies the HTML of every youtube.com video page once YouTube has completed to send the suggested videos. The HTML is sent to the tracking.exposed server, hosted in Germany and administrated by the technical staff of our team. The system administrators are the three technologies of Tracking Exposed, get in touch at youtube-team at tracking dot exposed.
Beside specially rare maintenance operations, the code processing the data collected is here available for review.
Technical detail: the extension cryptographically signs the HTML you send, with your public key. We differentiate supporters through the public key they are using, and you can create a new key, download, or import a key when you want. Each time a new supporter show up you’ll see it in the first graph
In our manifest.json the browser extension specify which kind of priviles the extension need, here you can find summarized what and why.
“permissions”: [ “storage”, “alarms”, “https://*.youtube.com/", “https://youtube.tracking.exposed/” ]
There is one entry for each submission received by the supporters
|id||e1895eed23ffcb8a0b5d1221c28a712b379886fe||the unique identified of the evidence: every observation has a different ID|
|title||Salmo - 90MIN (Official Live Performance) | Vevo X||the title of the video: in certain condition the same video might display a different title, for example, it can looks translated|
|videoId||U7OarstN2GU||This is the youtube VideoId, if you compose the url https://youtube.com/watch?v=$videoId, it will display the video. This videoId is unique in Youtube platform.|
|authorName||Confused Bi-Product of a Misinformed Culture||The name of the author as YouTube display it|
|authorChannel||/channel/UCSsz5GO1rQjzp1RND7QtEjg||The unique ID of the content producer: by looking at htts://youtube.com/$authorChannel you'll see the producer page|
|savingTime||2019-08-01 22:46:41.355Z||The date when the evidence was collected|
|watcher||caramel-macaroon-succotash||This is a pseudonym assigned to every broswer submitting data. This is linked to the authentication material, and therefore this is a personal data.|
|Related||[ list ]||A list of related videos. The size of this list is the number in the field RelatedN. See below the details|
|Metadata||[ list ]||A collection of additional metadata about what YouTube sent you during the video. Advertising banner, Advertising video. This set of information grows as long as we support new one, and they are limited to what youtube is sending to the supporter.|
Each related video has a different data structure. This list of related is part of the evidence. We use related and suggested as synonyms. We are talking about the videos display on the right column of YouTube interface.
|Index||1||Order of the related video. This counter start from 1, and we can't guarantee all the videos have the same amount of related videos. (also called suggestionOrder)|
|Title||Platero y Tú - Vamos tirando (1993) (Álbum completo)||The name of the suggested video|
|Verified||false||true or false, is the presence of the verified marker next ✔ to the channel name|
|foryou||true||This field take value true or false if the video related was explicitly 'recommended for you'|
|source||Kirstin Leticia||The display name of the channel owning the recommended video|
|displayTime||03:14||Video length, as declared in the preview|
What is linkable to an individual activity is personal data. In our regards, two metadata should deserve special attention and protection: the watcher pseudonym, and the sequence of videos seen by the same watcher.
The primary goal is to enable algorithm analysis. The influence of the personalization algorithm emerges by comparing individualized experiences with other people’s experiences. This can be done with three broad approaches:
A person collects how YouTube personalizes his or her experience. If this person decides to share their evidences (or a portion of it) with someone else in an exclusive way, this second person can accept or decline to share back their evidences. This allows two people to compare their personalized experience.
The privacy model is an explicit, and revokable, opt-in from the two parties, allowing a granular selection of the shared content.
A researcher coordinates tests among people or by using puppet profiles. The experiments might differ broadly (it mainly depends on the research question and methodology). In this case, the researcher should ask the data subject to share the data with them, in a private agreement expressed with informed consent. This is the approach used in the research team which publishes our first report: exposing YouTube, an ALEX project in DMI Summer School at UvA.
The privacy model is an explicit opt-in for people joining the research group. In case dummy profiles play a role, the researcher likely owns these profiles and has full control of the data collected with them.
Tracking Exposed might run an analysis on the dataset as long as the logic to look into the database is:
Our privacy model wants to anonymize, aggregate data in a way which you can’t recognize any contributor. Again, we should investigate phenomena without exposing any individual.
This method has a problem: the three declaration above aren’t enough at producing 100% safe and useful data for the public. Said approach it is a general indication, our procedure to accept a process like:
At the moment, this is not yet happening. We are only experimenting with producing aggregated queries with privacy-preserving capabilities. When external researchers had access to a selected portion of the dataset, they signed an agreement with us which requires them to:
In case a research abides the following methodology:
A researchers might decide to publish their collected data as a method to let others replicate and validate the research. Few cases like these are registered so far, such as in the context of Facebook algorithm analysis, as documented in the invisible curation of content, or the report Italian political election and digital propaganda, has data released in a repository.
The collaborative test like poTEST#1, or weTEST#1 fail to comply with point n.1 above, we release the data because the pseudonym released as part of the test is different from the one associated to the profile. It can’t be correlated.
The purpose of this dataset is the research on personalization algorithm. The dataset has not personal data, despite the fact that the personalization of YouTube depends on personal data (thus, legally acknowledged as data subject ). de-anonymization attacks such as relinking by searching for known patterns is not considered feasable because:
Last but not least: We express worry on the centralization of power in the hand of Google Chrome, upon Brave and Edge runs. We already suffered a few takedown of our extension(s).