Prigozhin audio files, transcribed

dataset
automatic transcription
Telegram
Russian language
Russian media
An automatic transcription of all the audio messages posted on Prigozhin’s official Telegram channel
Author

Giorgio Comai

Published

August 22, 2023

Work-in-progress

This is an early release of the dataset. Only limited quality checks have been conducted, so if you intend to use it, make sure it is fit for purpose.

A full release in a proper data repository with better documentation is forthcoming.

Prigozhin started to post audio messages on his official Telegram channel - the press service of his holding company - in late 2022. He abruptly stopped after his mutiny in late June 2023. This dataset includes an automatic transcription both in Russian and English created using the whisper models (more specifically,the large model), through a dedicated r package. Find more details and context about the process in the dedicated post.

Read full post with more context: “From the ‘battle of Bakhmut’ to the ‘march of justice’: Prigozhin’s audio files, transcribed

The same contents available here for download can more conveniently be consulted at the following pages:

Accuracy of the dataset

Contents presented here are the result of automatic transcription / translation. Transcription is mostly accurate but the spelling of names of persons or organisations is inconsistent. Automatic translation is in English is mostly usable, but inaccuracies are more frequent.

Summary statistics

Total number of posts including audio messages: 408
Earliest post with audio: 26 December 2022
Most recent post with audio: 26 June 2023

Downloads

About the identifier

The id column included in the dataset reflects the identifier of a given post on Telegram. The original post can be seen by adding the relevant id to the base address of the Telegram channel: https://t.me/concordgroup_official/

The id column can hence also be used to match this dataset with export files generated by Telegram itself.

This dataset includes also a prigozhin_id column. Many (but not all) of messages posted by Prigozhin’s press service start with a hash sign follwed by a numeric identifier, e.g. “#1234”. The prigozhin_id column reports this identifier, when avaialable.