Prigozhin audio files, transcribed
This is an early release of the dataset. Only limited quality checks have been conducted, so if you intend to use it, make sure it is fit for purpose.
A full release in a proper data repository with better documentation is forthcoming.
Prigozhin started to post audio messages on his official Telegram channel - the press service of his holding company - in late 2022. He abruptly stopped after his mutiny in late June 2023. This dataset includes an automatic transcription both in Russian and English created using the whisper models (more specifically,the large
model), through a dedicated r package. Find more details and context about the process in the dedicated post.
Read full post with more context: “From the ‘battle of Bakhmut’ to the ‘march of justice’: Prigozhin’s audio files, transcribed”
The same contents available here for download can more conveniently be consulted at the following pages:
Accuracy of the dataset
Contents presented here are the result of automatic transcription / translation. Transcription is mostly accurate but the spelling of names of persons or organisations is inconsistent. Automatic translation is in English is mostly usable, but inaccuracies are more frequent.
Summary statistics
Total number of posts including audio messages: 408
Earliest post with audio: 26 December 2022
Most recent post with audio: 26 June 2023
Downloads
- Russian version: prigozhin_audio_files_ru.csv
- English version: prigozhin_audio_files_en.csv
About the identifier
The id
column included in the dataset reflects the identifier of a given post on Telegram. The original post can be seen by adding the relevant id to the base address of the Telegram channel: https://t.me/concordgroup_official/
The id
column can hence also be used to match this dataset with export files generated by Telegram itself.
This dataset includes also a prigozhin_id
column. Many (but not all) of messages posted by Prigozhin’s press service start with a hash sign follwed by a numeric identifier, e.g. “#1234”. The prigozhin_id
column reports this identifier, when avaialable.