Text as data & data in the text
Studying conflicts in post-Soviet spaces through structured analysis of textual contents available on-line
A project led by Giorgio Comai, researcher and data analyst at OBCT/CCI, carried out with the support of the Italian MFA (see below for details and disclaimers).
About this project
Posts and updates
Review of literature
Title | Description | Categories |
Russian state institutions 2024 | This is a collection of full-text datasets based on contents extracted from the websites of Russian institutions. | |
transcript.duma.gov.ru_ru_2024 | Corpus based on the Russia’s Duma website (in Russian, 2006-2023) | dataset, Russian institutions, Russian parliament, Russian language |
archive.premier.gov.ru_ru_2024 | Corpus based on the archived version of the website of Russia’s prime minister (in Russian, 2008-2012) | dataset, Russian institutions, Russian government, Russian language |
duma.gov.ru_ru_2024 | Corpus based on the Russia’s Duma website (in Russian, 2006-2023) | dataset, Russian institutions, Russian government, Russian language |
government.ru_ru_2024 | Corpus based on the Russia’s government website (in Russian, 2013-2023) | dataset, Russian institutions, Russian government, Russian language |
archive.government.ru_ru_2024 | Corpus based on the archived version of Russia’s government website (in Russian, 2008-2013) | dataset, Russian institutions, Russian government, Russian language |
mid.ru_en_2024 | Corpus based on the website of Russia’s MFA (in English, 2003-2023) | corpus, full corpus, Russian institutions, Russia’s MFA, English language |
mid.ru_ru_2024 | Corpus based on the website of Russia’s MFA (in Russian, 2003-2023) | corpus, full corpus, Russian institutions, Russia’s MFA, Russian language |
kremlin.ru_en_2024 | Corpus based on Russia’s president website (in English, 1999-2023) | corpus, full corpus, Russian institutions, Russia’s president, English language |
zavtra.ru_ru_2024 | Corpus based on the website of Russian weekly newspaper ‘Zavtra’ (in Russian, 1996-2023) | corpus, full corpus, Russian media, Russian language |
kremlin.ru_ru_2024 | Corpus based on Russia’s president website (in Russian, 1999-2023) | dataset, Russian institutions, Russian language |
rg.ru_ru | All items published on Rossiiskaya Gazeta | dataset, Russian media, Russian language |
novostipmr.com_ru | All items published on the website of Transnistria’s news agency Novosti PMR | dataset, Russian language, Transnistria |
patriarhia.ru_ru | All items published on the official website of the Moscow Patriarchate | dataset, Russian language |
Prigozhin audio files, transcribed | An automatic transcription of all the audio messages posted on Prigozhin’s official Telegram channel | dataset, automatic transcription, Telegram, Russian language, Russian media |
mid.ru_en | All English-languge news items published on the website of the Russian Ministry of Foreign Affairs | dataset, Russian institutions, English language |
mid.ru_ru | All Russian-languge news items published on the website of the Russian Ministry of Foreign Affairs | dataset, Russian institutions, Russian language |
duma.gov.ru_ru | All news items published on the website of the Russian Duma | dataset, Russian institutions, Russian language |
tsargrad.tv_ru | All textual items published on the website of the Russian TV broadcaster ‘Tsargrad’ | dataset, Russian media, Russian language |
kp.ru_ru | All items published in the politics section of Komsomolskaya Pravda | dataset, Russian media, Russian language |
ng.ru_ru | All items published on Nezavisimaya Gazeta | dataset, Russian media, Russian language |
zavtra.ru_ru | All items published on the website of the Russian weekly magazine ‘Zavtra’ | dataset, Russian media, Russian language |
1tv.ru_ru | All items published on the Pervy Kanal (1tv.ru) | dataset, Russian media, Russian language |
kremlin.ru_en | All items published on the English language version of the Kremlin’s website | dataset, Russian institutions, English language |
kremlin.ru_ru | All items published on the Russian language version of the Kremlin’s website | dataset, Russian institutions, Russian language |
The tutorials are mostly based on
- Content Analysis Starter Toolkit for the R programming language, and will target users with beginner or beginner-intermediate coding skills. As the package gains new features, the tutorials will become more accessible; eventually, some of them will be accessible to users with no coding experience at all.
A draft version of the documentation for the package castarter
is already available online. Both documentation and functionalities of the package will mature in the coming months.
Funding and disclaimers
This project is hosted by OBCT/CCI. It is carried out with the support of the Italian Ministry of Foreign Affairs and International Cooperation under art. 23 bis, D.P.R. 18/1967. All opinions expressed within the scope of this project represent the opinion of their author and not those of the Ministry.
“Le posizioni contenute nel presente report sono espressione esclusivamente degli autori e non rappresentano necessariamente le posizioni del Ministero degli Affari Esteri e della Cooperazione Internazionale”