Text as data & data in the text

Studying conflicts in post-Soviet spaces through structured analysis of textual contents available on-line

A project led by Giorgio Comai, researcher and data analyst at OBCT/CCI, carried out with the support of the Italian MFA (see below for details and disclaimers).


About this project

Posts and updates

No matching items

Review of literature

Datasets

Title Description Categories
Russian state institutions 2024 This is a collection of full-text datasets based on contents extracted from the websites of Russian institutions.  
transcript.duma.gov.ru_ru_2024 Corpus based on the Russia’s Duma website (in Russian, 2006-2023) dataset, Russian institutions, Russian parliament, Russian language
archive.premier.gov.ru_ru_2024 Corpus based on the archived version of the website of Russia’s prime minister (in Russian, 2008-2012) dataset, Russian institutions, Russian government, Russian language
duma.gov.ru_ru_2024 Corpus based on the Russia’s Duma website (in Russian, 2006-2023) dataset, Russian institutions, Russian government, Russian language
government.ru_ru_2024 Corpus based on the Russia’s government website (in Russian, 2013-2023) dataset, Russian institutions, Russian government, Russian language
archive.government.ru_ru_2024 Corpus based on the archived version of Russia’s government website (in Russian, 2008-2013) dataset, Russian institutions, Russian government, Russian language
mid.ru_en_2024 Corpus based on the website of Russia's MFA (in English, 2003-2023) corpus, full corpus, Russian institutions, Russia's MFA, English language
mid.ru_ru_2024 Corpus based on the website of Russia's MFA (in Russian, 2003-2023) corpus, full corpus, Russian institutions, Russia's MFA, Russian language
kremlin.ru_en_2024 Corpus based on Russia's president website (in English, 1999-2023) corpus, full corpus, Russian institutions, Russia's president, English language
zavtra.ru_ru_2024 Corpus based on the website of Russian weekly newspaper ‘Zavtra’ (in Russian, 1996-2023) corpus, full corpus, Russian media, Russian language
kremlin.ru_ru_2024 Corpus based on Russia’s president website (in Russian, 1999-2023) dataset, Russian institutions, Russian language
rg.ru_ru All items published on Rossiiskaya Gazeta dataset, Russian media, Russian language
novostipmr.com_ru All items published on the website of Transnistria’s news agency Novosti PMR dataset, Russian language, Transnistria
patriarhia.ru_ru All items published on the official website of the Moscow Patriarchate dataset, Russian language
Prigozhin audio files, transcribed An automatic transcription of all the audio messages posted on Prigozhin’s official Telegram channel dataset, automatic transcription, Telegram, Russian language, Russian media
mid.ru_ru All Russian-languge news items published on the website of the Russian Ministry of Foreign Affairs dataset, Russian institutions, Russian language
mid.ru_en All English-languge news items published on the website of the Russian Ministry of Foreign Affairs dataset, Russian institutions, English language
duma.gov.ru_ru All news items published on the website of the Russian Duma dataset, Russian institutions, Russian language
tsargrad.tv_ru All textual items published on the website of the Russian TV broadcaster ‘Tsargrad’ dataset, Russian media, Russian language
kp.ru_ru All items published in the politics section of Komsomolskaya Pravda dataset, Russian media, Russian language
ng.ru_ru All items published on Nezavisimaya Gazeta dataset, Russian media, Russian language
zavtra.ru_ru All items published on the website of the Russian weekly magazine ‘Zavtra’ dataset, Russian media, Russian language
1tv.ru_ru All items published on the Pervy Kanal (1tv.ru) dataset, Russian media, Russian language
kremlin.ru_en All items published on the English language version of the Kremlin’s website dataset, Russian institutions, English language
kremlin.ru_ru All items published on the Russian language version of the Kremlin’s website dataset, Russian institutions, Russian language
No matching items

Tutorials

The tutorials are mostly based on castarter - Content Analysis Starter Toolkit for the R programming language, and will target users with beginner or beginner-intermediate coding skills. As the package gains new features, the tutorials will become more accessible; eventually, some of them will be accessible to users with no coding experience at all.

A draft version of the documentation for the package castarter is already available online. Both documentation and functionalities of the package will mature in the coming months.


Funding and disclaimers

This project is hosted by OBCT/CCI. It is carried out with the support of the Italian Ministry of Foreign Affairs and International Cooperation under art. 23 bis, D.P.R. 18/1967. All opinions expressed within the scope of this project represent the opinion of their author and not those of the Ministry.

Le posizioni contenute nel presente report sono espressione esclusivamente degli autori e non rappresentano necessariamente le posizioni del Ministero degli Affari Esteri e della Cooperazione Internazionale”