mid.ru_en_2024
Explore in an interactive web interface
Links for download: compressed csv / ods
Scope of this corpus
This corpus includes all news items published on the English language version of the website of Russia’s MFA.
Summary statistics
Dataset name: mid.ru_en_2024
Dataset description: all news items published on the English-language version of mid.ru
Start date: 2003-01-04
End date: 2023-12-31
Total items: 25 943
Available columns: doc_id; text; date; datetime; title; internal_id; url_id; translations; url
License: Permissive (see details)
Link for download: mid.ru_en_2024
field | present | missing | missing_share |
---|---|---|---|
doc_id | 25 943 | 0 | 0.0% |
text | 25 938 | 5 | 0.0% |
date | 25 943 | 0 | 0.0% |
datetime | 25 943 | 0 | 0.0% |
title | 25 943 | 0 | 0.0% |
internal_id | 25 856 | 87 | 0.3% |
url_id | 25 943 | 0 | 0.0% |
translations | 25 917 | 26 | 0.1% |
url | 25 943 | 0 | 0.0% |
Narrative explanation of how this textual dataset was built
The website of Russia’s MFA makes it possible to search in its news section by date. All index pages for each date starting with earliest publications have been retrieved. In the few occasions when more than 20 items were published on the same day, a second page for the relevant day was also retrieved. Here is an example of such an index page:
Direct links to news items were extracted from these pages.
The corpus includes the limited metadata available through the website, namely: - title - date and time of publication - an internal id which is included in almost all posts (see note below) - a list of the languages in which a given post has been published
Notes
This section lists some issues that may be of interest to users of this corpus
- Many items include the string: “Unofficial translation from Russian”
- Along with news, the MFA publishes items that detail the timing and accreditation rules for press briefings, see for example: https://mid.ru/en/foreign_policy/news/1927386/. As these do no not include substantive contents, they are not included in the dataset.
- Almost all news items are published with an identifier, e.g. “1383-22-09-2011” for this item. In a few dozens of instances the identifier is missing, and in a handful it is not unique. As a consequence, the numeric component of the url is likely preferrable as the main unique identifier.
- The Russian-language version of this corpus has a significantly larger number of publications.
- There are 5 items with empty text fields, they are listed below. Indeed, they simply include no text besides the title or include just a link to an external file (not included in this corpus).
date | title | url |
---|---|---|
2009-05-05 | Statement by H.E. Ambassador Anatoly Antonov, Head of the Delegation of the Russian Federation at the Third Session of the Preparatory Committee for the 2010 Review Conference of the Parties to the Treaty on the Non–Proliferation of Nuclear Weapons, New York, 4 May 2009 | https://mid.ru/en/foreign_policy/news/1711875/ |
2012-12-06 | Report on the human rights situation in the European Union | https://mid.ru/en/foreign_policy/news/1653435/ |
2013-05-21 | Speech of and answers to questions of mass media by Russian Foreign Minister Sergey Lavrov during joint press conference summarizing the results of negotiations with Secretary General of the Council of Europe Thorbjørn Jagland, Sochi, 20 May 2013 | https://mid.ru/en/foreign_policy/news/1587812/ |
2014-03-28 | The Hague Nuclear Security Summit Communiqué | https://mid.ru/en/foreign_policy/news/1709087/ |
2015-08-05 | The Ministry of Foreign Affairs of the Russian Federation on certain legal issues highlighted by the action of the Arctic Sunrise against Prirazlomnaya platform | https://mid.ru/en/foreign_policy/news/1512649/ |
License information
At the time contents were retrieved, the page on the conditions for the use of website contents makes clear that contents can be used for research purposes and can be re-published, as long as reference is always made to the website of the MFA.
Materials on the website of the Russian Ministry of Foreign Affairs are generally accessible and open for non-commercial use (personal, family, education, research, etc.).
Their reprinting, as well as any quoting in the mass media is allowed only with a reference to the website of the Russian Ministry of Foreign Affairs as a source of the information.
No specific license is however mentioned.
The contents of this dataset - “mid.ru_en” - are distributed within the remits of this license. To the extent that it is possible, the dataset itself is also distributed by its creator, Giorgio Comai, at the same conditions, as well as under the Open Data Commons Attribution license (ODC-BY).