archive.premier.gov.ru_ru_2024
Explore in an interactive web interface
Links for download: compressed csv / ods
Scope of this corpus
This corpus is based on all contents published in the “news” section of the website archive.premier.gov.ru as it was available online in early 2024.
Users should be aware that broadly for the same period (specifically, the time during which Vladimir Putin was prime minister) a separate website for the government was maintained, and its archived version is still available online at archive.government.gov.ru.
Summary statistics
Dataset name: archive.premier.gov.ru_ru_2024
Dataset description: all news items published on archive.premier.gov.ru
Start date: 2008-05-07
End date: 2012-05-07
Total items: 3 323
Available columns: doc_id; text; title; date; datetime; section; internal_id; url
License: Creative Commons Attribution 3.0 International
Link for download: archive.premier.gov.ru_ru_2024
field | present | missing | missing_share |
---|---|---|---|
doc_id | 3 323 | 0 | 0.0% |
text | 3 272 | 51 | 1.5% |
title | 3 323 | 0 | 0.0% |
date | 3 323 | 0 | 0.0% |
datetime | 3 323 | 0 | 0.0% |
section | 3 323 | 0 | 0.0% |
internal_id | 3 323 | 0 | 0.0% |
url | 3 323 | 0 | 0.0% |
Narrative explanation of how this corpus has been created
This corpus has been built based on index pages of the event “news” section, retrieving links starting with the earliest publication.
Links to photo, video, and audio pages have been removed, only textual contents have been kept.
Text and metadata have been extracted from the resulting pages.
Duplicates
Some items have been posted on the same date, with the same title, and with the same text under different urls (but the same numeric component in the url, here recorded as internal_id
). In such cases, duplicates have been removed.
Items with title but no text
There are 0 items with title, but no text. These are kept in the dataset, as the title may still offer relevant contents.
License information
At the time contents were retrieved, the footer of the website makes clear that all contents available are published with a Creative Commons Attribution 3.0 license:
The contents of this dataset - “archive.premier.gov.ru_ru” - are distributed within the remits of this license. To the extent that it is possible, the dataset itself is also distributed by its creator, Giorgio Comai, with the same CC-BY license, as well as under the Open Data Commons Attribution license (ODC-BY).