De facto stats
Finding and contextualising socio-economic indicators about post-Soviet de facto states
ZOiS, Berlin - 17 October 2023
Theoretical starting point
Similarly to other small dependent jurisdictions, post-Soviet de facto states…
- receive a large share of their budget incomes from external sources
- have an unusually high amount of residents whose income depends directly on the (de facto) state (e.g. pensioners, public sector employees, military)
- have an economy strongly influenced by migration and remittances
As a consequence…
Even if a lot of subsistence and informal economy goes unrecorded, a sizeable part of the economy is fully formalised in a way that leaves some record:
- external assistance from the patron is largely officially recorded in budgets
- pensions and salaries for some categories are either public or publicly object of debate
- relevant indicators are often publicly discussed (e.g. number of pensioners, number of teachers, health sector workers, etc.)
Socio-economic indicators in post-Soviet de facto states
Why do we care?
- questions of political economy
- degree of external dependency
- economic opportunities structure
- but also, more pragmatic:
- salaries & migration choices
- possibly, political or even status preferences
- e.g. how much does Transnistria need if assistance through gas is cut?
Getting the data
- sometimes, readily available
- sometimes, it’s really many needles scattered around many haystacks
Trusting the data
- to what extent should we trust them
- or, which data can we trust, and what’s excluded
Making the data comparable:
- exchange rates, inflation, etc.
- “per capita” and demography
The (relatively) easy part
- when data is mostly readily available
- but still needs to be adjusted for exchange rate and inflation
- per capita and purchasing power parity
Budget assistance (in RUB)
Budget assistance (in USD)
Exchange rates matter
Exchange rates matter
![]()
But there’s more…
Inflation-adjusted and “per capita”
For established economies, it’s easy to find data in constant LCU or constant USD. Alternatives here?
- statistics on inflation released or made public by de facto authorities
- data from the patron or neighbouring region
- find historical prices for selected goods (e.g. electricity, diesel at the pump, etc.)
- demography is an even bigger issue
This matters both for domestic dynamics and comparison
Plus, local peculiarities
e.g. Transnistria
- we know how much gas Transnistria receives
- we know how much that would cost, if Moldova were to pay for it
- but what is actually its value for the Transnistrian economy?
- and how is that value distributed, besides the budget?
In brief
- getting data that is good enough for analysis or comparison may require extra work but may still be feasible, depending on the research question
- depending on the research question, we may instead prefer looking for disaggregated data
- about some sectors of the economy
- about some groups of people whose incomes we may realistically know and compare
Some bottom-up stats
A part of the story, of varying size
Salary of teachers in Abkhazia
For different reasons, mostly reliable data
(even if partial)
Workers by sector in South Ossetia
- Which data do we trust?
- Is the data we can trust really so different from what we’d see in comparable jurisdictions?
Finding the needles in the haystacks
First, get the haystacks
Corpora of relevant online sources
Textual datasets of relevant online sources
- websites of de facto authorities
- local news websites
- possibly, websites of selected commercial activities
- selected sources in the patron state
From website to corpus
Dataset name: novostipmr.com_ru
Dataset description: All items published on the website of Transnistria’s news agency Novosti PMR
Start date: 2012-07-31
End date: 2023-10-09
Total items: 128 048
Available columns: id; url; title; date; datetime; subtitle; description; section; section_link; tags; tags_links; text
Then, parse them for the needles
Structured analysis of online contents
For example
Data about pensioners in Transnistria
For example
Data about pensioners in Transnistria
For example
When a given issue received attention
This technique can be used to…
- get data, across a wide range of issues
- triangulate
- put in context
- reality check
By the way… is it OK do this?
- “Text and data mining for the purposes of scientific research” is explicitly allowed by a 2019 EU directive
- even beyond research, same conditions as would apply e.g. to search engines remain valid
- full corpus can be publicly shared only when license allows for it
Taking care of the haystacks:
- keeping them updated
- but also, long time preservation
- consider what can be shared
Even established online sources disappear
- new president deletes statements of previous presidents (e.g. Georgia, Ukraine, etc.)
- institutions, organisations, or entities change or disappear
- selective removal of old contents (less common, but recorded e.g. in Transnistria post-Shevchuk)
- old contents are not moved to new websites
- some things may remain on the Internet Archive’s Wayback Machine, many (most?) are likely lost
Some may remain in personal archives, but are effectively lost (or, are they?)
How much can we trust these data points?
Context and corroboration
- overall statistics may be problematic, but local data more likely to be accurate
- formal or informal corroboration from multiple sources, online and offline (fieldwork still matters)
- hacked and dumped materials can be used for triangulation (e.g. #SurkovLeaks), but consider ethics
- acknowledge uncertainty
- but don’t be overwhelmed by it: official stats from around the world are often a (misleading) mess
An example
![]()
- inflation-adjusted estimates
- demographic estimates corroborated with local administrative documents
- geolocation and data retrieved through structured analysis of online contents
Some things I have not discussed
- trade
- much of the above applies
- military assistance and spending
- this is difficult, both conceptually and practically
What’s next
- more easily accessible socio-economic indicators
- we can get better at sharing them
- more easily accessible textual datasets
- publicly (not without obstacles)
- among peers (formal or informal data sharing)
- (not discussed here)
- land-use change satellite data
- geocoding and named entity recognition
Lots can be done as a community of scholars
De facto stats
Finding and contextualising socio-economic indicators about post-Soviet de facto states
ZOiS, Berlin - 17 October 2023
De facto stats Finding and contextualising socio-economic indicators about post-Soviet de facto states Giorgio Comai ( OBCT / CCI ) ZOiS, Berlin - 17 October 2023