De facto stats

Finding and contextualising socio-economic indicators about post-Soviet de facto states

Giorgio Comai

(OBCT/CCI)

ZOiS, Berlin - 17 October 2023

Theoretical starting point

Similarly to other small dependent jurisdictions, post-Soviet de facto states…

  1. receive a large share of their budget incomes from external sources
  2. have an unusually high amount of residents whose income depends directly on the (de facto) state (e.g. pensioners, public sector employees, military)
  3. have an economy strongly influenced by migration and remittances

As a consequence…

Even if a lot of subsistence and informal economy goes unrecorded, a sizeable part of the economy is fully formalised in a way that leaves some record:

  • external assistance from the patron is largely officially recorded in budgets
  • pensions and salaries for some categories are either public or publicly object of debate
  • relevant indicators are often publicly discussed (e.g. number of pensioners, number of teachers, health sector workers, etc.)

Socio-economic indicators in post-Soviet de facto states

Why do we care?

  • questions of political economy
  • degree of external dependency
  • economic opportunities structure
  • but also, more pragmatic:
    • salaries & migration choices
    • possibly, political or even status preferences
    • e.g. how much does Transnistria need if assistance through gas is cut?

Getting the data

  • sometimes, readily available
  • sometimes, it’s really many needles scattered around many haystacks

Trusting the data

  • to what extent should we trust them
  • or, which data can we trust, and what’s excluded

Making the data comparable:

  • exchange rates, inflation, etc.
  • “per capita” and demography

The (relatively) easy part

  • when data is mostly readily available
  • but still needs to be adjusted for exchange rate and inflation
  • per capita and purchasing power parity

Budget assistance (in RUB)

Budget assistance (in USD)

Exchange rates matter

Exchange rates matter

But there’s more…

Inflation-adjusted and “per capita”

For established economies, it’s easy to find data in constant LCU or constant USD. Alternatives here?

  • statistics on inflation released or made public by de facto authorities
  • data from the patron or neighbouring region
  • find historical prices for selected goods (e.g. electricity, diesel at the pump, etc.)
  • demography is an even bigger issue

This matters both for domestic dynamics and comparison

Plus, local peculiarities

e.g. Transnistria

  • we know how much gas Transnistria receives
  • we know how much that would cost, if Moldova were to pay for it
  • but what is actually its value for the Transnistrian economy?
  • and how is that value distributed, besides the budget?

In brief

  • getting data that is good enough for analysis or comparison may require extra work but may still be feasible, depending on the research question
  • depending on the research question, we may instead prefer looking for disaggregated data
    • about some sectors of the economy
    • about some groups of people whose incomes we may realistically know and compare

Some bottom-up stats

A part of the story, of varying size

Workers by sector in South Ossetia

  • Which data do we trust?
  • Is the data we can trust really so different from what we’d see in comparable jurisdictions?

Workers in South Ossetia in 2016

Finding the needles in the haystacks

First, get the haystacks

Corpora of relevant online sources

Textual datasets of relevant online sources

  • websites of de facto authorities
  • local news websites
  • possibly, websites of selected commercial activities
  • selected sources in the patron state

From website to corpus

Dataset name: novostipmr.com_ru

Dataset description: All items published on the website of Transnistria’s news agency Novosti PMR

Start date: 2012-07-31

End date: 2023-10-09

Total items: 128 048

Available columns: id; url; title; date; datetime; subtitle; description; section; section_link; tags; tags_links; text

Then, parse them for the needles

Structured analysis of online contents

This technique can be used to…

  • get data, across a wide range of issues
  • triangulate
  • put in context
  • reality check

By the way… is it OK do this?

  • “Text and data mining for the purposes of scientific research” is explicitly allowed by a 2019 EU directive
  • even beyond research, same conditions as would apply e.g. to search engines remain valid
  • full corpus can be publicly shared only when license allows for it

Taking care of the haystacks:

  • keeping them updated
  • but also, long time preservation
  • consider what can be shared

Even established online sources disappear

  • new president deletes statements of previous presidents (e.g. Georgia, Ukraine, etc.)
  • institutions, organisations, or entities change or disappear
  • selective removal of old contents (less common, but recorded e.g. in Transnistria post-Shevchuk)
  • old contents are not moved to new websites
  • some things may remain on the Internet Archive’s Wayback Machine, many (most?) are likely lost

How much can we trust these data points?

Context and corroboration

  • overall statistics may be problematic, but local data more likely to be accurate
  • formal or informal corroboration from multiple sources, online and offline (fieldwork still matters)
  • hacked and dumped materials can be used for triangulation (e.g. #SurkovLeaks), but consider ethics
  • acknowledge uncertainty
  • but don’t be overwhelmed by it: official stats from around the world are often a (misleading) mess

An example

  • inflation-adjusted estimates
  • demographic estimates corroborated with local administrative documents
  • geolocation and data retrieved through structured analysis of online contents

Some things I have not discussed

  • trade
    • much of the above applies
  • military assistance and spending
    • this is difficult, both conceptually and practically

What’s next

  • more easily accessible socio-economic indicators
    • we can get better at sharing them
  • more easily accessible textual datasets
    • publicly (not without obstacles)
    • among peers (formal or informal data sharing)
  • (not discussed here)
    • land-use change satellite data
    • geocoding and named entity recognition

Lots can be done as a community of scholars