Guest Post: The World by Numbers

The following is a guest post written by Daniel Scott (no relation), a friend from my undergrad days. I stumbled onto it on his facebook page and asked if I could repost it here. I think it does a great job of showing the trend towards increasingly decentralized data analysis opportunities as data becomes more and more available on the internet. Daniel is studying water treatment in Canada, but was still able to produce deep and interesting insights into international development and economic growth in his spare time. Only a few years ago, this type of thing might have been the exclusive realm of full-time researchers and development academics, who would have had the time and the knowledge to be able to access otherwise exclusive databases. Let the new era of decentralized information access begin. (aside: my own rough analysis indicates that CIDA will be joining this new era sometime around 2025-2050.)

——————————–————————————

The World by Numbers – Introduction

A lot of data is freely available online about almost every country in the world. I decided to compile some of this data and do some basic analyses. I was interested in correlations between different parameters that might suggest factors that could influence whether a particular country is more or less successful, as well as in finding links between parameters that seem unrelated at first.

This is a rough, first-cut type of analysis. I didn’t use any statistical techniques more rigorous than simple correlations. However, I was pleasantly surprised and excited to see just how much information could be gleaned from such a broad-strokes approach.

Read on to see what I found. The first section contains my observations including some tables and graphs, along with some suggested interpretations (which are tentative, since correlations don’t indicate whether there is a causal relationship or which way it might work). Next, I discuss some of the limitations of the work done. For those who want to dig deeper I present the methods I used, including descriptions of the variables, in the next section. I finish with some suggested take-away ideas and related links.

Observations

Table 1 shows correlations between the different variables I compiled. Cells are colour-coded to make the stronger correlations stand out. Some of these correlations are quite intuitive, such as higher GDP correlating with higher education spending and electricity usage. The correlations between things like civil rights, political liberties, and freedom of the press indicate that rights and freedoms tend to come as a package deal. One connection that I wasn’t expecting was the relatively strong correlation between property rights and perceived corruption. However, it made sense when I thought about it; in a country where corruption is commonplace, it is easy to imagine an authority figure groundlessly expropriating someone’s land, or demanding a share of imported/exported goods. These two variables of property rights and corruption perception also had a strong link to GDP. In turn, GDP had the strongest correlation to net migration rate; it seems that economic factors are the strongest driver behind global migration.

CorrelationChart2Table 1 – Correlations between different variables. 

Table 2 shows the results of a linear regression analysis. I made net migration the dependent variable since it seemed like the best proxy for how desirable a country is to live in (I looked at it as people voting with their feet). I only included some of the factors from the whole set as independent variables; GDP was removed because I didn’t want it to dominate the regression to the point of overshadowing other factors and then I tried to pick a subset of factors that weren’t highly correlated with each other (i.e. redundant). Here, security of property rights was the biggest determinant of migration rate, followed by civil liberties (but only property rights were significant at a 95% confidence level).

RegressionChart 

Table 2 – Linear regression analysis.

I also put together some graphs: Fig. 1 shows GDP and the Gini index, which measures [in]equality; fig. 2 shows GDP with the corruption perception index; and fig. 3 shows the civil liberties score with the net migration rate. In fig. 1, a very weak trend appears, while figure 2 has a much stronger relationship. In fig. 3 it is hard to see any trend as civil liberties scores are discrete. However, a closer examination will reveal that the median net migration rate appears to be negative except for countries with the strongest civil liberties (score of 1). Some of the most prominent outliers on these graphs are Persian Gulf states, where I presume their oil revenue can give them a pass on equality, corruption, and civil liberties that would not be tolerated in states not swimming in cash.

Graph1

 Figure 1 – Gini Index and GDP 

Graph2

Figure 2 – GDP and Corruption Perception Index 

Graph3

Figure 3 – Net Migration and Civil Liberties 

The remaining figures were made during a second round of analysis, after adding a few parameters and normalizing a couple of others by population. Fig. 4 made me very disappointed. It shows that Official Development Aid is not being allocated to the regions of most desperate need as far as water is concerned. That’s not to say that the countries receiving lots of aid have no need; "access to improved water supply" includes things as basic as public standpipes, so this aid could be used to upgrade to household plumbing, for example. However, countries with no access at all appear left out of ODA allocations. Note that there are a number of countries receiving even higher amounts of ODA for water that do not appear on this graph since they didn’t have data available for improved water access, or have populations under 1 million. I thought fig. 5, which compares cell phone use with access to improved sanitation, was very interesting. It appears that increased access to sanitation and mobile communication go quite well together except for some spread at the top. The point in the top left is Cuba, which makes a lot of sense considering the government there has heavily promoted public health but is wary of dissidents. Fig. 6 shows a loose relationship between press freedom (recall that a high press freedom mark indicates more attacks or barriers against press freedom in a country) and internet access. It appears that, while a lack of internet access does not imply poor freedom of the press, good press freedom and high levels of internet access certainly go together. From this analysis, we can only speculate which way a causal relationship might run. Finally, I’m sure that no one will be surprised by the link shown between GDP and electricity use in fig. 7.

Graph4

 Figure 4 – Official Development Assistance for Water and Access to Improved Water Supply

Graph5 

Figure 5 – Access to Improved Sanitation and Cell Phone Usage

Graph6 

Figure 6 – Press Freedom and Internet Usage

Graph7

Figure 7 – Electricity Consumption and GDP Per Capita

I know lots of readers of this blog have in-the-field experience in development. I’d love to hear comments about ways in which these broad trends concur or conflict with your experience.

Discussion of Strengths and Weaknesses

Of course this analysis is very rough, so I want to make its limitations clear. As this is just a blog post, readers are referred to the original sources and to encyclopaedias for full descriptions of how each parameter is defined and determined, and the associated limitations. I will just highlight some of the key limitations here.

  • These statistics are for countries as a whole and say nothing about internal variations. Just looking at Canada, we can see that while we have one of the highest per capita GDPs in the world, we still have poverty, notably on First Nations’ reserves.
  • Each of the parameters that I’ve used here has its own limitations resulting from how it is calculated and how it is defined. By necessity, a single number is at best a proxy for what we really want to know about. For example, GDP ignores grey- and black-market economic activity.
  • Correlations do not show a causal relationship. In the graphs, I’ve placed the variables on the axes in a way that seems plausible to me, but I have no statistical evidence that the axes couldn’t be flipped.
  • In the regression, I assumed that net migration rate could be a proxy for how desirable a certain country is to live in. Of course not everyone has the capacity to "vote with their feet". And the magnitude of the migration rate will be influenced not only by how desirable a certain move is, but also how easy. For example, moving from Mexico to the US is economically favourable and the border is somewhat porous, so millions of people have made the move. In contrast, I’m sure escaping from North Korea would be nice for many of its citizens, but they face a high risk of death (extending even to their families) for making the attempt.
  • Some of the water data was incomplete, especially in Europe, which could skew these analyses.
Methods
Sources and Descriptions of Variables

Here I list each parameter I included in this study and briefly describe it. The sources are the CIA World FactBook (WFB), Freedom House International (FH), Transparency International (TI), Economic Freedom Network—a consortium of free-market think-tanks such as the Fraser and Cato Institutes (EFN), Reporters Without Borders (RSF), and the World’s Water Report (WW). When I describe a variable as subjective, that is to differentiate ones like Freedom House scores that are ranked by expert opinions from ones with a definite numerical value such as electricity usage.

  • Net Migration—(WFB) the population gain/loss from {immigration – emigration}, per thousand people
  • Population—(WFB) the total polulation
  • Education Spending—(WFB) reported as a percentage of GDP, but I converted to a per capita value, as that made more sense in my view
  • Electricity Consumption—(WFB) total annual electricity consumption in kilowatt-hours; I normalized it to a per capita figure for this analysis
  • Gini Index—(WFB) a measure of [in]equality of income within a country, where 0 indicates perfect equality (everyone has the same income) and 100 indicates perfect inequality (where all the income is controlled by one person).
  • GDP—(WFB) the per capita purchasing power parity GDP
  • Political Rights—(FH) an index of political rights ranging from most free=1 to least free=7; subjective
  • Civil Liberties—(FH) an index of civil liberties ranging from most free=1 to least free=7; subjective
  • Press Freedom Mark—(RSF) a mark indicating the number and severity of press freedom violations in the past year; higher mark=more violations; subjective
  • Corruption Perception Index—(TI) an index of the perceived government corruption in a certain country; higher score=perceived to be less corrupt; subjective
  • Security of Property Rights—(EFN) a score of how secure property rights are in a certain country; higher=more secure; subjective
  • Freedom to Trade Internationally—(EFN) a score of how much freedom there is to trade internationally; higher=more freedom; subjective
  • Business Regulation—(EFN) a score of how good business regulation is a certain country, from the perspective of free-market think-tanks; higher="better" regulation (it looks like less red tape is considered good, but complete lack of regulation is not favoured as far as I can tell); subjective
  • Withdrawal of Water—(WW) annual per capita use of water in cubic meters; excludes rainfall, I think; this will be heavily influenced by the presence of water-thirsty crops and industries more than by domestic consumption in most cases, and waste will also play a role
  • Access to Improved Water Supply—(WW) the percentage of the population in a country that has access to improved water supply; the threshold for "improved" is set very low, so even things like a protected spring or rainwater collection will qualify
  • Access to Improved Sanitation—(WW) the percentage of the population in a country that has access to improved sanitation
  • Annual ODA for Water—(WW) the average annual Official Development Assistance (per capita, in USD) a country receives that is targetted at water and sanitation projects; it excludes things like flood control and power supply, as well as private aid
  • Cell Phone Users—(WFB) the number of cell phone users in a country; I normalized it by population from the total figure given in the World FactBook
  • Internet Users—(WFB) the number of internet users in a country; I normalized it by population from the total figure given in the World FactBook
Data Manipulation

Once I downloaded the data from each source (except to my chagrin, I typed it in from the World’s Water Report, because I have the book, and then I found that they have it on their website) I then imported it into Excel if necessary and cleaned up html artifacts and things like that. I also had to change the decimal separator to a period in the Press Freedom Index, since it comes from a French-based organization. One of the key steps I had to do when compiling the data was to make sure the country names were consistent. South Korea, for example, was listed as "South Korea", "Korea, South", "Korea, Republic of", and "Korea Rep.". Once all the names were reconciled, Excel’s lookup functions pulled everything into a nice, aligned table. From there I could do the analyses shown above. For some of them, I had to remove countries that didn’t have complete records available. I should also mention that countries with populations less than 1 million were excluded entirely.

Take Away Ideas

This analysis doesn’t prove anything, as I was only checking for correlations. However, it is suggestive of some things, such as the link between [a lack of] corruption, property rights, and GDP; and the role that free communication has in securing other freedoms. The emergence of expected correlations, such as the ones between civil rights and political freedoms, or between water and sanitation access, or between internet and cell phone usage, suggest that this analysis probably has some validity. Finally, I’d like everyone to realize that there is a wealth of data freely available and interesting patterns can show up if we take the time to look.

Related Links
  • Hans Rosling is much better at this sort of thing than I am. He’s developed Gapminder: A beautiful and intuitive interactive website that lets you look at global data like I’ve shown in many different ways—and they have a lot more data than I compiled for this post.
  • Information is Beautiful: this blog from the UK’s Guardian newspaper uses graphics to try to make sense of statistics in the news.
  • I also find Wolfram|Alpha to be a very useful site for quickly looking up various statistics and generating graphs on the fly. This site excels at interpreting plain language searches such as "5 poorest countries" or "GDP of Canada vs GDP of China".
The Fine Print

Copyright notices: the data I used did not originate with me, so I’m including the following copyright notices from the sources.

  • WFB: Public domain; source = CIA
  • FH: copyright Freedom House, Inc.; All rights reserved
  • TI: "All content on this website … is the property of Transparency International e.V. unless otherwise noted, and is protected by German, United States, and international copyright laws. Transparency International grants you a limited license to access, use, print, and copy this site for personal, informational, or academic purposes…"; Full notice here.
  • EFN: The authors request that this citation appear: "Gwartney, James and Robert Lawson with Herbert Grubel, Jakob de Haan, Jan-Egbert Sturm, and Eelco Zandberg (2009). Economic Freedom of the World: 2009 Annual Report. Vancouver, BC: The Fraser Institute. Data retrieved from www.freetheworld.com."
  • RSF: no copyright notice appears on their website.
  • WW: copyright Pacific Institute for Studies in Development, Environment, and Security, 2009; All rights reserved; This data was compiled from various agencies (e.g. WHO) which are enumerated in the introductions to each table.

I would be remiss if I didn’t thank Owen for hosting this guest post on his blog. I’m glad for the opportunity to share this analysis with a wider audience, and I look forward to the discussion to follow.

Anyone wishing to get in touch with me (e.g. to request a copy of this spreadsheet) may do so in the comments below or through my LinkedIn profile.

Advertisement

2 Responses to Guest Post: The World by Numbers

  1. cheap louis vuitton china to your friends at my estore

  2. Excellent information, Daniel. I was about to start the same kind of research myself, and you’ve already given me a flying start! .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s