A Tutorial for Analysing Survey of the Afghan People in R
Jun 5, 2018 00:00 · 713 words · 4 minute read
1 About the Survey of the Afghan People
Since 2006, The Asia Foundation conducts the Survey of the Afghan People, a nationally-representative annual survey. The Survey reflects perceptions about a broad range of topics including security, development, governance, service delivery, women’s rights, and migration. The Survey is broadly used by policy makers, academics, non-governmental organizations working in and on Afghanistan. The Foundation make the Survey’s data public on its website.
2 Survey Weight
National surveys such as the Survey of the Afghan People claim their data represent the general population. This claim is backed by the random selection of individuals that represent the true population. The Survey of the Afghan People collects data from all 34 provinces of Afghanistan, and include men and women, all ethnic groups and languages. The survey uses a multistage systematic sampling method, which means, first the country is divided into 34 stratas (here provinces) and then districts are selected within stratas using probability proportional to size (PPS) systematic sampling. The end product however might not fully represent the provincial, urban/rural, or gender proportions. Therefore, to ensure representativeness, survey weights are applied in analysis. Survey weights are commonly used in national or other types of surveys that claim representativeness of the population.
To apply survey weights in R require using special treatment of general estimation commands, which is available in packages such as survey. This package provides the tools to apply survey weight on various types of estimation commands. The way the survey package apply weight is to create a weighted dataset. The Survey data comes in a tabular form and include weight variables. The main weight variable that is used for the analyses in the survey report is MergeWgt10. We have to use svydesign() function from the survey package to produce the weighted data.
# install.packages("survey") # install first if you haven't
library(survey)
sap <- load("sap.Rda")
sap.w <- svydesign(id = ~1, data = sap[!is.na(sap$MergeWgt10), ], weights = ~MergeWgt10) # simple random sampling
2 Survey Analysis with R
I have broken down this post into (1) reading data, (2) tabulations and cross-tabulations, (3) descriptive statistics, (4) hypothesis testing, (5) inferential statistics.
2.1 Reading Data
The survey data is made available in .Rda (R), .dta (Stata) and .sav (SPSS) file format. In this tutorial, we go over how to carry out some data manipulation and analysis that is used in the Survey report using R. Ideally if you are using R, you would download .Rda file format of the data, in which case you will use `` function to open it in R or RStudio.
load(file = "file.Rda")
If you have the .dta (Stata) file format, there are haven, foreign, and readstata13 packages that are available and I have used. I suggest to use readstata13, because it preserves the labelled variables as factors. Keeping the order becomes very important when you produce tables or graphs; otherwise, you will get categories sorted by alphabetical order, which is not desireable in this case.
Let’s start by installing the readstata13 pacakges, then attaching it.
install.packages("readstata13")
library(readstata13)
read.dta13("file.dta")
If you have the .sav (SPSS) file format, there are haven and foreign packages that I have used. As far as I know, both does the job just fine.
library(haven)
read_spss("file.sav")
2.2 Tabulations and Cross-tabulations
Preparing a table of frequencies or percentages, which is know as tabulation, is the most widely used method in the survey report given the categorical nature of most of the survey questions. For example, we look at the percentage of people
library(dplyr)
svytable(formula = ~x4+m8, design = sap.w) %>% # weighted cross-tabulation of x4 by m8
prop.table(2) %>% # column percentages
{.*100} %>% # multiply values by 100
round(1) %>% # round the values to one decimal point
knitr::kable() # present in a nice table
2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
right direction | 44.3 | 42.3 | 37.5 | 42.3 | 46.7 | 46.2 | 51.5 | 57.6 | 54.7 | 36.7 | 29.3 | 32.8 |
wrong direction | 21.1 | 23.7 | 32.0 | 29.4 | 27.0 | 34.6 | 31.3 | 37.5 | 40.5 | 57.5 | 65.9 | 61.2 |
some in right, some in wrong direction | 29.4 | 25.3 | 23.0 | 20.6 | 21.6 | 17.1 | 15.4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
refused | 1.0 | 1.2 | 1.1 | 0.9 | 0.4 | 0.1 | 0.2 | 0.1 | 0.4 | 0.6 | 0.6 | 0.8 |
don’t know | 4.3 | 7.5 | 6.4 | 6.8 | 4.3 | 2.0 | 1.6 | 4.8 | 4.5 | 5.3 | 4.3 | 5.3 |