A Tutorial for Analysing Survey of the Afghan People in R

Jun 5, 2018 00:00 · 713 words · 4 minute read AfghanSurvey

1 About the Survey of the Afghan People

Since 2006, The Asia Foundation conducts the Survey of the Afghan People, a nationally-representative annual survey. The Survey reflects perceptions about a broad range of topics including security, development, governance, service delivery, women’s rights, and migration. The Survey is broadly used by policy makers, academics, non-governmental organizations working in and on Afghanistan. The Foundation make the Survey’s data public on its website.

2 Survey Weight

National surveys such as the Survey of the Afghan People claim their data represent the general population. This claim is backed by the random selection of individuals that represent the true population. The Survey of the Afghan People collects data from all 34 provinces of Afghanistan, and include men and women, all ethnic groups and languages. The survey uses a multistage systematic sampling method, which means, first the country is divided into 34 stratas (here provinces) and then districts are selected within stratas using probability proportional to size (PPS) systematic sampling. The end product however might not fully represent the provincial, urban/rural, or gender proportions. Therefore, to ensure representativeness, survey weights are applied in analysis. Survey weights are commonly used in national or other types of surveys that claim representativeness of the population.

To apply survey weights in R require using special treatment of general estimation commands, which is available in packages such as survey. This package provides the tools to apply survey weight on various types of estimation commands. The way the survey package apply weight is to create a weighted dataset. The Survey data comes in a tabular form and include weight variables. The main weight variable that is used for the analyses in the survey report is MergeWgt10. We have to use svydesign() function from the survey package to produce the weighted data.

# install.packages("survey") # install first if you haven't
library(survey)
sap <- load("sap.Rda")
sap.w <- svydesign(id = ~1, data = sap[!is.na(sap$MergeWgt10), ], weights = ~MergeWgt10) # simple random sampling

2 Survey Analysis with R

I have broken down this post into (1) reading data, (2) tabulations and cross-tabulations, (3) descriptive statistics, (4) hypothesis testing, (5) inferential statistics.

2.1 Reading Data

The survey data is made available in .Rda (R), .dta (Stata) and .sav (SPSS) file format. In this tutorial, we go over how to carry out some data manipulation and analysis that is used in the Survey report using R. Ideally if you are using R, you would download .Rda file format of the data, in which case you will use `` function to open it in R or RStudio.

load(file = "file.Rda")

If you have the .dta (Stata) file format, there are haven, foreign, and readstata13 packages that are available and I have used. I suggest to use readstata13, because it preserves the labelled variables as factors. Keeping the order becomes very important when you produce tables or graphs; otherwise, you will get categories sorted by alphabetical order, which is not desireable in this case.

Let’s start by installing the readstata13 pacakges, then attaching it.

install.packages("readstata13")
library(readstata13)
read.dta13("file.dta")

If you have the .sav (SPSS) file format, there are haven and foreign packages that I have used. As far as I know, both does the job just fine.

library(haven)
read_spss("file.sav")

2.2 Tabulations and Cross-tabulations

Preparing a table of frequencies or percentages, which is know as tabulation, is the most widely used method in the survey report given the categorical nature of most of the survey questions. For example, we look at the percentage of people

library(dplyr)
svytable(formula = ~x4+m8, design = sap.w) %>% # weighted cross-tabulation of x4 by m8
  prop.table(2) %>% # column percentages
  {.*100} %>% # multiply values by 100
  round(1) %>% # round the values to one decimal point
  knitr::kable() # present in a nice table

	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017
right direction	44.3	42.3	37.5	42.3	46.7	46.2	51.5	57.6	54.7	36.7	29.3	32.8
wrong direction	21.1	23.7	32.0	29.4	27.0	34.6	31.3	37.5	40.5	57.5	65.9	61.2
some in right, some in wrong direction	29.4	25.3	23.0	20.6	21.6	17.1	15.4	0.0	0.0	0.0	0.0	0.0
refused	1.0	1.2	1.1	0.9	0.4	0.1	0.2	0.1	0.4	0.6	0.6	0.8
don’t know	4.3	7.5	6.4	6.8	4.3	2.0	1.6	4.8	4.5	5.3	4.3	5.3