Session 1: Overview of Cspro, Dictionary and Forms
Session 1: Overview of Cspro, Dictionary and Forms
Identify different CSPro modules and tools and their roles in the survey workflow
Create a simple data entry application including dictionary and forms
Run a data entry application on Windows
Run a data entry application on Android and retrieve the data entered
Understand the differences between the new CSPro DB format and the old text format for data
files.
CSPro Overview
CSPro is a suite of software tools for census and survey data processing that includes modules for data
collection, editing, tabulation, and dissemination.
CSPro has a long history. It was first released in 2000 and has been used in over 100 countries
worldwide. It has been used for censuses all over the world as well as for many large and complex
household surveys including the Demographic and Health Survey (USAID), Multiple Indicator Cluster
Survey (UNICEF) and Living Standards Measurement Study (World Bank). The first Android version was
released in 2014. CSPro Android has already been used in production for household surveys and
population censuses in multiple countries.
CSPro is free software developed by the US Census Bureau and funded by USAID. The Census Bureau
provides free email customer support. You can send questions to [email protected].
CSPro can be used for both the traditional PAPI (pencil and paper interview) workflow as well as the
computer aided personal interview (CAPI) workflow. In this workshop, we will focus on data collection in
using a CAPI workflow.
Group exercise
Split into groups of 3-4 people and use the provided tablets to interview each other using the “Getting
to Know You” application. Interview each member of the group so that we have data for all workshop
participants. When you are done, tap the sync button ( ) to upload your results to the server.
When you launch CSPro you are given the choice of “Data Entry Application” for key from paper (PAPI)
and “CAPI Data Entry Application” for electronic data collection using phones/tablets/laptops. The
differences are:
Since we are creating a CAPI application we will choose “CAPI Data Entry Application”. We will name the
application “Popstan2020” and we will use the same name for the dictionary. Since we will eventually
add other applications such as the listing questionnaire and the menu, we will create the following
folder structure:
The first step in creating the application is to define the data dictionary. The data dictionary lists all the
data items and possible responses that will be in the application and organizes them in records and
levels. The dictionary has the following hierarchy:
Dictionary
Level
Record
Item
Value Sets
Sub-item
Before defining the record and items we first create ID-items. ID-items uniquely define each case (each
questionnaire). Usually these are geographic codes.
Why not include GPS, interview date, start and end time since they are in the same section of the
questionnaire? Because they are not part of the codes needed to uniquely identify the questionnaire.
Tip: Note that you can toggle showing names or labels in the dictionary tree on the left side of the
screen using the View menu. You can also select “Append names to labels in tree” to show both at the
same time.
Note that we could have separate records for education and fertility but instead we will combine them
with the person record. This will simplify analysis later on since we will not have to link the records
together. Later on, we will see that even with the records combined we can still put education and
fertility into separate rosters on our forms.
Properties of records:
Type value: B
Required: no (we can have empty households)
Max: 30 (questionnaire has limit of 10 but no penalty for adding a few extra just in case)
Different people have different styles of naming dictionary variables. Some use a descriptive name such
as “PLACE_OF_BIRTH” others prefer to use the question number such as “B07” and others prefer a
combination such as “B-7_PLACE_OF_BIRTH”. Whichever approach you choose just make sure that it
will be easy for users of your application and your data to understand. Will everyone working on the
logic for your application know what B07 is?
For each of our variables we need to add the possible responses (value sets). The value set lists all valid
responses along with their corresponding labels for coded variables. Without a value set, the
interviewer can enter any value (except blank) but with a value set they are limited to the options
defined in the value set. Without a values set, users can even enter negative numbers. For this reason, it
is good practice to use a value set for all numeric variables.
Define the value sets for some of our variables based on the response codes on the questionnaire:
1
The line number is not needed in CSPro itself as there are ways to determine the row number using logic,
however, when exporting the data to other packages it is often useful to have it. We will see later how to fill this in
automatically during data entry.
Dictionary Macros
There are some useful functions for working with dictionaries that you can access by right-clicking on
the dictionary in the tree on the left side of the screen and choosing “Dictionary Macros”. In particular
you can copy/paste all value sets or all item names/labels from the dictionary to/from Excel. This can be
used to create codebooks to share with people who do not have access to CSPro. It can also be used to
do bulk modifications on dictionary items such as renumbering values in value sets or adding prefixes to
item names.
Forms
Before we can enter data, we need to create data entry forms. To start, click on the yellow stack of
forms on the toolbar. To follow the look of the paper questionnaire we will create one form for each
page of the paper questionnaire.
Create a form for section A: Identification. Drag and drop the id-items onto the form. Note that we can
drag and drop individual items or entire records. By right clicking on the form in the forms tree on left
side of the screen we can change the label and name of the form. Let’s make the label “A: Identification”
and make the name “IDENTIFICATION_FORM”.
Create a form for section B: Demographics. Drag drop the items from the person record. Let’s give the
form label “B: Demographics” and name “DEMOGRAPHICS_FORM”. Note that when we drop the record
we have the option to put the items in a roster or a repeating form. If we drop the items on the
household identification form, we can only roster since the household identification isn’t repeated. For
our example let’s use a roster.
When we create the rosters, CSPro automatically gives them a name that ends in “000”, for example
“PERSON000”. You can see this in the forms tree on the left side of the screen. We can change this by
right clicking on the roster in the forms tree and choosing properties. Let’s name our roster
“DEMOGRAPHICS_ROSTER”.
Create a form for section: F: Housing Characteristics. Drag and drop the items from the housing record.
For paper and pencil surveys, we would spend a lot of time on the layout, adding additional labels and
frames to make the form look exactly like the questionnaire. However, when rendered on Android, the
form is rendered one question at a time so making the form look like the paper form is not as important.
The csdb extension is new in CSPro 7.0 and represents a new file format, the CSPro Data Base File. This
file not only contains the data itself but it also contains the notes, the index, the partial save status and
metadata used for data synchronization. In earlier versions of CSPro, the notes, index and partial save
were stored in extra files that accompanied the data file itself. It was unwieldy to deal with all of these
files so the CSPro DB file combines them all into one file. Unlike the text file used by earlier versions of
CSPro, this is a binary file that cannot be viewed using TextViewer. For the final CSPro 7.0 release we will
have a data viewer tool to see the data contained in a CSDB file which will play the role that TextViewer
does for text files.
In CSPro 7.0, when you launch a CSPro data entry application you can select the type of data file that
you want to use:
While it is still possible to use text files in CSPro 7.0, it is highly recommended to use CSPro DB instead.
There are new features such as smart sync and case labels that are not supported using text files.
When we copied the application to the tablet we copied the pen and the pff file. The pff file contains
various parameters about how to launch the data entry application including the data file to use. You
can modify the pff file by right clicking on it in Windows Explorer and choosing “Edit”. This allows you to
modify the name of the data file, force the application to start in add or modify mode and to lock
various parts of the user interface.
On Android, the list of available applications on the device is constructed by finding all the pff files in the
CSEntry directory and subdirectories so a pen file without a pff will not show up in the list.
It can be a pain when testing a question that is on the third form of a survey to have to reenter all of the
data up to the question you are testing. We can enable partial save under the data entry options so that
we can exit data entry, modify the application, and come back right to where we left off. While we are in
the data entry options we can also enable the case tree on both Windows and Android to make it easier
to navigate around the questionnaire while we are testing. The case tree is enabled by default on
Android since Android only shows one question at a time but it is off by default on Windows. Note that
on Android phones, since there is not enough space to display the case tree and the questions at the
same time so you need to tap on the big green “CS” in the top left corner of the screen to bring up the
case tree.
Group exercise
Add a new record to the dictionary for section E of the questionnaire (deaths). Name this record
“Deaths”. How many occurrences should it have? Don’t include E01 and E02 in the new record as they
do not have the same number of occurrences as the other variables in this section. Instead, add them to
the housing record. Create a new form for section E and add the fields onto it to create a roster. Use E02
as an occurrence control field to the roster to limit the number of rows to the number of deaths in the
household. Test the application on both Windows and on Android.
Subitems
Let’s add the Date of Birth (B06) to the application. In order to be able to look at both the date as an 8-
digit number and look at the day, month and year individually we can create an item with subitems.
Subitems are items that are made up of a subset of the digits of their parent item. Add the item for
interview date and the following subitems:
We are putting year first then month and day because this format will work better with other CSPro
features that we will see later. Click on Layout in the toolbar to ensure that the item and subitem
overlap. Add the subitems to the form, add the value sets for each subitem and test the application.
Note that when we add the subitems to the form we do not need keep the same order that we have in
the dictionary. On the form we can put the day, month then the year.
Occurrence Labels
With this approach, our form does not show the housing unit types but we can fix that by using
occurrence labels. Select the housing units variable in the dictionary and choose “Occurrence Labels…”
from the Edit menu. Add the names of the five types of housing units in the grid that comes up. Note
that you can copy from Excel and paste into this dialog. Now when we drag the variable to the form the
roster shows the type of housing unit for each row.
Checkboxes
We could also use a multiply occurring item for question B10, disabilities, but that can be implemented
more easily using checkboxes. Checkboxes offer a friendly interface for multiple response questions by
presenting a single screen with a checkbox for each option rather than presenting the options one by
one.
In CSPro multiple response questions are implemented as alpha variables whose length is the same as
the number of options that can be selected at the same time. The value set has a value for each option
which is usually a single letter. The resulting value is a string containing the values for each of the
selected items.
Visual A
Hearing B
Speech C
Physical D
Mental E
Self-care F
If the interviewer checks the boxes for Visual, Physical and Mental the value for the variable will be
“ADE”. We will see later how can convert the alpha value into a series of yes/no values to simplify
analysis.
When you drag an item on the form, CSPro sets the capture type based on the value set for that item. If
there is no value set when you drop the item, the capture type will be set to text box. You can always
change the capture using the Field Properties dialog.
Date Fields
Let’s add the interview date to the identification section. Which record should it go on? It could go on
any singly occurring record such as housing but let’s create a new record called INTERVIEW_REC to hold
the section A items that are not part of the id-items and add it there. We can add an eight-digit item for
the interview date. We can also add sub-items for the year, month and day of the interview. Drag the
date item onto the form. When dragging don’t use the sub-items, just the items. Now change the
capture type for the interview date to be Date and set the date format to be YYYYMMDD to match the
order of the sub-items year, month and day.
Tip: When dragging items onto a form with an existing roster drop the items inside the roster or you will
end up with a second roster for the new item.
1. Add the remaining fields from the housing section of the questionnaire to the dictionary and the
housing form (F5 through F12). Make sure to add the appropriate value sets.
2. Add the remaining fields from the demographics section (B) of the questionnaire to the
dictionary and to the demographics form. Make sure to add the appropriate value sets.
a. For Occupation(B17) use only the 4-digit occupation codes. This is hierarchical coding
scheme and we only want the last level (level 4) codes. Copy the occupations codes
from the Excel Spread Sheet “QuestionnaireAnnexes.xlsx” to the value set.
b. For Language(B18) use checkboxes.
3. Add the fields for section C, Education, to the dictionary. Add them to the person record. Create
a new form for section C and drop the education items onto to it to create a roster. Set the
occurrence control field for the roster to the number of household members.
4. Add the start and end times of the interview to the identification form after the interview date.
Use subitems for the hours and minutes. Add appropriate value sets.
5. Add a new record and form for section G, household possessions. Since G01 (quantity and value)
repeat, put these items in their own repeating record but do not include G02. Set the
occurrence labels in the roster for G01 to the names of the possessions. Make G02 a singly
occurring checkbox field and put it in the housing record since it does not repeat. The value set
for G02 should have the possession names as labels with codes “A”, “B”, “C”…