1 - Statistical Programming 101
1 - Statistical Programming 101
programming
Before we dig deep into specific aspects of research reproducibility, this session
will introduce you to 7 general principles that you should always keep in mind:
2
Principle 1: Your code is an output
Your code is an output
3
Your code is an output
4
Your code is an output
5
The main reason why we code
• In Excel you make changes directly to the data and save new versions of
the dataset.
• In Stata and R you make changes to the instructions on how to get from the
original data to the final analysis and save new versions of the instructions.
6
Create recipes and not meals
7
Create recipes and not meals
8
Principle 2: Know your data
Know your data
• To write a good recipe you need to know your ingredients very well
• The ingredients for a data work recipe are contained in the datasets
• Let’s discuss a framework to understand and communicate how your data is
structured
9
Exploring a new dataset
10
Exploring a new dataset
10
ID variables
11
Understand project data
• It is easy to remember information about one or two datasets while you are
working with them
• However, in your role as a research assistant, you will need to keep track of
multiple datasets, explain to other team members how they are organized,
and hand them to other researchers
• To communicate our understanding of datasets, we use data maps. We will
learn about this tool in the next session
12
Principle 3: Track your changes
Track your changes
13
How can you track changes?
14
Recommended practices for version control
• DIME projects are required to use git for version control of code
• Anything can be version-controlled through git, but it is only suitable for
code and outputs in plain text formats such as .csv, .do, .R, .tex
• The World Bank does not allow us to store data on GitHub, but you can track
changes to it by saving metadata such as codebooks on plain text format
15
Principle 4: Write code that others
can read
How to write good recipes
16
Is this slide easy to read?
White Space. Stata does not distinguish between one empty space and many empty spaces,
or one line break or many line breaks. It makes a big difference to the human eye and we
would never share a Word document, an Excel sheet or a PowerPoint presentation without
thinking about white space - although we call it formatting.
17
White Space
• Stata does not distinguish between one empty space and many empty
spaces, or one line break or many line breaks
• It makes a big difference to the human eye and we would never share a Word
document, an Excel sheet or a PowerPoint presentation without thinking
about white space – although we call it formatting
18
Vertical spacing
19
Vertical spacing
19
Horizontal spacing
20
Horizontal spacing
20
Style Guides
Style guides are common in most programming languages. Following a style guide
will make your code much more readable, and it will reduce the risk of errors.
21
Code linters
Linters are tools that flag style errors and possible bugs in software.
• Stata: Install the Stata linter (proudly developed by DIME Analytics!) from
SSC with: ssc install stata linter. More information is available here.
• R: Use the package lintr, available in CRAN. More information in this link.
22
Don’t repeat yourself
23
Principle 5: Think critically about
the data work
Critical thinking about data work
24
Critical thinking about data work
25
Principle 6: Ask for help
Help file usage and coding knowledge
26
Help file usage and coding knowledge
27
Help file usage and coding knowledge
29
Asking for help
This is always the case, no matter who you ask: DIME Analytics, Stack Overflow, a
friend from grad school etc.
30
How to ask for help
• You will never get a good answer if you only say “my code is not working”
• In good code question etiquette, include at least:
• Error message or description of unexpected behavior
• Software language and point to the part of your code that breaks
• Describe what you have tested so far and what you have learned
Much more details and advice on this topic at https://git.io/JtQTb and http://tinyurl.com/stack-hints
31
Principle 7: Keep improving your
skills
When your code works you are only half done.
- Ancient proverb
32
Re-write your own code
• Read your own code as a recipe. Would you be able to follow the instructions
if you were a new person joining the team?
33
Read other peoples code
• Google code, but before using, ask yourself critical questions about the code
you found
• Why did this person code this way?
• Does this apply to my context?
34
Have someone else read your own code
• Swap code with someone and discuss differences in coding style. Think of
each other’s code as recipes, can you follow the instructions?
• Have you ever asked someone to help you proofread your Word document?
Ask people to proof read you code
35
Wrapping up
Wrapping up
36
Wrapping up
We will see these principles in practice during the rest of this training.
36
Thank you! Gracias!