The goal of social science research is to produce and make publicly available the results to advance knowledge, deepen understanding and, ultimately, improve well-being. That broad goal is advanced when scientists keep up with recent developments of their colleagues by attending conferences and reading the literature, when they make well-reasoned arguments, and when they write and publish research papers to communicate their own empirical findings and interpretations. Good workflow also contributes to this goal by making us more efficient, providing us and our colleagues greater confidence in our work, reducing error, and creating stronger foundations for future research. By workflow I mean practices implemented on a day-to-day basis while conducting research that organize and document our analysis of empirical data. Any empirical research, whether quantitative or qualitative, involves workflow, but the focus of this document is on workflow for the analysis of secondary quantitative data sources.
Your workflow should
- Organize and document your research results
- Link results with process that produce them
- Help you to find what you were doing last time you worked on the project.
- Document known errors and inconsistencies in data.
- Allow collaboration. Your workflow needs to accommodate different work styles and computing systems.
- Provide opportunities to find errors.
- Allow you to build on past results in future studies, but also archive materials to replicate past results.
Elements of a good workflow (click on links to see tips on how to implement each element to good effect)
File structure – directory and sub-directory structure, file naming conventions
Documentation – research notebook, project document, code documentation, data documentation (e.g. date and method of access).
Code Structure – organization of scripts to process data that makes clear the function of each and its role in the project.
Archive – record of code and data that produced published results, record of supplementary analyses, old versions of code, documentation, papers. For archiving code, you might use GitHub and GitHub Desktop.
Automation – the code produces the empirical results as they appear in the publications and supplements with minimal human intervention.
Collaboration – workflow facilitates work on teams, but teamwork also encourages good workflow
GitHub – GitHub is a location for archiving, sharing, and keeping a version history of your files. There are many introductions to Git and GitHub, but this one is especially good.