Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

When working solo on a project, most will have all of their files for a project within a subdirectory with the project’s name. For collaborative projects, as well as projects for which you want to make your code available to enable others to replicate your work, I recommend that you have separate directories for your scripts, original data, analytical products (results, created data, and log files). I explain why below. 

For collaborations, keep your original data in a location that is accessible to everyone on the project. It is true that each person could create a copy of the original data to put in their own personal research space. When the project data are updated (e.g. you add a new variable to the IPUMS extract, a new batch of responses to your survey comes in, or the federal agency that produced the data provides a new version that corrects an error in an earlier release), then each project member will need to remember to copy the data into their space every update. Otherwise, different members of the project will be working on different versions of the data, a potential source of confusion. To avoid this, your code could access the data in a shared project directory. When the data file is updated, the project manager should archive the old version of the data in a way that preserves information about its origins, contents, and version.  The new working data file should have the same name as the old file so that it can be seamlessly accessed by already established scripts.

...