Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I also recommend that you keep your scripts in a directory that is linked to a code repository that is backed up to the cloud, like github. I discuss the benefits of using a code versioning system elsewhere. Having your scripts in a repository has important implications related to file structure. First, you will want to avoid putting your data in the code repository, unless your data file is very small and will not change. To remove the temptation, you could either add the data to the git ignore list or keep your data separate from the git repository. Second, keep the products of your code out of the repository as well.  One reason why you want to keep the original data and analytical products out of the repository has to do with how versioning system systems store your changes. In short, git compares the contents of the old version with the new and stores the difference. If the files are binary (e.g. a stata data file, excel file, word document, image) git is unable to easily make the comparison and will store the whole file. Overtime this will make your repository unwieldy. Another reason is that the data products can get out of sync with the code if some members of the research team are committing only the scripts, which I’m arguing is the better practice. Trust the process. If you need to go back to an earlier research product, you can checkout the version of the repository that produced that product and re-run the code to reproduce it. If your code cannot produce the data product, then the product isn’t documented and the system isn’t working as designed.

...