University of Graz logo

Text-based Collaboration in Scientific Projects

Improving Workflows with Distributed Version Control

Gerald Senarclens de Grancy, gerald@senarclens.eu

Relevance for Scientific Workflows

How Would we Design a System to Suit our Needs?

  1. Ability to work on more than one system
  2. Allow for cooperation (branching)
    • avoid version chaos
    • worst case: overwriting changes
  3. Snapshots: ability to diff to any past state
    • code doesn't compile anymore
    • infinite loop for new sample data
    • results are flawed
    • bisection to locate bugs
  4. Allow working offline (parc, travel, ...)
  5. Do distributed backups and encourage doing so
  6. Use harddisk economically
    • allow for of HDDs and SSDs
    • avoid duplication of data (just keep one copy); checksums for files and dirs
    • automatically compress data
  7. Ease of use
    • be fast (should not slow down workflow)
    • "easy" to use/ get started
    • good integration w/ common tools
  8. Freely available, ideally Open Source
  9. Independence of UniIT ;) et al
    • provide option for online repo(s)

Description of DVC

Continue with GIT
Continue with Mercurial

Mercurial

Git

Branches etc.

Bitbucket

Further Reading

Git Documentation http://git-scm.com/doc (2016)
Bryan O'Sullivan Mercurial: The Definitive Guide O'Reilly Media; 1st edt. (June 2009)

German

Valentin Haenel and Julius Plenz Git – Verteilte Versionsverwaltung für Code und Dokumente Open Source Press (2011)