Internal Reality

Tuesday, January 18

Hypothetical DVCS (git) workflow for VDrift - Part II, Developer Trees

AKA. The Post That Firefox Ate And I Had To Rewrite. Thanks, Firefox.

Welcome to part two of my article on workflows. Part one and its corollary dealt with the project administration side of a DVCS. Here I will explain working with the system outlined from the point of view of a developer

Starting out

First up, you're going to need a copy of the working tree. This sounds a bit strange if you've only ever used a centralised VCS before, but it's exactly what it sounds like, and explains why the command to get the source code is:
$ git clone git://github.com/VDrift/vdrift.git vdrift
It will literally clone the source tree into the vdrift directory, giving you a copy of all the history available in the main tree. You should probably make sure you're in a suitable location (I use ~/Source, this command would copy the tree into a ~/Source/vdrift directory). It'll set you up with a local branch master that tracks the project's, and you're set to start making changes. Some people recommend creating a new branch to work in and keeping master pure, but the only people who really need to do this are those who will be pushing directly to the main repository. I would recommend you rename your branch to avoid confusion during merges, eg. if you were going to be working on fancy gui buttons, you might run:
$ git branch -M master my-fancy-gui
Now if you want to have a look at the current state of the remote master branch without affecting your current work, it's pretty easy to do, by commiting your work (more on that later) and issuing the commands:
$ git fetch origin
$ git checkout origin/master
Similarly, if you want the latest stable, you could replace origin/master with origin/stable (You can't do it right now though, because the tree is not set up yet). I think this is a cleaner approach because it makes it clear that you're definitely looking at what's available to everyone, not something that might have your local changes in it also. When you want to work on your fancy gui stuff again, you issue:
$ git checkout my-fancy-gui
and continue on your way.

Making Changes

There's an adage you might have heard other variants on somewhere before, it goes like this — “Commit early, commit often.” If you look through the history of many projects that use systems such as subversion or CVS (and indeed some that use something like git, Hg, or Bzr), you'll often notice a commit that has a dot point list of changes and a ridculously large diff. This is considered by many as a bad practice, as it makes it harder to discover the point of changes and regressions accurately. Despite this, the nature of centralised version control systems promotes large single commits, because most people don't want to expose changes to the internet before they're complete. In a DVCS like git, though, commits are made to your own tree, so you're free to make them small and structured. So every time you make a discrete change, add the files it applies to with:
$ git add [filename...]
then commit with:
$ git commit
Save your commit message, and keep working (You can just commit all tracked files that have been changed with the -a flag to commit, but you must add new files if you have any). The only large commits should be merges.

Updating Your Tree

There have been a few guides in circulation recently that have advised, to keep commits ordered against the main tree, the use of what's known as rebasing to update your local code. This makes sense for preparing distinct patches manually, but isn't the best idea if you'll be preparing a series of changes. Have a look at what Linus Torvalds has to say about it. So, rebasing from origin/master probably isn't too bad if you're working on something once, and won't touch it again, or will start with a fresh branch every time. Or if you don't have your own remote tree, and will just be sending patches in the mail (urgh). Admittedly, the Linux kernel is also a much larger project than VDrift (at least, the source code part is), and VDrift probably won't have subsystem maintainers per se, but it stands that if you ever expect to be pulling changes from multiple sources or you ever expect your work to be pulled into the main tree, using git rebase will mess it up. You should also read this mailing list post if you're curious.

That said, it's important to keep the influence of the main development tree up-to-date in your local one. You can do this quite simply with a merge, for which you have two options:
$ git pull
will update origin/master and merge it with the current branch, but my preferred option is the two-staged:
$ git fetch origin
$ git merge origin/master
This method means that, among other things, between fetching and merging you can check the changes that have been made on origin/master and see where there might be conflicts with your own changes.

In summary, if you ever intend to make your tree public instead of sending a Big Patch That Does Everything (And Is Hard To Review) upstream, don't ever rebase your tree.

Making Your Code Public

If you broke your branch with git rebase, the correct thing to do here is to generate a patch with:
$ git diff origin/master
and get it to the developers somehow. Probably the issue tracker. If you're making one or two simple changes this probably isn't so bad anyway, and you're now done with this section.

Now if you have a big list of changes or just plain preferred the idea of publishing your changes as you made them, and you never rebased, you can make a remote and push to it. As VDrift is (or will be) using GitHub, we can make an account there, and fork the VDrift repository using their handy dandy interface. This is very simple, as having signed up you need simply to navigate to https://github.com/VDrift/vdrift and press the fork button, highlighted in pink here:

NOTE: You're going to need an SSH keypair. If you haven't generated one already and don't know how, GitHub has a guide you should follow (The last part of this also applies if you merely haven't attached it to your GitHub account yet).

Once you have an SSH keypair and a forked repository, navigate to that repository (in my case, because my username is fjwhittle and my repository name is vdrift, I'd go to https://github.com/fjwhittle/vdrift) and copy the SSH URL (HTTP should work as well, and you won't need a keypair for it, but I've never tried to use it). You need to add this as a remote to your local repository. In my case, I'd issue:
$ git remote add github git@github.com:fjwhittle/vdrift.git
for a remote named github — The actual name used here is relatively immaterial. You want to use the URL you copied for the part highlighted in red. Now if you run:
$ git remote show remote-name
You should get output a bit like:

* remote remote-name
  Fetch URL: Your-SSH-URL
  Push  URL: Your-SSH-URL
  HEAD branch: misc
  Remote branches:
    master          tracked
    staging         tracked
    stable          tracked

(There may also be something about "Local ref configured for 'git push'" but don't worry about it just now).

Now for the fun part (ie. pushing your changes to github). First, make sure your tree is up to date with origin/master as in the above section. As a rule on small to medium sized projects, there's probably something upstream that will conflict with the changes you've made, so do the best you can to resolve these before pushing any changes. Then, you push to changes to your GitHub repository with:
$ git push remote-name branch-name:branch-name
Substituting your remote name and branch name appropriately. If you've done anything bad like rebasing, git will refuse this step. You can force the issue, but 99% of the time you really shouldn't.

You're now ready to make a pull request. Again, hit the button on GitHub, and follow the instructions. This is really where GitHub shines, because this request will open an issue on the upstream project's tracker where your pull request can be reviewed and discussed before integration happens. I won't go into this feature in depth, but if you want to check it out, more info is available on GitHub's relevant help page.

That's it! Feel free to ask questions if you need anything clarified.

Tuesday, December 21

Hypothetical DVCS (git) workflow for VDrift - Part I Corollary

So, you don't want a staging branch after all?

Part one dealt with a tree that has a separate major branch strictly for bugfixes during a release cycle, but your project has only the stable and development branches. This represents a fundamental difference in workflows, but what does it mean?

In the first instance, staging more or less allowed development to continue unhindered during the entire release cycle. You don't really have any other options than this in centralised version control systems, but in DVCS everyone can have a repository that they individually commit to and can publish, meaning that the repository that mainline releases are – well, released – from is free to be dedicated to that purpose.

Differences in master

In the three branch model, closing the merge window for a release involved performing an actual merge from master into staging. In the two branch model, the close is more a change of state for one branch, essentially the difference beween open and closed master is in what is allowed to be pushed to it. While the merge window is open, this includes more or less anything that will go into the next release. After it closes, this should be limited to code changes that fix bugs only.

This does not mean that no one can work on new features in this time. On the contrary, it's as good a time as any to be working on the following release so your feature can be merged early on. Everyone is still capable of sharing code with each other, and extra branches in the main repository for specific feature sets for an as-yet-undefined release is possible. In effect, master has become staging.

Differences in stable

One of the benefits (probably the main benefit) you lose in this model is the clarity of merging bug fix commits from staging in sets. After release, bug fixes for that release have to be applied directly to stable, and if a different solution must be applied to master then reverts have to be made directly to stable also, although hopefully this is a rare occurence. A side effect of this is that fixes that cause a problem elsewhere become more devastating, but that's not a problem because you wouldn't apply a fix without checking it first now, would you? ;)

That's more or less all for now. See you soon for part two.

Thursday, December 16

Hypothetical DVCS (git) workflow for VDrift - Part I, Main Tree.

First post in a while! It's a tech article, too. Moving on....

There's been some discussion on the VDrift forums recently of moving to a distributed version control system, most likely 'git', and speculation that the project should have a concrete workflow to best utilise this technology. Here's my theory on how the project (and others like it) might best work based on the constraints given so far.

Part One deals with how the main development tree could be handled.

The project leaders have expressed that they wish to have three major branches: A "current release" branch, a development branch, and a bugfix/staging branch for the next release — for a DVCS I personally do not see the wisdom in differentiating between development and staging at this level, however this path has the support of consensus. For the purposes of this article, I will refer to these branches as stable, master and staging respectively. The reasons for this are largely related to what an experienced git using developer would expect to see, especially in the case of making master the development branch.

The master Branch

The purpose of master is to provide a centralised source with which developers can synchronise their local repositories and ensure their code applies to the most recent changes. This is the trunk of the project, if you prefer that terminology. The version of the source in this branch is, ultimately, where new features go to be tested by the wider community.

Any changes which land in this branch (normally via a pull request) should first be reviewed by at least one peer for validity and style before being added. Given a pull request like the following (Ignoring GitHub features for the moment):

The following changes since commit d2662ab2f0c70994c7bcef903b08fc5245e5e4a9:

SOME AWESOME SUPER DOOPER CHANGES

are available in the git repository at:

git://git.example.com/~uberman/vdrift.git asdooper

A maintainer could, after making sure their copy of master is up to date, add uberman's tree as a remote with:
$ git remote add uberman git://git.example.com/~uberman/vdrift.git -t asdooper
Then retrieve a copy of the branch:
$ git fetch uberman
$ git checkout uberman/asdooper
Which will land them in a "Detached HEAD" state in which they can review the code. Ideally the code is correctly styled, there are no obvious bugs and the last action that "uberman" performed was to merge the most recent version of master, resolving merge conflicts as he did so. In this case, if the reviewer is happy with "SOME AWESOME SUPER DOOPER CHANGES" (which they probably shouldn't be if that commit message is in there), they can simply switch to the master branch:
$ git checkout master
Then perform a merge:
$ git merge uberman/asdooper
Where uberman's diligent merge conflict resolution will prevent the reviewer from having to do any further surgery before pushing it to the main repository:
$ git push origin master
If there is some major problem with uberman's request, such as a failure to compile, changes being against too old a version of master, or unreadable code, the preferred option should be to refer the matter back to uberman, asking him to address the problem before requesting another pull. However, simple corrections such as to minor formatting issues could be performed by checking out to a new local branch (it may be preferrable to some to perform this step anyway):
$ git checkout -b asdooper uberman/asdooper
Then performing the required changes before merging the newly created branch into master; I recommend this to maintain clarity as to why the changes had to be made, though it is technically just as valid to perform the changes between merging to master and pushing the changes.

The staging Branch

This branch, while not strictly necessary in a DVCS model, facilitates avoiding code freeze during the release process. If you imagine a release cycle where new features are added at the beginning of the cycle, or while the "merge window" is open, a virtual feature freeze is put into effect merely by merging master into staging:
$ git checkout staging
$ git merge master
At this point, development can continue in master, but only bugfixes should be applied to staging (and the majority of those should also be applied to master). Importantly in a given release cycle this branch should work with precisely one revision of the data repository — in fact it should be possible to provide data snapshot to be used for the release at the point of closing the merge window.

Periodically during this phase of the release cycle, candidates for release may be tagged, for example to tag the first release canditate for a hypothetical January 2011 release (this is not a formal recommendation of versioning scheme):
$ git tag 11.1-rc1
It is one of these tagged release candidates which should eventually be merged into stable.

The second role of staging is to hold post-release bug fixes before applying them to stable. Any bug fixes that are applied during the feature freeze phase of the release cycle should thus be retained in staging until the merge window for the next release closes. It is important to note, however, that the commit for any bugfix for which a different solution was applied to master for any reason should be removed from staging with git revert before master is merged into staging for the next release — this reduces the number of dirty merges that are applied to staging and stable.

The stable Branch

There are exactly two types of update to this branch that should occur, and both are performed by merging. Using the hypothetical January 2011 release as above, the "final" release could be derived from 11.1-rc3 by first applying the version tag:
$ git tag 11.1 11.1-rc3
Merging it to the stable branch:
$ git checkout stable
$ git merge 11.1
Post-release bugfixes should be applied using a similar method, with a tag derived from the "final" release. For example, for the first set of bugfixes first tag the contents of staging:
$ git tag 11.1.1 staging
Then merge into stable:
$ git checkout stable
$ git merge 11.1.1

That's pretty much it for maintaining the main tree. The general guidelines are that almost every change to development should be a merge or a minor correction to merged changes, staging should have development merged into it once every release cycle and further changes should be cherry-picked or merged from dedicated bugfixing branches, and every change to master must be perfomed through a merge. NO REBASING EVER!!!

Friday, September 26

Python

Well, I've had a couple of goes at python now. Once with someone else's project and once writing a small application from (well, bridging an input device to an application, anyway). I have to say I'm not really impressed. It's moderately faster to write than C, but lacks the definition. Runtime does seem to be significantly faster than perl when using external libraries, and to assume that has more to do with how much less overhead there seems to be with symbol lookups and the like. On the other hand writing a package/module/class/whatever you want to call it in perl feels like much less of a hassle. I can see where the whitespace scoping is a good idea (make those lazy programmers indent their code), but I miss my braces. I also miss switch and static.

Saturday, July 19

HD and HD! Yay!

Yay! I got High Distinctions for both Uni subjects (Introduction to Information Technology – D'uh; and Discrete Maths – a little more surprising) last session! (marks finally came through after nearly a month of waiting) So, uh, since I last posted, Ironically I drove my other car into a pole, destroying the driver's side door and seriously damaging the sill on the same side, and someone shot a red light and killed my mum's car while I was driving it. I've been to Brisbane and back, which wasn't terribly interesting, and just recently I got my full license, which means no more plates! Yay! Also eight more points and an alocohol limit of 0.05% blood concentration, but that matters to me not (I have never even had a parking ticket, and don't drive when I've been drinking).