• Setting up svnsync-ed (mirrored) SVN repositories on Ubuntu (part 2 of 2)

    Note: if you haven't already, you may want to read part 1 of this article first.

    Phew, and that was all the work you needed to do to migrate your Subversion repository to another server. We've barely touched that other server that we wanted to use for mirroring the repository!

    "Bootstrapping" your mirror SVN server for svnsync

    This mirror SVN server also needs Subversion 1.4.x installed, so go ahead and do the (almost) same things we've done in part 1 to get Subversion installed. You should be able to use the .deb package generated by checkinstall on your main Subversion server to install Subversion 1.4.x on the mirror SVN server. Just scp it to the mirror SVN server and install it (dpkg -i subversion-1.4.3.deb).

    With Subversion installed, create a new Subversion repository (using svnadmin create, remember?), but don't load the repository dump like we did on the main SVN server in part 1 (of course, since we are going to be mirroring the repository!).

    Setting up svnsync to mirror your repository

    First, create a SVN user for svnsync to use - let's call this the 'svnsync user'. The easiest (and best) way to do this is edit the svnserve.conf and passwd files:

    conf/svnserve.conf

    
    # Uncomment this line.
    password-db = passwd
    

    conf/passwd

    
    svnsync = secret
    

    This gives read and write access to the 'svnsync user'. The svnsync program will authenticate with our repositories as this user via the svn:// protocol (i.e. via svnserve).

    Next, we need to create a pre-revprop-change hook for the destination repository. The svnsync documentation has a detailed explanation. Create a hooks/pre-revprop-change file under your destination repository's directory.

    
    #!/bin/sh
    USER="$3"
    
    if [ "$USER" = "svnsync" ]; then exit 0; fi
    
    echo "Only the svnsync user can change revprops" >&2
    exit 1

    Make it executable, and then initialize the sync:

    
    chmod +x hooks/pre-revprop-change
    svnsync init file:///var/svn/repositories/destination_repos svn://source.host/source_repos

    Don't worry, this only sets up the sync - there's no actual data copying yet. Syncing your repository data may take a long time if you have a big source repository, so I suggest using nohup to run the code overnight (or something), or at least saving the output in a log. Either way, the command to start the sync is:

    svnsync sync --username svnsync file:///var/svn/repositories/testsync/

    You should start seeing svnsync committing in changes from your source repository. Instant gratification (well, almost)! Should your svnsync process get aborted or killed, you can remove the hanging lock by running:

    svn propdel svn:sync-lock --revprop -r 0

    Setting up 'on-the-fly' syncing

    So now you have your source and destination repositories synced, but what happens when you start committing changes to your source repository? Nothing! That's because svnsync is merely a passive syncing tool (meaning you have to run it to sync, instead of it knowing when to sync automatically).

    There are two ways you can setup 'real-time' syncing:

    1. Use cron (or a similar scheduler) on the destination repository server. Add something like this to your crontab:

      * * * * * /usr/local/bin/svnsync --non-interactive sync svn://source.host/source_repos

      This basically runs svnsync on your destination repository server every minute to pull down any changes to your source repository.

    2. Add a post-commit hook to the source repository. I found this svnsync entry by Paul Querna that has a sample post-commit hook. If I recall correctly I tried it but it didn't work for me, so I settled on using cron to sync up my repositories.

    Things that I skipped

    There're some things that I skipped over while writing this, mainly to do with SVN authentication.

    • If you're accessing your repository via the svn+ssh:// protocol, you've to manage the (group) permissions of the repository files in the filesystem appropriately (basically the repository should be group writable by your users). chmod and chown are your friends, as is NIS (or something similar) to manage your users. I use these steps to create a new SVN repository that gets access via the svn+ssh:// protocol:

      
      sudo mkdir /var/svn/repositories/funky_project
      mkdir /tmp/funky_project
      mkdir /tmp/funky_project/trunk
      mkdir /tmp/funky_project/branches
      mkdir /tmp/funky_project/tags
      sudo svnadmin create /var/svn/repositories/funky_project
      sudo svn import /tmp/YourProjectNameHere file:////var/svn/repositories/funky_project -m "Initial import."
      rm -rf /tmp/funky_project
      sudo chown -R www-data:www-data /var/svn/repositories/funky_project
      sudo chmod -R g+w /var/svn/repositories/funky_project
          

      As you can see, my SVN users are part of the www-data group, and the repository directory is made group-writable.

    • The svn:// protocol has authentication configuration files in the conf/ directory of your repository. The SVN book has a section explaining how to configure authentication for svnserve.
    • Apache httpd can be used to expose your SVN repositories via the WebDAV protocol. This allows for the very commonly seen http:// repository URLs (especially for Open Source projects). Configuration is a little more involved and you would probably have to install Apache from source as well. The SVN book has the details.

    Wrapping up

    I hope someone found this entry useful - I know I could have used one when I was setting up Subversion and svnsync.

  • Testing rescue_action_in_public with RSpec

    After overriding rescue_action_in_public in the ApplicationController to deal with ActiveRecord::RecordNotFound exceptions (a very common exception to rescue in the canonical 'show' actions of your controllers), I decided to test it. I've been getting used to BDD with RSpec (and the Spec::Rails plugin), so I stumbled a bit when writing the spec.

    I finally settled on this:

    
    class DummyController < ApplicationController
      def index
      end
    end
    
    context 'A child class of ApplicationController' do
      controller_name :dummy
    
      specify 'should render a 404 error for ActiveRecord::RecordNotFound,
        ActionController::UnknownController,
        ActionController::UnknownAction,
        ActionController::RoutingError exceptions (in public)' do
    
        exceptions_404 = [
          ActionController::RoutingError.new('test'),
          ActiveRecord::RecordNotFound.new,
          ActionController::UnknownController.new,
          ActionController::UnknownAction.new]
    
        exceptions_404.each do |exception|
          controller.eigenclass.send(:define_method, :index) do
            raise exception.class, 'some message'
          end
    
          lambda {
            get 'index'
          }.should_raise(exception.class)
          controller.send :rescue_action_in_public, exception
    
          response.should be_missing
        end
      end
    end

    Notice the use of a dummy controller so that we can actually make a request to it (and get all the Rails magic and environment set up ready for testing). Also, I had to use instances of the exceptions rather than their classes because I'm sending a rescue_action_in_public message to the controller without knowing how to instantiate the exceptions (for example, ActionController::RoutingError actually has a constructor which requires at least 1 argument). So I create the exceptions first.

    The eigenclass method simply returns Ruby's canonical singleton class or metaclass, depending on who you talk to (i.e. class << self; self; end;) and I modify the dummy 'index' action to raise the exception. And here's the stinky part:

    
    lambda {
      get 'index'
    }.should_raise(exception.class)
    controller.send :rescue_action_in_public, exception

    Make a GET to the 'index' action, make sure it raises the exception and catch it (with should_raise - the assertion is unnecessary since I did override 'index' to raise the exception), and then force rescue_action_in_public to be called. Something's fishy here - why isn't the exception caught by default by rescue_action_in_public? I've set these to make sure that rescue_action_in_public is called but it seems like it never is called:

    
    ActionController::Base.consider_all_requests_local = false
    controller.eigenclass.send(:define_method, :local_request?) do
      false
    end

    I traced the code into ActionController::Rescue and everything seems to be in order. I'm stumped and weary, I think I'll look at this again tomorrow. Anyone see any obvious mistakes?

  • Setting up svnsync-ed (mirrored) SVN repositories on Ubuntu (part 1 of 2)

    This is a 2-part journal on setting up migrating and upgrading a Subversion repository, and then using svnsync to mirror the newly created repository. (Part 2)

    Initial setup

    Ever since Subversion 1.4 was released, I'd been eying the new svnsync tool because we had a single repository that was not, erm, really backed up (we had daily server backups and occasional manual repository dumps but that was it). svnsync promised to make repository mirroring simple, and after doing some repository migration and upgrading, I can assure you it really does make things easier than any other (more manual) repository backup solutions I had seen before. This is a walkthrough of how you can upgrade your pre-1.4 SVN repositories to 1.4.x, and setup svnsync to mirror your repositories. It's going to be very biased to Ubuntu but I'm sure you can translate any Ubuntu specific steps to your favorite distros.

    Here's our initial setup:

    • A pre-1.4 (it was version 1.2.3) SVN repository that needed to be upgraded and migrated to another server.
    • 2 cleanly installed Ubuntu 6.06 LTS VPSs, one of which is the intended target for the repository migration. The other would mirror the new 1.4.x repository (using svnsync).

    An un-installable Subversion 1.4.x?

    I wish I could have simply ran sudo apt-get install subversion and have Ubuntu pull down the latest 1.4.x .debs. Unfortunately, the version of Subversion in the Ubuntu apt-get repository is still 1.3.1 (which doesn't have svnsync). If anyone knows a reliable way to install Subversion 1.4.x via apt-get, let me know! I looked around for a good edge sources.list but came back empty-handed.

    I balk at installing stuff from source because I never did figure out how to easily clean out the stuff that gets installed. All thanks to this reluctance, I went digging around and found checkinstall. This thing is awesome - I wonder why I didn't manage to find it earlier.

    What checkinstall basically allows you to do is, instead of running the usual make install after the usual configure and make steps, it creates a Debian package (it also does RPMs and Slackware packages) for you that is easily un-installable with dpkg, and then proceeds to install the files just as it would have for any other deb.

    On Ubuntu it's really easy to install checkinstall, just:

    sudo apt-get install checkinstall

    Now, you no longer should type 'make install' - always use the 'checkinstall' command instead:

    
    ./configure
    make
    sudo checkinstall # instead of "make install"

    checkinstall will ask you a bunch of stuff but you can just go with the defaults for most of them - I did name my packages 'XXX from source', like 'Subversion 1.4.3 from source' so it's easier to check which packages are checkinstall-generated with a simple grep to dpkg -l.

    checkinstall generates a .deb (Debian) package before it actually installs your software (Subversion, in this case). It should tell you right at the end of its installation process about where to find this .deb and how to uninstall your newly installed (from source!) package (something like dpkg -r subversion-1.4.3). Don't delete this .deb yet as we will be using it to install Subversion 1.4.x on our mirror SVN server.

    Installing an un-installable Subversion 1.4.x on Ubuntu

    Now, blessed with our new checkinstall-granted powers, we can install Subversion from source without any qualms. Before we start, be sure to purge any existing Subversion packages you may have installed (do note that if you're using any packages that depend on the official Ubuntu Subversion packages, you may run into library version problems).

    
    dpkg -l | grep svn
    dpkg -l | grep subversion
    
    sudo dpkg --purge subversion
    sudo dpkg --purge libsvn0

    Now, it's time to get the source. Get it from the official Subversion website. Look for the source code download - the file should be something like this: http://subversion.tigris.org/downloads/subversion-1.4.3.tar.gz. Remember to get SVN dependencies (something like this: http://subversion.tigris.org/downloads/subversion-deps-1.4.3.tar.gz) as well as these are needed for access to 'http://' scheme SVN repositories. If you want to use Subversion to connect to a server via a http:// or https:// URL, you will require these dependencies (more specifically, the Neon library).

    Use something efficient like wget, curl or Axel (love Axel) to get the sources on the server where you want to install Subversion. Unpack them to the same directory. configure. make. checkinstall.

    
    tar zxf subversion-1.4.3.tar.gz
    tar zxf subversion-deps-1.4.3.tar.gz 
    cd subversion-1.4.3
    ./configure  # Be sure to read the INSTALL file for any options you may want to set (such as SSL)
    make
    sudo checkinstall

    If you get a warning "configure: WARNING: we have configured without BDB filesystem support" during your configure step, you'll get by just fine. Unless you specifically want your Subversion repositories in Berkeley DB format, we can ignore the warning (Subversion will use FSFS filesystem for your repositories) - see FSFS notes and Choosing a Data Store if you want to make an educated decision.

    Anyway, now with a brand new Subversion 1.4.x installed, we are finally ready for the real work - migrating your Subversion repository!

    Dumping and importing a repository

    Dumping a Subversion repository is dead easy:

    
    svnadmin dump /path/to/repository > repository_name.dump

    Depending on how big your repository is, you could end up with a pretty large dump file. gzip it, then scp it over to your new server, then gunzip it. Use svnadmin to load the repository dump.

    
    cd /var/svn  # I like to keep my svn repositories under /var/svn
    mkdir repository_name
    svnadmin create repository_name
    svnadmin load repository_name < /path/to/repository_name.dump

    If you have a good pipe between the source and destination servers, you can do this in a one-liner:

    svnadmin dump /path/to/repository | ssh -C [IP/domain of destination server] svnadmin load /path/to/new_repository

    Of course, all this dumping would require a temporary suspension of any repository write actions otherwise you're just going to have an inconsistent dump - just send out an email to your fellow developers and disable svn access.

    Setting up access to your new repository

    Now, you have a Subversion repository that is only accessible via the local filesystem (file:// 'protocol'), which isn't very useful. We'll need to setup remote access. Your Subversion repository can be accessed in a variety of ways, including:

    • svnserve standalone daemon (svn://)
    • svnserve with inetd (svn://)
    • svnserve over a SSH tunnel (svn+ssh://)
    • over the HTTP protocol (http:// and https://)

    The svnserve documentation details how to deal with the first 3, and setting up http:// and https:// access to your protocol is really a subject that deserves its own tutorial. Try the SVN book or Google.

    Personally I prefer svn+ssh:// access for internal projects since it allows me to unify authentication for my Subversion repositories with UNIX user accounts. Be wary of an angry cadre Windows developers though, since they need to take quite a good number of steps to setup public key authentication and integrate it with their svn clients on Windows machines. Integration with TortoiseSVN is quite a pain, though my Windows-using colleague at work found these useful: Putty and TortoiseSVN, Using Cygwin, Keychain, SVN+SSH and TortoiseSVN in Windows.

    svn:// access

    I also expose my repositories via svn:// (as we'll see later, this is useful for allowing access to a svnsync user without messing around with any UNIX user accounts) and use the xinetd daemon (apt-get install xinetd on Ubuntu to install) to launch svnserve process. If you're taking this path, create a file (I name it 'svn') in /etc/xinet.d to tell xinetd about svnserve.

    In /etc/xinet.d/svn:

    
    service svn
    {
            port                    = 3690
            socket_type             = stream
            protocol                = tcp
            wait                    = no
            user                    = www-data
            server                  = /usr/local/bin/svnserve
            server_args             = -i -r /var/svn
    }

    Notice that I needed to use the full path to svnserve (do a which svnserve to get the full path, making sure this is the 1.4.x version that you just installed). The server_args parameter also bears some explanation. The -i option tells svnserve to use inetd (xinetd is a variant of inetd, sorta). The -r /var/svn option tells svnserve to only expose repositories below that path. This basically translates your repository at /var/svn/my_cool_project to be accessible via svn://your.hostname/my_cool_project.

    svn+ssh:// access

    Accessing your repository this way basically logs in to the host server of your repository over SSH, invokes the svnserve process, and accesses your repository in a very file://-like manner. What this means is that your repository path is taken from the root of your filesystem. An example: a repository located in /var/svn/my_cool_project would be available at svn+ssh://your.hostname/var/svn/my_cool_project. For this reason I often symlink /svn to /var/svn (to get repository URLs like svn+ssh://your.hostname/svn/my_cool_project instead).

    Relocating working copies

    Now, all your working copies are still pointing to the old Subversion server - no need to fret, a simple svn switch fixes things:

    svn switch --relocate [from] [to]

    Replace '[from]' and '[to]' with the source and destination Subversion repository URLs.

    Remember to stop access to your old server so no one is making commits to the wrong place.

    Setting up svnsync

    I'd intended to write this entire piece in one blog post, but I'm running out of steam at this point. In Part 2, we'll actually setup svnsync for some repository mirroring goodness!

  • Checking for duplicate ActiveRecord objects

    I've been writing a database importer plugin for a Rails application that needs to data on some "legacy" production databases (well, not really legacy, but the schema differs from ActiveRecord conventions) with the intention of scheduling a cron job to run the imports. Why not connect the Rails app to the legacy databases? Hmm, let's see:

    • the records don't have to be up to date (so I can afford to, say, import yesterday's records today),
    • less jumping through hoops molding ActiveRecord models to the legacy databases,
    • the production database schema is liable to change - but this should not affect my Rails application,
    • there will be lower loads on the legacy databases which are in full-blown production use, and
    • most importantly, it gives me an excuse to figure out writing a data importer for a Rails application.

    And I am surprised that it actually was rather fun writing the importer plugin (data importing stuff is normally one of the most unexciting things a programmer can do, right next to writing lengthy requirements documentation and any kind of contact sport). It's basically a plugin that defines ActiveRecord models on the source (legacy) databases and then creates our Rails app's models from these. Importer classes allow me to then run the imports using script/runner like so:

    
    script/runner "HotelsImporter.import :start => 2.days.ago.to_date, :end => 1.day.ago.to_date" -e production

    Put that in a cron job and there you go, scheduled daily (or hourly, whatever) imports.

    But I digress. What was I actually going to talk about? Oh yes, checking for duplicate ActiveRecord objects. Now, the importers I wrote were run daily but there was the risk of re-importing the same data again (due to failed cron jobs, running the same job twice, acts of god, etc.). To be defensive, I needed to check that there were no existing records before importing them from the legacy databases.

    At this point I could decide to run uniqueness checks on any natural keys of each table (and Rails makes this really easy with AR validations, as we all know), or rely on a more convenient "the whole hog" field-by-field comparison. I settled on doing a field-by-field comparison after realizing that:

    • it's easier and I don't have to specify which natural fields constitute the natural keys, and
    • there are some tables which don't really have a natural keys (these generally belong to has_many side of an association).

    Update: As choonkeat pointed out in a comment below, I can simply use Post.find(:all, :conditions => new_post.attributes) since that stood out very clearly as the way to do it. This was actually the first way I tried to do this but it didn't work in the importer - I must have been doing something stupid! Doh! Thanks choonkeat for pointing out my blooper. Anyway you can mostly ignore what follows below but I'll keep it here to remind myself of my error.

    So I went looking for an easy way or a Railism to check whether an existing new ActiveRecord object already exists in the database. Hmm, I couldn't find anything helpful - I guess everyone is relying on AR validations. Still, I went ahead and mixed in a to_conditions instance method to ActiveRecord::Base - looks like my answer to everything nowadays is to re-open existing classes.

    
    module Bezurk #:nodoc:
      module ActiveRecord #:nodoc:
        module Extensions
          def to_conditions
            attributes.inject({}) do |hash, (name, value)|
              hash.merge(name.intern => value)
            end
          end
          alias :to_conditions_hash :to_conditions
        end
      end
    end
    
    # ...
    ActiveRecord::Base.send(:include, Bezurk::ActiveRecord::Extensions)

    So now in my importers I can easily check for potential duplicate entries:

    
    new_post.save! if Post.find(:all, :conditions => new_post.to_conditions).empty?

    Now, I just have this nagging suspicion that there is a better way to do this...

  • irb and script/console tab-completion

    Ugh, I wish I found this earlier: Tab Completion in IRb. I only went googling for this after I realized I have been tabbing to get auto-completion on script/console for a bit but it never sunk in that tab-completion wasn't ever working. Useful stuff, go set it up if you haven't already.

subscribe via RSS