Tech and Business Pearls

A Clean Python Shell Script

Guido van Rossum, the creator of Python, recently wrote a post on his blog about how Python makes great shell scripts, even (especially?) compared to shell scripts traditionally created in Bash and using purely shell commands.

Guido is absolutely correct.  Shell scripts birth themselves painfully from my fingertips.  Bash’s kludgy syntax irks my orderly sensibilities.  Typos frequent my unreadable scripts.  iffi?  Who invented this?

On the other hand, Python sticks to just a handful of language constructs, so the language does not force me to google how to create else statement every time I need to do it.  Python just makes sense.

One of the comments asks if Guido can show him some “really beautiful” python shell scripts.  I don’t mean to brag, and by no means do I think that my scripts in particular invite the light of the heavens to shine upon them, but I think I follow PEP 8 fairly closely and I pay attention to the brevity and clarity of my language.  I find that I can quickly debug my scripts because of their transparency at run-time and their concise self-annotating source code.

So, below I show a script which I call merge_branch.py. Do

chmod +x merge_branch.py 

so that you can run it from the command line with a simple ./merge_branch.py.  I alias it in ~/.bashrc.

alias mb='../path/to/merge_branch.py'

The script simplifies what I would have to do manually in git many times a day.  You see, git exemplifies a great version control system for keeping track of source code.  I create branches instantly.  This encourages me to work on bug fixes and new features separate from the main code base.  As others make changes, I can easily integrate their changes by merging the changed branch into my branch.  Git’s merge algorithm embarrasses any other I have ever used.  Some of Subversion’s merges still haunt my nightmares to this day.

So, git rocks.  Unfortunately, conflicts sometimes do occur.  So, the proper procedure for merging a branch needs to be followed carefully.  People are bad at doing things carefully. That’s okay.  We should spend more time on making mistakes being creative.  That is why we invented computers to do work for us.

For best results, merge the latest changes people made to master into your branch as often as possible.  Small changes incrementally will probably mean small merge fixes.  One big change will probably cause you major pains.  So, merge from master into your branch often.

If you use GitHub or another central repository with a number of other people, then you must do a number of things.  First, make sure that all the commits that you wanted to make are commited to your branch and that you didn’t leave any files out that you have not explicitly git-ignored.  This happens a lot in SVN and Git if you are not careful, and it is the greatest source of frustration for anyone who uses a system like this.  To human is to err, c’est la vie.  Then, git-checkout master.  Make sure that origin has the latest changes from your master.  Then git-pull the latest changes form origin (github) into master.  Then git-checkout your branch again.  Then git-merge master into your branch.  If there are any errors, fix them and commit, otherwise you are done.

Also, when you complete all the changes in your branch and all the tests pass, then you need to git-merge your branch into master and git-push it back up to origin (possibly github).  You need to follow the procedure above to ensure the latest master changes are included in your branch (preferably before you run the tests).  Then, you check out master, git-merge master into the branch (this will be clean since it should just be a fast forward because you already have the latest master).  git-push the changes to origin.  Finally, delete the branch that you just completed.

This tedium rotted my brain for weeks.  Finally, I resolved to write a script to solve the tedious parts, but bring possible errors to my attention if they occur.

Please let me describe to you a few features of the script.  First, I try to follow PEP 8 as much as I can.  I have read it at least times; you should too. Also, recite the Zen of Python every night before you go to bed.

Notice how I start the script with a shebang line which says /usr/bin/env python, the preferred way to start Python since it is most flexible.  For example, I can use Python 2.6, or my own local version of Python.

I use the logging module which is part of Python’s very large standard library.  Logging gives you so much for free.  For example, instead of commenting out print statements, just change the default logging level threshold.  Always use the logging module instead of using print for everything.  Always.  It’s as easy as import logging; logging.error(’…’).  Also, the logging.basicConfig(…) I use here is the same one I use everywhere.  Logging the time that a message appeared saves hours and hours when I debug long-running scripts.

Use the optparse module in every Python shell script you write (getopt is too weak and will end up being much, much less simple by the end of a non-trivial program).  Again, you get so much for free, like a -h help command.  The documentation for optparse explains how to do everything in detail.  Make sure you set the usage parameter.  Also, make sure you call parser.error() for input option errors instead of raising an exception yourself.

Write utility functions.  Use the power and simplicity of Python to your favor.  Here, I use call_command().  I use it throughout the script and it makes the code so clean and clear.

Finally, I like to put the main() function of scripts at the top.  That makes the most sense to me.  If I open a file, I want to instantly read what it does, not read what its utility functions do.  I put the utility functions below.  Of course, at the bottom, after everything else has loaded, I place the if __name__=”__main__” code and then call main().   This way, I can import this as a module (in, for example, py.test or iPython) to test the utility functions without running the actual script.  (warning:  Do not put anything except the call to main() at the bottom.  Otherwise, you may not realize under what circumstances you call main() and with what parameters.)

Here is the script:

#!/usr/bin/env python
import os
import re
import subprocess
import logging
import optparse

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(message)s')

def main():
    usage = "usage: %prog [options]"
    parser = optparse.OptionParser(usage)
    parser.add_option("-m", "--merge-master", dest="merge_master",
                    action="store_true",
                    default=False,
                    help="Merges the latest master into the current branch")
    parser.add_option("-B", "--merge-branch", dest="merge_branch",
                    action="store_true",
                    default=False,
                    help="Merge the current branch into master; forces -m")
    options, args = parser.parse_args()

    if not options.merge_master and not options.merge_branch:
        parser.error('Must choose one-- try -m or -B')

    # Merging branch requires latest merged master
    if options.merge_branch:
        options.merge_master = True

    if options.merge_master:
        output,_ = call_command('git status')
        match = re.search('# On branch ([^\s]*)', output)
        branch = None
        if match is None:
            raise Exception('Could not get status')
        elif match.group(1) == 'master':
            raise Exception('You must be in the branch that you want to merge, not master')
        else:
            branch = match.group(1)
            logging.info('In branch %s' % branch)

        if output.endswith('nothing to commit (working directory clean)\n'):
            logging.info('Directory clean in branch: %s' % branch)
        else:
            raise Exception('Directory not clean, must commit:\n%s' % output)

        logging.info('Switching to master branch')
        output,_ = call_command('git checkout master')

        output,_ = call_command('git pull')
        logging.info('Pulled latest changes from origin into master')
        logging.info('Ensuring master has the latest changes')
        output,_ = call_command('git pull')
        if 'up-to-date' not in output:
            raise Exception('Local copy was not up to date:\n%s' % output)
        else:
            logging.info('Local copy up to date')

        logging.info('Switching back to branch: %s' % branch)
        output,_ = call_command('git checkout %s' % branch)

        output,_ = call_command('git merge master')
        logging.info('Merged latest master changes into branch: %s' % branch)
        logging.info('Ensuring latest master changes in branch: %s' % branch)
        output,_ = call_command('git merge master')
        if 'up-to-date' not in output:
            raise Exception('Branch %s not up to date:\n%s' % (branch, output))
        else:
            logging.info('Branch %s up to date' % branch)

        logging.info('Successfully merged master into branch %s' % branch)

    if options.merge_branch:
        logging.info('Switching to master branch')
        output,_ = call_command('git checkout master')

        output,_ = call_command('git merge %s' % branch)
        logging.info('Merged into master latest branch changes: %s' % branch)

        output,_ = call_command('git branch -d %s' % branch)
        logging.info('Deleted safely branch: %s' % branch)

        call_command('git push')
        logging.info('Pushed master up to origin')
        logging.info('Ensuring that origin has latest master')
        stdout,stderr = call_command('git push')
        if stderr == 'Everything up-to-date\n':
            logging.info('Remote repository up to date: %s' % branch)
        else:
            raise Exception('Remote repository not up to date:\n%s' % output)

        logging.info('Successfully merged branch %s into master and pushed to origin' % branch )
def call_command(command):
    process = subprocess.Popen(command.split(' '),
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    return process.communicate()
if __name__ == "__main__":
    main()

over 8 years ago on November 17 at 11:00 pm by Joseph Perla in hacks, technology


blog comments powered by Disqus

Hi, my business card says Joseph Perla. Former VP of Technology, founding team, Turntable.fm. My first college startup was in the education space. My second was Labmeeting, a cross between Google, LinkedIn, and Facebook for scientists. I dropped out of Princeton (twice).

I love to advise and help startups. My code on Github powers many websites and iPhone apps. I give talks about startup tech around the US and also internationally at conferences in Florence. incubators in Paris, and startups in Budapest.

Twitter: @jperla

Subscribe to my mailing list

* indicates required

Favorite Posts

Y Combinator Application Guide
What to do in Budapest
How to hack Silicon Valley, meet CEO's, make your own adventure
Your website is unviral
The Face that Launched a Thousand Startups
Google Creates Humanoid Robot, Programs Itself

Popular Posts

How to launch in a month, scale to a million users
Weby templates are easier, faster, and more flexible
Write bug-free javascript with Pebbles
How to Ace an IQ Test
Capturing frames from a webcam on Linux
A Clean Python Shell Script
Why Plant Rights?

Recent Posts

Working Copy is a great git editor
Venture Capital is broken
The nature of intelligence: brain bowls, cogniphysics, and prochines
Bitcoin: A call-to-arms for technologists
Stanford is startups
Today is Internet Freedom Day! DRM-free book about Aaron Swartz's causes

More...