Joseph Javier Perla

The site design looks balanced when this sentence is long

Archive for November, 2008

Log Reader 3000

with one comment

I wanted to show off another python script today.  I think it’s pretty cool.  It’s kind of like a very rudimentary version of something you might see in Iron Man.  And, of course, anything in Iron Man is cool.

I dub it Log Reader 3000.  It’s purpose?  It helps me monitor logs.  How?  Well, sometimes I need to follow a log in real time as it is written, but I can quickly get bored.  The log scrolls by endlessly while, very often, little new information spits itself out.  I can quickly lose focus, or at the very least, damage my vision after staring at a screen intently for extended periods.

Ideally, I want the log to simply flow through me, and if my subconscious notices something odd, then I can act on it.  If the log is read aloud to me, then I can work on other tasks and let my auditory memory and auditory processing take note of oddities on which I need to act.

So, I made a python script to read the log out to me as it is written. It is my first Python 2.6 script.  I take advantage of the new multiprocessing module built into the standard library.  I also use the open-source festival text-to-speech tool.

First, install festival.  sudo apt-get install festival in Ubuntu.  You probably want to set it up to work with ALSA or ESD sound.  By default, festival uses /dev/dsp, which means that you can’t use festival and any other program that uses audio (like Skype) at the same time.   Fortunately, and as usual, Ubuntu provides detailed, simple instructions to set up festival with ALSA: https://help.ubuntu.com/community/TextToSpeech .

Finally, just find an appropriate use case.  Note that most log monitoring applications would not be improved with Log Reader 3000.  If you just want to be notified of errors, you should have a program email you when an error appears in a log.  If you want to understand the log output of a program that has already run, understand that Log Reader 3000 is meant for live-running programs.  Yes, Log Reader 3000 can be modified to read any text file line-by-line.  But, you will find that reading ends up being much faster than listening to a slow automated voice, so I recommend that you just try to skim a completed program’s output with VIM.

So then why ever use Log Reader 3000?  It is useful for applications which fit all of the following criteria:

  1. you want to monitor a live running program
  2. and the debugging information is nuanced and you need a human to interpret it (i.e. it cannot be filtered programmatically) and/or you want to be able to intervene while the program is running to keep it doing what it ought to be doing in real time.

Applications:

  • Say that you are spidering the web, and what the spider should and should not be spidering is not yet well-defined, but a human knows, then the Log Reader 3000 can read aloud where the spider is, and the human can correct course as he or she notices the spider going astray.
  • Or, say that you are working on some kind of artificial intelligence.  Perhaps, the AI program can reason aloud and a human can correct or redirect the machine’s reasoning as it goes along.  I have no idea how or why an AI would do that.
  • Maybe you want to protect against bot attacks, but your aggressor is particularly clever and seems to avoid looking like a bot in all of the obvious ways.  You can pipe the output of your log into Log Reader 3000 and notice new kinds of suspicious patterns live while reclining in your chair or surfing the web.
  • You run a securities trading program.  You have numerous checks and double-checks to ensure that everything works correctly.  Nevertheless, you need to have a human monitoring the system as a whole continuously anyway, so you have Log Reader 3000 read aloud total portfolio value, or live trades, or trading efficiency, or fast-moving securities, or all of the above.
  • The lobby of your startup has a TV screen with graphs of user growth and interaction on the site.  You want to increase the coolness factor by having a computer voice read aloud some of the searches or conversations happening on your site live.
  • You make a living by selling cool techy art projects which blend absurdity with electronics.  You read aloud live google searches, or live wikipedia edits, or inane YouTube comments out of what looks like a spinning vinyl record.  Passersby whisper of your genius.

Once you have the application, just tail -f the log, parse out the parts you want the log reader to read (you can use awk for that, for example, or maybe a simple python script), and pipe that into the Log Reader 3000.

tail -f output.log | awk “{ print $1 }” | ./log_reader_3000.py

How does Log Reader 3000 work?  The main process reads in one line at a time.  As it reads in each line from stdin, it sends it to the processing queue.  The child process reads the last item in the queue (it discards the items at the top of the queue because those are old and we need to catch up with the latest output line) and then calls a function to say() the line.  The say() function simply uses the subprocess module to call festival in a separate process and then blocks until it is done saying it aloud.

Because having a computer voice read aloud a sentence takes a while, the log probably outputs many more lines than can be read aloud.  That is why a multiprocess queue is needed, and that is why Log Reader 3000 only reads out the most recent line which has been output, which is why it is most useful for specific applications.

Here is the script, log_reader_3000.py:

#!/usr/bin/env python2.6
import sys
import subprocess
from multiprocessing import Process, Queue
def say(line):
    say = '(SayText "%s")' % line
    echo = subprocess.Popen(['echo', say], stdout=subprocess.PIPE)
    subprocess.call(['festival'], stdin=echo.stdout)
def listen_to_lines(queue):
    line = 'I am Log Reader 3000.  The world is beautiful.'
    while True:
        while not queue.empty():
            line = queue.get()
        say(line)
queue = Queue()
p = Process(target=listen_to_lines, args=(queue,))
p.start()
while True:
    line = sys.stdin.readline()
    sys.stdout.write(line)
    queue.put(line)

Written by Joseph Perla

November 21st, 2008 at 4:33 am

Posted in Art, Hacks, Technology

A Clean Python Shell Script

without comments

Guido van Rossum, the creator of Python, recently wrote a post on his blog about how Python makes great shell scripts, even (especially?) compared to shell scripts traditionally created in Bash and using purely shell commands.

Guido is absolutely correct.  Shell scripts birth themselves painfully from my fingertips.  Bash’s kludgy syntax irks my orderly sensibilities.  Typos frequent my unreadable scripts.  iffi?  Who invented this?

On the other hand, Python sticks to just a handful of language constructs, so the language does not force me to google how to create else statement every time I need to do it.  Python just makes sense.

One of the comments asks if Guido can show him some “really beautiful” python shell scripts.  I don’t mean to brag, and by no means do I think that my scripts in particular invite the light of the heavens to shine upon them, but I think I follow PEP 8 fairly closely and I pay attention to the brevity and clarity of my language.  I find that I can quickly debug my scripts because of their transparency at run-time and their concise self-annotating source code.

So, below I show a script which I call merge_branch.py. Do

chmod +x merge_branch.py 

so that you can run it from the command line with a simple ./merge_branch.py.  I alias it in ~/.bashrc.

alias mb='../path/to/merge_branch.py'

The script simplifies what I would have to do manually in git many times a day.  You see, git exemplifies a great version control system for keeping track of source code.  I create branches instantly.  This encourages me to work on bug fixes and new features separate from the main code base.  As others make changes, I can easily integrate their changes by merging the changed branch into my branch.  Git’s merge algorithm embarrasses any other I have ever used.  Some of Subversion’s merges still haunt my nightmares to this day.

So, git rocks.  Unfortunately, conflicts sometimes do occur.  So, the proper procedure for merging a branch needs to be followed carefully.  People are bad at doing things carefully. That’s okay.  We should spend more time on making mistakes being creative.  That is why we invented computers to do work for us.

For best results, merge the latest changes people made to master into your branch as often as possible.  Small changes incrementally will probably mean small merge fixes.  One big change will probably cause you major pains.  So, merge from master into your branch often.

If you use GitHub or another central repository with a number of other people, then you must do a number of things.  First, make sure that all the commits that you wanted to make are commited to your branch and that you didn’t leave any files out that you have not explicitly git-ignored.  This happens a lot in SVN and Git if you are not careful, and it is the greatest source of frustration for anyone who uses a system like this.  To human is to err, c’est la vie.  Then, git-checkout master.  Make sure that origin has the latest changes from your master.  Then git-pull the latest changes form origin (github) into master.  Then git-checkout your branch again.  Then git-merge master into your branch.  If there are any errors, fix them and commit, otherwise you are done.

Also, when you complete all the changes in your branch and all the tests pass, then you need to git-merge your branch into master and git-push it back up to origin (possibly github).  You need to follow the procedure above to ensure the latest master changes are included in your branch (preferably before you run the tests).  Then, you check out master, git-merge master into the branch (this will be clean since it should just be a fast forward because you already have the latest master).  git-push the changes to origin.  Finally, delete the branch that you just completed.

This tedium rotted my brain for weeks.  Finally, I resolved to write a script to solve the tedious parts, but bring possible errors to my attention if they occur.

Please let me describe to you a few features of the script.  First, I try to follow PEP 8 as much as I can.  I have read it at least times; you should too. Also, recite the Zen of Python every night before you go to bed.

Notice how I start the script with a shebang line which says /usr/bin/env python, the preferred way to start Python since it is most flexible.  For example, I can use Python 2.6, or my own local version of Python.

I use the logging module which is part of Python’s very large standard library.  Logging gives you so much for free.  For example, instead of commenting out print statements, just change the default logging level threshold.  Always use the logging module instead of using print for everything.  Always.  It’s as easy as import logging; logging.error(’…’).  Also, the logging.basicConfig(…) I use here is the same one I use everywhere.  Logging the time that a message appeared saves hours and hours when I debug long-running scripts.

Use the optparse module in every Python shell script you write (getopt is too weak and will end up being much, much less simple by the end of a non-trivial program).  Again, you get so much for free, like a -h help command.  The documentation for optparse explains how to do everything in detail.  Make sure you set the usage parameter.  Also, make sure you call parser.error() for input option errors instead of raising an exception yourself.

Write utility functions.  Use the power and simplicity of Python to your favor.  Here, I use call_command().  I use it throughout the script and it makes the code so clean and clear.

Finally, I like to put the main() function of scripts at the top.  That makes the most sense to me.  If I open a file, I want to instantly read what it does, not read what its utility functions do.  I put the utility functions below.  Of course, at the bottom, after everything else has loaded, I place the if __name__=”__main__” code and then call main().   This way, I can import this as a module (in, for example, py.test or iPython) to test the utility functions without running the actual script.  (warning:  Do not put anything except the call to main() at the bottom.  Otherwise, you may not realize under what circumstances you call main() and with what parameters.)

Here is the script:

#!/usr/bin/env python
import os
import re
import subprocess
import logging
import optparse

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(message)s')

def main():
    usage = "usage: %prog [options]"
    parser = optparse.OptionParser(usage)
    parser.add_option("-m", "--merge-master", dest="merge_master",
                    action="store_true",
                    default=False,
                    help="Merges the latest master into the current branch")
    parser.add_option("-B", "--merge-branch", dest="merge_branch",
                    action="store_true",
                    default=False,
                    help="Merge the current branch into master; forces -m")
    options, args = parser.parse_args()

    if not options.merge_master and not options.merge_branch:
        parser.error('Must choose to merge master into the current branch, or merge the current branch into the latest master; try -m or -B')

    # Merging branch requires latest merged master
    if options.merge_branch:
        options.merge_master = True

    if options.merge_master:
        output,_ = call_command('git status')
        match = re.search('# On branch ([^\s]*)', output)
        branch = None
        if match is None:
            raise Exception('Could not get status')
        elif match.group(1) == 'master':
            raise Exception('You must be in the branch that you want to merge, not master')
        else:
            branch = match.group(1)
            logging.info('In branch %s' % branch)

        if output.endswith('nothing to commit (working directory clean)\n'):
            logging.info('Directory clean in branch: %s' % branch)
        else:
            raise Exception('Directory not clean, must commit:\n%s' % output)

        logging.info('Switching to master branch')
        output,_ = call_command('git checkout master')

        output,_ = call_command('git pull')
        logging.info('Pulled latest changes from origin into master')
        logging.info('Ensuring master has the latest changes')
        output,_ = call_command('git pull')
        if 'up-to-date' not in output:
            raise Exception('Local copy was not up to date:\n%s' % output)
        else:
            logging.info('Local copy up to date')

        logging.info('Switching back to branch: %s' % branch)
        output,_ = call_command('git checkout %s' % branch)

        output,_ = call_command('git merge master')
        logging.info('Merged latest master changes into branch: %s' % branch)
        logging.info('Ensuring latest master changes in branch: %s' % branch)
        output,_ = call_command('git merge master')
        if 'up-to-date' not in output:
            raise Exception('Branch %s not up to date:\n%s' % (branch, output))
        else:
            logging.info('Branch %s up to date' % branch)

        logging.info('Successfully merged master into branch %s' % branch)

    if options.merge_branch:
        logging.info('Switching to master branch')
        output,_ = call_command('git checkout master')

        output,_ = call_command('git merge %s' % branch)
        logging.info('Merged into master latest branch changes: %s' % branch)

        output,_ = call_command('git branch -d %s' % branch)
        logging.info('Deleted safely branch: %s' % branch)

        call_command('git push')
        logging.info('Pushed master up to origin')
        logging.info('Ensuring that origin has latest master')
        stdout,stderr = call_command('git push')
        if stderr == 'Everything up-to-date\n':
            logging.info('Remote repository up to date: %s' % branch)
        else:
            raise Exception('Remote repository not up to date:\n%s' % output)

        logging.info('Successfully merged branch %s into master and pushed to origin' % branch )
def call_command(command):
    process = subprocess.Popen(command.split(' '),
                               stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)
    return process.communicate()
if __name__ == "__main__":
    main()

Written by Joseph Perla

November 17th, 2008 at 11:00 pm

Posted in Hacks, Technology

Mark Zuckerberg Wears My Sandals

without comments

mark zuckerberg

Valleywag reports that Facebook still has not started firing employees as every other Silicon Valley startup has. But, the most interesting part of the article lies hidden within the picture.

Mark Zuckerberg (the billionaire) wears the same sandals I do: old-school Adidas sandals .  I don’t know anyone else who wears them.

I just bought these Rainbows in Miami which are pretty nice.

Written by Joseph Perla

November 15th, 2008 at 4:15 am

Posted in News, Personal