Wednesday, November 30, 2011

Firefox add-ons for the occasional web developer

I'm not a hardcore web developer but I do some occasional web-based work, and one of the issues I have is that - because web applications exist in an environment which spans both browser and server (and which often seems to hide the workings of its components) - it can be quite difficult to see under the hood when there are problems.

Fortunately there are a number of adds-on for Firefox (my browser of choice) that can help. These are the ones that I like to use:
  • Firebug http://getfirebug.com/ is possibly one of the most essential add-ons for web development. I first came across it as a Javascript logger and debugger, but it's far more than that: describing itself as a complete web development tool, its functionality extends to HTML and CSS in additional to script profiling and network analysis capabilities. As an occasional user I've found the Javascript debugging functions invaluable, and the ability to edit CSS in-place and see the results immediately has also been really helpful in debugging style sheets - in fact its biggest downside from my perspective is that it's not immediately obvious how to use many of its functions.
  • Live HTTP Headers http://livehttpheaders.mozdev.org/ is a great tool for exposing the interactions between your browser and a server. I found this invaluable when I was debugging some website functionality that I was developing earlier this year, as it enabled me to follow a redirect that I'm sure I couldn't have seen otherwise.
  • QuickJava http://quickjavaplugin.blogspot.com/ is a utility that allows support for Java, Javascript, Flash, Silverlight, images and others to be toggled on and off within your browser, enabling you to check how a page behaves when viewed by someone who doesn't have these enabled.
  • I really like the HTML Validator http://users.skynet.be/mgueury/mozilla/ for ensuring that my HTML markup is actually W3C compliant; the main issue with this is that it's only available for Windows platforms. Provided you have the "Add-on Bar" visible in Firefox (toggle via "Options" in the Firefox main menu, or do ctrl+/), this displays a little icon at the bottom of the screen indicating the goodness or otherwise of your markup.

There are a few other useful add-ons for working with design elements like colours and images:

  • Colorzilla http://www.colorzilla.com/ is a tool that allows you (among other things) to pick colours from the current webpage and get the corresponding hex codes or RGB values.
  • Measureit http://frayd.us/ creates a ruler that lets you measure the size of page elements in pixels - particularly helpful when sizing images for web display.
  • In the past I've found the in-browser screen capture utility Fireshot http://screenshot-program.com/fireshot/ quite handy for taking screenshots of an entire webpage including the off-screen portions. I have to admit I haven't used it for a while though. There's a paid "pro" version which offers a lot of additional functionality.
Although I've given URLs, the easiest way to install any of these is via the "Get Add-ons" tab accessed via the "Add-ons" option in Firefox's main menu (I'm using Firefox 8.0 at the time of writing). Once installed the individual add-ons appear in various places, for example by default Firebug's icon can be found at the top-right hand corner. If an add-on's icon doesn't appear automatically (as seems to happen for Measureit) then you might have to add it manually: go to "Options"/"Toolbar layout", locate the item and drag it to the toolbar.

I wouldn't try to argue that this is definitive list, but for an occasional user like as myself these tools work well and (with the exception of Firebug) are easy to remember how to use even after several months away from them. However if these don't meet your needs then I'd recommend checking out the "Web Development" category of Mozilla's add-ons site for many more options.

Sunday, November 13, 2011

Creative Commons overview

A while ago I came across an interesting overview of the Creative Commons licence for digital content by Jude Umeh in the BCS "IT Now" newsletter ("Flexible Copyright", also available via the BCS website at http://www.bcs.org/content/conBlogPost/1828 as "Creative Commons: Addressing the perils of re-using digital content"), which I felt gave a very clear and concise introduction to the problem that Creative Commons (CC) is trying to solve, how it works in practice, and some of the limitations.

Essentially, anyone who creates online content - whether a piece of writing (such as this blog), an image (such as a photo in my Flickr stream), or any other kind of media - automatically has "all rights reserved" copyright on that content. This default position means that the only way someone else can (legally) re-use that content is by explicitly seeking and obtaining the copyright owner's permission (i.e. a licence) to do so. As you might imagine this can present a significant barrier to re-using online content.

The aim of the Creative Commons is to enable content creators to easily pre-emptively grant permissions for others to re-use their work, by providing a set of free licences which bridge the gap between the "all rights reserved" position (where the copyright owner retains all rights on their work) and "public domain" (where the copyright owner gives up those rights, and allows anyone to re-use their work in any way and for any purpose).

These licences are intended to be easily understood and provide a graduated scale of permissiveness. According to the article the six most common are:

  • BY ("By Attribution"): this is the most permissive, as it grants permission to reuse the original work for any purpose - including making "derived works" - with no restrictions other than that it must attributed to the original author.
  • BY-SA ("By Attribution-Share Alike"): the same as BY, with the additional restriction that any derived work must also be licensed as BY-SA.
  • BY-ND ("By Attribution-No Derivatives"): the original work can be freely used and shared with attribution, but derivative works are not allowed.
  • BY-NC ("By Attribution-Non-Commerical"): as with BY, the original work can be used, shared and used in derived works, provided attribution is made to the original author; however the original work cannot be used for commercial purposes.
  • BY-NC-SA ("By Attribution-Non-Commercial-Share Alike"): similar to BY-SA, so any derived work must use the same BY-NC-SA licence, and like BY-NC, in that commercial use of the original work is not permitted.
  • BY-NC-ND ("By Attribution-Non-Commercial-No Derivatives"): the most restrictive licence (short of "all rights reserved"), as this only allows re-use of the original work for non-commercial purposes, and doesn't permit derivative works to be made. Umeh states that BY-NC-ND is "often regarded as a 'free advertising' licence".

As Umeh points out, "CC is not a silver bullet", and his article cites examples of some of its limitations and potential pitfalls. Elsewhere I've also come across some criticisms of using the non-commercial CC licences in certain contexts: for example, the scientist Peter Murray Rust has blogged about what he sees as the negative impact of CC-NC licensing in science and teaching (see "Suboptimal/missing Open Licences by Wiley and Royal Society" http://blogs.ch.cam.ac.uk/pmr/2011/10/27/suboptimalmissing-open-licences-by-wiley-and-royal-society/ and "Why you and I should avoid NC licences" http://blogs.ch.cam.ac.uk/pmr/2010/12/17/why-i-and-you-should-avoid-nc-licences/).

However it's arguable that these are special cases, and that more generally CC-based licensing has a significant and positive impact on enabling the legal re-use of online material that would otherwise not be possible: indeed, even the posts cited above only criticise its NC aspects, and otherwise see the CC as greatly beneficial. Certainly it's worth investigating if you're interested in allowing others to reuse digital content that you've produced (there's even a page on the CC website to help choose the appropriate CC licence based on answers to plain English questions: http://creativecommons.org/choose/).

As I'm not an expert on CC (or indeed on copyright law or content licensing), I'd recommend Umeh's article as the next step for a more comprehensive and expert overview; and beyond that of course more information can be found at the Creative Commons website http://www.creativecommons.org/ (with the UK-specific version due to become available at http://www.creativecommons.org.uk later this month).

Monday, October 24, 2011

Day Camp 4 Developers: Project Management

Just over three of weeks ago I attended the third online Day Camp 4 Developers event, which this time focused on the subject of project management. The DC4D events are aimed at filling the "soft skills" gap that software developers can suffer from, and rather than being a "how-to" on project management (arguably there are already plenty of other places you can learn the basics) the six speakers covered a range of topics around the subject - some of which I wouldn't initially have thought of in this context (for example, dealing with difficult people). However as one of the speakers noted, fundamentally project management is as much about people as it is about process, and all of them delivered some interesting insights.

The first talk (which unfortunately I missed the start of through my own disorganisation) by Brian Prince about "Hands-on Agile Practices" covered the practical implementation of Agile process in a lot of detail. I've never worked with Agile myself, but I have read a bit about it in the past and Brian's presentation reminded me of a few Agile concepts that sound like they could be usefully adopted elsewhere. For example: using "yesterday's weather" (i.e. statistically the weather tomorrow is likely to be the same as today's) as a way to plan ahead by considering recent performance; and the guidelines for keeping stand-up meetings concise could also be applied to any sort status meeting (each person covers the three points "what I did yesterday", "what I'm doing today and when it will be done", and "what issues I have"). The idea of focusing on "not enough time" rather than "too much to do" also appealed to me.

The next presentation "Dealing with Difficult People" by Elizabeth Naramore turned out to be an expected delight. Starting by asking what makes people you know "difficult" to interact with, she identified four broad types of behaviour:
  • "Get it done"-types are focused on getting information and acting on it quickly, so their style is terse and to-the-point,
  • "Get it right"-types are focused on detail, so their style is precise, slow and deliberate,
  • "Get along"-types are focused on making sure others are happy, so their style is touchy-feely and often sugar-coated, and
  • "Get a pat on the back"-types are focused on getting their efforts recognised by others, so their style is more "person-oriented".
The labels are instantly memorable and straight away I'm sure we can all think of people that we know who fit these categories (as well as ourselves of course). Elizabeth was at pains to point out that most people are a mixture of two or more, and that none of them are bad (except when someone is operating at an extreme all of the time). The important point is that they affect how people communicate, so if you can recognise and adapt to other people's styles, and learn to listen to them, then you'll stand a better chance of reducing your difficult interactions.

Rob Allen was next up with "Getting A Website Out Of The Door" (subtitled "Managing a website project"), and covered the process used at Big Room Internet for spec'ing, developing and delivering website projects for external clients. Rob included a lot of detail on each part of the process, what can go wrong, and how they aim to manage and reduce the risks of that happening. One specific aspect that I found interesting was the change control procedure, which is used for all change requests from their clients regardless of the size of the change, essentially:
  • Write down the request
  • Understand the impact
  • Decide whether to do it
  • Do the work!
I think that the second point here is key: you need to understand what the impact will be, and how much work it's really going to be (I'm sure we've all agreed at one or another to make "trivial" changes to code which have turned out in practice to be far more work than anyone first imagined). A more general point that Rob made was the importance of clear communication, particularly in emails (which should have a subject line, summary, action and deadline).

Rob was followed by Keith Casey talking about how "Project Management is More Than Todo Lists". One of the interesting aspects of Keith's talk was that he brought an open source perspective to the subject. In open source projects the contributors are unpaid and so understanding how their motivations differ from doing work paid work is important for the projects to successful: as Keith said early on in his talk, in this case "it's about people".

He argued that people managing open source projects should pay attention to the uppermost levels of Maslow's Hierarchy of Needs (where the individual focuses on "esteem" and "self-actualisation"), but there was also a lot of practical advice: for example, having regular and predictable releases; ensuring that bugs and feature requests are prioritised regularly, and that developements should be driven input and involvement by the community. I particularly liked the practical suggestion that frequently asked questions can be used to identify areas of friction that need to be simplified or clarified. He also recommended Karl Fogel's book "Producing Open Source Software", which looks like it would be a good read.

Thursday Bram's presentation "Project Management for Freelancers" was another change of direction (and certainly the subtitle "How Freelancers Can Use Project Management to Make Clients Happier than They've Ever Been Before" didn't lack ambition). She suggested that for freelancers, project management is at least in part about helping clients to recognise quality work - after all they're not experts in coding (that's why they hired you), so inevitably they have an issue with knowing "what does quality look like?". (If you've ever paid for a service such as car servicing or plumbing then I'm sure you can relate to this.) So arguably one function of project management is to provide a way to communicate the quality of your work. The key message that I took away from Thursday's talk was that "what makes people happy is 'seeing progress' on their projects". Again I felt this was an idea that I could use in my (non-freelancer) work environment.

The last session of the day was Paul M. Jones talking about "Estimating and Expectations". Essentially we (i.e. everyone) are terrible at making estimates, as illustrated by his "laws of scheduling and estimates":
  • Jones' Law: "if you plan for the worst, then all surprises are good surprises"
  • Hofstadter's Law: "it always takes longer than you expect - even when you take Hofstadter's Law into account"
  • Brooks' Law: "adding more people to a late software project will make it even later"
However there are various strategies and methods we can use to try and make our estimates better: for example, using historical data and doing some design work up front can both provide valuable knowledge for improved estimates. In this context Paul also had my favourite quote of the day: "It's not enough to be smart; you actually have to know things" (something that I think a lot of genuinely clever people can often forget, especially when they move into a domain that's new to them).

It felt like Paul packed an immense amount of material into this talk, covering a wide range of different areas and offering a lot of practical advice drawn from various sources (Steve McConnell's Code Complete, Fred Brooks' The Mythical Man Month and Tom DeMarco and Timothy Lister's Peopleware: Productive Projects & Teams were all mentioned) both for estimation techniques and for expectation management - where ultimately communication and trust are key (a message that seemed to be repeated throughout the day).

In spite of a few minor technical issues (the organisers had opted to use a new service called Fuzemeeting, which I guess was still ironing out some wrinkles), overall everything ran smoothly, and at the end I felt I'd got some useful ideas that I feel I can apply in my own working life - in the end surely that's the whole idea. It was definitely worth a few hours of my weekend, and I'm looking forward to being able to see some of the talks again when the videocasts become available. In the meantime if any of this sounds interesting to you then I'd recommend checking out the DC4D website and watching out for the next event!

Sunday, October 16, 2011

Using Fedora 15 & Gnome 3: an update

Following up on my previous posting about the Fedora 15/Gnome 3 user experience, I've now been using it as a day-to-day working environment for the last 4 1/2 months and thought it was time to post a brief update.

Overall the experience has been pretty good (although I gather a lot of other commentators on the web wouldn't agree). For me the least satisfying aspect is still the automated workspaces/virtual desktops, closely followed by the default left-click behaviour of icons in the favourites sidebar. Both of these continue to catch me out from time to time, but I'd class their deficiencies as merely irritating rather than unusable.

Another aspect that I complained about in my previous post was the limited set of customisations that seemed to be available. However I've since discovered the gnome-tweak-tool, which provides access to a much wider range of customisations than is offered via the "Preferences" options. (This and many other useful features are covered in Fedora's release notes for desktop users, which I should probably have read right at the start.)

It's likely that you'll need to explicitly install it as it doesn't appear to be there by default, i.e.:

% yum install gnome-tweak-tool

(Nb this requires superuser privileges). To launch, start from the command line (or go to the "Applications" desktop view and use the search box to look for "tweak"). The tool itself looks like this:

Fedora 15: gnome-tweak-tool: "Fonts" tab

Figure 1: gnome-tweak-tool displaying the "Fonts" tab

There are various categories ("Fonts", "Interface" etc) with a set of options for each, and at first glance there doesn't seem to be that many options available. However if the one you want might doesn't appear to be there then it's worth typing in some search terms to see if something comes up (for example, this is how I found the option for displaying the full date next to the time at the top of the screen).

Another useful utility is gnome-session-properties (again, seems easiest to launch from the command line), which really doesn't have many options but does allow you to customise which applications start up on login:

Fedora 15: gnome-session-properties

Figure 2: gnome-session-properties dialog

As you can see by the fact that I'm still using the default desktop wallpaper, I'm not big on customisations (in fact my needs are basic: web browser, email client, terminal window, Emacs and some development tools are usually sufficient), however these additional tools have helped make me feel a little more at home, and generally I'm pretty happy with the setup now.

Finally I thought I'd give the GnomeShell Cheatsheet page at http://live.gnome.org/GnomeShell/CheatSheet a quick mention. It covers similar ground to my previous post but from a more expert perspective and with some useful extra detail.

Wednesday, July 27, 2011

Book review: “Python Testing Cookbook” by Greg L. Turnquist

Disclosure: a free e-copy of this book was received from the publisher for review purposes. The opinions expressed here are entirely my own; a copy of this review has also been posted at Amazon.

Greg L. Turnquist’s “Python Testing Cookbook” explores automated testing at all levels, with the intention of providing the reader with the knowledge needed to implement testing using Python tools to improve software quality. To this end the book presents over 70 “recipes” in its nine chapters (ranging from the basics of unit testing, through test suites, user acceptance and web application testing, continuous integration, and methods for smoke- and load-testing), covering both tools for testing Python, and Python tools for testing. It also delivers advice about how to get the most from automated testing, which is as much an art as a science.

The first three chapters introduce the fundamentals: writing, organising and running unit tests, comprehensively covering unittest (Python’s built-in unit testing library), nose (a versatile tool for discovering, running and reporting tests) and doctest (which turns Python docstrings into testable code – a sample of this chapter can be downloaded from http://www.packtpub.com/python-testing-cookbook/book). Having established a solid foundation, subsequent chapters look at increasingly broader levels of automated testing using the appropriate relevant Python tools: for example, the “lettuce” and “should_DSL” libraries for “behaviour driven development” (an extension of “test driven development” which aims to produce human-readable test cases and reports), and the “Pyccuracy” and “Robot” frameworks for end-user acceptance testing of web applications. Later chapters cover higher level concepts and tools, such as using nose to hook Python tests into “continuous integration” servers (both Jenkins and TeamCity are covered in detail), and assessing test coverage using the “coverage” tool (both as a metric, and to identify areas that need more tests). A detailed chapter on smoke- and load-testing includes practical advice on developing multiple test suites for different scenarios, and methods for stress-testing (for example, by capturing and replaying real world data) to discover weaknesses in a system before going to production. The final chapter distils the author’s experience into general advice on making testing a successful part of your code development methodology, both for new and legacy projects.

There’s a lot of good stuff in this book: the initial chapters on unittest and nose are particularly strong, and I can imagine returning to these in future as a reference. There is also a lot of excellent and hard-won practical advice from the author’s own experience – not only in these early chapters but throughout the book – which is consistently valuable (in this regard the final chapter is a real highlight and could easily stand alone – I will definitely be re-reading it soon). Elsewhere the various tools and topics are presented clearly with plenty of useful detail, and in some cases have demystified things that I’d always assumed were quite esoteric and difficult to do (nose in particular was a revelation to me, but also setting up continuous integration servers and measuring test coverage).

There are a few disappointments: the section on mock objects left me feeling baffled as to how to actually implement them in practice – a shame as it was something that I’d looked forward to learning. I’d also have liked something about approaches for handling difficult testing scenarios such as software which interacts with the file system or with large files – a few hints here would have been invaluable for me. There are typos in some commands and code in a few recipes (e.g. for nose), which meant I had to look up the correct syntax elsewhere – perhaps not so bad, but annoying (especially in a cookbook) – and since the recipes themselves aren’t numbered, this sometimes made it difficult to navigate between them.

However these are fairly minor quibbles, and in conclusion I was impressed with both the breadth of material covered by the book and the level of detail for many topics. Moreover I enjoyed reading it and was often left feeling excited at the prospect of being able to apply the ideas to my own projects, which is I think was one of the author’s aims (and no mean feat for a technical book). I think that the combination of the detail together with the author’s practical advice make this book both an excellent introduction to testing with Python, and a valuable resource to refer back to subsequently.

(Addendum: Greg Turnquist's blog about the book can be found at http://pythontestingcookbook.posterous.com/ and features some interesting supplementary material.)

Sunday, July 10, 2011

Fedora 15 and Gnome 3: user basics

I've been using Fedora 15 for about a month now and thought it was time to write up some of my experiences with the new Gnome 3 desktop, since certain aspects are quite a bit different from the previous version. I know other people have posted details about the Fedora 15 desktop (for example Xavier Claessens' One Week with Gnome 3) but when I first installed it there didn't seem to be much from a "user basics" perspective. So this is my take, hope it's useful.

Getting started

When you first start up Gnome 3 the desktop looks pretty empty - in fact there are no desktop icons in this new version (even when you put them in your "Desktop" subdirectory):

Fedora 15: Gnome 3 desktop
Figure 1: "Empty" Gnome 3 desktop on startup. No desktop icons allowed!


To get started, move the mouse to the word "Activities" at the top right-hand corner of the screen (the so-called "hot corner") - immediately changing the desktop to the "Windows" view:

Fedora 15: "Activities" hot corner: "Windows" view
Figure 2: "exploded view" of the Gnome 3 desktop, accessed either by moving the mouse over the "Activities" hot corner (top-left of the screen), or by hitting the "Gnome" (i.e. Windows) key on the keyboard. The Favourites sidebar sits on the left edge of the screen, and the edge of the Workspaces sidebar peeks out on the right.

In this view (figure 2) you can see the Favourites sidebar on the left side, and just the very edge of the Workspaces sidebar on the right (more about those below). I call this the exploded view of the current workspace, since (as in this example) features minatures of any windows in the workspace. The exploded view can also be toggled by pressing the "Gnome key" "Super key" (i.e. the Windows key).

From this view you can toggle to the "Applications" view, by clicking on "Applications" near the top left (figure 3):

Fedora 15: "Activities" hot corner: "Applications" view
Figure 3: "Applications" view of the Gnome 3 desktop


This shows all the applications installed on the system, with a search box and category groupings on the right to help you find the one you want.
  • Drag icons from this view to the "Favourites" bar to make them more easily accessible in future.
  • The "Add/remove software" application is a graphical front end to yum for installing and managing additional packages that are weren't included by default.

The Favourites sidebar: launching and navigating applications

The Favourites sidebar is the strip down the left-hand side which holds various application icons. These icons do "double-duty": if you've dragged an icon there from elsewhere (essentially "favouriting" it), then it initially acts as a launcher for that application; also the icons for any running applications (favourited or otherwise) will appear here.

If an application already has one or more instances running then clicking on its icon takes you to the "nearest" running instance; clicking-and-holding (or right clicking) gives you more options (e.g. to start a new instance, or move to any of the currently open instances) - behaviour that is probably already familiar to users of Windows 7 and Mac OS X (figure 4):

Fedora 15: "Favourites" sidebar
Figure 4: The "Favourites" sidebar with dialogue (i.e. the black bubble) opened for Firefox after right-clicking on its icon. This gives options to move to a running Firefox window, or to start a new Firefox instance.


Another way to navigate between running applications is to use Alt+Tab to cycle between them (figure 5):

Fedora 15: Alt-Tab cycle through applications
Figure 5: Alt+Tab moves between running applications...


Repeated Alt+Tabb'ing moves between the applications; if there's more than one instance running then these are also shown when you Alt+Tab to it, and you can use the arrow keys to navigate to the specific one you want (figure 6):

Fedora 15: Alt+Tab cycle through applications (multiple windows)
Figure 6: ... and arrow keys allow you to select a specific instance if there are multiple instances of a particular application.

The Workspaces sidebar: navigating multiple desktops

Workspaces provide a way to manage applications, by giving the user multiple virtual desktops. These should already be familiar to seasoned Gnome users, but they operate somewhat differently in Gnome 3: there are no longer a fixed number of workspaces, instead they are created and destroyed automatically by the system as required.

You can move between workspaces in at least two different ways. Firstly, you can access the workspaces sidebar on the right-hand side of the screen, by moving the mouse over it in the "exploded view" of the desktop and causing it to "pop out" (figure 7):

Fedora 15: Workspaces "popped out"
Figure 7: the Workspaces sidebar "popped out" on the right of the screen in the exploded view of the Gnome 3 desktop.

The sidebar shows miniatures of each workspace, with the current workspace highlighted with a white outline. Clicking on one of the images takes you to that workspace; you can also drag application windows between the different workspaces.

Note the sidebar also shows an extra "empty" workspace at the bottom: if an application is opened or moved into this workspace then a new empty workspace is automatically created underneath. Furthermore, there's only ever one empty workspace - so if a workspace "empties" (e.g. because you've closed all the applications it contains) then Gnome automatically removes it. This can be quite disconcerting, and is probably the feature that causes me the most confusion in practice as it often upsets my sense of where I am in the workspace order.

The other way to navigate between workspaces is Alt+Ctrl+Up/Down Arrows, which I find myself using quite a lot (although I frequently overshoot into the empty workspace by accident) (figure 8):

Fedora 15: alt+ctrl+arrows to navigate workspaces
Figure 8: moving between multiple desktops using Alt+Ctrl+Up/Down keys

Other observations
  • Resizing windows: windows can be maximised by double-clicking on their title bar (double-click again to restore to the original size). There's no "minimise" button on the window frame, so you now have to right-click and then select the "Minimise" menu option. Also, note that dragging a window to the top of the screen automatically causes it to maximise (again similar to Windows 7, and not always what you intend). Manual resizing is also possible as always, by dragging the window edges - but this can be fiddly, as the area where an edge can be "caught" for dragging seems to be quite small.
  • System notifications: these now pop up rather discreetly at the bottom of the screen, but interacting with then can be frustrating at times - often they disappear before you have a chance to read them, and sometimes (counterintuitvely) disappear when clicked.
  • Customisation: while some preference-type options are available via the "username" menu (top right-hand corner of the screen) under "System Settings", overall the customisation options feel quite limited (for example, no screen-savers). However as a number of interfaces to system tools currently only seem to be accessible by launching from a command line, it's not clear if this a conscious design decision or whether more customisation options will be exposed in future versions.
  • Fallback mode: this is a half-way house between Gnome 2 and Gnome 3, and is started by default on systems which can't support the full Gnome 3 experience (which appears to include virtual machines). However as it's much more like the old Gnome, if you really don't like the new version then you could try using fallback mode instead.

Conclusion

Having been using Fedora 15 and Gnome 3 day-to-day for a few weeks now, I'm now largely used to its quirks and finding it overall a perfectly serviceable working environment - for me the new workspaces model and the rather random system notification mechanism have proved to be the most challenging differences from previous versions. So while it may not suit everyone's tastes it's definitely worth trying (and hopefully the more egregious foibles will be ironed out in future versions).

Monday, May 30, 2011

Mac OS X: new user tips

Over the couple of weeks I've been using an Apple iMac, and as a Windows/Linux user I've found navigating the desktop has been something of a learning experience for me.

As different as they are, in many ways the standard Windows and Linux desktops are idiomatically quite similar these days, and both support the standard PC three-button mouse. By contrast the Mac OS X desktop environment (and its use of the infamous one-button mouse) has a number of differences which can turn even basic operations (for example, cut-and-paste) initially into something of a challenge.

However some basic knowledge should go a long way in helping. First, there are the three essential keys you need to know about:
(The links give more background but aren't essential to the following. You can think of the option key as being the same as the "Alt" key on Windows/Linux.)

Then:
  • Emulating the right-hand mouse button: [ctrl] + mouse click (essential for desktop and web applications that use this to activate context menus and so on)
Basic text editing operations:
  • Cut: [⌘] + [x]
  • Copy: [⌘] + [x]
  • Paste: [⌘] + [x]
Basic keys for navigating within text documents:
  • Home: [↖]
  • End: [↘]
  • Page up: []
  • Page down: []
Useful shortcuts for navigating the desktop:
  • Cycle between open windows: [⌥] + [tab]
  • Zoom out (pulls back to show all open windows): [F9]
  • Show desktop (hides all open windows): [F11]
And finally (and essential if you're programming and find your Apple keyboard is missing a hash key):
  • Hash symbol ("#"): [⌥] + [3]
These all work on OS X 10.4.11 ("Tiger"), which is admittedly no longer a very recent release, but hopefully they're also applicable to later Mac OSes. I can't say that I've fallen in love with Apple as a result, but they have enabled me to operate at an acceptably functional level (until I can get my Linux workstation up and running!).

Saturday, April 9, 2011

Managing Python packages: virtualenv, pip and yolk

I've recently been playing with the Python virtualenv package - along with pip and yolk - as a way of managing third-party packages. This post is my brief introduction to the basics of these three tools.

virtualenv lets you create isolated self-contained "virtual environments" which are separate from the system Python. You can then install and manage the specific Python packages that you need for a particular application - safe from potential problems due to version incompatibilities, and without needing superuser privileges - using the pip package installer. yolk provides an extra utility to keep track of what's installed.

1. virtualenv: building virtual Python environments

virtualenv can either be installed via your system's package manager (for example, synaptic on Ubuntu), or by using the easy_install tool, i.e.:

$ easy_install virtualenv

(If you don't have the SetupTools package which provides easy_install then you can download the "bootstrap" install script from http://peak.telecommunity.com/dist/ez_setup.py. Save as ez_setup.py and run using /path/to/python ez_setup.py.)

Once virtualenv is installed you can create a new virtual environment (called in this example, "myenv") as follows:

$ virtualenv --no-site-packages myenv

This makes a new directory myenv in the current directory (which will contain bin, include and lib subdirectories) based on the system version of Python. The --no-site-packages option tells virtualenv not to include any third-party packages which might have been installed into the system Python (see the virtualenv documentation for details of other options).

To start using the new environment, run the environment's "activate" command e.g.:

$ source myenv/bin/activate

The shell command prompt will change from e.g. $ to (myenv)$, indicating that the "myenv" environment (and any packages installed in it) will be used instead of the system Python for applications run in this shell. (Note that the Python application code doesn't need to be inside the virtual environment directory; in fact this directory is just using for the packages associated with the virtual environment.)

Finally, when you've finished working with the virtual environment you can leave it by running the deactivate command (also in the bin directory).

(On Windows you may have to specify the full path to the "Scripts" directory of your Python installation when invoking the easy_install and virtualenv commands above, e.g. C:\Python27\Scripts\virtualenv. Also, note that when a virtual environment is created it won't contain a "bin" directory - instead it's activated by invoking the Scripts\activate batch file in the virtual environment directory. Invoking the deactivate command exits the environment as before.)

2. pip: installing Python packages

Once you're created a virtual environment you can start to add packages (which is really the point of doing this in the first place). virtualenv automatically includes both easy_install and an alternative package installer called pip (at least, for virtualenv 1.4.1 and up; earlier versions only have easy_install, so you'll need to run easy_install pip within the virtual environment in order to get it).

Most packages that are easy_installable can also be installed using pip, and it's designed to work well with virtualenv. However I think its main advantage is that it offers some useful functionality that's missing from easy_install - most significantly, the ability to uninstall previously installed packages. (Other useful features include the ability to explicitly control and export versions of third-party package dependencies via "requirements files" - see the pip documentation for more details.)

Basic pip usage looks like this:

(myenv)$ pip install python-dateutil # install latest version of a package

(myenv)$ pip uninstall python-dateutil # remove package

(myenv)$ pip install python-dateutil==1.5 # install specific version


(As an aside, the python-dateutil package is illustrative of one of the advantages of using pip over easy_install: after installing the latest version of python-dateutil, I discovered that it's only compatible with Python 3 - an earlier 1.* version is required to work with Python 2. pip let me uninstall the newer version and reinstall the older one.)

3. yolk: checking Python packages installed on your system

The final utility I'd recommend is yolk, which provides a way of querying which packages (and versions) have been installed in the current environment. It also has options to query PyPI (the Python Package Index). Installing it is easy:

(myenv)$ pip install yolk

Running it with the -l option (for "list") then shows us what packages are available:
(myenv)$ yolk -l
Python - 2.6.4 - active development (/usr/lib/python2.6/lib-dynload)
pip - 1.0 - active
python-dateutil - 1.5 - active
setuptools - 0.6c9 - active
wsgiref - 0.1.2 - active development (/usr/lib/python2.6)
yolk - 0.4.1 - active
(See the yolk documentation to learn more about its other features.)

Summary

Obviously the above is just an introduction to the basics of virtualenv, pip and yolk for managing and working with third-party packages - but hopefully it's enough to get started. If you're interested in using virtualenv in practice then Doug Hellman's article about working with multiple virtual environments (and his virtualenvwrapper project, which provides tools to help) is recommended as a starting point for further reading.

Monday, April 4, 2011

Richard Stallman: "A Free Digital Society?"

About a month ago I was fortunate to attend an IET-hosted lecture by Richard Stallman, entitled "A Free Digital Society?". Probably most famous as the originator of the GNU project (out of which came GNU/Linux) and initiator of the free software movement, Stallman has for many years been an active and vocal advocate for free software, and has a campaigned against excessive extension of copyright laws

He began the talk with the observation that there is an implicit assumption in the recent movement towards "digital inclusion", that using computers and the internet is inherently good and beneficial. However, as the question mark in the title of his talk indicated this assumption merits closer attention, as (in his opinion) there are various issues and threats associated with these technologies. These include:
  • Survelliance: technology now makes it possible for ISPs, websites and other organisations to monitor and analyse what individuals do online (e.g. the sites that they visit, things they buy, search terms they use etc) to an extent to which (in Stallman's words) "Stalin could only dream".
  • Censorship: for example, governments or corporations blocking access to particular websites (think Google in China), or even forcing them to close.
  • Restrictions on users imposed by data formats: both proprietary (e.g. Silverlight) and patented data formats (e.g. MP3) restrict what the end user is able to do with the data they encode.
  • Non-free software: here "free" is in the sense of "freedom", rather than price. Non-free software is essentially software that isn't under the control of you, the user - in the case of proprietary software, it's controlled by the owner (for example Microsoft, Apple, Amazon) who is able to insert features (e.g. to track user behaviour) that serves their interests rather than those of the user. By contrast, free software - which by the way you can still charge money for - gives the user four basic freedoms: 0. to run the software for any purpose; 1. to study how the software works, and make changes to it; 2. to redistribute the software as-is; 3. to redistribute the software with your changes (see the free software definition). In this way malicious features can be detected and removed, and control is returned to the user.
  • "Software as a service" (SaaS): in Stallman's definition, "software as a service" is anything where the computation is done by programs that you can't control - this is like non-free software above, because someone else has control and can change how your computing is done at any time without your permission. He made a distinction between things like e-commerce, online storage storage (e.g. Dropbox), publishing (e.g. Twitter) and search (which are about "data" or "communication", and so are not SaaS), and e.g. Google Docs (which does do computation for you, and so is SaaS). (See Stallman's article Who does that server really serve?)
  • Misuse of an individual's data: essentially doing something with your data without your permission, or even your knowledge - for example, passing on personal data to the authorities, unilaterally modifying your data, or even (for example in the case of Facebook) using it for commercial purposes.
  • "The War on Sharing": according to Stallman, sharing is "using the internet for what it's best at", and the war on sharing - whether digital rights management (DRM) technology or threatening internet users with disconnection (as under the UK's Digital Economy Act) - is an attempt by commercial interests to unfairly restrict what users are allowed to do (see Stallman's article Ending the War on Sharing).
  • Users don't have a postive right to do things on the internet: essentially, all the activities that users perform on the internet - communications, payment etc - are dependent on organisations who have no obligation to continue providing those services to you.
This is a pretty long list of issues (hopefully I've accurately captured the essence of each), and while many of them can be mitigated by moving to free software; others (for example, monitoring by ISPs) require other solutions - and Stallman admitted that he's quite pessimistic about the future. Aside from that, it was a fascinating and entertaining talk (including the auctioning of a GNU gnu soft toy to raise funds for the Free Software Foundation) and the subsequent audience Q&A session provided many opportunities for elaboration and clarification on many of the issues.

I'm still mulling over many of the issues raised. On the one hand there is a fundamental question about what moral rights you believe individuals should have, both generally and with specific regard to the digital world; and on the other there is the question of what you should do if you feel those rights are not being upheld. Stallman's position is clear and uncompromising: for example, not owning a mobile phone and not using a key card to enter his office (to avoid the possibility of being tracked), and using a netbook that allows him to run 100% free software (down to the BIOS level). It's certainly given me plenty to think about, and I'm looking forward to reading his book of collected essays "Free Software, Free Society" - which might be a good place to start if you're also interested in learning more.

Sunday, April 3, 2011

Book review: "Python 2.6 Text Processing: Beginner’s Guide" by Jeff McNeil

Jeff McNeil’s “Python 2.6 Text Processing: Beginner’s Guide” is a practical introduction to a wide range of methods for reading, processing and writing textual data from a variety of structured and unstructured data formats. Aimed primarily at novice Python programmers who have some elementary knowledge of the language basics but without prior experience in text processing, the book offers hands-on examples for each of the techniques it discusses – ranging from Python’s built-in libraries for handling strings, regular expressions, and formats such as JSON, XML and HTML, through to more advanced topics such as parsing custom grammars, and efficiently searching large text archives. In addition it contains a great deal of general supporting material on working with Python, including installing packages and third-party libraries, and working with Python 3.

The first three chapters lay the foundations, covering a number of Python basics including a crash course in file and URL I/O, and the essentials of Python’s built-in string handling functions. Useful background topics – such as installing packages with easy_install, and using virtualenv – are also introduced here. (A sample of the first chapter can be freely downloaded from the book’s website at https://www.packtpub.com/python-2-6-text-processing-beginners-guide/book). The next three cover: using the standard library to work with simple structured data formats (delimited “CSV” data, “ini”-style configuration files, and JSON-formatted data); working with Python regular expressions (a stand out chapter for me); and handling structured markup (specifically, XML and HTML). Subsequent chapters on using the Mako templating package (the default system for the Pylons web framework) to generate emails and web pages, and on writing more advanced data formats (PDF, Excel and OpenDocument), are separated by an excellent overview of understanding and working with Unicode, encodings and application internationalization (“i18n”).

The remaining two chapters cover more advanced topics, with some good background theory supplementing the practical examples: using the PyParsing package to create parsers for custom grammars (with a brief nod to the basics of natural language processing using the Natural Language Toolkit, NLTK); and the Nucular package for indexing large quantities of textual data (not necessarily just plain text) to enable highly efficient searching. Finally, an appendix offers a grab-bag of general Python resources, references to some more advanced text processing tools (such as Apache’s Lucene/Solr), and an excellent overview of the differences between Python 2 and 3 (including a hands-on example of migrating code from 2 to 3).

The book covers a lot of ground and moves fairly quickly; however it adopts a largely successful hands-on approach, engaging the reader with working examples at each stage to illustrate the key points, and this certainly helped me keep up. I was also impressed by the clear and concise quality of code in the examples, and the very natural way that general Python concepts and principles – generators, duck typing, packaging and so on – were introduced as asides. (One very minor criticism is that the layout of the example code could have been improved, as the indentation levels weren’t always immediately obvious to me.) Aside from a surprisingly unsatisfying chapter on structured markup (reluctantly, I would recommend looking elsewhere for an introduction to XML processing with Python) and a few niggling typos, there’s a lot of excellent material in this book, and the author has a knack for presenting some tricky concepts in a deceptively easy-to-understand manner. I think that the chapter on regular expressions is possibly one of the best introductions to the subject that I’ve ever seen; other chapters on encodings and internationalization, advanced parsing, and indexing and searching were also highlights for me (as was the section on Python 3 in the appendix).

Overall I really enjoyed working through the book and felt I learned a lot. I think it’s fair to say that given the rather ambitious range of techniques presented, in many cases (particularly for the more advanced or specialised topics) that the chapters are inevitably more introductory than definitive in nature: the reader is given enough information to grasp the background concepts and get started, with pointers to external resources to learn more. In conclusion, I think this is a great introduction to a wide range of text processing techniques in Python, both for novice Pythonistas (who will undoubtedly also benefit from the more general Python tips and tricks presented in the book) and more experienced programmers who are looking for a place to start learning about text processing.

Disclosure: a free e-copy of this book was received from the publisher for review purposes; this review has also been submitted to Amazon.

Friday, March 18, 2011

Day Camp 4 Developers: Telecommuting

About two weeks ago I took part in the second online Day Camp 4 Developers, on the topic of telecommuting. The idea behind the Day Camp events is to provide software developers with practical knowledge and advice in the area of "soft" skills, to complement their expertise with "hard" skills (i.e. actual coding). In this case five speakers gave consistently excellent web presentations (slides and audio) with different perspectives on remote working, while an IRC chatroom gave all participants a forum to discuss the issues behind the scenes.

Lorna Jane Mitchell started off by asking "Could You Telecommute?". As a teleworker herself, Lorna Jane looked at the environmental, organisational and personal factors that influence the happiness and productivity of the remote worker: for example, ensuring you have a good home working space, and set clear boundaries between work and personal life (both for yourself and for others). In particular you have to be aware of the tendency for other people to think that working from home is easy, and that your time is infinitely flexible. She also noted that there are some big differences between being part of a distributed team and being a telecommuting member of a co-located team (where you risk feeling isolated), and further differences between employees and freelancers. Particularly for lone telecommuters, it's important to build professional and social support networks that might otherwise be taken for granted in more conventional work settings.

Next self-described "entreprenerd" Ivo Jansch talked about "The Business Case For Telecommuting". Ivo's company Egeniq is built around a distributed team (essentially using remote working as an organisational model) - so in addition to benefiting individual workers, he suggested ways that telecommuting could positively impact the company's bottom line, for example enabling access to an bigger talent pool and increasing its geographical reach (if providing consultancy services). He acknowledged that this distributed model won't suit every company or industry however, and success requires (amongst other things) a results-driven culture where individuals are trusted to self-manage and have a sense of shared responsibility. Ultimately good communication between team members is paramount.

After the lunch break, Jack G. Ford gave a manager's perspective on setting up a telecommuting programme in his presentation "Can I Work From Home Tomorrow?". Jack introduced himself as an ex-coder who is now the manager for 17 developers in a more conventional environment than Ivo's, but in spite of that his key points seemed remarkably similar: beyond asking whether the company infrastructure can support remote working, the main issues are trust (both with the manager and with the team) and good communication between the manager and the individual. Jack emphasised that as a manager, when you telecommute, "I can't see you," so the telecommuter must stay connected, keep the manager informed, and must not only act professionally but be seen to do so. Although it might seem obvious, this was a fascinating insight into telecommuting from the other side of the management chain.

Ligaya Turmelle's presentation on "Managing the Work/Life Balance" emphasised the challenges of balancing work and home life, with her lists of "the good, the bad and the ugly" of remote working from a teleworker perspective. Ligaya focused especially on balancing family commitments with work commitments, and among some interesting observations (for example, no longer doing the daily commute means you lose some "me time" to yourself), I was most struck by the admission that if you love your work then it can mean sometimes that you want to go on working, and are in danger of not respecting your own ground rules. While noting that situations can differ both for individuals and companies, her advice was: clarify everyone's expectations (e.g. policies for "on-call" hours, weekends, and holidays); set up ground rules and limits (and be disciplined in adhering to them); and try to be flexible and imaginative in how you approach your work.

The final presentation was Avdi Grimm talking about "The Well-Equipped Remote Worker". Avdi is a freelance software developer who is also a "dispersed teams facilitator" and runs the Wide Teams blog. As might be expected from the title, some of the focus was on the hardware and software tools that can help with remote working, but there was just as much information on practices that can support distributed teams. Once again promoting communication is key, and using tools and practices that help team members create good working relationships (for example, utilising social media like Twitter and Facebook, and holding regular face-to-face meetings) can really contribute to this.

Looking back over all the talks, a few common themes had emerged for me:
  • Good communication (both with managers and with other team members) to build trust, keep people informed and avoid misunderstandings;
  • Clarify expectations on all sides, and establishing well-defined boundaries between work and personal life. Set ground rules to ensure that those boundaries are respected by others (your boss, your family and friends) and have the discipline to also respect them yourself;
  • Build and maintain your social and professional support networks for when there are problem times;
  • Provide yourself with a good working environment and (software and hardware) tools.
I was also able to relate some points to my own experiences: when I worked briefly as a remote member of a co-located team, I did feel a real sense of isolation; another time as a home teleworker I got the impression from some people that they assumed (not maliciously) that I only did a few hours work a day; and previous experience as part of a large organisation makes me feel that there was some truth in Ivo's comment that "co-location is over-rated", in that it doesn't automatically lead to great communication between individuals or groups.

Overall it was an excellent event and a good use of 8 hours of my Saturday - although the time difference (coincidentally another telecommuting issue) meant that it didn't finish until 10pm UK time I surprised myself by staying with it to the end. Hats off to Cal and Kathy Evans for organising the day and to the speakers for their excellent presentations. Here's waiting for the next Day Camp 4 Developers!

Sunday, February 27, 2011

MadLab: pancake café and the Omniversity of Manchester

Yesterday I dropped into the Manchester Digital Laboratory (aka MadLab) in Edge Street for the MadLab Café Pancake Day, and enjoyed a couple of hours chatting to various friendly people while eating an extremely tasty pancake and drinking cups of tea (one of my favourite pastimes), and at one point even discussing Outkast's back catalogue.

MadLab describes itself as "a community space for people who want to do and make interesting stuff - a place for geeks, artists, designers, illustrators, hackers, tinkerers, innovators and idle dreamers; an autonomous R&D laboratory and a release valve for Manchester's creative communities." I'm not sure precisely where I'd put myself in that list - I've only been there a couple of times before, for the Python Northwest user group meetings - but the folks I met seemed to be a representative cross section of the target community.

There's a packed and eclectic schedule of (mostly free) events hosted there, which is well-worth checking out (see http://madlab.org.uk/events/), but their most recent new development is the Omniversity of Manchester - a programme of professional-level training courses that so far have covered experimental film making and physical computing with Arduino, with plans to extend to topics as diverse as web design, Ruby on Rails, writing workshops and urban gardening. These courses won't be free, but the fees will go towards keeping MadLab sustainable and supporting the other free events.

If you're interested in learning more then you can watch out a video, and register the subjects you'd like to see covered by taking a moment to fill in their survey:
Personally I think it's a really exciting idea - I'm generally a fan of courses, and many of the proposed workshops are things that I'd love to learn more about, so it would also be great to see the Omniversity take off and help MadLab expand and flourish as a focal point for Manchester's digital community - the more people who find out about it and get involved the better. And in the meantime I'll be looking forward to the next (undoubtedly tasty) MadLab café event.

Friday, February 25, 2011

Book review: "Simply SQL" by Rudy Limeback

Rudy Limeback's "Simply SQL" (Sitepoint) is an overview of SQL targeted at web application developers, and intended to fill a gap between the basic "SQL 101"-type tutorials (seemingly compulsory in just about every introductory article or book about web programming) and more advanced texts covering topics which at first glance don't seem so relevant to the straightforward day-to-day requirements of many web applications.

The chapters are grouped into two main sections. The first deals with the details of the SQL language and comprises the bulk of the book. It starts with a short introduction to the SQL commands most commonly needed by web developers to create and modify data within the database (all the usual suspects - CREATE, ALTER, INSERT, UPDATE, DELETE and so on - are quickly dealt with here). The rest of this section focuses on the SELECT command (the one used to retrieve information), with each chapter covering one specific clause - FROM, WHERE, GROUP BY and so on - in quite extensive detail, and illustrated with examples from sample applications.

The second section of the book has three chapters covering some basic database design concepts, specifically SQL data types, relational integrity, and the use of "special structures" (such as tables that refer to themselves) for particular situations. The appendices then outline the basics of using some specific SQL implementations, along with details of the sample applications and scripts used in the main part of the book.

The heavy emphasis on the SELECT statement might seem odd, but it makes a lot of sense in the context of web applications where data is typically read from the database far more than it's written. The detailed examples are also excellent - at times invaluable - for clarifying things like (for example) the nuances of the different types of JOINS, the subtleties of the GROUP BY and HAVING clauses (useful for aggregating data from subsets of rows in conjunction with summing and averaging functions), and the issues with working with time data. I certainly learnt a few things - the GROUP BY clause was completely new to me, as were the distinctions between the FLOAT and DECIMAL data types (DECIMALs are exact - within certain limits - while FLOATs are approximate). I found the brief sections on views, derived tables and subqueries extremely enlightening, as was the discussion of foreign keys in the chapter on relational integrity, and the clear writing style throughout made the book a pleasure to read.

It's important to note that "Simply SQL" is based on the SQL standard, rather than the syntax of specific implementations (although in places it does indicate where there are notable deviations from the standard, particularly for MySQL) - also it doesn't cover any of the programming APIs, so it's not really a reference text (admittedly it doesn't claim to be). However with its clear and detailed explanations it looks like it would be a useful companion to more traditional reference or cookbooks and will definitely reward re-reading - least ways, I'm sure I'll be squeezing plenty more juice out of it in the future. So overall highly recommended.

Friday, February 11, 2011

Don Knuth: BCS/IET Turing Lecture

Earlier this week was the annual Manchester BCS/IET Turing Lecture, and this year's guest speaker was Don Knuth. Possibly he's best known (at least to me) as the author of the seminal "The Art of Computer Programming" (a multi-volume book which he began in 1962, and continues to work on to this day - subvolume 4A is the most recently published, with another 5 sections still to come), and the typesetting system TeX (pronounced "tek", and used for typesetting countless Ph.D theses - including mine). However Knuth's contributions to computer science throughout his long career (he's now in his seventies) are staggering - as are his "extra-curricular" activities, which include writing novels and playing the pipe organ.

So it was quite an opportunity to be able to listen to this giant of computing first hand - even more so since rather than a straightforward lecture, this was actually a Q&A, with Knuth taking questions from the audience. After opening with a concise explanation of the significance of the number 885205232 (which I won't spoil by revealing here, since it's a puzzle in his book "Selected Papers on Fun & Games", other than noting that it involves Alan Turing's manual for programming the Ferranti Mk. I computer), Knuth fielded questions on various topics including: elegance in programming languages, the public's fear of computers, "busy beaver" numbers, the best way to teach programming to elementary schoolchildren, and whether an aptitude for programming is an art or a "genetic defect".

Throughout his answers were thoughtful, often surprising (for example, making a case for pointers in C as an elegant language feature), consistently interesting, and delivered with characteristic humour (Knuth was once published in MAD magazine, and is famous for the quote "Beware of bugs in the above code; I have only proven it correct, not tried it", amongst others). In response to a question about "what are we 'enabling the information society' to do" (a reference to the BCS's current mission statement), Knuth initially replied "to have jobs", before more seriously reflecting that "there's a long way to go improving what we already have at the moment."

Although Knuth's world of computing feels like it's a long way from the one I inhabit, it was a great privilege to see and hear such a legendary figure - in spite of his age he seems as lively as ever, both physically and intellectually, and still enjoying it - and his career is truly inspiring: when asked what he'd do differently if he had his time again, his reply was that he wouldn't change anything. "In my case," he said, "Murphy's Law hasn't worked - so many things that could have gone wrong didn't."

Monday, January 31, 2011

The What, Where and How of Open Data

Last week I attended a seminar at the Cathie Marsh Centre for Census and Survey Research, given by Rufus Pollock of the Open Knowledge Foundation (OKFN) on the topic of "open data".

Rufus started by showing two example applications built using open data. Yourtopia makes use of data from the World Bank that measures individual nations progress towards the Millennium Development Goals. Visitors to the site balance the relative importance of different factors (for example, "health", "economy" and "education"), and their preferences are matched with the data in order to suggest which country meets them most closely. Where Does My Money Go? offers various breakdowns of UK government spending and presents these in a way that allows the site visitor to see (for example) how much of the tax they pay is used for things such as defence, environment, culture and so on.

Both sites are eye-catching and fun (and can provide some surprising insights), while at the same time serving more serious purposes. In the context of the seminar Rufus noted that building the two sites also highlighted some key issues when working with these kinds of datasets:
  • Completeness: i.e. the data are not always complete
  • Correctness: i.e. the data are not always correct
  • Ease-of-use: it can take a lot of effort to put the data into a format where it can actually be used (for example an estimated 90% of the time developing Where Does My Money Go?, as opposed to 10% actually building the site)
These issues can largely be mitigated by "open data", which has two key characteristics:
  • Legal openness: the data must be provided with a licence that allows anyone to use, reuse and redistribute the data, for any purpose. ("Reuse" in this context can include combining it with other datasets and redistributing that.) An explicit open licence is required (such as those offered at Open Data Commons) because the default legal position for any data - even that posted "openly" on the web - doesn't entitle someone else to reuse or redistribute.
  • Technical openness: the data should be in a format that means that it's easy to access and work with, that it should be possible to obtain the data in bulk, and in a machine-readable, open format. These are pre-requisites for the data to be useful in a practical sense: for example, it's not sufficient to provide the data via a website that only returns subsets of that data via a form submission.
(See the official definition at http://www.opendefinition.org/.)

The data itself can be about almost anything: geographical (for example, mapping postcodes to a latitude and longitude), statistical, electoral, legal, financial - the OKFN's CKAN (Comprehensive Knowledge Archive Network) site has many examples. The key point is that the data should not be personal - that is, it shouldn't enable individuals to be identified, either directly or indirectly.

The motivation for making data open goes back to the initial issues of completeness, correctness and ease-of-use - it can take a lot of time to assemble a dataset (for example, the Government already collects a lot data), but once the effort has been made then the added cost of releasing it is small, and then sharing it reduces the cost of merging, filling gaps and correcting errors. To make an analogy with open source software, it's a essentially Linus' Law for data: "given enough eyeballs, all bugs are shallow". Rufus also talked about a corollary to this, the "many minds" principle: the best use of the data you produce will probably be thought of by someone else (and vice versa).

One argument against openness is that it precludes the possibility of commercial exploitation in order to offset the costs of compiling the data, and is a topical point given the current economic climate. Rufus's counter-argument is that there are many other ways to fund the creation of data aside from making it proprietary, by considering the data as a platform (rather than as a product), and building on that platform to sell extensions or complementary services (such as consultancy - again there are parallels with open source software). (Some of the audience expressed also concerns that in principle at least, open data is might be used irresponsibly - but arguably if the data is available to all then it means that others could challenge that interpretation.)

The final point that Rufus's talk addressed is how to actually build the open data ecosystem. To some degree it's up to the people who hold the data, but his suggestions are:
  • Start small and simple (which I took to mean, start with small sets of data rather than doing everything all at once).
  • If you're using someone else's dataset then you can make an enquiry via the OKFN website to find out what the licensing situation is.
  • If you have your datasets then put them under an open data licence and can register it at CKAN so that others can find it.
  • "Componentize" your data to make it easier to reuse (which I took to mean, divide the datasets up into sensible subsets).
  • Make the case with whoever holds the data you want (government, business etc) to release it openly.
For me as a "lay person", this was a fascinating introduction to the world of open data. Not unreasonably the seminar didn't go into details of actually working with such data (I think many of the seminar audience members were researchers already familiar with the available tools). However afterwards Rufus made the point that writing a paragraph of text after looking at the data is just as valid as the slick visualisations provided by Where Does My Money Go? and other sites. Ultimately it's having open access to the data in the first place that counts.

Sunday, January 23, 2011

Python North-West: The Python Challenge

Last week I went to my first-ever Python North-West meeting, at the Manchester Digital Laboratory (aka MadLab). The webpage describes it as a "user group for Pythoneers and Pythonistas of all levels and ages, open to everyone coding 'the way Guido indented it'", and meetings alternate between talks and coding dojos (group coding sessions where people get to share code and ideas with the aim of improving their knowledge and skills - see http://codingdojo.org/cgi-bin/wiki.pl?CodingDojo for more information).

This particular meeting was a coding dojo and so as a group we worked through The Python Challenge (http://www.pythonchallenge.com/), which is a series of puzzles that can be solved using Python programming combined with some imagination and lateral thinking. While most people had come with their own laptops, the format that developed was for one person to "drive" the laptop connected to the overhead projector, typing in code and taking suggestions from the others.

Although I'd already looked at the first two challenges earlier in the day to get an idea of what was involved, the group setting provided a great opportunity to see how other people worked, and to learn about bits of Python that I was unfamiliar with - one example for me was being introduced to list comprehensions, which are concise ways to generate lists, e.g.:

>>> [[x,x**2] for x in vec]
[[2, 4], [4, 16], [6, 36]]

(although there were several other examples which I won't write about here so as not to spoil the challenges for others). Also, as many of the challenges began with having to figure out what the programming problem actually was, it meant that collectively we didn't get stuck for too long on any particular puzzle - I know that at least a couple would have had me completely stumped if I'd been on my own. For me personally it was also an opportunity to play with IDLE - Python's IDE - under Windows (not an environment that I've used much in the past but quite handy for this kind of exploratory programming process.)

Overall it was great to get out and interact with other Python developers in an enthusiastic and friendly atmosphere, while at the same time broadening my knowledge of the language - and now I've had a taste I'll definitely be back for future meetings.