ironic cog: 2010

Thursday, December 2, 2010

City Tickets

Recently someone on the MySociety mailing list posted a link to Mayo Nissen's blog entry about an interesting project called City Tickets.

While it's not immediately obvious from the post (especially given the very convincing pictures), City Tickets is Mayo's final thesis project at the Copenhagen Institute of Interaction Design, rather than an actual existing system. In it he proposes augmenting existing parking ticket machines so they can also dispense so-called "city tickets": small receipt-like forms specific to the local area around the machine, to be filled in with details of repairs or suggested improvements (e.g. potholes in the road, or providing a bench to sit on). Completed tickets would be submitted to the local authority for action, and tickets would also show the "to-do list" of pending works.

The principle of City Tickets is very similar to MySociety's existing FixMyStreet website. FixMyStreet provides a simple but efficient way for people to report problems such as graffiti, fly-tipping, broken pavements and so on to their local council: problems can be pinpointed on a street-level map (making them easier for council representatives to locate), the details are forwarded to the local by the website on your behalf (removing a significant barrier to reporting problems), and lists of unresolved problems can be viewed and discussed (making it easier to follow up).

FixMyStreet is a great idea (as are all the MySociety projects) and does get results. The interesting twist with City Tickets is the possibility of extending this beyond the web, removing another potential barrier to participation (the reverse of a trend to move access to government services and information online, arguably depriving a significant proportion of the UK population without internet access). More generally, the idea of co-opting an existing technology infrastructure for social good feels like it should have potential. While using parking meters might not work everywhere, what about ATMs or payphones, for example? Or perhaps it could be a very worthwhile target application of ubiquitous computing technologies?

It seems unlikely that we'll see anything like City Tickets in real life any time soon, which is perhaps a shame. But one of the great things about projects like this is that they can suggest exciting possibilities we might not otherwise have imagined. And in the meantime, keep supporting FixMyStreet!

Tuesday, November 30, 2010

MySQL: MyISAM versus InnoDB

Last week I saw an interesting article by Craig Buckler on one of the Sitepoint blogs: "Top 10 MySQL Mistakes Made By PHP Developers". I was already aware of a few of them (for example sanitizing user input, avoiding * in SQL SELECT statements) - and I've been using the PDO (PHP Data Objects) interface for database access pretty much since I started with PHP, which made recently switching an application from an SQLite to MySQL backend almost trivially straightforward.

However the first mistake listed was something I hadn't previously come across: "#1: Using MyISAM rather than InnoDB" - which prompted me to do a bit of background research.

MyISAM and InnoDB are two of several database "storage engines" offered by MySQL (a storage engine is the underlying software that implements the low-level database operations to create, read, update and delete data - you can list all the supported engines for a MySQL installation using SHOW ENGINES). Surprisingly (to me at least) the engine type is specified at table- rather than database-level, and can be done when first creating the table:

CREATE TABLE myTable (
...
) ENGINE = MyISAM | InnoDB;

If the engine isn't explicitly specified then MySQL uses the default (MyISAM for MySQL versions before 5.5.5, InnoDB since).

The key differences between the two are:

InnoDB supports transactions, but MyISAM doesn't. Transactions enable several SQL statements to be combined and then accepted all at the same time, essentially treating them as a single query. This helps maintain database integrity - the database state is only changed if all statements in the transaction are executed successfully.
InnoDB uses row-level locking, whereas MyISAM uses table-level locking. This means that InnoDB offers better concurrency for tables with a high volume of updates at the same time as reads. (See another Sitepoint blog post for a real-world example of where this can make a difference: "Free Performance With MySQL Table Types".)
InnoDB supports foreign-key constraints, which again can help maintain database integrity - operations on one table will fail if some condition - say a matching row in another table - is not met.
MyISAM supports full-text indexing; InnoDB doesn't.

From what I understand, in the past MyISAM's principal advantage over InnoDB has been its speed, operating significantly faster especially for applications where there is a high volume of reads compared with updates. However more recent implementations of InnoDB seemed to have closed the performance gap enough that this is no longer an issue (which is presumably why MySQL now uses it by default).

New databases created using the latest MySQL will automatically use InnoDB, but for older databases you can find out what engine your tables are using (along with other information) with:

SHOW TABLE STATUS;

The storage engine can be changed using the ALTER TABLE command, e.g.

ALTER TABLE myTable ENGINE = InnoDB;

Clearly it's possible to mix tables with different storage engines, but presumably great care needs to be taken to understand the implications (for example, what happens if a transaction that involves InnoDB and MyISAM tables fails?). There's more about storage engines in the MySQL reference manual.

Returning to the original posting, Craig's rather blunt statement about favouring InnoDB over MyISAM seemed to generate plenty of controversy amongst the commenters, but from what I've read InnoDB does look to be the better choice even if you're not currently taking advantage of transactions or foreign key constraints (which I'm not). I think there's also a wider point (as hinted at in his mistake #5: "Favouring PHP over SQL"): to use something effectively, you need to take time to understand it and the context in which it's operating. So this post is my contribution to the cause - hope it helps.

Thursday, October 21, 2010

nXhtml: an Emacs mode for web development

I'm a fan of Emacs but I've often been frustrated when using it to edit PHP files. Although the default Emacs major mode for PHP generally did an okay job with "pure" PHP files (i.e. those only containing PHP code), I would still find myself occasionally struggling against Emacs automatically insisting on mis-formatting my code. And when the files contained a mixture of PHP and HTML (commonplace in web applications), the auto-indentation could get quite deranged.

Programming is difficult enough sometimes without having your editor working against you as well - in fact for a while on Windows I even switched to using Notepad++ (which solved the problem by not having any auto-indentation at all). The problem with Emacs is two-fold: first, I felt I needed a better PHP mode than the default; and second, it needed to be able to deal with "mixed modes" (when two major modes - in this case PHP and HTML - are used in a single file).

Initially php-mode looked promising, working really nicely with pure PHP - but unfortunately the solution for handling embedded HTML (to toggle manually between PHP and HTML modes as required using the M-x html-mode/M-x php-mode sequences) didn't feel that practical to me.

However the documentation for php-mode also suggested nXhtml, which describes itself as "an Emacs mode for web development". nXhtml includes a version of php-mode along with support for mixed modes, and it does a great job of handling PHP both on its own and with embedded HTML. The automatic formatting is nicely behaved so it works for rather than against me, and the syntax highlighting also distinguishes PHP code sections from those containing HTML. In addition there are some other useful-looking features (such as tag completion) that I haven't really explored yet.

However it looks like nXhtml was the solution that I was looking for all along. It's a straightforward install: on Linux, download the nXhtml.zip file, extract the contents to e.g. $HOME/nxhtml/, then edit $HOME/.emacs to include:

(load "your-path-to/nxhtml/autostart.el")

For Windows it's even easier, you can download and install a version of Emacs bundled with nXhtml (however note that I couldn't get nXhtml to work with an existing Xemacs 21.4 Windows install).

Although I haven't worked with it extensively yet, so far I've found nXhtml a great improvement for working with my web application code and I'd recommend anyone interested in using Emacs for PHP development to also give it a try.

Wednesday, October 13, 2010

Learning Eclipse: Total Beginners Tutorials

A couple of weeks ago I decided to start learning about the Eclipse IDE, which looked both powerful and daunting on the previous occasions I'd seen it. Luckily around the same time I also stumbled across the free video tutorial Eclipse and Java for Total Beginners by Mark Dexter, which gives an excellent hands-on introduction to Eclipse by leading the viewer through the development of a simple Java application.

There's a substantial amount of material in the 16 videos - each is around 12 to 16 minutes long, but pausing playback to follow along on my own machine effectively doubled the running time, so the total length was more like 8 hours for me (roughly a day's worth of training). Although the tutorial is based on Eclipse 3.3 (it's dated 2007) and is running on Windows, I didn't see any significant differences compared to using Eclipse 3.5 (the version fetched via Synaptic) on Ubuntu Linux 9.10.

The tutorials begin with Mark outlining the main features of the Eclipse workbench (i.e. main Eclipse window containing various subwindows, such as the editor, package explorer, console and so on). He quickly progresses to writing Java code, showing how Eclipse helps by giving rapid feedback as you type, and then introduces two powerful features, code assist and quick fix:

Code assist is invoked using ctrl+space and provides a list of possible completions based on the current context, from which the programmer can select the appropriate one. (This can often result in a code assist template, for example when creating new methods which gives the programmer fields with hints.)
Quick fix is invoked using ctrl+1 (or by right-clicking) on an error that has been identified by Eclipse. It then provides a set of suggestions (or "proposals") as to how that error could be fixed (for example, offering to correct a mis-typed variable, or to create new classes or methods). Accepting a proposal performs the correction automatically.

These two features made me feel like I had my very own Java expert to help me whenever I needed, and with practice I found in some cases it really sped up coding - especially when combined with JUnit, the next major component that Mark talks about.

JUnit is Java's unit testing framework, and its tight integration into Eclipse makes it easy to adopt a test-driven development (TDD) approach (where the test cases for new classes and methods are written before the classes and methods themselves are implemented). As well as providing a way of checking that the initial implementations are correct, the test cases also provide a detailed specification of how the implementation should behave.

Writing the test cases first means that they actually get written, which is significant enough on its own (writing tests after the code always feels like a major chore). However the whole test driven approach steps up a gear once it's coupled with the quick fix mechanism, as it enables the following workflow for rapid development:

Write the initial JUnit test cases for a new class and its methods.
Eclipse will identify the missing class and methods as errors, and stubs can be automatically created using quick fix.
Fill out the stubs with real code, running the unit tests as required to verify that the code is working properly.

This might not sound like much but when put into practice it felt like something of a revelation to me - essentially once the tests are written you can quickly generate the "scaffolding" code for the implementation and then spend most of your time writing the code that's specific to your application.

Eclipse makes it easy to rerun the tests and identify errors, and Mark shows how it also helps to catch regressions later on when the code is modified or extended (if the tests suddenly stop working then the programmer knows that either they broke the code, or that the tests need to be updated). The process continues to be, first update/extend the test cases, then update the code (using quick fix whenever possible to make it easier). At this point he also introduces some of Eclipse's refactoring tools to help (by for example automatically making local method variables into class fields, or extracting blocks of code into new methods).

There are a lot of other useful extras that Mark throws in along the way but which I won't mention here. However I will make an exception for one gem: the scrapbook feature essentially allows you to evaluate snippets of Java code without having to write a full program (with one caveat: code assist doesn't always work in the scrapbook). It's a useful but oddly well-hidden feature (right-click on the current project, select "New..." then "Other...", expand the "Java" group and then the "Java Run/Debug" group - you get the idea), giving you the Java equivalent of firing up an interactive interpreter to test out fragments of code in Python.

In conclusion, I really enjoyed working through these tutorials and definitely learned a lot more by performing the steps alongside watching the videos (especially with test-driven development, an unexpected high-value bonus). There's a lot of material but the pacing and structure is spot-on, Mark's delivery is relaxed and engaging, and Eclipse turns out to be fun to use too. So although there's a lot more to learn about Eclipse (and Java!), I'd really recommend this if you're new to the system and want help getting started (and then follow it up with the much shorter Eclipse Workbench Tutorial, to learn more about manipulating the IDE itself). Great stuff.

Monday, October 11, 2010

PHPNW10

Last weekend I was in Manchester at PHPNW10, the annual conference for the north-west of England PHP community. It was a fairly last minute decision to attend but looking at the conference programme persuaded me that I'd be rewarded with lots of good material, and I wasn't wrong - it's going to take me a while to process the volume of quality information from the talks I attended.

With that in mind I won't try to do more here than just summarise the sessions I was in on Saturday, kicking off with the keynote talk by Lorna Mitchell on professional development ("Teach a man to fish"). While at first this might have seemed a little out of place amidst all the technical content, it was entirely appropriate given that many people (including me) come to these events to learn. Much of the focus was on benchmarking and improving the skills of your team as a whole (making the point that skills gaps only exist in the context of knowing what skills are actually needed), but emphasis was also placed on individuals taking the responsibility for their own professional development. There was some solid practical advice for making the business case for training to managers, and suggested alternatives to formal training courses (for example, allowing developers "study days" on company time) if you have "a training budget of zero." And if you're working on your own then the suggestion to make your own "team" sounds like a good one to try.

After the keynote the sessions split into three parallel "tracks" for the remainder of the day, and in general I chose to focus on those talks on basic development tools and methodology (with a plan to catch the talks I missed about geolocation, REST interfaces and so on later online).

Things were off to a great start with Robert Mortimer's talk "Let your toolchain set you free", which dealt with choosing and using appropriate tools for setting up your development environment (Linux-Apache-MySQL-PHP i.e. LAMP), source code control (subversion), performing code validation (PHP_codesniffer), enforcing coding styles (PHP_Beautifier), generating documentation (docblockgen), unit testing, debugging (Xdebug) and developing using an IDE (NetBeans). Accompanied by live demos and full of little tricks to automate things, it was a dizzying talk.

I followed that with Ian Barber's talk "Debugging - Rules and Tools". At the beginning Ian confessed to being something of a debugging nerd and described the satisfaction he felt when tracking down bugs in an application - something I identified with from my former life in software maintenance. His presentation was structured around the nine rules outlined in David J. Agan's book "Debugging", giving practical advice and suggesting useful tools and resources for each rule. As with the earlier talk there was an almost overwhelming number of tips and tricks (a small number of examples: Jmeter for load-testing; Selenium for simulating user interactions; MySQL Proxy to isolate your database; Tamperdata to modify http requests...). I also learned a new word, "heisenbug", to describe an intermittent fault.

After lunch and a chat with some fellow attendees I dropped into Marco Tabini's talk describing the development of the php|architect website ("The curious case of php|architect"). Marco focused on higher-level design decisions - and the reasoning behind them - that had been made at various stages in the site's evolution, and reminding the audience that real-world projects don't always follow a smooth path.

Marco was followed by "Developing Easily Deployable PHP Applications", given by John Mertic of SugarCRM. At the beginning John made the important point that you need to define the "support matrix" (i.e. the set of environments defined by combinations of operating system, web server, database and PHP version) for your application, as knowing this will inform your design and test options. He then went on to describe how this is managed for SugarCRM, raising a number of interesting points - for example, providing hooks and other mechanism to make customising, configuring and extending the application as easy and as maintainable as possible for the end user.

The last real talk was Harrie Verveer's "Database version control without pain", which aimed to address the issues with managing changes to database schema in sync with changes to your application code. The problem is that the mechanism used for patching PHP code as part of an update can't be used to update the database - in this case the patches are in the form of SQL code rather than code differences - and Harrie conceded that (contrary to the title) there isn't a magic bullet for painless database version control. Nevertheless he did a great job of outlining the pro's and con's of the various options, ranging from a simple patching strategy (probably the way to go for me at the moment) through to tools like Phing, Liquibase, Akrabat DB Schema Manager and Doctrine Migrations.

I was feeling a little brain dead by then, and as entertaining as it was much of the substance of the closing "framework shoot-out" was lost on me (particularly as I'm not a framework user). But no matter - because I'd had a great day at a well-run conference, met some great fellow software developers, and been exposed to far more useful information than I could have hoped to find elsewhere on my own. I know it's going to take me some time to follow up on the various tools and techniques - in the meantime thanks to all the organisers and speakers for making my first PHPNW conference such a memorable and enjoyable one.

Friday, October 8, 2010

Open Source Content and Document Management in the Public Sector

Yesterday I attended another of the Manchester BCS's open lectures, this time an excellent talk by Graham Oakes entitled "Open Source Content and Document Management in the Public Sector". As the title implies, the talk focused on issues associated with using open source solutions for content management systems within public sector organisations and essentially reported the conclusions from a meeting held by the BCS Open Source Specialist Interest Group at the start of the year.

That meeting in turn had been inspired by a Guardian article reporting on a website project by Birmingham City Council ("Why can't local government and open source be friends?", 7th August 2009). The article suggested that the spiralling cost of the project could have been significantly reduced if the council had opted for open source software (with the defining characteristic of being free to install and use for any purpose) for the content management system (CMS) behind the website, rather than a costly proprietary solution.

Certainly there is now plenty of mature and robust open source software that is recognised by industry analysts as being "enterprise-ready" (possibly the most famous example being the Apache webserver, "the most popular web server on the Internet since April 1996"), and this includes open source CMS's being used successfully to run the websites of prominent public bodies (Graham cited several notable examples including various UK police forces, the C.I.A. and even the White House). So open source is certainly a viable option for these applications.

Also there are many apparent benefits over proprietary software:

Lower front-end costs: the software is free to install and use; also it's cheap to experiment with before committing to it wholesale.
Easier to work with: the source code is visible and can be modified as required; also there are no licensing issues for virtualization or cloud computing applications (which can be a real problem when using proprietary software).
Favours incremental delivery: you can start small and build up over time, rather than having to deliver one huge project all at once.

Another potential benefit for public sector organisations is that use of open source can demonstrate a commitment to openness by the organisation. However Graham was keen to point out that there are also a number of risk factors which counterbalance these benefits:

Misunderstood costs: probably the greatest misconception about open source: even though it might be free to download doesn't mean it costs nothing to run - someone still needs to support the system day to day. But perhaps more significantly, the biggest costs associated with the implementation of any CMS - whether open source or proprietary - are for things like content migration and training. (It's probably in these areas that the aforementioned city council website project actually went wrong, as the CMS software is likely to have been around only 10% of the total budget.)

There are other "misperceptions" (for example, having access to the source code is not an intrinsic benefit - it also requires that your organisation has the expertise to exploit it). Graham also cited some cultural factors that might work against the successful use of open source by public organisations:

Mismatch of scale: generally open source is on a much more smaller scale than government organisations, which are better at interacting with large corporations.
Broken procurement models: government procurement is geared towards buying licences rather than buying services, and this bias tends to favour proprietary solutions over open source. (An interesting later observation was that this model tends also to favour large programmes over smaller ones - but that in his opinion the larger a project is the greater the chances of failure are.)

(A further point was that people can have a "philosophical bias" towards or against open source, leading to unreasoned decisions about which technology to use, regardless of how well it matches the requirements.)

In conclusion, while there are potential benefits to public sector organisations using open source, it's certainly not a given. In many ways using open source isn't that different from proprietary software. He closed with a few conclusions:

All open source is not the same: there are variations in quality, capabilities and levels of community support, so these should be assessed before committing to a particular solution.
Focus on your problem and not on the technology
Consider the total life-cycle costs: look beyond just the start-up costs when comparing open source and proprietary software.
It's the team and not the technology that creates success: good people will still succeed using mediocre tools.

Hopefully I've accurately communicated the key points of a Graham's fascinating talk. I'd add that many of his conclusions chimed with those from "Open Source for the Enterprise" (Dan Woods & Gautam Guliani, O'Reilly Media) which makes the case for open source in the private sector - and reiterates the point that informed consideration of benefits and risks of any technology is vital in making good choices.

Monday, October 4, 2010

Devministrators and DevOps

Yesterday evening while reading a free sample of Jeff Barr's "Host Your Website In the Cloud" I downloaded from Sitepoint, I happened upon a term I'd never seen before: devministrator.

The way the word is used in the book seems to indicate a software developer who also does system administration tasks, so on that basis I guess I could describe myself as a devministrator - I'm the sole sys admin supporting my personal Ubuntu system used for my Linux development work - but a Google search turns up surprisingly little in the way of a concrete definition.

Google does pull up a few blog postings however, including one by Kris Buytaert who in turn references it from a 2008 Cloud Cafe podcast (about an open source configuration management tool called Puppet) which might even be the source of the word, describing system administrators who apply software development best practices (e.g. version control, continuous integration testing and automated scripting) to solve their administration problems.

However in looking these up I came across another term which aims to encapsulate a similar idea and which seems to have gained more traction: the DevOp (apparently a contraction of "developer" and "sysop" - itself a contraction of "system operator"). As described in this recent article from IT World "The New Type of Programmer: DevOp" [warning: link opens with an ad] brings together coding expertise with a detailed understanding of how to manage and configure the environments that the code operates in. And bringing me full circle, according to the article, it's the brave new world of cloud computing that's creating the need for this "new type of programmer": "A cloud developer needs to understand the operating environment ... as well as the development environment."

On that basis I'm probably more devministrator than DevOp. But anyway, there's an interesting overview of the DevOp movement (almost a manifesto) by Patrick Debois, What Is This DevOps Thing, Anyway, which offers a much broader perspective than just cloud development - though I'd also recommend the free sample of Jeff Barr's book if (like me) you're a non-expert looking for a reasonably detailed introductory overview of the technical aspects of developing for the cloud.

Saturday, October 2, 2010

MySQL user administration basics

Last week I spent a frustrating few hours setting up a script to copy a MySQL-backed web application from CVS into a local test environment. In the process I somehow broke the database access settings in a way I didn't understand, and - despite the fact that it was probably quite a trivial error - all my attempts at manually resetting the MySQL user and password failed. Finally I resorted to using phpMyAdmin to sort out the immediate problem, but afterwards I felt I needed to learn more about the basics of MySQL user administration.

First, a bit of background: it's typical for applications such as mine to access the database via a dedicated MySQL user explicitly created for the purpose, and with access rights restricted to only the data that it actually needs. This acts as a security measure to limit the potential damage to the database as a whole if the application is compromised. (Any application that uses MySQL for its database backend should also implement its own independent application-specific user management system, but this is a separate issue not covered here.)

The first point is that MySQL's users are entirely distinct and separate from the users defined on the host system - so they also need to be managed separately. The essential SQL commands for doing this are:

GRANT: creates a new user and sets its privileges (if the username/hostname pair specified doesn't already exist); modifies the privileges for an existing user (if it does exist):

GRANT privileges ON data TO user IDENTIFIED BY password;

For example:

GRANT SELECT,INSERT,UPDATE,DELETE ON myappdb.* TO 'myappdbuser'@'localhost' IDENTIFIED BY 'quitesecret';

gives the user myappdbuser@localhost a limited set of privileges for all tables in the myappdb database. (To grant all permissions, privileges can be specified as ALL; data can be specified as *.* to grant the rights to all tables in all databases. Also, note that more recent versions of MySQL seem to support an explicit CREATE USER command in addition to GRANT.)

REVOKE: the opposite of GRANT; removes a user's privileges:

REVOKE privileges ON data FROM user;

For example:

REVOKE INSERT,UPDATE,DELETE ON myappdb.* FROM 'myappdbuser'@'localhost';

DROP USER: removes a user, for example:

DROP USER 'myappdbuser'@'localhost';

SET PASSWORD sets or changes the password for a user, for example:

SET PASSWORD FOR 'myappdbuser'@'localhost' = PASSWORD('newsecret');

To get a list of users as username/host pairs: execute a SELECT query on the user table of MySQL's mysql administrative database (where the user data is stored):

SELECT User,Host FROM mysql.users;

These commmands are actually very straightforward, however based on recent experience two particular points are worth emphasizing:

Each user is uniquely identified by a username and host pair (not just by the username) - so to MySQL fred@localhost, fred@example.com and fred@% are all different and distinct users.
Each user has its own set of associated privileges, specifying which database operations the user is allowed to perform and on which data - so it's possible for each of fred@localhost, fred@example.com and fred@% to have different privileges.

These are significant when considering connection requests where there is an ambiguity in the user specification (for example, omitting the hostname) which results in more than one potential match in the user table. In this case MySQL uses the most specific match, which might not be the one that was intended. The connection might then be denied (if the intended and actual users have different passwords), or have subsequent problems executing SQL queries (if the two users have different privileges). So I'd also recommend always specifying users explicitly whenever possible with the full username@hostname pair.

Hopefully this overview has given some insight into the basics of MySQL user administration, although obviously there's a lot more than I've outlined here (for example I've skipped over many details are for the GRANT command - see the section Account Management Statements in the MySQL manual for more comprehensive information) - and I'd still recommend using phpMyAdmin or similar for database user management (especially if you don't do it that often). However I hope it's still useful - and I'd welcome any comments or corrections.

Sunday, September 26, 2010

Basic CVS from Emacs

I've been a long time user of Emacs both on UNIX/Linux and on Windows, and I'm reasonably proficient now I suppose - I have a small set of keyboard short cuts that I use regularly for things like switching between buffers and doing query-replace operations - but I'm aware that I've only scratched the surface of what the program is capable of (after all, I still think it's "just" an editor).

So my most recent discovery is probably not news to a lot of people, but I still think it's pretty neat: specifically, the ability to access CVS version control operations on a file directly from within Emacs using its standard version control interface, vc.el.

A quick summary of the basics: to load vc.el and activate the functions, first do Ctrl-x Ctrl-q, then:

Use the sequence Ctrl-x v = to see differences between the current buffer and the last CVS version (i.e. cvs diff), and
Use the sequence Ctrl-x v v to commit your changes - this opens a new buffer to write the commit message, and when you're done use Ctrl-c Ctrl-c to finish the commit.

Other operations are available but I think these are the ones that will be most useful to me, as it means I won't need to leave the editor in order to commit changes from the command line. (Also with this method I don't need to "revert" the buffer after CVS has updated any keywords in the file, which I always have to do when using editor-plus-command-line.)

So far I've been using this with GNU Emacs 22.2.1 and CVS 1.12.3 on Ubuntu Linux. I'm guessing that Emacs works out that the file is under version control by detecting the CVS subdirectory, and that the same (or similar) key sequences might also work for SVN - but I have no idea yet if this will work on Windows using Xemacs and TortoiseCVS (I will report back once I've had a chance to investigate further).

Thursday, September 23, 2010

Stormy weather? Cloud Computing caveats

The concept of "cloud computing" suddenly seems to be going mainstream - as well as receiving several emails advertising Sitepoint's latest book "Host Your Website in the Cloud" (about how to use Amazon's cloud computing platform AWS), I've seen a whole bunch of articles and events about various aspects of the Cloud - not all of them uncritical (for example Dai Davis' BCS talk I previously reported).

Today I came across another article, "Cloud Computing caveats" by James Hayes (on the IET website) which draws attention to some of the issues that businesses should perhaps be considering before rushing to use cloud-based services. Ignoring the first cavaet ("Is it new?"), which really just gives a brief history of how today's cloud evolved from what went before (summary: internet connections have only recently achieved the capacity and resilience required to make them possible), the remaining concerns are much the same as those expressed in Dai's talk:

Data ownership: once you put your data into the cloud, you've given up a degree of control. Do you understand the implications, and are they acceptable both legally and operationally?
Service level agreements: if the service becomes unavailable for some reason, what reassurances do you have from your cloud provider about how long it will take to put it right? What would the impact of downtime be on your business?
Risk appraisal: cloud services might have a low up-front cost that is particularly attractive to smaller enterprises, but have they properly assessed the risks (and potential costs - not just financial) if the service encounters problems?
"Cloud governance": in the traditional model of enterprise software procurement, a company's IT department could exercise a high degree of control over what software was used there. The ease of access to cloud computing services offering equivalent functionality threatens to bypass these controls (including assessments of longer term costs and risks).

The software geek inside me finds the technical aspects of cloud computing technology absolutely fascinating (which is why I want to get the Sitepoint book, even though I don't really need it), and as a consumer I'm using all kinds of cloud-based services. But my internal project manager recognises that it's also vital important to understand any wider implications of a particular technology used in a specific context - best summed up by a quote from one of the people interviewed in James's article: "None of the risks associated with Cloud are 'showstoppers' for all enterprises and for all specific use cases within any single enterprise ... but they will be for some."

Wednesday, September 22, 2010

Simple distributed version control using Dropbox

Over the last 2 years I've been developing a few personal software projects, and while I'd happily been using CVS for version control on one machine (a laptop running Windows XP), there were two significant friction points: 1) how to transfer code to my other machines (I'm running two other separate operating systems - Ubuntu Linux and Windows XP - on a single dual-booting desktop PC), and then merge any changes or fixes back again, and 2) how to ensure that the CVS repositories were being regularly and reliably backed up.

One of the issues with the first problem is that while it's irritating and error-prone to work around (and has a deterrent effect on performing cross-system testing), it doesn't cause enough real pain to force you to properly deal with it. The second problem is that hardy perennial, that making back ups never really feels that urgent (until immediately after your hard drive fails).

So I had been limping along for a while until a few months ago when I read an article called "Easy Version Control" by Ryan Taylor (havocinspired.co.uk) in August's .net magazine. Ryan's article covers a lot of other things, but for me the key suggestion is to use the free Dropbox file synchronisation service (which also provides you with up to 2GB of online storage) to back up and share the version control repositories.

The actual set up process is simple:

Install the Dropbox client on the first machine,
Put your repositories inside the special Dropbox directory/folder that is created when the client is installed,
Install Dropbox clients on each of the other machines where you want to access the repositories.

The Dropbox clients do the rest: they automatically synchronise the Dropbox directories/folders - including the repositories - across all machines. You can then check out working copies of your code and commit changes back to the repository on any machine, which will be automatically reflected on all the others. Additionally, a synchronised copy of the repository is also held on the Dropbox servers (thus taking care of the back up issue).

In Ryan's article he works with SVN rather than CVS but this doesn't appear to be a problem - neither does sharing repositories between Linux and Windows (where I'm using TortoiseCVS ). (He also suggests that several people could work with the same repository at once by enabling sharing in Dropbox for the appropriate directories/folders.) So all in all this seems like an ideal solution for simple distributed version control for personal projects like mine.

Tuesday, September 21, 2010

Social and Legal Aspects of Cloud Computing

Last week I attended one of the public talks organised by the Manchester branch of the BCS, by Dai Davis of Brooke North LLP, entitled "Social and Legal Aspects of Cloud Computing".

Dai's definition of "cloud computing" was quite broad - essentially it's the delivery of a service over the internet, where encompassing "software as a service" through to "storage as a service" and "platform as a service". You could think of it as "renting" hardware, software and/or data storage. The most obvious example is web mail - the service provider typically gives you access to an email client and also handles storage and retrieval of your email.

A key characteristic of cloud computing is that the hardware and data storage could be physically located anywhere in the world, and as an end user you have no idea where they are - you're leaving the service provider to deal with the technical details - and for this reason, cloud computing services have undeniable attractions at the point of entry: they usually have low start-up costs for the end user, both financially and in terms of ease-of-use.

However Dai suggested that there are other factors to consider before opting to use these services, and central to this is control of data - your data. As already noted, once you've entered your data into the system you have no idea where is in the world. Do you know who else might have access to it? If you try to delete it, how do you know if it's really gone? And what if you want to get your data out again - can you get it in the format you need? The first three of these are potential issues under the EU Data Protection legislation, which forbids export of personal data outside the EU, only allows it to be held for as long as is necessary to process it, and stipulates that you must take appropriate measures to ensure its security.

Unfortunately you are unlikely to have any legally-binding guarantees from the service provider as regards any of these - you're asked to take it on trust that the service provider won't abuse their position of trust. Dai pointed to last year's incident when Amazon unilaterally removed copies of "1984" from customers' Kindles as an indication of what could happen, but there are other issues with people losing control of their data - posting to Facebook being one example (another interesting aside was Dai's observation that although it is possible to delete your Facebook account, it's very difficult to erase all trace of yourself from it).

Ultimately the choice about using cloud computing-based services is then a risk-reward analysis, and the problem is really that although the benefits are usually obvious, the risks only become evident further down the line. It's possible that we're yet to realise the full implications of these things. I don't think that Dai is saying that we shouldn't use these services, only that we should go in with our eyes open. Overall, a fascinating and slightly worrying overview of the issues.

Sunday, September 19, 2010

ironic_cog.init()

This is my blog dedicated specifically to my interests in software development. It's a bit of an experiment and I'm not sure how I'll get on with it, however if you're reading this then welcome!