I don’t blog for a week, and look what happens! In the past week, Kevin Cunnington was announced as the new head of GDS, taking over from Stephen Foreshew-Cain. Naturally, there had been intense speculation about the move in the days beforehand, much of it vastly overblown but nevertheless unsettling to GDS staff.
Thankfully, both for us but also for the British public who have come to expect great-quality government services online, Kevin has announced that it’s business as usual for GDS. It is not being broken up, dismembered or disbanded. Personally, I think GDS has done more than anyone to bring user-centred services to the core of government, and it’s great news that this will be continuing.
In the meantime, with all of this happening in the background, I was getting on with a very interesting week-long exercise with the other developers in my team.
We have a bit of a problem - there are 300,000 and growing pages on GOV.UK, but only 5% of these are tagged. Now, we could get a group of people to give up a year of their time to sit down and go through tagging all of these. If that’s not a candidate for the world’s most boring job then I’d like to know what is. Instead, why not get computers to do the difficult job for us?
I have to admit that I’ve never worked on machine learning before, but that week taught me so much about how algorithms can be trained to understand the contents of documents and suggest tags for them. This is something I never thought I’d be learning as a developer on GOV.UK.
I worked on supervised learning, specifically the naive Bayes algorithm, where I took a sample set of already-tagged documents and trained the algorithm. Then I provided it with a set of untagged documents and got it to tag them. Measuring accuracy against the tags humans would give those same documents, the algorithm scored 70% at first pass, which is very respectable.
Now, this was purely an experiment to work out the potential of using such algorithms, but given the results (both of my supervised learning and the unsupervised learning experimented with by my fellow developers), there is a lot of potential and hope that such algorithms could be built into the tools we provide to content editors across government. This would make their lives much easier by suggesting tags from the new taxonomy that we’re also working on, allowing them to approve or change these choices.
Exciting, and all in a week’s work at GDS.
If this sounds like a good place to work, take a look at Working for GDS - we’re usually in search of talented people to come and join the team.