Monday, July 14, 2008

hybrid meaning making: human / computer cognition

Basil: Can't we get you on Mastermind, Sybil? Next contestant Sybil Fawlty from Torquay, special subject the bleeding obvious.

Back in April, I posted about the Powerhouse Museum's use of human tagging along with automatic indexing.

A flood of things have triggered some further thoughts on this topic. I finally got round to reading Clay Shirky's stuff on gin, sitcoms & cognitive surplus. Then CapitalD sent out tweet with this video presentation by Luis von Ahn on human cognition. Big Lou is running a game called ESP that generates metadata on photographs under the guise of providing entertainment. Sneaky huh?

Meanwhile on the other side of town...

I am doing some research around photography & images at the moment - and encountered the Second international Photo Metadata Conference held in Malta a month ago. One presentation that caught my eye was by Chris Town on Imense.

Imense is a search engine that uses automated image processing to generate results.

Time for a showdown: Imense vs Google Images.

Round One. Let's start with something simple: a "Syndey sunset". Here is Imense & here is Google. Really not that much in it. Some good stuff and nothing wildly off-beam. A tie.

Round Two. How about "an elderly lady holding a broom"? Here's Google. The image on my list is a palpable hit however number 3 ain't even close. Over to you Imense. Er, Imense? What's that? "Sorry, no results found. Please change the query or search options and try again." Nul point there.

Round Three. Time for something metaphysical - how about "sadness"? Google comes back with these and Imense comes back with these. On the whole, both pretty darn miserable. Another tie. Keep going.

Round Four. OK - I like the work of Powell & Pressburger so how about something from The Red Shoes. So in goes "red shoes movie" and out comes? Well Google's first link is for this but by image number 4 it gets there. And Imense? Not even close.

It's 2-4 to Google. This is may simply be down to the number of images that Imense is dealing with - a larger pool with give better results. But it seems that while Imense may work with obvious stuff ("I need a picture with 3 people being chased down a beach by a dog"), it doesn't seem to handle the non-obvious stuff better.

How could Big Lou's approach dovetail with Chris's?

My suspicion is that our attempts to deal with the vast amounts of stuff (words, pictures, sounds) that we are producing will require a human/computer hybrid approach. This will make two groups unhappy - those that believe that raw computing power can solve any problem and those that believe that machines have no role to play in human meaning making.

For the rest of us, it's all very promising.

No comments: