googlefusion-4Yesterday was the announcement from Google of their new “Fusion Tables” offering, a web based data tool that promised easy visualization, collaboration, and sharing.  Well, I’ve tried it and found it .. well “lacking” is one way of saying it I suppose.  I loaded up what I thought would be an easy dataset, Robert Kosara’s modified version of the UK Met Office Climate data.  I’ve wanted to play with that for a while now, and this seemed the perfect chance.  Unfortunately, I was very underwhelmed with Google’s latest offering.

Read my review after the break.

Startup

Simply visit http://tables.googlelabs.com/ and login with your usual Google credentials, and you can browse the public tables, or upload your own. New tables can be uploaded from your own machine, and they accept data in Excel Spreadsheets (up to 1 Meg) and CSV files (up to 100Meg).  The UK Met Office dataset from Robert Kosara is an 84Meg CSV file, so I used that. While uploading it took mere seconds, after that begins the lengthy “importing” phase.

Google doesn’t use any database you’re probably familiar with like SQLite or MySQL, instead they use their own proprietary database called ‘Bigtable‘.  Bigtable was specifically designed for the massively distributed and parallel systems that make Google work, but in this case it’s a huge problem.  After an hour, it had imported 21% of the data.  After six hours, it had imported 80% of the data.  I finally gave up and checked in the next day to find it had finished, although I don’t know the specifics of when (somewhere in the 8-10 hour range I believe).  The resulting dataset is 1.4 million rows and 11 columns.

Usage

So once the data is finally imported, you can begin looking around.  To Google’s credit, you can actually visualize and edit the data as it’s being imported, albeit in partial form.  The default view is, of course, a basic table view.  To keep from crashing your browser (and reducing bandwidth as well), the table view is restricted to viewing only 100 rows at a time.

The Starting Table View

The Starting Table View

googlefusion-2The first order of business was to check the datatypes “autoselected” during the import.  That’s done by going to the “Edit” menu and selecting “Modify Table Info” .  I immediately ran into a few hiccups:

  • Data Types – Only a few data types are recognized: Text, Number, Decimal (Number but Floating Point), Date/Time, and Location.
  • Data Restrictions – Only a single field can be “Location”
  • No calculated fields – This is particularly annoying in this case, as you can’t combine the “Month” and “Year” fields into a single date, nor can you combine the “Latitude” and “Longitude” fields into a single location.

By default, the “Year” was selected as a Date and the “Station Name” was selected as Location.  Google is smart enough to automatically geocode location names (As best it can anyway), so that’s actually a pretty good selection for a Location.  Using the Lat/Long pair would be better, but then I would have to build a new CSV file and go through another 8+ hour import.

So, with our data loaded, what to do?  Visualize.

Visualizations

This is where Google Fusion both shines & falls flat.  Ironic, huh?

Simply select the “Visualize” menu and select “Intensity Map”.  It automatically loads up a google map and chooses to map Station Number across the world.  Not too useful, I know, but a quick click and that’s changed to Temperature, and Voila! Look at the results:

Visualizing Temperature Across the World

Visualizing Temperature Across the World

If you’re like me, then you’re probably scratching your head thinking something’s not right.  The clue, hard as it is to see, is in the upper right corner.  Google Fusion Tables will only visualize 250 rows at a time. That makes this type of visualization effectively useless.

So, we’ll limit the data, right?  So, I select the “Filter” options, and enter the following criteria:

  • Continent = Europe
  • Year = 1950
  • Month = 8

So, this trims the data down to the temperature of Europe during August 1950, which reduces it down to a nice 152 rows, which creates the following broken map:

googlefusion-4The problem here is that the “Station Names” is mapped to the wrong locations.  Again, a problem that would be easily fixed by using the Lat/Long pairs, but I can’t do that.  I modified the Table structure to indicate that “Country” is a Location, hoping to correct it, and redrew the map which I’ve embedded below for you to try out:

Much better, but still with some errors:

  • The US is being used to show country “I”.. Dunno what that is.
  • “Yugoslavia” is somewhere in Mexico.

Note: I don’t know why, but when I embedded the map here, those problems went away.

Also, the Colormap can not be changed, it’s stuck on a light to dark green ramp.  While this certainly quells the “Death to the Rainbow” coalition, a red or orange colormap would probably fit better with the data I’m visualizing.

Conclusions

Well, Google Fusion Tables has promise, but only for looking at small-ish datasets.  The 100 or 250 row restriction, while improving interactivity, makes large-scale visualization impossible.  Also, there’s not much I can see in good interactive widgets.  The ability to embed a slider & map it to the “Year” field would make this much more useful and interactive, but doesn’t seem to exist.

I’m hoping that many of the problems I’ve listed here (extreme import times, low column limits, inability to compute fields) is largely due to the “Labs” or Beta status of the product.  Hopefully Google will continue to invest in it and grow it into a more serious competitor.

If you want to try it, I’ve made the table public so that you can try it yourself.  Simply go to this link and try it out for yourself, minus the 8 hours of import.