I don't lend a lot of credence to most statistics as it is widely understood that up to 68% of them are bogus to start with, but there is one that certainly has a lot of staying power in mapping and GIS circles. It goes something like ď80 percent of data stored in databases has a spatial component.Ē Itís usually used in context of selling a geocoding solution to add coordinates to these tables or a visualization platform to help you spot trends in your precious 80% that isnít living up to its potential. Iíve heard it stated with confidence by analysts and seen it used widely in press releases dozens of times over the years. What are the origins of the statement? Is it even plausible? I chuckle every time I read it, not because I donít believe it to be accurate but because I actually know the origins of it. The most recent example I saw was just the other day buried in the announcement of the EPAís choice of mapping platforms:
Because roughly 80 percent of the worldís business data has a location component, the Virtual Earth tool adds significant value to the EPAís efforts to unlock, view and act upon the location component inherent in its data
Before I reveal the origins of the stat, letís think about its validity. Iíve always interpreted Ďdataí in this context to mean Ďrecordsí. In other words if any field or fields of a table is spatial in nature, it would qualify. Whatís spatial? I think of that as anything that can be geocoded, addresses being the most common type of data. A post code, Phone number, IP address all can be geocoded as can dozens of other types of data. Itís also important to consider that data in tables keyed to another table containing location should also count; if an order record is tied by customer ID to a customer table with a delivery address, that order has location. So Iíll buy it; the vast majority of data in a database has a spatial component. 80 percent of it in fact if that makes you feel better
So who came up with it? Iíve heard GIS Hall of famer Roger Tomlinson often credited with it. But thereís a 74% chance that isnít true. I trace it back to a Product Manager and Product Marketer at MapInfo. To protect the innocent, I wonít reveal their names here but one sounds like a sneeze if you say it fast and the other is often confused with a member of the Pixies. MapMarker was the first mapping product I ever worked on at MapInfo around 94/95. MapMarker could rip through a table of a 100k street addresses or postcodes in an hour on a 486 and assign coordinates, something we take for granted today. To support MapMarkerís value proposition, Sneeze and Pixie came up with the 80% figure and used it in fact sheets and other collateral. From there it just spread and was recycled endlessly until it became accepted as fact. I have no idea what kind of research went into coming up with the 80% figure; it could have simply been a wild guess that sounded reasonable and defensible at the time (probably the origin of many stats At any rate, I think itís probably in the ballpark and if nothing else serves to remind us of the value in treating location as a first class citizen in our data stores. If anyone has a source for this stat pre 1994, share it in comments. And if Pixie or Sneeze happen to be reading this and want to reveal their identities and take credit for their stat, jump in.
UPDATE: a couple of my ex-MapInfo colleagues contacted me to inform that the stat was used at MapInfo before sneeze was at the company. Further, one of them attributes the origins to MapInfo founders Laszlo Bardos and Sean OíSullivan with Pixie later referencing it in MapMarker's marketing materials.
Click here to view the article.