In Defense of OpenStreetMap's Data Model
Who's the customer, anyway?
I was at a hotel recently with one of those slots at the door to put your key card in to power up the room. The hotel had already put a special card in it, not to be removed. This keeps the lights and air conditioning on 24/7.
One might imagine the satisfied engineers and/or product people multiple time-zones away: Happy with a great success, a hotel design that would cut costs and save energy, hundreds of little keycard boxes designed-in to the rooms. They probably got an award or pay rise or some other recognition.
All the users had to do at the actual hotel, which the designers would probably never visit, is follow the rules.
But, life finds a way.
A file format is a representation of some data as a collection of characters which can be kept in memory, saved on disk or sent over a network.
Some formats are very simple like txt files, where what you see is what’s on the disk. Some are a little bit more complicated like HTML which takes text and then wraps around it things that say “this bit is bold” or “this bit is a link”.
Some formats are very complicated and the data and the disk representation of the data can be very different from each other. For example if you play a music file you can hear it but the file itself would look like impenetrable nonsense to most people.
File formats used to be very important 20-40 years ago because computers were basically terrible back then. They were quite literally millions of times worse than now in every aspect, so formats had to be small and fast.
But still, the simpler it is, the more people will use it generally speaking.
There comes a disconnect when somethings designer and user are different people. The canonical example (the design of everyday things) is a stove top with four burners arranged in a square, but the controlling knobs arranged in a line. This requires you to think. Requiring you to think is the opposite of good design.
Design gets more important the more your life might depend on it. For example, in cryptonomicon, the American protagonist comes across a Russian mortar:
Shaftoe consults the instructions. It does not matter that these are printed in Russian, because they are made for illiterates anyway. A series of parabolas is plotted out, the mortar supporting one leg and exploding Germans supporting the opposite.
The disconnect between a designer and the user is called the principal payer problem in economics. What is means is this; heads I win and tails you lose. It means if someone else pays for your mistakes, you’re going to make mistakes. It means I can make slightly cheaper stove tops and millions of people can turn the wrong knob on every day.
The answer, as any product owner will tell you, is to get close to the customer. To talk to them. To understand them. To feel their pain. The big short:
Deutsche Bank had a program it called KYC (Know Your Customer), which, while it didn't involve anything so radical as actually knowing their customers, did require them to meet their customers, in person, at least once.
The problem with talking to the customer for anything you are making is that you might be wrong. This can be embarrassing. It’s hard work. It brings negative feedback, which nobody wants to hear. So finding a product person who will actually talk to a customer is a little like finding a six-legged starfish. Possible, but you’ll need specialist equipment and time.
This is why bad design is everywhere.
There is a cheap defense, and that’s simplicity. If you won’t or can’t talk to customers then at least make the thing simple. The simpler something is, the more people will use it.
And that’s what the OpenStreetMap (OSM) model and format is - simple. In the geographic map model, roughly speaking, there are points - nodes - in space. If they’re on their own they can be restaurants and mountain peaks. If you link them together in a line they can be roads and rivers. If you loop them together in a ring they can be lakes or islands. And that’s kind of it, and it’s scaled from nothing to millions of users.
These nodes and other things are glued together with something called tags, which are an unconstrained and open way to describe anything on Earth.
This simplicity was - and is - necessary because the customers are largely working for free. OSM’s community are volunteers. Volunteers writing the software and volunteers wandering around to collect data. The harder you make it for them to edit, the less volunteers you’ll get.
I’m leaving out all the technical reasons this is great like OSM’s core API is extremely fast, scalable, atomic and maintainable. And all the people who hardened it after I designed it.
OSM’s design is simple but it’s also a little chaotic for the same reason that it’s simple: It’s a mirror of the human world, which after all is chaotic. This has been a sore spot for the types of people who want to design something far away and have rule-abiding users do what they’re told.
For (nearly) decades some of these people have complained about the downsides of the data model. It’s a little bit inconsistent and for many use cases you need to process the data.
You see, OSM isn’t stored strictly in a geographic way. This is on purpose because it’s so simple, but it means that nodes in Finland might somehow link to nodes in Barbados sometimes. You need to check for wacky things like this if you process the raw data in to a map. It’s not really a big deal, but things do happen which break the data.
Now you could fix this all with rules.
You could have rules that say you can’t link Finland to Barbados. That one-way streets can’t end at each others end. That countries can’t be renamed. That roads should know where they are without interrogating their individual nodes.
And you’d be wrong every time. Because countries do change their names, the-country-formerly-known-as-Turkey just did. And maybe there’s some undersea cable that links Finland and Barbados, who knows.
And that’s the point, rules and complexity have completely unknowable downsides. Downsides like the destruction of the whole project.
With each rule and added complexity you make the system less human and less fun. You make it a Computer Scientists rube goldberg machine while sterilizing it of all the joy of life.
So of course this is exactly what they want to do.
The OSM Foundation which oversees the OSM project has grown with its funding. The Engineering Working Group (EWG) of the OSMF has “commissioned” (I think that’s OSMF language for paid) a longstanding proponent of rules and complexity to, uh, investigate how to add rules and complexity to OSM.
I’m tempted here to see if I could task DALL·E 2 with “photorealistic image of someone throwing out a baby with the bathwater”.
While it may be predictable that the person who wants to do this is German, or ironic that apparently the EWG skipped their own process to commission this, funny that the EWG threw out its own dictum (“Simpler formats are better."), the sad thing is writ large in the blog post itself:
<the problem with OSMs data model/format is that it>… makes processing OSM data extremely cumbersome and resource-intensive.
The problem is that processing OSM data is cumbersome and resource-intensive.
So the problem has nothing to do with people viewing OSM data, which is billions of people now. Nothing to do with volunteers mapping and editing, which is millions of people. Nothing to do with making it easier for volunteers to build software to edit the map, which is thousands of people.
No, the problem is the dozens of people processing the data. The processing here, by the way, is almost entirely automated for most things, it just takes time.
Let us pray that the EWG is just throwing Jochen a bone to go play in the corner and stop annoying the grownups.
The image that comes to mind is of The Fountainhead. What a nice building, but, we need to add something baroque and Greek on the front of it. Thus missing the entire point.
Of course this has all been done before. There are much more complicated geographical formats and internet editing systems than OSM. They have wonderful consistency, powerful editing tools that require training to use them and 200 page documents just on the data model. And of course, nobody uses them.
The question is what to do instead?
Talk to the customer.
This isn’t complicated but the OSMF has all the problems of a product owner at a company, together with no profit motive, unfireable volunteers, too much un-spendable money, lack of leadership and a loudest-voice-wins election system.
I only met one person from OSMF who ever talked to customers (“end-users” if you don’t like the word “customer”) and he’s gone.
On the other hand I talk to customers all the time, all over the world.
They break down in to two types. The volunteer map editors and the service providers who process the OSM data in to services for end users like you.
For years they’ve been saying essentially the same thing - we need to get more data (especially addresses and points of interest) in to the map so that end users can search OSM for “100 main street, cityville” and it finds what they’re looking for.
It’s unclear to me how making the system more complicated and rewriting all the tools will expedite that. Knock yourself out, I guess?
Facebook talked to the customer too, and some of them complained about consistency in OSM data like Jochen does. Facebook solved this in a beautifully OSM-like way: daylight. Daylight is a sanitized, consistent and cleaned up map based on OSM that didn’t require everyone in the entire world who uses OSM to change what they’re doing.
So, before you throw support behind solutions, try to talk to customers to see if the problem is real first.