A Pokemon Case Study: Objects vs Names

Introduction

Which return value makes more sense to you?

getListOfPokemon()
// => ["Bulbasaur", "Ivysaur", "Venusaur", "Squirtle", ...]

Or:

getListOfPokemon()
// => [{ kdex: 1, ndex: 1, name: "Bulbasaur", type: ["Grass", "Poison"] }, ...]

One is a list of Pokemon names, and the other is a list of Pokemon objects with assorted fields.

What is a "Pokemon"?

In terms of code, what is a Pokemon? Should it be a name, or an object? And if it's an object, what does it contain?

I like this summary of the problem:

People often talk about a Person class representing a person. But it doesn’t. It represents information about a person.

https://lispcast.com/clojure-and-types/#types-as-concretions

Pokemon is an abstract concept that lives outside of our code. We just happened to model information about it in a context-specific way. The structure we impose on that information (e.g. with types, schemas, conventions...) is a concretion.

Looking at the bigger picture, a Pokemon is not simply the sum of its name, Pokedex number, attributes, moves, whatever. It's a category that transcends all of that.

There could be many contexts in which to convey information about a Pokemon. In another context, it may be more important to describe what a Pokemon looks like. e.g. URLs to images, or if it's a game, perhaps 3D object data.

So a hypothetical "Pokemon class" represents information about a Pokemon. But could there ever be a canonical, all-purpose Pokemon class?

Qualifiers?

One way you may think to remove ambiguity is to qualify the kind of information you're dealing with. A Pokemon what?

// Should we call it this?
let favoritePokemon = "Charmander"

// Or this?
let favoritePokemonName = "Charmander"

It borders on Hungarian notation, but it could be claimed that this convention helps with readability. Maybe it's not immediately clear if "favoritePokemon" is its name, or an object that has information about the Pokemon.

let favoritePokemon = lookupPokemonByName("Charmander")

And we're back at our initial problem. Maybe "favoritePokemon" is the Pokedex number, or it's information about the Pokemon. Unless the reader's familiar with the code's conventions (assuming they're consistent), they might run into this confusion.

Would type declarations help?

Whenever we do anything to make code easier to understand, we should consider what additional information is given to the reader.

It might nudge the reader in the right direction. If "favoritePokemon" is a string, you could reasonably infer that it's the name. Maybe.

function isGrassType(pokemon: string) {
  ...
}

Here's the problem: The fact that it's a string doesn't actually carry semantics. This might not even be a name! Lots of things can be strings. What if it's a UUID, or a shorthand ID (e.g. "001-bulbasaur")?

Which name?

If you're a Pokemon enthusiast, you might know that, as a Japanese invention, Pokemon also have Japanese names.

let favoritePokemon = lookupPokemonByName("Hitokage")

No wait, it's actually:

let favoritePokemon = lookupPokemonByName("ヒトカゲ")

Dangit.

Example-driven development

We can address a lot of coding comprehension issues by providing examples.

It's unfair to not allow developers to see what data looks like. Every time a library ships documentation without examples, it's a disgrace. Autogenerated JavaDocs that only enumerate class fields and method signatures are not documentation. A command line tool's '--help' page that only enumerates accepted flags are not documentation.

No, the usage of your library is not made obvious with typing conventions. Add examples. Please.

With the following function:

function isGrassType(pokemon: string) {
  ...
}

We might provide example runs:

isGrassType("Bulbasaur")
// => true

isGrassType("Charmander")
// => false

isGrassType("Not a Pokemon")
// => false

I'd like developers to be more proactive on documentation-by-example. I personally prefer compiling static Wiki-style websites that combine conceptual documentation and usage guides.

The code is fine, actually

At a coding level, there's nothing wrong with being terse about what we're talking about. Conflating a Pokemon with its name, or an object, or whatever you need it to be is good - if you know what it is.

We only start to lose our vision once we stop understanding what sort of data is going in an out.

We need feedback on the data

My conclusion is that abstractions summarize concretions, not replace them. At the end of the day, we need to see what's going on. I believe that ambitious fully-qualified "every must be unique" naming conventions a global top-level underestimate the nuances of the real world.

Should we really go back to identifying all schemas with URI namespaces, like with XML?

The real world is messy and has different ways of conceptualizing information. Information is inherently context-sensitive and is often incomplete.

My previous frustration with the "types-as-documentation" ideology is that it doesn't necessarily reflect real-world usage. Sure, "money" is a number type, but how do I use it? Which currency is it? Is it counted in dollars or cents? Should I call it "numberOfUsdCents"? Can I have a fraction of a cent?

Going the other way - with global naming conventions - should have helped break the illusion, but I later realized that it too, wasn't perfect. How verbose should names be? When a new name is introduced and creates ambiguity, should everything else be renamed? It's like saying "Michael's just not clear enough. There's millions of Michaels. This won't do. You're Michael Joseph Jackson, born on August 29, 1958… from Gary… Indiana… United States… of America."

A comment on name scope and single-letter variable names

Conventional wisdom in programming is to avoid single-letter variable names. As is often seen with the cargo-cult, people take this to extremes and make the "rules" of programming either/or. ALWAYS or NEVER.

It's an argument I find deeply unsatisfying, because we use single-letter variables all the time. If you've ever written a for-loop, you probably use single-letter variables.

for (int i = 0; i < 10; i++) {
  ...
}

Or simple mapping with fixed-point variables:

let cents = dollars.map(x => x * 100)

Or other idiomatic constructs:

try {
  ...
} catch (e) {
  ...
}

In sitations like these, We understand that names have scope. We understand that i is an index, because it's the sort of thing programmers see all the time. We're not confusing i in the for-loop with another i in another function in another file or something.

Yet, there's truth that with enough breadth, brevity starts to hurt us. I've always seen names as a "scoping" issue. As a rule of thumb, global(er) scope legitimize longer names, and local(er) scope shorter names. Again, it depends on context!

A comment on "contextless" data modeling

I believe that data is always interpreted based on context. The real question is "what is the breadth of the context, and where is the floor of the context?". Even "contextless" data models are rooted onto something local.

In Clojure for example, programs use fully qualified symbols and keywords to solve data aggregation problems. e.g. :com.my-application.core.pokemon/id, instead of simply :id. The kicker is that these keywords are usually based on a package namespace with authority, such as Maven Central or Clojars.

XML Namespaces and XSD do the same thing.

I'm making a postmodern argument for data. Semantics are always relative to something.