Code Architecture: On Types

Recap

Just a brief reminder to please first read Part 1 - which gives a brief overview of the usefulness of primitives, lists and string maps in a type system, plus the 'null' concept to allow them to be optional. This post extends on those ideas to add new useful concepts that I've found useful / missed when using a type system.

Undefined

Given a null / unset option for values, this gets us pretty much to where we need to be in terms of storing information to use within an application - and numerous modifications we can apply to that information; in particular, it's enough for a basic CRUD system. But if you look closely at that previous link, you might notice where it falls down: modification (PUT vs PATCH in http).

Simply put, a standard update ('put') is where you copy every value from the given item into the target one, whereas patch lets you conditionally only update a subset. Note that you can always do a patch manually via read-modify-put, though too much semantics are lost (and result in concurrency problems). In my experience, the patch operation itself is used really frequently in applications (e.g. updating just an address for a contact), so is definitely worth supporting, but the question is then how to distinguish which values to update.

One common solution is to indicate 'don't update' by a null value, to the inevitable problem of then being unable to specify when to set a value to null! e.g. how would you patch a node to become a leaf? The alternative is to allow field-level PUTs - and while this is a great idea, and should be supported, it results in request explosion (one for each field, instead of for each substantial thing) and loses the batch atomicity that's inherent in patch methods. Once you then support batched PUTs, it's (almost) equivalent to a PATCH.

Javascript (/actionscript) however does this very well - by having 'null' = no value, but adding the concept 'undefined'; patching with null then sets a field to null, patching with undefined results in an unchanged field. Implementation hacks aside, 'undefined' is a great concept - consider models where a default value can be used in the absence of a value - i.e. a field that, when undefined, gets a default (possibly null) value; if in a hierarchy (think: css), it could then even walk up ancestors until a defined value can be used. Use with care though, as mentioned above conflation with null can lead to problems if the two need to be distinct (as with patch above).

Delete

As seen above, there's a difference in a patch method to setting a value to the literal 'null' vs not having it set; and undefined was used for the latter. The problem next then is - how can you set a value to 'undefined' in a patch method? So far we can create typed items (by setting to non-null), update and patch them, but there's no support yet for deletion (= setting to undefined, which can be different from setting to null). Essentially, what's required is a way to patch the value with the literal 'undefined', rather than the semantic version, which would then be ignored. The question then is how to patch something to literal-undefined...by setting it to literal-literal-undefined? ad infinitum.

Obviously, this solution has reached a problematic point, with a never-ending list of literals required. I'm not sure the best solution to this, so feel free to add any to the comments. Maybe the answer lies in each field being coupled with a 'defined' property, so a null value becomes {defined: true, value: null}, undefined is {defined: false, value: null} and literal undefined is {defined: true, value: undefined}. The literals then follow naturally via e.g. literal literal undefined is {defined: true, value: {defined: true, value: {defined: false, value: null}}} - which certainly does seem hacky though; however each layer is increasingly rare, so it's possible.

In the wild: ML/proto/objects/JSON/...

One thing worth adding when considering something like this is to see what the decades of programming have already given us in this area - after all, coders have needed types for a long time now. As mentioned, the simplest example of these types in action are classin OOP - primitives, arrays (lists) and objects/structs (string maps) are the basis for model types in dozens of languages.

That's not all, however - these types almost precisely descibe what's available in Javascript, and one of the reasons I believe it's as popular a language as it is - it has string map objects, and arrays, but is just lacking in primitives with its unusual number system. It's popular predecessor XML (or other markup languages) even have something similar - where an element is a string map with attributes for key-value fields, but the limitation of a single special list property, being its children; and with HTML5 data attributes gaining popularity, I can only see them merging closer.

Finally, there's another family of objects interesting in this case, being those designed for language independence. For example, Google's protocol buffers and Facebook's Thrift systems, which both attempt to capture aspects from type systems of multiple languages; each have message/struct-based key-value string maps as the fundamental structure, repeated fields, a variety of primitive types, etc... plus add a number of useful additional features to consider: optional fields/isset handling of null vs. undefined, default values, 'required'-ness, sets, enums, serialization, ... but overall, there's a large similarity between them and objects, json, python dictionaries, etc.

Future extensions?

That feels enough for this post, and possibly even enough for types in general - mostly, I feel that the features listed (primitives, lists, stringmaps, null/undefined/literals) are enough to cover a huge amount of applications, which giving enough control to require a non-huge amount of understandable code.

There are a few considerations provided by other systems though, that are likely to be worth future blog posts - for example, enums are available in many of the examples listed above, but haven't yet been mentioned. In particular, functions are definitely going to be a later topic, as it's a split between a number of examples above - for instance, Javascript and python both allow functional members, but Java and protocol buffers do not and XML being in the middle where by default it doesn't, but XSLT provides a lot of what is required. To hint at my thoughts, I think the Uniform Access Principle is flawed, but to know why, you'll have to tune in later!

As always, questions/comments etc. are welcome on this post or on G+.

1 comment:

Fergal DalyAugust 1, 2012 at 2:18 AM
You can solve the null/defined update with an escaper, some class that you agree in advance will always never be copied directly into the date to be patched, instead you look at it's contained value. Then you can use null to signal not updating a field and Escape(null) if you actually want to set a field to null. Or if you fell that setting a field to null will be a common operation, define a DontUpdate singleton and make that mean don't update and use Escape(DontUpdate) to mean set it DontUpdate and of course Escape(Escape(DontUpdate)) to mean set it to Escape(DontUpdate) etc.

With this you can also then define a Delete singleton to signal that you want to remove the field entirely.

The only downside now is that your patcher has to check for these sentinels and escapes so it goes a bit slower.

Code Architecture

Hidden Image for Share

Tuesday, July 31, 2012

On Types - Part 2