Code Architecture: July 2012

Recap

Just a brief reminder to please first read Part 1 - which gives a brief overview of the usefulness of primitives, lists and string maps in a type system, plus the 'null' concept to allow them to be optional. This post extends on those ideas to add new useful concepts that I've found useful / missed when using a type system.

Undefined

Given a null / unset option for values, this gets us pretty much to where we need to be in terms of storing information to use within an application - and numerous modifications we can apply to that information; in particular, it's enough for a basic CRUD system. But if you look closely at that previous link, you might notice where it falls down: modification (PUT vs PATCH in http).

Simply put, a standard update ('put') is where you copy every value from the given item into the target one, whereas patch lets you conditionally only update a subset. Note that you can always do a patch manually via read-modify-put, though too much semantics are lost (and result in concurrency problems). In my experience, the patch operation itself is used really frequently in applications (e.g. updating just an address for a contact), so is definitely worth supporting, but the question is then how to distinguish which values to update.

One common solution is to indicate 'don't update' by a null value, to the inevitable problem of then being unable to specify when to set a value to null! e.g. how would you patch a node to become a leaf? The alternative is to allow field-level PUTs - and while this is a great idea, and should be supported, it results in request explosion (one for each field, instead of for each substantial thing) and loses the batch atomicity that's inherent in patch methods. Once you then support batched PUTs, it's (almost) equivalent to a PATCH.

Javascript (/actionscript) however does this very well - by having 'null' = no value, but adding the concept 'undefined'; patching with null then sets a field to null, patching with undefined results in an unchanged field. Implementation hacks aside, 'undefined' is a great concept - consider models where a default value can be used in the absence of a value - i.e. a field that, when undefined, gets a default (possibly null) value; if in a hierarchy (think: css), it could then even walk up ancestors until a defined value can be used. Use with care though, as mentioned above conflation with null can lead to problems if the two need to be distinct (as with patch above).

Delete

As seen above, there's a difference in a patch method to setting a value to the literal 'null' vs not having it set; and undefined was used for the latter. The problem next then is - how can you set a value to 'undefined' in a patch method? So far we can create typed items (by setting to non-null), update and patch them, but there's no support yet for deletion (= setting to undefined, which can be different from setting to null). Essentially, what's required is a way to patch the value with the literal 'undefined', rather than the semantic version, which would then be ignored. The question then is how to patch something to literal-undefined...by setting it to literal-literal-undefined? ad infinitum.

Obviously, this solution has reached a problematic point, with a never-ending list of literals required. I'm not sure the best solution to this, so feel free to add any to the comments. Maybe the answer lies in each field being coupled with a 'defined' property, so a null value becomes {defined: true, value: null}, undefined is {defined: false, value: null} and literal undefined is {defined: true, value: undefined}. The literals then follow naturally via e.g. literal literal undefined is {defined: true, value: {defined: true, value: {defined: false, value: null}}} - which certainly does seem hacky though; however each layer is increasingly rare, so it's possible.

In the wild: ML/proto/objects/JSON/...

One thing worth adding when considering something like this is to see what the decades of programming have already given us in this area - after all, coders have needed types for a long time now. As mentioned, the simplest example of these types in action are classin OOP - primitives, arrays (lists) and objects/structs (string maps) are the basis for model types in dozens of languages.

That's not all, however - these types almost precisely descibe what's available in Javascript, and one of the reasons I believe it's as popular a language as it is - it has string map objects, and arrays, but is just lacking in primitives with its unusual number system. It's popular predecessor XML (or other markup languages) even have something similar - where an element is a string map with attributes for key-value fields, but the limitation of a single special list property, being its children; and with HTML5 data attributes gaining popularity, I can only see them merging closer.

Finally, there's another family of objects interesting in this case, being those designed for language independence. For example, Google's protocol buffers and Facebook's Thrift systems, which both attempt to capture aspects from type systems of multiple languages; each have message/struct-based key-value string maps as the fundamental structure, repeated fields, a variety of primitive types, etc... plus add a number of useful additional features to consider: optional fields/isset handling of null vs. undefined, default values, 'required'-ness, sets, enums, serialization, ... but overall, there's a large similarity between them and objects, json, python dictionaries, etc.

Future extensions?

That feels enough for this post, and possibly even enough for types in general - mostly, I feel that the features listed (primitives, lists, stringmaps, null/undefined/literals) are enough to cover a huge amount of applications, which giving enough control to require a non-huge amount of understandable code.

There are a few considerations provided by other systems though, that are likely to be worth future blog posts - for example, enums are available in many of the examples listed above, but haven't yet been mentioned. In particular, functions are definitely going to be a later topic, as it's a split between a number of examples above - for instance, Javascript and python both allow functional members, but Java and protocol buffers do not and XML being in the middle where by default it doesn't, but XSLT provides a lot of what is required. To hint at my thoughts, I think the Uniform Access Principle is flawed, but to know why, you'll have to tune in later!

As always, questions/comments etc. are welcome on this post or on G+.

Welcome to the first instalment of my Code Architecture blog, a place for me to document thoughts I've gained while working on multiple projects, where I think app development is headed, and how to get there.
As always, please feel free to add discussions at the bottom of each post or suggestions on topics to cover.

This initial post is going to cover what I consider one of the most fundamental topics - data types. That is, the values you can represent within an application, and how to use them.

Primitives - Numbers, Letters, ...

These are the most low-level types available - ones that map directly into binary values in hardware; For example, if you have a single bit, you can have a boolean value. This can be extended to 8-, 16-, 32- and 64-bit values (or higher), which can be interpreted as a number of types - integers, floating-point numbers, characters, enums, ... but that's about it in terms of how it's stored.

This gets us to the first area for expansion that I see, which is in terms of variable-length primitives. For example, UTF-8 is a common way to have a variable-length character set, and a few languages have arbitrary-precision integers, though there seems to be no one agreed-upon way of representing arbitrary-bit values using only 2^n-bit numbers.

Complex - Lists (and Strings), Maps ...

There are a few more concepts that appear all over the place while writing software - the most basic is a list (or array, vector, ...). Simply put, it's an ordered collection of primitives, with the most typical example being a list of characters, or a string. These are used everywhere for text, but at a higher level, lists are required to represent anything ordered, and can be combined for higher dimensions - e.g. a list of lists for a grid/table.

For unordered items however, by far the most useful type appears to be the Map, at least where model data for an application is concerned - and in particular, StringsMaps, where the values are each keyed by a string. A StringMap (or Dictionary) is the fundamental type in both JavaScript and Python, as well as representing the underlying structure of an object in O-O languages, with field names mapping to their values.

More types exist (collections, sets, binary trees, ...) but mostly these can be implemented in terms of the above, and are less common in usage - so a modern language mostly just needs primitives, lists, strings and stringmaps for its data model types. As an aside, perhaps for a later post, I feel that (nested) stringmaps are the best approximation so far of how humans index information themselves - e.g. chair = {legs: 4, colour: "black", cushion: true} etc... hence their prevalence and usefulness today.

Type extension: Nullability

Given a language with defined value families, this gives us the ability to represent a lot of real-world entities in a program - but there's still a problem in that all values are required; to properly represent optional values, a type system will also need the concept of nullability - to be used when a value is unset.

For an example of what this gives you, consider a recursively defined data structure, like a binary tree; If all nodes require two child nodes, and they require two children, and those four grandchildren require two each, and ...etc, suddenly simply can't define all the (infinite) objects required, there needs to be a way of optionally not having some. Of course, it's possible to have these without nullability - e.g. add an 'isSet' boolean to every type, and a singleton 'null' value to use when isSet is false (or in the tree example, have 'parent' and 'leaf' types) but these all degenerate into essentially equivalent scenarios, with the concept of nullability being the simplest generalisation. Linked lists, skip lists, B trees, tries etc, and anything using them, are all improved by nullability.

Nullability also assists with asynchronous data (later post coming), for a convenient default value of things before they're fetched; it does have the downside often of forcing repeated null checks, though one possible way of dealing with those will be coming later, and other techniques exist (@NotNull, or the Elvis operator).

The combination of the types above (primitives, complex, nulls) give you enough to write a lot of things in a language - but there's still more aspects to types, problems which I see appearing that are still to be solved by different constructs - that said, this post is already pretty long, so tune in next time for types part 2! And as mentioned before, please post any thoughts or requests in the comments section below.

Code Architecture

Hidden Image for Share

Tuesday, July 31, 2012

On Types - Part 2