Thursday, 24 January 2013

CrO4 - Why it didn't happen

After a few months of experimenting and tinkering with CrO4, I eventually stopped work on it. There were 3 main reasons for this (from least to most important):

  1. Developing a feature which crosses the WebKit/Chromium project boundary sucks. Transitioning from a JavaScript Chrome plugin and NodeJs server prototype into something that had practical performance and scalability required hacking that functionality into the browser itself. Unfortunately, this code straddles the interface between WebCore (The DOM and HTML code in WebKit) and Chromium it's self.
    Hopping this boundary meant multi-hour build times, and trying to navigate through code that is highly C-Macro infested and interacts with auto-generated C++ code (most of Chromium's WebKit glue is C++ generated by a few dozen Perl scripts which parse the hundreds of IDL files in WebCore). Answering a question like "who calls this" usually involved a 45-minute build to create some scenario observable in GDB, because navigating that code manually is totally impractical.
  2. HTML is incredibly stupid. I can't express just how amazingly stupid the amount of legacy and related bullshit there is in the modern browser. Think of the most menial task you can do in a browser, like setting a font, or moving a text box to the middle of the screen. Now think of all the completely different ways that you can do that in HTML. Then add CSS into the mix. The amount of duplicated and obscure functionality in the modern page renderer is mind-boggling, simply because of how important it has been to keep adding features to browsers without breaking existing pages. This both augments point #1, and makes synchronizing a given page across multiple browsers even more impractical.
  3. The UI of CrO4 wasn't a once-size fits all solution. CrO4 worked well for long reads which took multiple sittings, and editing documents/forms while on the go, but it didn't do well with stream-based services. When I look at my phone in the morning, I really don't care about what was half-way down my Facebook feed, or what post #125 on my Reddit front page was last night. The inertia of removing that cruft canceled out the value of not losing my place on other services. CrO4 is just too intrusive.
But, my initial problem of reducing the barrier of seamless transition between devices still exists. Reading Reddit on my browser, and then walking to the train an looking at my phone still presents me with duplicate information and a discontinuity in my user experience. Looking for a flight at work, and then continuing my search from home still requires me to basically start from scratch.

So, what have I learned from CrO4?

  • Browsers are an engineering quagmire. There is a lot of value for this problem waiting to be taken from within the browser, but Html (and accompanying technologies) have acquired so much technical debt that this approach isn't going to yield anything from a one-man personal project in a reasonable amount of time.
  • Don't build a new basement under an unstable building. CrO4 was an attempt to add new functionality to existing infrastructure by treating the platform of the browser as an abstraction that could be swapped out for free. This happens in networking all the time (Ethernet, WiFi, and 3G all do this quite well), but my mistake (well, one of them) was not to consider the robustness and interface complexity of the layer of abstraction directly above the one I'm replacing.
  • Building up and out is easier than re-modeling. Through working on CrO4 I've developed a great deal of respect for Linus Torvalds' hard-line attitude of kernel fixes not breaking user-space programs. Even if those user-space programs depended on "wrong" kernel features ("wrong" is different from "broken"). Re-building existing infrastructure means that you need to make sure that all the existing functionality of the world needs to keep working, where as building on top of it (ie, a new web service) or building in parallel to it (ie, a new mobile platform) provides a huge "clean slate" degree of freedom.

And, the most important lesson:

  • I don't know how to magically make all web services multi-device friendly without changing the service. I was careful not to word this "It is impossible to magically make..." because I'm still willing to bet that there is some umbrella solution to this problem, but I've conceded that I've only grazed the edge of that solution if it does exist. As far as I'm concerned (for now) the solution to the multi-device discontinuity problem rests in the distributed hands of the developers of the services that users want to use.

So where do I go from here? 

One of the consequences of not being able to use the browser event model and a single central execution environment (CrO4's server-browser DOM+JavaScript VM) means that most services will need to have multiple executables running over many devices.

In an ideal world, this collection of devices could just be considered a composite of identical state machines which form one simple state machine over multiple devices, but in reality, it means that we have multiple independent systems connected through changing variable time delays. In less mathematical terms, it's a bitch of a communications problem to solve.

The general identity for the problem of synchronizing multiple agents, all of whom are modifying the state of some document, where each agent's document needs to have the changes of all the other agents applied to it (in some sort of manner that makes sense for that particular document format) is called Operational Transform (OT).

The problem with OT is that it is difficult to do, and for most services, the cost-benefit tradeoff of implementing OT (substantial complexity and developer time costs) to the benefits it provides (multi-device synchronization and offline editing ability) usually ends in a clear case for not using OT.


If I could provide OT in a package that lowers the implementation costs to a level where app and service developers see OT as a worthwhile investment, it may solve my original problem (seamless experience over multiple devices). This is where my latest project, Cortex OPT, comes into play.

More to come on this topic soon.

1 comment:

  1. Since you brought up node, I figure this is at least tangentially related enough to warrant a mention: