February 2013
by Robbert Broersma

The web needs “XML: The Good Parts”

Introduction

A new generation of web development frameworks on the rise provide a comprehensive starting point for creating web applications. They combine a basic templating engine, 'two-way data bindings' for form elements, and automatic recalculation of calculated values. The most popular amongst these frameworks are AngularJS , Ember.js , Knockout , and Polymer .

Fueled by the development of a suite of new web standards, Polymer is exploring markup-based declarative web applications ("everything is an element") using a simultaneously developed JavaScript implementation of the new Web Components standards.

The increasing adoption of declarative frameworks might finally change the mindset of web developers in ways that make them appreciate the groundwork that standards like XPath, XForms and XSLT have laid.

Using the momentum of Web Components, dubbed the 'declarative renaissance' , a rebound of declarative XML technologies is possible when they are re-imagined to fill voids in JavaScript frameworks that are widely deployed today.

What could web development look like with full-fledged templating and query engines?

Death of a browser technology

Web development is shifting towards developing declarative applications, and frameworks are evolving slowly towards the same feature sets that have been designed for the XPath, XForms and XSLT standards a long time ago. If these shared philosophies are growing more popular, then why aren't the XML technologies themselves appealing to web developers?

XML has become an unspeakable technology, a future averted by WHATWG pragmatists, a cumbersome toolset that has no place on the web.

While few web developers will have actual experience with XML, the image of XML has irreversibly been damaged by the disappointment of a previous generation.

The decline of the technology can be seen in many things: using AJAX definitely doesn't mean you're using XML anymore. XMLHttpRequest is now mostly being used for exchanging JSON, hardly anyone still uses XHTML, the new DOM specification no longer considers attributes to be a Node, and Chrome even intents to fully remove XSLT 1.0 support.

Perhaps one of the main reasons for disappointment is that this family of extensible standards have been the least extensible parts of browsers: implementations cannot be extended to support new features. An XSLT 1.0 processor in the browser cannot be made to support the more recent <xsl:for-each-group>. Using variables in DOM XPath evaluators is technically impossible, and so is creating a polyfill for XPath 3.0 functions. If one feature is missing from an XML standard that is essential for your use case, on the web you cannot use the standard at all.

Not all should be considered lost: XML technologies that didn't make into browsers, or have since disappeared, now have a second chance.

We will show what can be done to cut down on complexity, and how we can give developers more flexibility. We will argue why Web Components should not settle for less than the power of XSLT 2 templates, and how users of popular web frameworks are missing out on the good parts of the XML platform.

Custom elements reincarnated

The extensible web manifesto pleads to design new web standards with low-level extensibility in mind, opening up possibilities for web applications that were previously reserved for browser makers and browser extensions. The new DOM standard is specifically designed to allow to be influenced and extended using JavaScript.

Web Components aim to transparently provide web applications with full control over rendering and processing unknown elements in HTML.

Enable mix and match

With the Shadow DOM 'mix and match' of markup vocabularies has never before been so close to being practical. CSS styles are contained within their respective components, so the theming of one widget library doesn't affect the layout of another. Events and focus transparently handle components as one Node in the DOM tree, even though internally they actually may consist out of dozens of elements.

The future seems bright indeed.

To prevent conflicts with elements that later might be added to the HTML standard, all custom elements must be prefixed using a dash, e.g.: <custom-menu>. Supposedly, it will not be long before someone will write a little script called AutoPrefixer that for convenience will implement something equivalent to element namespaces, to reduce typing and improve readability of the code.

<polymer-ui-menu>
  <polymer-ui-menu-item icon="settings" label="Settings"></polymer-ui-menu-item>
  <polymer-ui-menu-item icon="dialog" label="Dialog"></polymer-ui-menu-item>
  <polymer-ui-menu-item icon="search" label="Search"></polymer-ui-menu-item>
</polymer-ui-menu>

The above could then become:

<ui-menu inherit-prefix="polymer-">
  <ui-menu-item icon="settings" label="Settings"></ui-menu-item>
  <ui-menu-item icon="dialog" label="Dialog"></ui-menu-item>
  <ui-menu-item icon="search" label="Search"></ui-menu-item>
</ui-menu>

It's a trick that is remarkably similar to how XML works, where using the special xmlns attribute essentially defines a default prefix for all elements. Usually however, instead of being a notable convenience, namespace declarations make writing XML documents from scratch a bother. It needn't be though: not using hard-to-remember URLs would make namespaces as simple as writing import pickle in another language. Like the HTML doctype declaration was simplified to <!DOCTYPE html>, and since relative URLs in xmlns aren't being resolved anyway, we might as well start using predictable namespaces values and make our lives easier.

<menu xmlns="polymer-ui">
  <menu-item icon="settings" label="Settings"/>
  <menu-item icon="dialog" label="Dialog"/>
  <menu-item icon="search" label="Search"/>
</menu>

Doesn't that look handsome, compared to the first example?

The above illustrates another syntax inconvenience of HTML compared to XML. Custom elements in HTML cannot be self closing. Within the possibilities of HTML parsing this seems like a reasonable trade-off, but compared to XML it is rather inconvenient.

Namespaces could also be useful to prevent clashes across libraries providing user interface widgets, and prevent the need for verbose element names that are required to prevent clashes. Libraries that register custom HTML elements should also register those elements in a namespace, providing maximum ease of use as well as flexibility for those who require it.

Clashes between vocabularies already exist, in this example an SVG element becomes an HTML element:

<template id="vertical-bar">
  <rect ... height="{value}"></rect>
</template>

Because of how HTML is parsed, the namespace of the <rect> element will be the HTML namespace instead of SVG, breaking the constructed image. This would make implementing templates for the following example impossible, unless XHTML syntax is used and a namespace prefix explicitly defines <svg:rect> in the SVG namespace.

<bar-graph>
  <vertical-bar value="42"></vertical-bar>
</bar-graph>

For namespaced custom elements to work, the Shadow DOM must also define document.registerElementNS(). Registering these elements must not require a dash in the element name when the namespace is not the HTML or SVG namespace.

Using XHTML imports

For reasons outlined above, Web Components might benefit from using XHTML syntax in external template files. A major reason for not using XML syntax on the web doesn't apply for these files: when a server framework uses string concatenation to create XML instead of tree serialization there is a big risk of well-formedness errors, causing the page not to load at all. HTML imports are most likely hand-coded, and syntax errors can be caught during development just as easily as one would with JavaScript files.

There are significant drawbacks though: ampersands, and the less-than and greater-than signs need to be escaped as &, < and >. When handcoding an HTML or XML file, unlike escaping < and >, it isn't very intuitive to escape ampersands that are part of URL query strings, in the img src attribute for example. For script blocks it is even a bigger issue, because escaping renders the script completely unreadable.

Overcoming these disadvantages is necessary to make switching to XHTML worthwhile for developers. On the web, XML is best served with error recovery. An initiative like XML Error Recovery is essential to increasing adoption of XML.

Templates will drive the web

Templates are at the core of most web pages. Historically templates are processed server side, but increasingly additional content is provided with the servers only sending the raw data and scripts from the web page implementing templates to present it.

Conditionally showing validation warnings next to form inputs. A list of autocomplete suggestions. The latest tweets. Showing the number of unread e-mails between parentheses after "Inbox", or not anymore when the last unread mail is opened.

Powered by queries

Data-driven templates need to specify what data sets to iterate trough, and what values to present on the screen. For tabular data like relational databases and CSV files using SQL makes sense, for traversing DOM trees CSS selectors and the more powerful XPath can be used.

For JSON a path-based language (like XPath, based on UNIX file paths) makes sense as well. In reality however, queries don't traverse data that is parsed from a JSON string, that would guarantee the structure is in fact a tree. So called 'JSON structures' really are regular JavaScript arrays and objects, and can potentially contain cyclic references.

Handlebars is one example of a template engine that traverses over JavaScript structures. It implements the popular Mustache template syntax, self-described as 'logicless templates': that means templates without if/else statements, and no loops. Ironically these are in fact the only things they do offer, most notably these engines are lacking query expressions: there are no comparison operators and no arithmetic operators.

Here's the syntax, in a nutshell: {{#person}} … {{/person} is an instruction to loop over a dataset and repeat the contained template, equivalent to <for-each select="person"> in XSLT. {{^person}}Your address book is empty{{/person}} is an if-not statement, and simply {{name}} renders text output, like <value-of select="name"/> does.

The output of any template is essentially limited to the data that can be targeted using the query language. When an advanced query language is not a significant part of a template engine, complex selecting and filtering must occur in a preprocessing step.

Taking advantage of XPath

Most query languages in JavaScript template engines evolve as part of a template engine, and semantics are adapted to support more use cases while earlier versions are already widely put in production use. Unfortunately absence of a diligent design process will lead to quirky and undesirable behavior, such as the number value 0 evaluating to the boolean true in the case of Handlebars.

Query engines for JavaScript structures are often bootstrapped by compilation to JavaScript function bodies, relying on eval() to implement the query. This shortcut gives these scripts the advantage of a smaller code size, offloading the workload to the browser. Especially in early versions of such query engines, there is significant risk for script injection and constantly auditing security will remain of the essence.

Reliance on the conversion to JavaScript, and the pressure to keep the codebase small in general, has lead to an unexpressive and makeshift syntax of the expressions that define what templates do. In Mustache selecting the n-th item from an array is not even possible: only statically numbered indexes can be selected, like accessing the first item using list.0.

The meager semantics of the expression language ensure that the essence of queries will move to helper functions implemented in JavaScript, diluting the foremost value of using a template: separation of concerns. When describing what is to be computed can be defined in a query expression, the essence of the template is captured on the spot.

The semantics of the query language should not be left to chance. Also, developers should not be so restricted in their templates that for what would be a basic comparison in XPath, they need to resort to writing dozens of lines of code in JavaScript.

When so many users of template engines could use more advanced queries, why not take advantage of an existing, fully documented language that too has been designed for tree structures? A language designed around the exact use cases of all these template frameworks, accompanied by tens of thousands of unit tests. We shouldn't hold back the web by waiting the next ten years for makeshift solutions to mature, instead we should be looking over the horizon, starting from a high point that XPath has already reached.

To the benefit of everyone

Since JavaScript on the web is mainly used to power user interfaces, it is surprising that Unicode string handling was never part of the language, or built into the core of most libraries. String lengths are inaccurate, which is especially problematic for validating form inputs. Sorting will cause ‘Motörhead’ to be listed after ‘Mott the Hoople’, reversing strings can even move diacritics to a different letter: ‘daehr̈otoM’.

Being inclusive to a wide range of cultures has always been an important aspect in the design of many W3C standards, because the web is always facing a larger audience than just John Doe. Both developers using XPath and XSLT and the visitors of their sites enjoy the benefits of having full Unicode support, whether it comes to input validation, sorting, string modifications or number formatting.

Both those who employ a library and the developers of what essentially are user interface libraries face a choice: sacrifice the belief that it is okay to have a 'lightweight' framework, or choose a policy of excluding individuals from all over the world, like from Finland or from Turkey.

Support for Unicode can add considerably to the download size of libraries. When widely used libraries had never considered it an option to not support Unicode, browsers wouldn't have had the luxury to wait until 2013 to deploy the ECMAScript Internationalization API and offer significant bandwidth savings to their users by introducing new Unicode APIs.

Unicode that just works, localized formatting for numbers, currency, dates and time zone corrected times should be within reach to anyone developing a template. Based on thorough research to provide a publishing platform for content around the world, XSLT and XPath are powerful tools that need a facelift but then really need to find their way back to the web.

Design for developers

Starting with what web developers already know and expect, XPath should be extended to more conveniently support common use cases. At least all basic JavaScript functions should have readily-available alternatives in XPath. The following string functions need to be introduced, for example: string-split(), string-reverse(), string-replace(), string-repeat() and trim(). To encourage consistent behavior, all functions that are available in XPath should be available from JavaScript as well.

To align with expectations of JavaScript developers, the else-expression in if/then/else should be made optional. The API should allow to add new functions to XPath that can be used without namespace prefix, because namespaced functions... you can't explain that! Of course the math functions must become available without the math: prefix too.

On the template front there are idiosyncrasies too, that we needn't expose a new generation of developers to. For example: the regexp attribute on the analyze-string instruction should not allow attribute value templates by default, because those will obviously break the regular expression syntax: when { and } need to be escaped, that will render the expression unreadable and make it impossible to just copy over existing patterns.

What would templates look like if they were designed for use in HTML in the first place?

<xsl:for-each-group select="blogpost/tag" group-by=".">
  <xsl:sort order="descending"/>
  <li>
    <xsl:value-of select="."/>
  </li>
</xsl:for-each-group>

In HTML the same template would involve a lot less typing:

<for-each select="blogpost/tag" sort reversed group-by=".">
  <li>{{.}}</li>
</for-each>

This is the kind of template syntax that is easy to get started with and easy to remember, and still offers the power of XSLT.

Instantly add interactivity

When developers would declaratively implement the interactive parts of web pages, they can not only move the burden of DOM manipulation to the template engine, but also adding event listeners and cleaning up afterwards. Being relieved of these duties is already hugely being appreciated by users of libraries such as D3.js .

Implementing an actual ticking clock in SVG would become child's play:

<svg viewBox="0 0 1000 1000" width="150" height="150">
 <g transform="translate(500, 500)">

  <path stroke="black" stroke-width="20" d="M 0 0 L 0 -325"
        transform="rotate({hours-from-dateTime(current-dateTime()) * 30 +
                           minutes-from-dateTime(current-dateTime()) div 2})"/>

  <path stroke="black" stroke-width="20" d="M 0 0 L 0 -450"
        transform="rotate({minutes-from-dateTime(current-dateTime()) * 6})"/>

  <path stroke="red"   stroke-width="5" d="M 0 0 L 0 -450"
        transform="rotate({floor(seconds-from-dateTime(current-dateTime())) * 6})"/>

 </g>
</svg>

The most common user interface widgets can easily be implementing declaratively using XPath-driven templates. Paging of search results can be implemented by simply applying templates to subsequence(result, $page * $pageSize, $pageSize). Not only can browsing be implemented by simply updating the $page variable, more search results will automatically be rendered when the $pageSize preference for the number of items per page changes.

A list of autocomplete suggestions could highlight the search keyword matches in bold using the analyze-string templates. Using XForms-like bindings between the data model and HTML form inputs, a visitor of a webshop could narrow down the search results using slider inputs for minimum and maximum price:

<input type="range" ref="$search/price/min" min="0" max="{max($results/price)}">

With no additional code, adjusting the sliders should trigger the removal of the results that don't fall into the selected price range, as well as trigger rendering results that we're previously limited to another page by the $pageSize maximum.

Because web pages must not become unresponsive when something unexpected happens, the semantics of some XPath functions and XSLT instructions should be amended for interactive queries not to cause fatal exceptions but instead to fail more gracefully, like returning NaN or the empty sequence, or execute a fallback template.

Automatically optimizing performance

Using an extensive and expressive query language instead of the one-dimensional queries offered by Mustache, has more advantages than just increased flexibility in handling data structures.

The greatest benefit will come from programmatically analyzing expressions for powering reactive templates: they can be used to determine reasons for recalculation and to automatically create dependency trees so recalculations can be performed in optimal order. Expressions could even be split up into several parts, limiting the impact of updating one variable to recalculation of only affected expressions, always taking the most direct path to correctly updating the template output.

Combine aforementioned logic and parts of the query can also be allowed to return values asynchronously, transparently making use of Promise return values, and finish rendering the dependent templates when the data arrives.

Even without reactive templates there is much to gain. There already is a lot of experience with automatically optimizing templates based on query expressions, such as fusion of tree traversal loops, branch elimination and hoisting (parts of) expressions out of loops.

Brought to you by Frameless

Since the summer of 2011 the authors of this essay have been working on a framework to make the declarative web reality: Frameless . Designed as a multistage project, work started out with implementing the latest versions of XSLT and XPath in JavaScript, followed by repurposing the engine as an interactive template engine, while staying compliant with the tens of thousands of test cases from existing test suites.

Frameless is software implemented in JavaScript that aims to bring powerful features from existing web standards together in the browser, and explores ways to combine the declarative real-time data bindings from XForms with the powerful templates of XSLT.

We are happy to announce that all improvements and dreams that have been put forward here are already being enjoyed in the Frameless labs: the future is now.

Conclusion

Web Components are a big step forward in providing the web platform with the extensibility that was imagined for XML. Because browsers lacked APIs to extend standards from the XML platform, the technologies failed to live up to expectations of providing distributed extensibility and were abandoned.

Now that declarative programming is being appreciated more by the web development community, a lot of value and experience can be found in these sidetracked browser technologies, although some refurbishing will be in order to go from interesting to desirable.

We must re-imagine existing technologies with ease of use in mind: a leaner syntax and a powerful JavaScript API. What would XSLT look like with HTML syntax? What would XPath be like when it was designed by JavaScript developers?

Compromises have to be made: on the HTML side namespaced elements must stay first class citizens, requiring changes to the Shadow DOM working draft specification. From the XML side, versions of XForms and XSLT should be distilled that adhere to HTML syntax conventions. XPath must radically improve the extensibility by and the interaction with JavaScript.

There is a future where declarative web applications will make lives of developers much better, but only if together we start learning from the past and stop pretending things don't need to get more complex before they get easier.

W3C, Introduction to Web Componentshttp://www.w3.org/TR/components-intro/
Adaptive Path, Ajax: A New Approach to Web Applicationshttps://web.archive.org/web/20080702075113/http://www.adaptivepath.com/ideas/essays/archives/000385.php
WHATWG, DOM Standard: Attr interfacehttp://dom.spec.whatwg.org/#interface-attr
Adam Barth, Intent to Deprecate and Remove: XSLThttps://groups.google.com/a/chromium.org/forum/#!searchin/blink-dev/xslt/blink-dev/zIg2KC7PyH0
Alex Russel, Real Constructors & WebIDL Last Callhttp://infrequently.org/2011/10/real-constructors-webidl-last-call
W3C, Web IDL: NamedConstructorhttp://www.w3.org/TR/WebIDL/#NamedConstructor
W3C, Shadow DOMhttp://www.w3.org/TR/shadow-dom/
W3C, Plenary Ballot on relative URI References In namespace declarationshttp://www.w3.org/2000/09/xppa
Stefan Goessner, JSONPath - XPath for JSONhttp://goessner.net/articles/JsonPath/
Mustache: logic-less templateshttp://mustache.github.io/
Handlebars: 0 is truehttps://github.com/wycats/handlebars.js/issues/608
Mario Heiderich, A wiki dedicated to JavaScript MVC security pitfallshttps://code.google.com/p/mustache-security
AngularJS filter modulehttp://docs.angularjs.org/api/ng.filter:filter
Mustache: Accessing Array item by index in templatehttps://github.com/janl/mustache.js/issues/158
Mathias Bynens, JavaScript has a Unicode problemhttp://mathiasbynens.be/notes/javascript-unicode
Steven Levithan, XRegExp Unicode addonhttp://xregexp.com/plugins/#unicode
Peter Beverloo, Chrome 24 betahttp://blog.chromium.org/2012/11/a-web-developers-guide-to-latest-chrome.html
ECMAScript Internationalization APIhttp://www.ecma-international.org/ecma-402/1.0/
W3C, Personal names around the worldhttp://www.w3.org/International/questions/qa-personal-names
Domenic Denicola and Brian Cavalier, Promises/A+ specificationhttp://promisesaplus.com/
Anne van Kesteren, XML5's Storyhttp://archive.xmlprague.cz/2012/files/xmlprague-2012-proceedings.pdf
WHATWG, HTML: tree constructionhttp://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html
W3C, XML Error Recovery Community Grouphttp://www.w3.org/community/xml-er/
Sharon DiOrio, Angular Filters Beyond OrderBy and LimitTohttps://www.youtube.com/watch?v=L4FJ_kuO9Rc&t=4m49s
W3C, XPath: Trigonometric and exponential functionshttp://www.w3.org/TR/xpath-functions-30/#trigonometry
Oona Räisänen, Wanted: Valid last namehttps://twitter.com/windyoona/status/427176843158888449
Tex Texin, Internationalization for Turkish: Dotted and Dotless Letter "I"http://www.i18nguy.com/unicode/turkish-i18n.html
The Extensible Web Manifestohttp://extensiblewebmanifesto.org/

QUISLEGETHAEC?