XProc

XProc
Filename extension
.xpl
Internet media type
application/xproc+xml
Type of formatStylesheet language / Scripting language
Extended fromXML
StandardXProc 3.0

XProc is an XML transformation language for processing documents in pipelines: chaining conversions and other steps together to achieve the desired results. It can handle documents in XML, HTML, JSON, text and binary.

The current (stable) version is 3.0.[1] While XProc 1.0[2] is a W3C Recommendation, XProc 3.0 is a standard developed by the W3C XProc Next Community Group.[3]

Its main characteristics are:

  • XProc is a programming language, expressed in XML, in which you can write pipelines.
  • An XProc pipeline takes data as its input (often XML) and passes this through specialized steps to produce end results.
  • Steps range from simple ones, like adding attributes, to more complex stuff like splitting/combining/pruning, transformations with XSLT and XQuery, validations against schemas, etc.
  • Within a pipeline you can do things like working with variables, branching, looping, catch errors, etc. Everything is based on the data flowing through.
  • XProc pipelines are not limited to a linear succession of steps. They can fork and merge.
  • XProc allows you to create custom steps by combining other steps. These custom steps can be used just like any other. Therefore, pipelines and steps are interchangeable concepts in XProc.
  • Custom steps can be collected into libraries.
  • XProc aids in the housekeeping surrounding the processing, like inspecting directories, reading documents from zip files, writing things to disk, etc
  • There is software that can execute these pipelines, the so-called XProc processors.

Example

The following is a (very) simple XProc pipeline:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="3.0">

  <p:input port="source"/>
  <p:output port="result"/>

  <p:add-attribute attribute-name="timestamp" attribute-value="{current-dateTime()}"/>
  <p:delete match="@data"/>

</p:declare-step>
  • It declares two ports:
    • An input port called source. This is where the original document flows in.
    • An output port called result. This is where the resulting document flows out.
  • The document that comes in through the source port automatically flows into the first step of the pipeline. This p:add-attribute step adds an attribute called timestamp with the current date and time.
  • The result of this flows through the p:delete step that removes all attributes called data.
  • Since p:delete is the last step, the resulting document flows out through the output result port.

So if you supply the following XML document to this pipeline:

<example data="321">
  <item data="123">Some data...</item>
</example>

It comes out as:

<example timestamp="2024-09-11T15:05:22.82+02:00">
  <item>Some data...</item>
</example>

The exact date and time recorded in the timestamp attribute is of course dependent on the date and time the pipeline is executed.


Understanding and learning XProc

The learning page of the XProc website[4] contains links to all the learning and reference materials the XProc community group is aware of. There is a special 101 section with introductory learning materials.


History

Ideas for something, some programming language, for processing were there right from the beginnings of XML, at the end of the twentieth century. But it was not until the end of 2005 that the W3C started a working group called the XML Processing Model Working Group. this resulted in the recommendation for XProc 1.0 dated May 11, 2010.[2]

There were various attempts to create working XProc 1.0 processors. The only two currently available as open source products that implement the full 1.0 standard are XML Calabash[5] and MorganaXProc.[6]

After the release of version 1.0, the XProc working group continued debating a next version. Ideas were raised for version 2.0. This was based on a non-XML syntax which didn’t raise a lot of support from the community. Engagement in the working grouped waned and in 2016 it ceased to exist.

In June 2017 the XProc Next Community Group[3] was founded and started working on a new version, now completely XML based. Because this was a completely different approach than the 2.0 initiative, the version number was increased to 3.0. A stable version was released on 12 September 2022.[1]

In 2024 the working group started work on a minor update to 3.1.


Implementations

The following processors support the XProc 3.0 standard:

  • MorganaXProc-IIIse,[7] maintained by Achim Berndzen. Implements all required and most of the non-required parts of the XProc standard.
  • XML Calabash 3,[8] maintained by Norman Walsh. This is (2024) under development.


Older versions

The following processors support the XProc 1.0 standard. There were several other XProc 1.0 implementations, but these were either incomplete or are not maintained.

  • XML Calabash,[5] maintained by Norman Walsh. This processor is also integrated in the Oxygen XML Editor product.
  • Morgana Xproc 1.0,[6] maintained by Achim Berndzen.


This is the logo of XProc. It was created by Bethan Tovey-Walsh. The fish is called Kanava, which is Finnish for pipeline.


References