XQuery is a powerful language, which can be used not only to query XML
databases, but also in new XML applications: to set up a Web-service,
for data integration, as a component of the semantic Web
infrastructure, or for plain file processing. Because it is a complex
language, and because it can be used in so many different
environments, building an XQuery processor is a challenging task. The
objective of this lecture is to explain you (almost) all that you need
to know to build an XQuery implementation which is both complete, and
efficient.
I will start by giving an overview of relevant compilation techniques,
for both query languages and programming languages. Then I will
describe the architecture for a complete implementation of XQuery and
walk you through the compilation of a query from parsing to
evaluation. During that process, I will identify the most significant
query processing problems and suggests some solutions. Notably, I will
argue for a very systematic approach to dealing with the details of
the XQuery semantics, for an hybrid optimizer that applies both
compiler optimizations and database optimizations, and for extending
traditional database algebras with XML streaming operations. This
presentation is very influenced by my experience in building
Galax, a
complete and open-source XQuery 1.0 engine implemented in Caml.