31 Oct Sql server tutorial 19 Working with XML Data
Key Skills & Concepts
- Learn the Basics of XML
- Learn About SQL/XML
- The Extensible Markup Language (XML) is a general-purpose markup language used to
describe documents in a format that is convenient for display on web pages and for
exchanging data between different parties. The specifications for storing XML data in
SQL databases were added to the SQL standard in SQL:2003 as Part 14 (arguably the
most significant enhancement to that version of the standard). Part 14, also known as
SQL/XML, was expanded in SQL:2006, and some error corrections were published in
SQL/XML is not at all the same as Microsoft’s SQLXML, which is a proprietary technology used in SQL Server.
As you can imagine, the unfortunate similarity in names has caused much confusion. Microsoft participated in
the standards proceedings for SQL/XML, but then chose not to implement it.
Learn the Basics of XML
In order to understand SQL/XML, you must first understand the basics of XML. While a
complete explanation of XML is well beyond the scope of this book, this topic provides a
brief overview. You can find a lot more information by searching on the Internet.
You may already be familiar with HTML, the markup language used to define web
pages. If so, the syntax of XML will look familiar. This is because both are based on the
Standard Generalized Markup Language (SGML), which itself is based on Generalized
Markup Language (GML), which was developed by IBM in the 1960s. A markup
language is a set of annotations, often called tags, which are used to describe how text is
to be structured, formatted, or laid out. The tagged text is intended to be human readable.
One of the fundamental differences between HTML and XML is that HTML provides a
predefined set of tags, while XML allows the author to create their own tags.
Learn About SQL/XML
As already mentioned, XML is commonly used to represent data on web pages, and that
data often comes from relational databases. However, as you have seen, the two models
are quite different in that relational data is stored and presented in tables where neither
hierarchy nor sequence have any significance, while XML is based on hierarchical trees in
which order is considered significant. The term forest is often used to refer to a collection
of XML tree structures. XML is used for web pages because its structure so closely
matches the structure that would be used to display the same data in HTML. In fact, many
web pages are a mixture of HTML for the static portions, and XML for the dynamic data.
It is perhaps this widespread implementation that has led many of the major vendors,
including Oracle and IBM, to support XML extensions.
SQL/XML can be divided into three main parts: the XML data type, SQL/XML
functions, and SQL/XML mapping rules. I cover each of these as the major topics in the
remainder of this chapter.
The XML Data Type
The XML data type is handled in the same general way as all the other data types I had
discussed in Chapter 3. While storing data in XML format directly in the database is not
the only way to use SQL and XML together, it is a very simple way to get started because
it is a logical extension of the earliest implementations where SQL developers simply
stored the XML text in a column defined with a general character data type such as
CHARACTER VARYING (VARCHAR). However, it is far better to tell the DBMS that
the column contains XML and the particular way the XML is coded so that the DBMS can
provide additional features tailored to the XML format.
The type modifier is required and must be enclosed in a pair of parentheses as shown,
while the secondary type modifier is optional, and in fact is not supported for all type
modifiers. The standard is not specific about how a particular SQL implementation should
treat the various types, but some conventions and syntax rules are specified. The valid
type modifiers are:
DOCUMENT The DOCUMENT type is intended for storage of text documents
formatted using XML. In general, the data values are expected to be composed of
human-readable characters such as letters, numbers, and symbols as they would appear
in an unstructured text document.
CONTENT The CONTENT type is intended for more complex data that can
include binary data such as images and sound clips.
SEQUENCE The SEQUENCE type is intended for XQuery documents, which are
often called XQuery sequences. XQuery is an advanced topic that is beyond the scope
of this book.An SQL/XML function (also called an XML value function) is simply a function that
returns a value as an XML type. For example, a query can be written that selects nonXML
data (that is, data stored in data types other than XML) and formats the query results
into an XML document suitable for display.