Abstract
This chapter from "The Guru's Guide to SQL Server Architecture and Internals" gives you an architectural and a practical-use overview of XML for SQL Server (SQLXML). You'll find out how the SQLXML technologies are designed and how they fit together, and you'll learn about practical applications such as using OPENXML, accessing SQL Server over HTTP, and using URL querles.
The key to everything is happiness. Do what you can to be happy
in this world. Life is short—too short to do otherwise. The deferred
gratification you mention so often is more deferred than gratifying.
H. W. Kenton
NOTE: This chapter assumes that you’re running, at a minimum, SQL Server
2000 with SQLXML 3.0. The SQLXML Web releases have changed and enhanced
SQL Server’s XML functionality significantly. For the sake of staying
current with the technology, I’m covering the latest version of SQLXML rather
than the version that shipped with the original release of SQL Server 2000.
This chapter updates the coverage of SQLXML in my last book, The Guru’s
Guide to SQL Server Stored Procedures, XML, and HTML. That book was
written before Web Release 1 (the update to SQL Server 2000’s original
SQLXML functionality) had shipped. As of this writing, SQLXML 3.0 (which
would be the equivalent of Web Release 3 had Microsoft not changed the
naming scheme) has shipped, and Yukon, the next version of SQL Server, is
about to go into beta test.
This chapter will also get more into how the SQLXML technologies are
designed and how they fit together from an architectural standpoint. As
with the rest of the book, my intent here is to get beyond the "how to" and
into the "why" behind how SQL Server’s technologies work.
I must confess that I was conflicted when I sat down to write this chapter.
I wrestled with whether to update the SQLXML coverage in my last
book, which was more focused on the practical application of SQLXML but
which I felt really needed updating, or to write something completely new
on just the architectural aspects of SQLXML, with little or no discussion of
how to apply them in practice. Ultimately, I decided to do both things. In
keeping with the chief purpose of this book, I decided to cover the architectural
aspects of SQLXML, and, in order to stay up with the current state of
SQL Server’s XML family of technologies, I decided to update the coverage
of SQLXML in my last book from the standpoint of practical use. So, this
chapter updates what I had to say previously about SQLXML and also
delves into the SQLXML architecture in ways I’ve not done before.
OVERVIEW
With the popularity and ubiquity of XML, it’s no surprise that SQL Server
has extensive support for working with it. Like most modern DBMSs, SQL
Server regularly needs to work with and store data that may have originated
in XML. Without this built-in support, getting XML to and from SQL Server
would require the application developer to translate XML data before sending
it to SQL Server and again after receiving it back. Obviously, this could
quickly become very tedious given the pervasiveness of the language.
SQL Server is an XML-enabled DBMS. This means that it can read and
write XML data. It can return data from databases in XML format, and it can
read and update data stored in XML documents. As Table 18.1 illustrates,
out of the box, SQL Server’s XML features can be broken down into eight
general categories.
We’ll explore each of these in this chapter and discuss how they work
and how they interoperate.
MSXML
SQL Server uses Microsoft’s XML parser, MSXML, to load XML data, so
we’ll begin our discussion there. There are two basic ways to parse XML data
using MSXML: using the Document Object Model (DOM) or using the Simple
API for XML (SAX). Both DOM and SAX are W3C standards. The DOM
method involves parsing the XML document and loading it into a tree structure
in memory. The entire document is materialized and stored in memory
when processed this way. An XML document parsed via DOM is known as a
DOM document (or just "DOM" for short). XML parsers provide a variety of
ways to manipulate DOM documents. Listing 18.1 shows a short Visual Basic
app that demonstrates parsing an XML document via DOM and querying it
for a particular node set. (You can find the source code to this app in the
CH18\msxmltest subfolder on the CD accompanying this book.)
Listing 18.1
Private Sub Command1_Click()
Dim bstrDoc As String
bstrDoc = "<Songs> " & _
"<Song>One More Day</Song>" & _
"<Song>Hard Habit to Break</Song>" & _
"<Song>Forever</Song>" & _
"<Song>Boys of Summer</Song>" & _
"<Song>Cherish</Song>" & _
"<Song>Dance</Song>" & _
"<Song>I Will Always Love You</Song>" & _
"</Songs>"
Dim xmlDoc As New DOMDocument30
If Len(Text1.Text) = 0 Then
Text1.Text = bstrDoc
End If
If Not xmlDoc.loadXML(Text1.Text) Then
MsgBox "Error loading document"
Else
Dim oNodes As IXMLDOMNodeList
Dim oNode As IXMLDOMNode
If Len(Text2.Text) = 0 Then
Text2.Text = "//Song"
End If
Set oNodes = xmlDoc.selectNodes(Text2.Text)
For Each oNode In oNodes
If Not (oNode Is Nothing) Then
sName = oNode.nodeName
sData = oNode.xml
MsgBox "Node <" + sName + ">:" _
+ vbNewLine + vbTab + sData + vbNewLine
End If
Next
Set xmlDoc = Nothing
End If
End Sub
We begin by instantiating a DOMDocument object, then call its loadXML
method to parse the XML document and load it into the DOM tree. We call
its selectNodes method to query it via XPath. The selectNodes method returns
a node list object, which we then iterate through using For Each. In
this case, we display each node name followed by its contents via VB’s Msg-
Box function. We’re able to access and manipulate the document as though
it were an object because that’s exactly what it isparsing an XML document
via DOM turns the document into a memory object that you can then
work with just as you would any other object.
SAX, by contrast, is an event-driven API. You process an XML document
via SAX by configuring your application to respond to SAX events.
As the SAX processor reads through an XML document, it raises events
each time it encounters something the calling application should know
about, such as an element starting or ending, an attribute starting or ending, and so on. It passes the relevant data about the event to the application’s
handler for the event. The application can then decide what to do in
responseit could store the event data in some type of tree structure, as
is the case with DOM processing; it could ignore the event; it could
search the event data for something in particular; or it could take some
other action. Once the event is handled, the SAX processor continues
reading the document. At no point does it persist the document in memory
as DOM does. It’s really just a parsing mechanism to which an application
can attach its own functionality. In fact, SAX is the underlying parsing
mechanism for MSXML’s DOM processor. Microsoft’s DOM implementation
sets up SAX event handlers that simply store the data handed to them
by the SAX engine in a DOM tree.
As you’ve probably surmised by now, SAX consumes far less memory
than DOM does. That said, it’s also much more trouble to set up and use.
By persisting documents in memory, the DOM API makes working with
XML documents as easy as working with any other kind of object.
SQL Server uses MSXML and the DOM to process documents you
load via sp_xml_preparedocument. It restricts the virtual memory MSXML
can use for DOM processing to one-eighth of the physical memory on the
machine or 500MB, whichever is less. In actual practice, it’s highly unlikely
that MSXML would be able to access 500MB of virtual memory, even on a
machine with 4GB of physical memory. The reason for this is that, by default,
SQL Server reserves most of the user mode address space for use by
its buffer pool. You’ll recall that we talked about the MemToLeave space in
Chapter 11 and noted that the non–thread stack portion defaults to 256MB
on SQL Server 2000. This means that, by default, MSXML won’t be able to
use more than 256MB of memory—and probably considerably less given
that other things are also allocated from this region—regardless of the
amount of physical memory on the machine.
The reason MSXML is limited to no more than 500MB of virtual memory
use regardless of the amount of memory on the machine is that SQL
Server calls the GlobalMemoryStatus Win32 API function to determine the
amount of available physical memory. GlobalMemoryStatus populates a
MEMORYSTATUS structure with information about the status of memory
use on the machine. On machines with more than 4GB of physical memory,
GlobalMemoryStatus can return incorrect information, so Windows returns
a -1 to indicate an overflow. The Win32 API function GlobalMemoryStatusEx
exists to address this shortcoming, but SQLXML does not call it. You can see
this for yourself by working through the following exercise.
Exercise 18.1: Determining How MSXML Computes Its Memory
Ceiling
Restart your SQL Server, preferably from a console since we will be attaching
to it with WinDbg. This should be a test or development system,
and, ideally, you should be its only user.
Start Query Analyzer and connect to your SQL Server.
Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe
from the list of running tasks; if you have multiple instances, be sure to
select the right one.)
At the WinDbg command prompt, add the following breakpoint:
bp kernel32!GlobalMemoryStatus
Once the breakpoint is added, type g and hit Enter to allow SQL Server
to run.
Next, return to Query Analyzer and run the following query:
declare @doc varchar(8000)
set @doc='
<Songs>
<Song name="She''s Like the Wind" artist="Patrick Swayze"/>
<Song name="Hard to Say I''m Sorry" artist="Chicago"/>
<Song name="She Loves Me" artist="Chicago"/>
<Song name="I Can''t Make You Love Me" artist="Bonnie Raitt"/>
<Song name="Heart of the Matter" artist="Don Henley"/>
<Song name="Almost Like a Song" artist="Ronnie Milsap"/>
<Song name="I''ll Be Over You" artist="Toto"/>
</Songs>
'
declare @hDoc int
exec sp_xml_preparedocument @hDoc OUT, @doc
The first time you parse an XML document using sp_xml_preparedocument,
SQLXML calls GlobalMemoryStatus to retrieve the amount
of physical memory in the machine, then calls an undocumented function
exported by MSXML to restrict the amount of virtual memory it
may allocate. (I had you restart your server so that we’d be sure to go
down this code path.) This undocumented MSXML function is exported
by ordinal rather than by name from the MSXMLn.DLL and was added
to MSXML expressly for use by SQL Server.
At this point, Query Analyzer should appear to be hung because your
breakpoint has been hit in WinDbg and SQL Server has been stopped.
Switch back to WinDbg and type kv at the command prompt to dump
the call stack of the current thread. Your stack should look something
like this (I’ve omitted everything but the function names):
You’ll recall from Chapter 3 that we discovered that the entry point
for T-SQL batch execution within SQL Server is language_exec. You
can see the call to language_exec at the bottom of this stackthis
was called when you submitted the T-SQL batch to the server to run.
Working upward from the bottom, we can see the call to SpXmlPrepareDocument,
the internal "spec proc" (an extended procedure implemented
internally by the server rather than in an external DLL)
responsible for implementing the sp_xml_preparedocument xproc.
We can see from there that SpXmlPrepareDocument calls LoadXMLDocument,
LoadXMLDocument calls a method named Load, Load
calls a method named DoLoad, and DoLoad calls GlobalMemoryStatus.
So, that’s how we know how MSXML computes the amount of
physical memory in the machine, and, knowing the limitations of this
function, that’s how we know the maximum amount of virtual memory
MSXML can use.
Type q and hit Enter to quit WinDbg. You will have to restart your SQL
Server.
FOR XML
Despite MSXML’s power and ease of use, SQL Server doesn’t leverage
MSXML in all of its XML features. It doesn’t use it to implement serverside
FOR XML queries, for example, even though it’s trivial to construct a
DOM document programmatically and return it as text. MSXML has facilities
that make this quite easy. For example, Listing 18.2 presents a Visual
Basic app that executes a query via ADO and constructs a DOM document
on-the-fly based on the results it returns.
Listing 18.2
Private Sub Command1_Click()
Dim xmlDoc As New DOMDocument30
Dim oRootNode As IXMLDOMNode
Set oRootNode = xmlDoc.createElement("Root")
Set xmlDoc.documentElement = oRootNode
Dim oAttr As IXMLDOMAttribute
Dim oNode As IXMLDOMNode
Dim oConn As New ADODB.Connection
Dim oComm As New ADODB.Command
Dim oRs As New ADODB.Recordset
oConn.Open (Text3.Text)
oComm.ActiveConnection = oConn
oComm.CommandText = Text1.Text
Set oRs = oComm.Execute
Dim oField As ADODB.Field
While Not oRs.EOF
Set oNode = xmlDoc.createElement("Row")
For Each oField In oRs.Fields
Set oAttr = xmlDoc.createAttribute(oField.Name)
oAttr.Value = oField.Value
oNode.Attributes.setNamedItem oAttr
Next
oRootNode.appendChild oNode
oRs.MoveNext
Wend
oConn.Close
Text2.Text = xmlDoc.xml
Set xmlDoc = Nothing
Set oRs = Nothing
Set oComm = Nothing
Set oConn = Nothing
End Sub
As you can see, translating a result set to XML doesn’t require much
code. The ADO Recordset object even supports being streamed directly to
an XML document (via its Save method), so if you don’t need complete control
over the conversion process, you might be able to get away with even
less code than in my example.
As I’ve said, SQL Server doesn’t use MSXML or build a DOM document
in order to return a result set as XML. Why is that? And how do we
know that it doesn’t use MSXML to process server-side FOR XML queries?
I’ll answer both questions in just a moment.
The answer to the first question should be pretty obvious. Building a
DOM from a result set before returning it as text would require SQL Server
to persist the entire result set in memory. Given that the memory footprint
of the DOM version of an XML document is roughly three to five times as
large as the document itself, this doesn’t paint a pretty resource usage picture.
If they had to first be persisted entirely in memory before being returned
to the client, even moderately large FOR XML result sets could use
huge amounts of virtual memory (or run into the MSXML memory ceiling
and therefore be too large to generate).
To answer the second question, let’s again have a look at SQL Server
under a debugger.
Exercise 18.2: Determining Whether Server-Side FOR XML
Uses MSXML
Restart your SQL Server, preferably from a console since we will be attaching
to it with WinDbg. This should be a test or development system,
and, ideally, you should be its only user.
Start Query Analyzer and connect to your SQL Server.
Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe
from the list of running tasks; if you have multiple instances, be sure to
select the right one.) Once the WinDbg command prompt appears, type
g and press Enter so that SQL Server can continue to run.
Back in Query Analyzer, run a FOR XML query of some type:
SELECT * FROM (
SELECT 'Summer Dream' as Song
UNION
SELECT 'Summer Snow'
UNION
SELECT 'Crazy For You'
) s FOR XML AUTO
This query unions some SELECT statements together, then queries
the union as a derived table using a FOR XML clause.
After you run the query, switch back to WinDbg. You will likely see some
ModLoad messages in the WinDbg command window. WinDbg displays
a ModLoad message whenever a module is loaded into the process being
debugged. If MSXMLn.DLL were being used to service your FOR
XML query, you’d see a ModLoad message for it. As you’ve noticed,
there isn’t one. MSXML isn’t used to service FOR XML queries.
If you’ve done much debugging, you may be speculating that perhaps
the MSXML DLL is already loaded; hence, we wouldn’t see a ModLoad
message for it when we ran our FOR XML query. That’s easy enough to
check. Hit Ctrl+Break in the debugger, then type lm in the command
window and hit Enter. The lm command lists the modules currently
loaded into the process space. Do you see MSXMLn.DLL in the list?
Unless you’ve been interacting with SQL Server’s other XML features
since you recycled your server, it should not be there. Type g in the
command window and press Enter so that SQL Server can continue
to run.
As a final test, let’s force MSXMLn.DLL to load by parsing an XML document.
Reload the query from Exercise 18.1 above in Query Analyzer
and run it. You should see a ModLoad message for MSXML’s DLL in
the WinDbg command window.
Hit Ctrl+Break again to stop WinDbg, then type q and hit Enter to stop
debugging. You will need to restart your SQL Server.
So, based on all this, we can conclude that SQL Server generates its own
XML when it processes a server-side FOR XML query. There is no memory-efficient mechanism in MSXML to assist with this, so it is not used.
USING FOR XML
As you saw in Exercise 18.2, you can append FOR XML AUTO to the end
of a SELECT statement in order to cause the result to be returned as an
XML document fragment. Transact-SQL’s FOR XML syntax is much richer
than this, though—it supports several options that extend its usefulness in
numerous ways. In this section, we’ll discuss a few of these and work
through examples that illustrate them.
SELECT…FOR XML (Server-Side)
As I’m sure you’ve already surmised, you can retrieve XML data from SQL
Server by using the FOR XML option of the SELECT command. FOR
XML causes SELECT to return query results as an XML stream rather
than a traditional rowset. On the server-side, this stream can have one of
three formats: RAW, AUTO, or EXPLICIT. The basic FOR XML syntax
looks like this:
SELECT column list
FROM table list
WHERE filter criteria
FOR XML RAW | AUTO | EXPLICIT [, XMLDATA] [, ELEMENTS]
[, BINARY BASE64]
RAW returns column values as attributes and wraps each row in a generic
row element. AUTO returns column values as attributes and wraps each row
in an element named after the table from which it came.1 EXPLICIT lets you
completely control the format of the XML returned by a query.
XMLDATA causes an XML-Data schema to be returned for the document
being retrieved. ELEMENTS causes the columns in XML AUTO
data to be returned as elements rather than attributes. BINARY BASE64
specifies that binary data is to be returned using BASE64 encoding.
I’ll discuss these options in more detail in just a moment. Also note that
there are client-side specific options available with FOR XML queries that
aren’t available in server-side queries. We’ll talk about those in just a moment,
too.
RAW Mode
RAW mode is the simplest of the three basic FOR XML modes. It performs
a very basic translation of the result set into XML. Listing 18.3 shows
an example.
Listing 18.3
SELECT CustomerId, CompanyName
FROM Customers FOR XML RAW
(Results abridged)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<row CustomerId="ALFKI" CompanyName="Alfreds Futterkiste"/><row Cu
CompanyName="Ana Trujillo Emparedados y helados"/><row CustomerId=
CompanyName="Antonio Moreno Taquería"/><row CustomerId="AROUT" Com
Horn"/><row CustomerId="BERGS" CompanyName="Berglunds snabbköp"/><
CustomerId="BLAUS" CompanyName="Blauer See Delikatessen"/><row Cus
CompanyName="Blondesddsl p_re et fils"/><row CustomerId="WELLI"
CompanyName="Wellington Importadora"/><row CustomerId="WHITC" Comp
Clover Markets"/><row CustomerId="WILMK" CompanyName="Wilman Kala"
CustomerId="WOLZA"
CompanyName="Wolski Zajazd"/>
Each column becomes an attribute in the result set, and each row becomes
an element with the generic name of row.
As I’ve mentioned before, the XML that’s returned by FOR XML is not
well formed because it lacks a root element. It’s technically an XML fragment
and must include a root element in order to be usable by an XML
parser. From the client side, you can set an ADO Command object’s xml
root property in order to automatically generate a root node when you execute
a FOR XML query.
AUTO Mode
FOR XML AUTO gives you more control than RAW mode over the XML
fragment that’s produced. To begin with, each row in the result set is named
after the table, view, or table-valued UDF that produced it. For example,
Listing 18.4 shows a basic FOR XML AUTO query.
Listing 18.4
SELECT CustomerId, CompanyName
FROM Customers FOR XML AUTO
(Results abridged)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI" CompanyName="Alfreds Futterkiste"/><
CustomerId="ANATR" CompanyName="Ana Trujillo Emparedados y helados
CustomerId="ANTON" CompanyName="Antonio Moreno Taquería"/><Custome
Henderson_book.fm Page 682 Thursday, September 25, 2003 5:23 AM
Using FOR XML 683
CustomerId="AROUT" CompanyName="Around the Horn"/><Customers Custo
CompanyName="Vins et alcools Chevalier"/><Customers CustomerId="WA
CompanyName="Wartian Herkku"/><Customers CustomerId="WELLI" Compan
Importadora"/><Customers CustomerId="WHITC" CompanyName="White Clo
Markets"/><Customers CustomerId="WILMK" CompanyName="Wilman Kala"/
CustomerId="WOLZA"
CompanyName="Wolski Zajazd"/>
Notice that each row is named after the table from whence it came:
Customers. For results with more than one row, this amounts to having
more than one top-level (root) element in the fragment, which isn’t allowed
in XML.
One big difference between AUTO and RAW mode is the way in which
joins are handled. In RAW mode, a simple one-to-one translation occurs between
columns in the result set and attributes in the XML fragment. Each
row becomes an element in the fragment named row. These elements are
technically empty themselves—they contain no values or subelements, only
attributes. Think of attributes as specifying characteristics of an element,
while data and subelements compose its contents. In AUTO mode, each
row is named after the source from which it came, and the rows from joined
tables are nested within one another. Listing 18.5 presents an example.
Listing 18.5
SELECT Customers.CustomerID, CompanyName, OrderId
FROM Customers JOIN Orders
ON (Customers.CustomerId=Orders.CustomerId)
FOR XML AUTO
I’ve formatted the XML fragment to make it easier to read—if you run
the query yourself from Query Analyzer, you’ll see an unformatted stream
of XML text.
Note the way in which the Orders for each customer are contained
within each Customer element. As I said, AUTO mode nests the rows returned
by joins. Note my use of the full table name in the join criterion.
Why didn’t I use a table alias? Because AUTO mode uses the table aliases
you specify to name the elements it returns. If you use shortened monikers
for a table, its elements will have that name in the resulting XML fragment.
While useful in traditional Transact-SQL, this makes the fragment difficult
to read if the alias isn’t sufficiently descriptive.
ELEMENTS Option
The ELEMENTS option of the FOR XML AUTO clause causes AUTO
mode to return nested elements instead of attributes. Depending on your
business needs, element-centric mapping may be preferable to the default
attribute-centric mapping. Listing 18.6 gives an example of a FOR XML
query that returns elements instead of attributes.
Listing 18.6
SELECT CustomerID, CompanyName
FROM Customers
FOR XML AUTO, ELEMENTS
Notice that the ELEMENTS option has caused what were being returned
as attributes of the Customers element to instead be returned as
subelements. Each attribute is now a pair of element tags that enclose the
value from a column in the table.
NOTE: Currently, AUTO mode does not support GROUP BY or aggregate functions.
The heuristics it uses to determine element names are incompatible
with these constructs, so you cannot use them in AUTO mode queries. Additionally,
FOR XML itself is incompatible with COMPUTE, so you can’t use it in
FOR XML queries of any kind.
EXPLICIT Mode
If you need more control over the XML than FOR XML produces, EXPLICIT
mode is more flexible (and therefore more complicated to use) than
either RAW mode or AUTO mode. EXPLICIT mode queries define XML
documents in terms of a "universal table"a mechanism for returning a result
set from SQL Server that describes what you want the document to look
like, rather than composing the document itself. A universal table is just a
SQL Server result set with special column headings that tell the server how
to produce an XML document from your data. Think of it as a set-oriented
method of making an API call and passing parameters to it. You use the facilities
available in Transact-SQL to make the call and pass it parameters.
A universal table consists of one column for each table column that you
want to return in the XML fragment, plus two additional columns: Tag and
Parent. Tag is a positive integer that uniquely identifies each tag that is to be
returned by the document; Parent establishes parent-child relationships between
tags.
The other columns in a universal table—the ones that correspond to the
data you want to include in the XML fragment—have special names that actually
consist of multiple segments delimited by exclamation points (!).
These special column names pass muster with SQL Server’s parser and provide
specific instructions regarding the XML fragment to produce. They
have the following format:
Element!Tag!Attribute!Directive
We’ll see some examples of these shortly.
The first thing you need to do to build an EXPLICIT mode query is to
determine the layout of the XML document you want to end up with. Once
you know this, you can work backward from there to build a universal table
that will produce the desired format. For example, let’s say we want a simple
customer list based on the Northwind Customers table that returns the
customer ID as an attribute and the company name as an element. The
XML fragment we’re after might look like this:
Listing 18.7 shows a Transact-SQL query that returns a universal table that
specifies this layout.
Listing 18.7
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1]
FROM Customers
(Results abridged)
T
Tag Parent Customers!1!CustomerId Customers!1
------ -------- ---------------------- ---------------------------
1 NULL ALFKI Alfreds Futterkiste
1 NULL ANATR Ana Trujillo Emparedados y
1 NULL ANTON Antonio Moreno Taquería
The first two columns are the extra columns I mentioned earlier. Tag
specifies an identifier for the tag we want to produce. Since we want to produce
only one element per row, we hard-code this to 1. The same is true of
Parentthere’s only one element and a top-level element doesn’t have a
parent, so we return NULL for Parent in every row.
Since we want to return the customer ID as an attribute, we specify an
attribute name in the heading of column 3 (bolded). And since we want to
return CompanyName as an element rather than an attribute, we omit the
attribute name in column 4.
By itself, this table accomplishes nothing. We have to add FOR XML
EXPLICIT to the end of it in order for the odd column names to have any
special meaning. Add FOR XML EXPLICIT to the query and run it from
Query Analyzer. Listing 18.8 shows what you should see.
Listing 18.8
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1]
FROM Customers
FOR XML EXPLICIT
As you can see, each CustomerId value is returned as an attribute, and
each CompanyName is returned as the element data for the Customers element,
just as we specified.
Directives
The fourth part of the multivalued column headings supported by EXPLICIT
mode queries is the directive segment. You use it to further control
how data is represented in the resulting XML fragment. As Table 18.2 illustrates,
the directive segment supports eight values.
Of these, element is the most frequently used. It causes data to be rendered
as a subelement rather than an attribute. For example, let’s say that,
in addition to CustomerId and CompanyName, we wanted to return ContactName
in our XML fragment and we wanted it to be a subelement rather
than an attribute. Listing 18.9 shows how the query would look.
Listing 18.9
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
ContactName AS [Customers!1!ContactName!element]
FROM Customers
FOR XML EXPLICIT
As you can see, ContactName is nested within each Customers element
as a subelement. The elements directive encodes the data it returns. We can
retrieve the same data by using the xml directive without encoding, as shown
in Listing 18.10.
Listing 18.10
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
Henderson_book.fm Page 689 Thursday, September 25, 2003 5:23 AM
690 Chapter 18 SQLXML
ContactName AS [Customers!1!ContactName!xml]
FROM Customers
FOR XML EXPLICIT
The xml directive (bolded) causes the column to be returned without
encoding any special characters it contains.
Establishing Data Relationships
Thus far, we’ve been listing the data from a single table, so our EXPLICT
queries haven’t been terribly complex. That would still be true even if we
queried multiple tables as long as we didn’t mind repeating the data from
each table in each top-level element in the XML fragment. Just as the column
values from joined tables are often repeated in the result sets of Transact-
SQL queries, we could create an XML fragment that contained data
from multiple tables repeated in each element. However, that wouldn’t be
the most efficient way to represent the data in XML. Remember: XML supports
hierarchical relationships between elements. You can establish these
hierarchies by using EXPLICIT mode queries and T-SQL UNIONs. Listing
18.11 provides an example.
Listing 18.11
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
NULL AS [Orders!2!OrderId],
NULL AS [Orders!2!OrderDate!element]
FROM Customers
UNION
SELECT 2 AS Tag,
1 AS Parent,
CustomerId,
NULL,
OrderId,
OrderDate
FROM Orders
ORDER BY [Customers!1!CustomerId], [Orders!2!OrderDate!element]
FOR XML EXPLICIT
This query does several interesting things. First, it links the Customers
and Orders tables using the CustomerId column they share. Notice the
third column in each SELECT statement—it returns the CustomerId column
from each table. The Tag and Parent columns establish the details of
the relationship between the two tables. The Tag and Parent values in the
second query link it to the first. They establish that Order records are children
of Customer records. Lastly, note the ORDER BY clause. It arranges
the elements in the table in a sensible fashion—first by CustomerId and
second by the OrderDate of each Order. Listing 18.12 shows the result set.
As you can see, each customer’s orders are nested within its element.
The hide Directive
The hide directive omits a column you’ve included in the universal table
from the resulting XML document. One use of this functionality is to order
the result by a column that you don’t want to include in the XML fragment.
When you aren’t using UNION to merge tables, this isn’t a problem because
you can order by any column you choose. However, the presence of
UNION in a query requires order by columns to exist in the result set. The
hide directive gives you a way to satisfy this requirement without being
forced to return data you don’t want to. Listing 18.13 shows an example.
Listing 18.13
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
PostalCode AS [Customers!1!PostalCode!hide],
NULL AS [Orders!2!OrderId],
NULL AS [Orders!2!OrderDate!element]
FROM Customers
UNION
SELECT 2 AS Tag,
1 AS Parent,
CustomerId,
NULL,
NULL,
OrderId,
OrderDate
FROM Orders
ORDER BY [Customers!1!CustomerId], [Orders!2!OrderDate!element],
[Customers!1!PostalCode!hide]
FOR XML EXPLICIT
Notice the hide directive (bolded) that’s included in the column 5 heading.
It allows the column to be specified in the ORDER BY clause without
actually appearing in the resulting XML fragment.
The cdata Directive
CDATA sections may appear anywhere in an XML document that character
data may appear. A CDATA section is used to escape characters that would
otherwise be recognized as markup (e.g., <, >, /, and so on). Thus CDATA
sections allow you to include sections in an XML document that might otherwise
confuse the parser. To render a CDATA section from an EXPLICIT
mode query, include the cdata directive, as demonstrated in Listing 18.14.
Listing 18.14
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
Fax AS [Customers!1!!cdata]
FROM Customers
FOR XML EXPLICIT
As you can see, each value in the Fax column is returned as a CDATA
section in the XML fragment. Note the omission of the attribute name in
the cdata column heading (bolded). This is because attribute names aren’t
allowed for CDATA sections. Again, they represent escaped document segments,
so the XML parser doesn’t process any attribute or element names
they may contain.
The id, idref, and idrefs Directives
The ID, IDREF, and IDFREFS data types can be used to represent relational
data in an XML document. Set up in a DTD or XML-Data schema,
they establish relationships between elements. They’re handy in situations
where you need to exchange complex data and want to minimize the
amount of data duplication in the document.
EXPLICIT mode queries can use the id, idref, and idrefs directives to
specify relational fields in an XML document. Naturally, this approach
works only if a schema is used to define the document and identify the columns
used to establish links between entities. FOR XML’s XMLDATA option
provides a means of generating an inline schema for its XML fragment.
In conjunction with the id directives, it can identify relational fields in the
XML fragment. Listing 18.15 gives an example.
Listing 18.15
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId!id],
CompanyName AS [Customers!1!CompanyName],
NULL AS [Orders!2!OrderID],
NULL AS [Orders!2!CustomerId!idref]
FROM Customers
UNION
SELECT 2,
NULL,
NULL,
NULL,
OrderID,
CustomerId
FROM Orders
ORDER BY [Orders!2!OrderID]
FOR XML EXPLICIT, XMLDATA
Note the use of the id and idref directives in the CustomerId columns
of the Customers and Orders tables (bolded). These directives link the two
tables by using the CustomerId column they share.
If you examine the XML fragment returned by the query, you’ll see that
it starts off with the XML-Data schema that the XMLDATA directive created.
This schema is then referenced in the XML fragment that follows.
Order Your SQL Fundamentals CD Today! Learn how to use SQL Server, understand Office integration techniques and dive into the essentials of SQL Express and Visual Basic with this free SQL Fundamentals CD.
You've Deployed SharePoint...Now What? This one-day free online conference delivers the technical knowledge needed to kick MOSS up a notch. In one information-packed day, independent SharePoint experts will present practical, real-world information and provide take-away, ready-to-use solutions
What Would You Do If You Ran Microsoft? ITTV's 2008 inaugural video contest, "If I Ran Microsoft..." is your chance to tell it like it is. Be goofy or be serious, but don"t miss this chance to have fun, win prizes, and go viral in a major way.
Maximize Your SharePoint Investment This web seminar discusses how true bi-directional replication of SharePoint content from one server to another enables branch offices to maintain access to current SharePoint content.