XML Persistence Format

The XML format written into the stream is returned in UTF-8 format. In ADO, this is the only encoding used.

A quick look at the format will show that it is broken into two sections, a schema section followed by the data section. The following is an example XML file for the Shippers table from the Northwind database. Various parts of the XML are discussed following the example.

<xml xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" 
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" 
xmlns:rs="urn:schemas-microsoft-com:rowset" 
xmlns:z="#RowsetSchema"> 
  <s:Schema id="RowsetSchema"> 
    <s:ElementType name="row" content="eltOnly" rs:updatable="true"> 
      <s:AttributeType name="ShipperID" rs:number="1" 
        rs:basetable="shippers" rs:basecolumn="ShipperID"
        rs:keycolumn="true"> 
        <s:datatype dt:type="int" dt:maxLength="4" rs:precision="10" 
          rs:fixedlength="true" rs:maybenull="false"/> 
      </s:AttributeType> 
      <s:AttributeType name="CompanyName" rs:number="2" 
        rs:nullable="true" rs:write="true" rs:basetable="shippers" 
        rs:basecolumn="CompanyName"> 
        <s:datatype dt:type="string" dt:maxLength="40" /> 
      </s:AttributeType> 
      <s:AttributeType name="Phone" rs:number="3" rs:nullable="true" 
        rs:write="true" rs:basetable="shippers" 
        rs:basecolumn="Phone"> 
        <s:datatype dt:type="string" dt:maxLength="24"/> 
      </s:AttributeType> 
      <s:extends type="rs:rowbase"/> 
    </s:ElementType> 
  </s:Schema> 

  <rs:data> 
    <z:row ShipperID="1" CompanyName="Speedy Express" 
      Phone="(503) 555-9831"/> 
    <z:row ShipperID="2" CompanyName="United Package" 
      Phone="(503) 555-3199"/> 
    <z:row ShipperID="3" CompanyName="Federal Shipping" 
      Phone="(503) 555-9931"/> 
  </rs:data> 
</xml>

The schema shows the declarations of namespaces, the schema section, and the data section. The schema section contains definitions for row, ShipperID, CompanyName, and Phone.

Schema definitions conform to the XML-Data specification and are able to be fully validated (though validation will not occur in Internet Explorer 5). You can view this specification at http://www.w3.org/TR/1998/NOTE-XML-data/. XML-Data is the only supported schema format for Recordset persistence in the current release.

The data section has three rows containing information about shippers. For an empty rowset, the data section may be empty, but the <rs:data> tags must be present. With no data, you could write the tag shorthand as simply <rs:data/>. Any tag prefixed with "rs" indicates that it is in the namespace defined by urn:schemas-microsoft-com:rowset. The full definition of this schema is defined in the appendix to this document.

Namespaces

The XML persistence format in ADO uses the following four namespaces.

Prefix	Description
s	Refers to the "XML-Data" namespace containing the elements and attributes that define the schema of the current Recordset.
dt	Refers to the data type definitions specification.
rs	Refers to the namespace containing elements and attributes specific to ADO Recordset properties and attributes.
z	Refers to the schema of the current rowset.

A client should not add its own tags to these namespaces as defined by the specification. For example, a client should not define a namespace as "urn:schemas-microsoft-com:rowset" and then write out something like "rs:MyOwnTag." To learn more about namespaces, see http://www.w3.org/TR/REC-xml-names/.

Important The id for the schema tag must be "RowsetSchema" and the namespace used to refer to the schema of the current rowset must point to "#RowsetSchema."

Note that the prefix of the namespace, that part to the right of the colon and to the left of the equal sign, is arbitrary.

xmlns:rs="urn:schemas-microsoft-com:rowset"

The user can define this to be any name as long as this name is consistently used throughout the XML document. ADO always writes out "s", "rs", "dt", and "z," but these prefix names are not hard coded into the loading component.

Schema Section

The schema section is required. As the previous sample shows, ADO writes out detailed metadata about each column to preserve the semantics of the data values as much as possible for updating. However, to load in the XML, ADO only requires the names of the columns and the rowset to which they belong. Here is an example of a minimal schema:

<xml xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882"
    xmlns:rs="urn:schemas-microsoft-com:rowset"
    xmlns:z="#RowsetSchema">
  <s:Schema id="RowsetSchema">
    <s:ElementType name="row" content="eltOnly">
      <s:AttributeType name="ShipperID"/>
      <s:AttributeType name="CompanyName"/>
      <s:AttributeType name="Phone"/>
      <s:Extends type="rs:rowbase"/>
    </s:ElementType>
  </s:Schema>
  <rs:data>
...
  </rs:data>
</xml>

In the case above, ADO will treat the data as variable length strings since no type information is included in the schema.

Creating Aliases for Column Names

The rs:name attribute allows you to creat and alias for a column name so that a friendly name may appear in the column information exposed by the rowset but a shorter (or less friendly) name may be used in the data section. For example, the schema above could be modified to map ShipperID to s1, CompanyName to s2, and Phone to s3 as follows:

<s:Schema id="RowsetSchema"> 
<s:ElementType name="row" content="eltOnly" rs:updatable="true"> 
<s:AttributeType name="s1" rs:name="ShipperID" rs:number="1" ...> 
...
</s:AttributeType> 
<s:AttributeType name="s2" rs:name="CompanyName" rs:number="2" ...> 
...
</s:AttributeType> 
<s:AttributeType name="s3" rs:name="Phone" rs:number="3" ...> 
...
</s:AttributeType> 
...
</s:ElementType> 
</s:Schema>

Then, in the data section, the row would use the name attribute, not rs:name, to refer to that column:

"<row s1="1" s2="Speedy Express" s3="(503) 555-9831"/>

Creating aliases for column names is required whenever a column name is not a legal attribute or tag name in XML. For example, "Last Name" must have an alias because names with embedded spaces are illegal attributes. The following line won't be correctly handled by the XML parser, so you must create an alias to some other name that does not have an embedded space.

<row last name="Jones"/>

Whatever value you use for the name attribute (not rs:name) must be used consistently in each place that column is referenced in both the schema and data sections of the XML document. The following example shows the consistent use of "s1":

<s:Schema id="RowsetSchema">
  <s:ElementType name="row" content="eltOnly">
    <s:attribute type="s1"/>
    <s:attribute type="CompanyName"/>
    <s:attribute type="s3"/>
    <s:extends type="rs:rowbase"/>
  </s:ElementType>
  <s:AttributeType name="s1" rs:name="ShipperID" rs:number="1" 
    rs:maydefer="true" rs:writeunknown="true">
    <s:datatype dt:type="i4" dt:maxLength="4" rs:precision="10" 
      rs:fixedlength="true" rs:maybenull="true"/>
  </s:AttributeType>
</s:Schema>
<rs:data>
  <z:row s1="1" CompanyName="Speedy Express" s3="(503) 555-9831"/>
</rs:data>

Similarly, because there is no alias defined for CompanyName above, CompanyName must be used consistently throughout the document.

Data Types

You can apply a data type to a column with the dt:type attribute. For the definitive guide to allowable XML types, please see http://www.w3.org/TR/1998/NOTE-XML-data-0105/#Datatypes. You can specify a data type in two ways: Either specify the dt:type attribute directly on the column definition itself, or use the s:datatype construct as a nested element of the column definition. For example,

<s:AttributeType name="Phone" >
  <s:datatype dt:type="string"/>
</s:AttributeType>

is equivalent to

<s:AttributeType name="Phone" dt:type="string"/>

If you omit the dt:type attribute entirely from the row definition, by default the column's type will be a variable length string.

If you have more type information than simply the type name (for example, dt:maxLength) it makes it more readable to use the s:datatype child element. This is merely a convention, however, and not a requirement.

The following examples show further how to include type information in your schema:

<!-- 1. String with no max length -->
<s:AttributeType name="title_id"/>
<!—or -->
<s:AttributeType name="title_id" dt:type="string"/>

<!—- 2. Fixed length string with max length of 6 -->
<s:AttributeType name="title_id">
    <s:datatype dt:type="string" dt:maxLength="6" rs:fixedlength="true" />
</s:AttributeType>

<!—- 3. Variable length string with max length of 6 -->
<s:AttributeType name="title_id">
    <s:datatype dt:type="string" dt:maxLength="6" />
</s:AttributeType>

<!—- 4. Integer -->
<s:AttributeType name="title_id" dt:type="int"/>

There is a subtle use of the rs:fixedlength attribute in the second example. A column with the rs:fixedlength attribute set to "true" means that the data must have the length defined in the schema. In this case, a legal value for title_id is "123456," as is "123 ." However, "123" would not be valid because its length is 3, not 6. See the OLE DB specification for a more complete description of the fixedlength property.

Handling Nulls

Null values are handled by the rs:maybenull attribute. If this attribute is set to true, the contents of the column may contain a null value. Furthermore, if the column is not found in a row of data, the user reading the data back from the rowset will get a null status back from IRowset::GetData(). Consider the following column definitions from the Shippers table:

<s:AttributeType name="ShipperID">
  <s:datatype dt:type="int" dt:maxLength="4"/>
</s:AttributeType>
<s:AttributeType name="CompanyName">
  <s:datatype dt:type="string" dt:maxLength="40" rs:maybenull="true"/>
</s:AttributeType>

The definition allows CompanyName to be null, but ShipperID cannot contain a null value. If the data section contained the following row, the Persistence Provider would set the status of data for the CompanyName column to DBSTATUS_S_ISNULL:

<z:row ShipperID="1"/>

If the row was entirely empty, as follows, the Persistence Provider would return a status of DBSTATUS_E_UNAVAILABLE for ShipperID and DBSTATUS_S_ISNULL for CompanyName.

<z:row/>

Note that a zero-length string is not the same as null.

<z:row ShipperID="1" CompanyName=""/>

For the previous row, the Persistence Provider will return DBSTATUS_S_OK for both columns. The CompanyName in this case is simply "" (a zero-length string).

For further information about the OLE DB constructs available for use within the schema of an XML document for OLE DB, please refer to the definition of "urn:schemas-microsoft-com:rowset" and the OLE DB specification.

Data Section

The data section defines the data of the rowset along with any pending updates, insertions, or deletions. The data section may contain zero or more rows. The data section may only contain data from one rowset where the row is defined by the schema. Also as noted above, columns without any data may be omitted. If an attribute or sub-element is used in the data section and that construct has not been defined in the schema section, it is silently ignored.

String

Reserved XML characters in text data must be replaced with appropriate character entities. For example, in the company name "Joe's Garage," the single quote character must be replaced by an entity. The actual row would look like:

<z:row CompanyName="Joe&apos;s Garage"/>

The following characters are reserved in XML and must be replaced by character entities: {',",&,<,>}.

Binary

Binary data is bin.hex encoded (that is, one byte maps to two characters, one character per nibble).

DateTime

The variant VT_DATE format is not directly supported by XML-Data data types. The correct format for dates with both a data and time component is yyyy-mm-ddThh:mm:ss.

For more information about date formats specified by XML, please refer to http://www.w3.org/TR/1998/NOTE-XML-data-0105/#Datatypes.

When the XML-Data specification defines two equivalent data types (for example, i4 == int), ADO will write out the friendly name but read in both.

Managing Pending Changes

A Recordset can be opened in immediate or batch update mode. When opened in batch update mode with client-side cursors, all changes made to the Recordset are in a pending state until the UpdateBatch method is called. Pending changes are also persisted when the Recordset is saved. In XML, they are represented by the use of the "update" elements defined in urn:schemas-microsoft-com:rowset. In addition, if a rowset can be updated, the updateable property must be set to true in the definition of the row. For example, to define that the Shippers table contains pending changes, the row definition would look like the following:

<s:ElementType name="row" content="eltOnly" updatable="true">
  <s:attribute type="ShipperID"/>
  <s:attribute type="CompanyName"/>
  <s:attribute type="Phone"/>
  <s:extends type="rs:rowbase"/>
</s:ElementType>

This tells the Persistence Provider to surface a rowset that supports IRowsetUpdate and IRowsetChange so that ADO can construct an updateable Recordset object.

The following sample data shows how insertions, changes, and deletions look in the persisted file:

<rs:data>
  <z:row ShipperID="2" CompanyName="United Package" 
    Phone="(503) 555-3199"/>
<rs:update>
  <rs:original>
    <z:row ShipperID="3" CompanyName="Federal Shipping" 
      Phone="(503) 555-9931"/>
  </rs:original>
  <z:row Phone="(503) 552-7134"/>
</rs:update>
<rs:insert>
  <z:row ShipperID="12" CompanyName="Lightning Shipping" 
    Phone="(505) 111-2222"/>
  <z:row ShipperID="13" CompanyName="Thunder Overnight" 
    Phone="(505) 111-2222"/>
  <z:row ShipperID="14" CompanyName="Blue Angel Air Delivery" 
    Phone="(505) 111-2222"/>
</rs:insert>
<rs:delete>
  <z:row ShipperID="1" CompanyName="Speedy Express" Phone="(503) 555-9831"/>
</rs:delete>
</rs:data>

An update always contains the entire original row data followed by the changed row data. The changed row may contain all of the columns or only those columns that have actually changed. In the previous example, the row for Shipper 2 is not changed, while only the Phone column has changed values for Shipper 3 and is therefore the only column included in the changed row. The inserted rows for Shippers 12, 13, and 14 are batched together under one rs:insert tag. Note that deleted rows may also be batched together, though not shown above.

Hierarchical Recordsets in XML

ADO allows persistence of hierarchical Recordset objects into XML. With hierarchical Recordset objects, the value of a field in the parent Recordset is another Recordset. Such fields are represented as child elements in the XML stream rather than an attribute. The following example demonstrates this case:

Rs.Open "SHAPE {select stor_id, stor_name, state from stores} APPEND ({select stor_id, ord_num, ord_date, qty from sales} AS rsSales RELATE stor_id TO stor_id)", "Provider=MSDataShape;DSN=pubs;UID=sa;PWD=;"

The following is the XML format of the persisted Recordset:

<xml xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882"     xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"     xmlns:rs="urn:schemas-microsoft-com:rowset" 
    xmlns:z="#RowsetSchema"> 
  <s:Schema id="RowsetSchema"> 
    <s:ElementType name="row" content="eltOnly" rs:updatable="true"> 
      <s:AttributeType name="stor_id" rs:number="1" 
        rs:writeunknown="true"> 
        <s:datatype dt:type="string" dt:maxLength="4" 
          rs:fixedlength="true" rs:maybenull="false"/> 
      </s:AttributeType> 
      <s:AttributeType name="stor_name" rs:number="2" rs:nullable="true" 
        rs:writeunknown="true"> 
          <s:datatype dt:type="string" dt:maxLength="40"/> 
      </s:AttributeType> 
      <s:AttributeType name="state" rs:number="3" rs:nullable="true" 
        rs:writeunknown="true"> 
        <s:datatype dt:type="string" dt:maxLength="2" 
          rs:fixedlength="true"/> 
      </s:AttributeType> 
      <s:ElementType name="rsSales" content="eltOnly" 
        rs:updatable="true" rs:relation="010000000100000000000000"> 
        <s:AttributeType name="stor_id" rs:number="1" 
          rs:writeunknown="true"> 
          <s:datatype dt:type="string" dt:maxLength="4" 
            rs:fixedlength="true" rs:maybenull="false"/> 
        </s:AttributeType> 
        <s:AttributeType name="ord_num" rs:number="2" 
          rs:writeunknown="true"> 
          <s:datatype dt:type="string" dt:maxLength="20" 
            rs:maybenull="false"/> 
        </s:AttributeType> 
        <s:AttributeType name="ord_date" rs:number="3" 
          rs:writeunknown="true"> 
            <s:datatype dt:type="dateTime" dt:maxLength="16" 
              rs:scale="3" rs:precision="23" rs:fixedlength="true" 
              rs:maybenull="false"/> 
        </s:AttributeType> 
        <s:AttributeType name="qty" rs:number="4" rs:writeunknown="true"> 
          <s:datatype dt:type="i2" dt:maxLength="2" rs:precision="5" 
            rs:fixedlength="true" rs:maybenull="false"/> 
        </s:AttributeType> 
        <s:extends type="rs:rowbase"/> 
      </s:ElementType> 
      <s:extends type="rs:rowbase"/> 
    </s:ElementType> 
  </s:Schema> 
  <rs:data> 
    <z:row stor_id="6380" stor_name="Eric the Read Books" state="WA"> 
      <rsSales stor_id="6380" ord_num="6871" 
        ord_date="1994-09-14T00:00:00" qty="5"/> 
      <rsSales stor_id="6380" ord_num="722a" 
        ord_date="1994-09-13T00:00:00" qty="3"/> 
    </z:row> 
    <z:row stor_id="7066" stor_name="Barnum's" state="CA"> 
      <rsSales stor_id="7066" ord_num="A2976" 
        ord_date="1993-05-24T00:00:00" qty="50"/> 
      <rsSales stor_id="7066" ord_num="QA7442.3" 
        ord_date="1994-09-13T00:00:00" qty="75"/> 
    </z:row> 
    <z:row stor_id="7067" stor_name="News & Brews" state="CA"> 
      <rsSales stor_id="7067" ord_num="D4482" 
        ord_date="1994-09-14T00:00:00" qty="10"/> 
      <rsSales stor_id="7067" ord_num="P2121" 
        ord_date="1992-06-15T00:00:00" qty="40"/> 
      <rsSales stor_id="7067" ord_num="P2121" 
        ord_date="1992-06-15T00:00:00" qty="20"/> 
      <rsSales stor_id="7067" ord_num="P2121" 
        ord_date="1992-06-15T00:00:00" qty="20"/> 
    </z:row> 
...
  </rs:data> 
</xml>

The exact order of the columns in the parent Recordset is not obvious when it is persisted in this manner. Any field in the parent may contain a child Recordset. The Persistence Provider persists out all scalar columns first as attributes and then persists out all child Recordset "columns" as child elements of the parent row. The ordinal position of the field in the parent Recordset can be obtained by looking at the schema definition of the Recordset. Every field has an OLE DB property, rs:number, defined in the Recordset schema namespace that contains the ordinal number for that field.

The names of all fields in the child Recordset are concatenated with the name of the field in the parent Recordset that contains this child. This is to ensure that there are no name collisions in cases where parent and child Recordsets both contain a field that is obtained from two different tables but is named singularly.

When saving hierarchical Recordsets into XML, you should be aware of the following restrictions in ADO:

A hierarchical Recordset with pending updates cannot be persisted into XML.
A hierarchical Recordset created with a parameterized shape command cannot be persisted (in either XML or ADTG format).
ADO currently saves the relationship between the parent and the child Recordsets as a binary large object (BLOB). XML tags to describe this relationship have not yet been defined in the rowset schema namespace.
When a hierarchical Recordset is saved, all child Recordsets are saved along with it. If the current Recordset is a child of another Recordset, then its parent is not saved. All child Recordsets that form the subtree of the current Recordset on are saved.

When a hierarchical Recordset is reopened from its XML-persisted format, you must be aware of the following limitations:

If the child record contains records for which there are no corresponding parent records, these rows are not written out in the XML representation of the hierarchical Recordset. Thus, these rows will be lost when the Recordset is reopened from its persisted location.
If a child record has references to more than one parent record, then on reopening the Recordset, the child Recordset may contain duplicate records. However, these duplicates will only be visible if the user works directly with the underlying child rowset. If a chapter is used to navigate the child Recordset (that is the only way to navigate through ADO) then the duplicates are not visible.

Recordset Properties in XML

The following Recordset properties are currently persisted into the XML format:

Update Resync
Unique Table
Unique Schema
Unique Catalog
Resync Command
IRowsetChange
IRowsetUpdate
CommandTimeout
BatchSize
UpdateCriteria
Reshape Name
AutoRecalc

These properties are saved in the schema section as attributes of the Element definition for the Recordset being persisted. These attributes are defined in the rowset schema namespace and must have the prefix "rs:".