Data Transformation Services

Multiphase Data Pump Functionality

Data Transformation Services (DTS) provides advanced users the capability to add programs that customize the data pump at various phases of its operation. By customizing the operation of the data pump, you can add a wide range of functionality to a package. For example:

Data Pump Process

The multiphase data pump option allows you to access the data pump at several points during its operation and add functionality. When copying a row of data from source to a destination, the data pump follows this basic process:

After the data pump processes the last row of data, the task is finished and the data pump operation terminates.

Data Pump Phases

The following figure shows the data pump phases and how they map to the data flow.

In the figure, the data flow (detailed in the expanded, gray area of the figure):

Three additional data pump phases not directly tied to the processing of row data are shown in the figure: Pre Source, Post Source, and Pump Complete. Each of these phases covers events prior to or after the row-by-row copying of data, transformation, and commit (or rollback) process. For example, the Pre Source phase occurs before the first row of data is fetched. The Post Source phase occurs after the last row of data is processed, and the Pump Complete phase occurs at the end of the transformation task.

Accessing Data Pump Phases

To display the multiphase data pump options in DTS Designer, you must select an option in SQL Server Enterprise Manager. After you have selected the multiphase data pump option, the feature will remain accessible to any packages opened in DTS Designer, for any future sessions. By default, this option is not selected.

To activate the multiphase data pump feature

Enterprise Manager

Enterprise Manager

Configuring Data Pump Phases

By default, the data pump is set to operate so that only its Row Transform phase is available. That phase is what you configure when mapping column-level transformations in the Transform Data task, Data Driven Query task, and Parallel Data Pump task, without selecting a phase. Advanced users who want to add functionality to a package so that it supports any data pump phase can do so by:

Data Pump Phases

The following sections provide information on the data pump phases you can customize, either through ActiveX scripts or through a custom COM object.

Note  When writing multiple functions that access the same data pump phase, the return value from the last function is the one used. To preserve a return value from a prior phase function, you must return the value from the subsequent phase function.

Pre Source Phase

The pre source phase is executed before the first fetch of source data. The phase is executed one time for the entire data pump operation unless you use the DTSTransformStat_SkipFetch return code in your phase function, which creates a loop.

You can add a pre-source data pump function for a transformation to write header rows containing meta data information to a file, and initialize objects, connections, and memory for use in later data pump phases.

Row Transform Phase

This is the default data pump phase available through the Transformations mapping tab of the Transform Data task or the Data Driven Query task. The Main function placeholder supplied on the ActiveX Script Transformation Properties dialog box is the default entry point for adding custom scripting code for this phase. The Row Transform phase allows read access to source data and meta data and write access to destination data.

Post Row Transform Phase

The Post Row transform phase is executed after the row transform phase of the data pump and consists of the Transform Failure, Insert Success, and Insert Failure subphases. Of the subphases, listed below, Insert Success and Insert Failure are mutually exclusive; only one of those subphases can occur for a given row:

On Batch Complete

You can call this data pump phase on success or failure of a batch or rows, as defined by the value specified in Insert batch size in the Options tab of the Transform Data task. Setting a batch size for a Data Driven Query task or parallel data pump task can only be done programmatically; if you want to write an On batch complete function for either of those tasks, you must do so programmatically.

On Pump Complete

You can customize this phase at the end of the transformation task (after all rows have been processed). Use functions written to On pump complete to free up resources and commit data held in global variables throughout the lifetime of data pump. You cannot access the data through an On pump complete function.

Post Source Data

Access this phase to process the destination data after completion of the task. Unlike On pump complete functions, functions written to this phase allow you to access the destination data. Common uses of a post source data function include writing footer rows to a file, freeing up resources, and committing data held in global variables.

BODY>