Administering SQL Server

Copying Data Between Different Collations

When bulk copying data using native or character format, bcp, by default, converts character data to:

This can cause the loss of extended or DBCS characters during the conversion between OEM and ANSI code pages. To prevent the loss of extended or DBCS characters, bcp can create data files using:

Unicode native format and Unicode character format convert character data to Unicode during the bulk copy, resulting in no loss of extended characters.

Using the -C (code page) switch, the bcp utility can create or read data files using the code page specified by the user. For example, to bulk copy the authors2 table in the pubs database to the Authors.txt data file using code page 850, execute from the command prompt:

bcp pubs..authors2 out authors.txt -c -C850 -Sservername -Usa -Ppassword

Alternatively, using the CODEPAGE clause, the BULK INSERT statement can read data files using the code page specified by the user. For example, to bulk copy the Authors.txt data file into the authors2 table in the pubs database using code page 850, execute from a query tool such as SQL Query Analyzer:

BULK INSERT pubs..authors2 FROM 'c:\authors.txt'
WITH (
   CODEPAGE = 850
)

The following are valid values for the code page.

Code page value Description
ACP Columns of char, varchar, or text data type are converted from the ANSI/Windows code page (ISO 1252) to the SQL Server code page when importing data to an instance of SQL Server, and vice versa when exporting data from an instance of SQL Server.
OEM (default) Columns of char, varchar, or text data type are converted from the system OEM code page to the SQL Server code page when importing data to an instance of SQL Server, and vice versa when exporting data from an instance of SQL Server.
RAW This is the fastest option because no conversion from one code page to another occurs.
<value> Specific code page number (for example, 850).

Column-level Collations

In SQL Server 2000, you can specify column-level collations for bulk copy operations. These collations define how character and Unicode data is stored in the specified columns of the data file.

Users and applications specify only the collation in which the data is stored in the data file. The bulk copy components perform internally any required translations between the data file collation and the collations of the source or destination columns in the database.

On a bulk copy out operation, the column and default collation specifications define the code pages used to build all SQLCHAR data in the resulting bulk copy data file. On a bulk copy in operation, the column and default collation specifications define the code pages used to read SQLCHAR data from the source data file.

If the SORTED hint is specified on a bulk copy in operation, the collations defined for any character and Unicode columns referenced in the SORTED hint define the expected sequence of the data.

On a bulk copy in operation, you must ensure that the collation specifications you make match the collations present in the bulk copy data file.

Format files in SQL Server 2000 support an eighth column in which you can provide a collation specification that defines how the data for that column is stored in the data file:

This is an example of a format file with column collations specified:

8.0
5
1  SQLCHAR  0   4  "/t" pub_id   1   "SQL_LATIN1_General_Cp1_CI_AS_KI_WI"
2  SQLCHAR  0  40  "/t" pub_name 2   "SQL_LATIN1_General_Cp850_BIN"
3  SQLCHAR  0  20  "/t" city     3   "RAW"
4  SQLCHAR  0   2  "/t" state    4   "RAW"
5  SQLCHAR  0  30  "/t" country  5   ""

Column collation specifications are ignored for columns that do not have SQLCHAR or SQLNCHAR specified as their host data type. Collations for SQLNCHAR columns are ignored on bulk copy out operations; they apply only to bulk copy in operations where the SQLNCHAR column is referenced in a SORTED hint. Collations apply to SQLCHAR columns on both in and out operations.

On a bulk copy out operation, the collation specification controls only the code page used to store character data in the bulk copy data file. It applies to:

The rules for determining which collation is used on a bulk copy out are:

On a bulk copy in operation, the collation specification controls:

For a bulk copy in operation, code page interpretation applies only to columns stored as SQLCHAR in a data file. All columns in a character mode data file are stored as SQLCHAR in a data file. It also applies to any column for which SQLCHAR is specified in a format file or using bcp_setcolfmt:

A bulk copy in operation also uses collations to properly interpret the ORDER bulk copy hint. This applies to both SQLCHAR and SQLNCHAR columns. The data in the columns referenced by a SORTED hint must be in the sequence defined by the collation mapped to those columns.

See Also

bcp Utility

BULK INSERT.

SetCodePage Method

Unicode Character Format

Unicode Native Format