Partitioning works well if the tables in the database are naturally divisible into similar partitions where most of the rows accessed by any SQL statement can be placed on the same member server. Tables are clustered in related units. For example, suppose the entry of an order references the Orders, Customers, and Parts tables, along with all tables that record the relationships between customers, orders, and parts. Partitions work best if all the rows in a logical cluster can be placed on the same member server.
Partitioning is most effective if the tables in a database can be partitioned symmetrically:
For example, suppose a company has divided North America into regions. Each employee works in one region, and customers make most of their purchases in the state or province where they live. The region and employee tables are partitioned along regions. Customers are partitioned between regions by their state or province. While some queries require data from multiple regions, the data needed for most queries is on the server for one region. Applications route SQL statements to the member server containing the region inferred from the context of the user input.
Although symmetric partitions are the ideal goal, most applications have complex data access patterns that prevent symmetrical partitioning. Asymmetric partitions result in some member servers assuming larger roles than others. For example, only some of the tables in a database may be partitioned, with the tables that have not been partitioned remaining on the original server. Asymmetric partitions can provide much of the performance of a symmetric partition, with these important benefits:
In an asymmetric approach, the original server usually retains some tables that did not fit the partitioning scheme. The performance of these remaining tables is usually faster than in the original system because the member tables move to member servers, reducing the load on the original server.
Many databases can be partitioned in more than one way. The specific partitions chosen for implementation must be those that best meet the requirements of the typical range of SQL statements executed by the business services tier.
You should also design the partitioning in a way that produces routing rules that applications can use to determine which member server can most effectively process each SQL statement. The business services tier must be able to match a piece of user data against the routing rules to find which member server processes the SQL statement.
There are four areas to consider when designing a set of distributed partitioned views to implement a federation of database servers:
Develop a list of the SQL statements executed by the application during typical processing periods. Divide the list into SELECT, UPDATE, INSERT, and DELETE categories, and order the list in each category by frequency of execution. If the SQL statements reference stored procedures, use the base SELECT, INSERT, UPDATE, and DELETE statements from the stored procedure. If you are partitioning an existing Microsoft® SQL Server™ 2000 database, you can use SQL Profiler to get such a list.
The recommendation for using the frequency of SQL statements is a reasonable approximation in the typical online transaction processing (OLTP) or Web site database in which distributed partitioned views work best. These systems are characterized by having individual SQL statements that retrieve relatively small amounts of data when compared to the types of queries in a decision support, or OLAP, system. When each SQL statement references a small amount of data, simply studying the frequency of each statement yields a reasonable approximation of the data traffic in the system. Many systems, however, have some group of SQL statements that reference large amounts of data. You may want to take the extra step of weighting these queries to reflect their larger data requirements.
The intent is to find clusters of tables that can be partitioned along the same dimension (for example, part number or department number) so that all the rows related to individual occurrences of that dimension will end up on the same member server. For example, you may determine that one way to partition your database is by region. To support this, even tables that do not have a region number in their key must be capable of being partitioned in some manner related to a region. In such a database, even when the Customer table does not have a region number column, if regions are defined as collections of whole states or provinces, then the Customer.StateProvince column can be used to partition the customers in a manner related to region.
Because they define the relationships between tables, explicit and implicit foreign keys are the prime elements to review in looking for ways to partition data. Study the explicit foreign key definitions to determine how queries would usually use rows in one table to find rows in another table. Also study implicit foreign keys, or ways that SQL statements use values in the rows of one table to reference rows from another table in join operations, even when there is no specific foreign key definition. Because implicit foreign keys are not explicitly defined as part of the database schema, you must review the SQL statements generated by the application to understand whether there are statements that join tables using nonkey columns. These implicit foreign keys are typically indexed to improve join performance, so you should also review the indexes defined in the database.
Select the partitioning that will best support the mix of SQL statements in your application. If some sets of tables can be partitioned in more than one way, use the frequency of SQL statements to determine which of the partitions satisfies the largest number of SQL statements. The tables most frequently referenced by SQL statements are the ones you want to partition first. Prioritize the sequence in which you partition the tables based on the frequency in which the tables are referenced.
The pattern of SQL statements also influences the decision on whether a table should be partitioned: