One of the more advanced features of the data mining column structure is the ability to nest data mining columns. Data mining models use this nested column structure for both input and output data, as the syntax used to populate a data mining model with training data allows nested columns to be represented as subqueries. Data mining cases may not be easily described by using typical relational tables; a single case may depend on several groups of supporting information to perform predictive analysis. To illustrate this point, consider the case of a telephone company customer: A customer may have multiple telephone lines and multiple ISP accounts.
To retrieve all of the customer information, all of the telephone lines for each customer, and all of the ISP accounts for each customer, several approaches could be used:
All three of these approaches are ungainly, involve repetitive data and action, and are highly inefficient.
However, if a single column could hold a group of columns, you could construct a single query that would return one row per customer in the Customers table containing all of the columns in the Customers table, an additional column containing all of the Telephone Lines rows for a given customer, and an additional column containing all of the ISP Accounts rows for a given customer, as shown in the following diagram.
As the diagram shows, there is no redundant data for the customer in the returned rowset; one row per customer is all that is needed, and the nested columns of the rowset contain the data pertinent to that customer. Rowsets constructed in this fashion, referred to as hierarchical rowsets, are fully supported by OLE DB.
Case information for a data mining model may not reside in a single case table, but may have supporting tables supplying additional information to define the case. In the diagram, the Telephone Lines and ISP Accounts tables serve as supporting tables for the Customers case table. They provide additional information about the case, such as the number and type of ISP accounts the customer may possess, or the number of telephone lines used by the customer. The data mining model can take advantage of nested data mining columns to process this supporting information and create additional rules and patterns for the customer based on the data in the supporting tables.