CREATE MINING MODEL Statement

This statement creates a local data mining model on the client computer. You can create mining models from relational databases, PMML, or OLAP cubes.

BNF (CREATE MINING MODEL)

<dm_create>::=CREATE MINING MODEL <identifier> ( <col_def_list> ) USING <algorithm> [(<algo_param_list>)]

<pmml_create>::= CREATE MINING MODEL <identifier> FROM PMML <string>

<select_into>::= SELECT * INTO <identifier> USING <algorithm> FROM <identifier>

<col_def_list>::= <col_def> |<col_def_list> , <col_def>
<col_def>::= <col_def_reg> | <col_def_tbl>
<col_def_reg>::= <identifier> <col_type> [<col_distribution>] [<col_binary>] [<col_content>] [<col_content_qual>] [<col_qualif>] [<col_prediction>] [<relation_clause>]

<col_def_tbl> ::= <identifier> TABLE <col_prediction> ( <col_def_list> )

<algorithm> ::= MICROSOFT_DECISION_TREES | MICROSOFT_CLUSTERING

<algo_param>::= <identifier> = <value>

<algo_param_list>::=<algo_param>

| <algo_param>, <algo_param_list>

<col_type>::= LONG
         | BOOLEAN
         | TEXT
         | DOUBLE
         | DATE

<col_distribution>-> NORMAL
| UNIFORM

<col_binary>::= MODEL_EXISTENCE_ONLY
| NOT NULL

<col_content>::= DISCRETE
         | CONTINUOUS
         | DISCRETIZED( [<disc_method> [, <numeric_const>]] )
         | SEQUENCE_TIME

<disc_method>::= AUTOMATIC
         | EQUAL_AREAS
         | THRESHOLDS
         | CLUSTERS

<col_content_qual>-> ORDERED
| CYCLICAL

<col_prediction> -> PREDICT
| PREDICT_ONLY

<relation_clause> -> <related_to_clause>
| <of_clause>

<related_to_clause>-> RELATED TO <identifier>
| RELATED TO KEY

<of_clause>::= OF <identifier>
| OF KEY

BNF (CREATE OLAP MINING MODEL)

Use this syntax to create mining models that are based on OLAP cubes instead of on relational database tables. Each OLAP mining model contains one or more case dimensions and zero or more case measures. Columns within each case can be based on any object in the Dimension object model, such as a hierarchy, level, or property, or can be based upon the value of a measure. The flags that are used with each OLAP mining model column are the same as those used for relational mining models. OLAP mining models are trained in the same manner as relational mining models, using the same syntax.

<olap create statement> ::= CREATE OLAP MINING MODEL <dmm name>
FROM <cube name> <olap definition>
USING <dmm algorithm> [(dmm flag list)]

<olap definition> ::= CASE <olap dimension> [, <olap dimension list>] [, <olap measure list>]

<olap dimension> ::= DIMENSION <dimension name> <predict qualifier>
{ <olap level list> | <olap hierarchy list> }

<olap hierarchy> ::= HIERARCHY <hierarchy name> <predict qualifier> <olap level list>

<flag Name> ::= <col_type> [<col_distribution>] [<col_binary>] [<col_content>] [<col_content_qual>] [<col_qualif>]

Remarks

The CREATE MINING MODEL statement creates a new mining model based on the column definition list. Each column is described by content flags in the column definition. These flags provide additional information to the mining algorithm concerning the content of the training data or model. No more than one flag from a particular group can be used (that is, flags within a flag type group are exclusive of each other) and they must be placed in their correct order. The flag type groups and correct orders for the content flags are listed in the following table.

Flag type	Flag name	Description
Distribution	NORMAL	The values of the column appear in a normal distribution.
	LOG NORMAL	The values of the column appear in a log normal distribution.
	UNIFORM	The values of the column appear in a uniform distribution.
Content Type	KEY	The column is discrete and is a key. Key columns will not have any other flags except in the case of a nested table with no attribute columns.
	CONTINUOUS	The column contains values in a continuous range, such as Age or Salary.
	DISCRETE	The column contains a discrete set of values, such as Gender.
	DISCRETIZED()	The column contains a continuous set of values that should be converted to buckets.
	ORDERED	The column contains a discrete set of values that are ordered, such as Salary Level.
	CYCLICAL	The column contains an ordered discrete set of values that are cyclical, such as Day of Week or Month.
	SEQUENCE TIME	The column contains time measurement units.
Modeling	MODEL_EXISTENCE_ONLY	The column should be modeled as having two states, missing and nonmissing, regardless of the values in the column. This is particularly useful for columns in a nested table, where values are sparse across cases.
	NOT NULL	The column cannot accept NULL values.
Special Property	PROBABILITY	The value in this column is the probability (0-1) of the associated value.
	VARIANCE	The value in this column is value variance of the associated value.
	STD	The value in this column is the standard deviation of the associated value.
	PROBABILITY VARIANCE	The value in this column is the variance of the probability associated with the associated value.
	PROBABILITY STD	The value in this column is the standard deviation of the probability associated with the associated value.
	SUPPORT	The value in this column is the weight (case replication factor) of the associated value.

<Column relation> clause	Description
OF	This form is restricted to use for columns with Special Property content flags, for example, ProbGender Double PROBABILITY OF Gender.
RELATED TO	This form indicates a value hierarchy. The target of a related to column can be a key column in a nested table, a discretely valued column on the case row, or another column with a RELATED TO clause (indicating a deeper hierarchy).

<Prediction flag> clause	Description
PREDICT	This column can be predicted by the model and it can be supplied in input cases to predict the value of other predictable columns.
PREDICT_ONLY	This column can be predicted by the model, but its values cannot be used in input cases to predict the value of other predictable columns.