Decision tree classification training

Decision tree is an unsupervised learning method for classification and regression. Its purpose is to create a model that learns simple decision rules from data features to predict the value of a target variable. The advantage of this model is that the data form is easy to understand, can deal with irrelevant feature data, and the computational complexity is not high, so it is the basic form of the tree model.

Decision tree classification, like other classifiers, can be used to predict the class of samples, which can be used for both binary classification and multi-classification.

The method carries out a data Training Procedure of a decision tree classification method, can obtain a model according to data characteristics, and then is used for prediction.

When creating a decision tree classification training task, you need to set the following parameters:

Training Dataset: required parameter. The Dataset to be trained accesses Connection Info, including Data Type, Connect Parameter, Dataset name, etc. You can connect HBase data, dsf data, and Local Data.
Data Query Conditions: optional parameter; the specified data can be filtered out for corresponding analysis according to the Query Conditions; attribute conditions and Spatial Query are supported. E.g. SmID < 100 and BBOX(the_geom, 120,30,121,31)。
Explanatory Fields: a required parameter, the field of the explanatory variable. Enter one or more explanatory fields of the training Dataset as the independent variables of the model, which can help predict the category.
Modeling field: a required parameter, which is used to train the field of the model, that is, the dependent variable. This field corresponds to a known (trained) value of a variable that will be used to make predictions at unknown locations.
Depth of the
tree: An optional parameter, or the maximum number of partitions made into the tree. Range of values & gt; 0. The default is 5. If you use a larger maximum depth, more divisions will be created, which may increase the likelihood of overfitting the model.
Leaf Node Splitting Threshold: Optional parameter, the minimum number of observations required to retain a leaf (i.e., a terminal node on a tree that is not further split). The value range is 0, and the default value is 1. For very large data, increasing the number will reduce the runtime of the tool.
Distance explanatory variable Dataset: optional parameter, supports point, line and Region Dataset, calculates the Closest distance between the elements of the given Dataset and the elements in the training Dataset, and automatically creates a list of explanatory variables.
Model Save Directory: optional parameter; save the model with good Training Result to this address. An empty value indicates that the Model will not be saved.

After executing the training task, the following Result Parameter is output:

dTRModelCharacteristics: Properties of the decision tree classification model.
Variable: The Field array of the decision tree classification model, which refers to the field of the independent variable in the training model.
Variable Importances: The field importance refers to the degree of influence of the respective variable characteristics on the dependent variable.
F1 Score: weighted f1-measure.
Accuracy: Weighted accuracy.
WeightedPrecision: The weighted precision.
WeightedRecall: The weighted recall.
dTClassification Diagnostics: Classification result diagnostics. Including f1Score, precision, recall, true positive rate and false positive rate for each classification category.