Classification and prediction based on forest

Data prediction is carried out according to the model trained based on forest classification or the existing model, such as the prediction of suitable areas for animals and plants, areas prone to natural disasters, etc. The result returns to the element Dataset (FeatureRDD). The prediction result will automatically generate a column of predicted fields and add the probability field to represent the probability of each predicted value. The value with the highest probability is the final prediction result (predicted).

When you create a forest-based classification prediction task, you need to set the following parameters:

Prediction Dataset: required parameter. The Dataset to be predicted accesses Connection Info, including Data Type, Connect Parameter, Dataset name, etc. You can connect HBase data, dsf data, and Local Data.
Data Query Conditions: optional parameter; the specified data can be filtered out for corresponding analysis according to the Query Conditions; attribute conditions and Spatial Query are supported. E.g. SmID < 100 and BBOX(the_geom, 120,30,121,31)。
Model Save Directory: required parameter, which is the save address of the generated model in Training Procedure.
Mapping field of prediction data: optional parameter. The field of prediction data shall be in one-to-one correspondence with the field of training data, so that the forest model obtained by training can be used to obtain the prediction result. The default is null. In this case, all fields in the explanatory variable array must exist in the Prediction Dataset.
Distance explanatory variable mapping of prediction data: optional parameter. If the distance explanatory variable Dataset is input in the training model stage, the prediction distance explanatory variable Dataset must be input here, and the corresponding field is required.
Result Dataset: required parameter. It is the Connection Info for saving the access data of the prediction result. It needs to include the Data Type, Connect Parameter, Dataset name and other information.