Naive Bayes Classification Training |
Naive Bayes is a classification method based on Bayes theorem and the assumption of conditional independence of features. Compared with other more complex classification algorithms, Naive Bayes classification algorithm has better learning efficiency and classification results. Naive Bayes algorithm plays an important role in the direction of character recognition and image recognition. It can classify unknown characters or images according to their existing classification rules. It is widely used in real life, such as text classification, spam filtering and so on.
This method carries out the data training process of Naive Bayes classification, and can obtain the model according to the data characteristics, and then use it for prediction.
Training dataset: Required parameter. The data set to be trained accesses connection info, including data type, connection parameters, data set name, etc. You can connect HBase data, dsf data, and local data.
Data query criteria: Optional parameter; the specified data can be filtered out for corresponding analysis according to the query criteria; attribute condition and spatial query are supported. E.g. SmID<100 and BBOX(the_geom, 120,30,121,31)。
Explanation field: Required parameter, field name of the explanation variable. Enter one or more explanation field names of the training data set as the independent variables of the model to help predict the results.
Modeling field: Required parameter, which is used to train the field of the model, that is, the dependent variable. This field corresponds to a known (trained) value of a variable that will be used to make predictions at unknown locations.
Smoothing parameter: Optional parameter, value range >0. Default is 1.0.
Naive Bayes classification model: Optional parameters, including multinomial model and Bernoulli model. The default is multinomial.
Distance explanatory variable dataset: Optional parameter that supports point, line, and surface data sets. It calculates the closest distance between the elements of a given data set and the elements in the training data set, and automatically creates a list of explanatory variables.
Model saving directory: Optional parameter, save the model with better training results to this address. Empty means the model will not be saved.
numClasses: The number of classes.
f1Score: Weighted f1-measure.
accuracy: Weighted accuracy.
weightedPrecision: The weighted precision.
weightedRecall: The weighted recall.