Preparing data |
The input data sources supported by the iServer distributed analysis service include the following. After the data is ready, iServer will filter out all the datasets which meet the analysis condition when creating the specific analysis job.
Relational datasets stored in iServer DataStore
Datasets in big data sharing file
Shared directory
The distributed file storage HDFS directory
Datasets stored in spatial database
iServer DataStore is an application that allows you to quickly create data storage and associate the data storage with iServer. For how to build the iServer DataStore distributed environment, please refer to Build distributed iServer DataStore environment.
The relational datasets in iServer DataStore are from the two sources:
Create dataset: in the %datacatalog_uri%/relationship/datasets resource page, you can create a dataset without any feature. You can connect to iServer DataStore with iDesktop then add features.
Import dataset: in the %datacatalog_uri%/relationship/dataimport resource page, you can import a dataset. Supported formats are: CSV, UDB, workspace, and Excel. After imported successfully, the imported datasets will be listed on the %datacatalog_uri%/relationship/datasets resource.
The iServer administrator can register the CSV file, the UDB file, and the HDFS directory as iServer's big data file sharing. For the registration method, see: Register big data file sharing. The datasets in big data file sharing which has been registered successfully will appear in the datasets resource of the Data Category Service and will also be used as input data for the distributed analysis service.
The csv data files registered to iServer need to be validated for distributed analysis service. The validation method is:
If you use distributed analysis service with an unregistered csv data, you need ensure that a corresponding .meta file exists in the csv storage path which contains meta information for the csv data file. For example, the content of the .meta file for the sample data newyork_taxi_2013-01_14k.csv under [iServer installation directory]/samples/data_en/ProcessingData directory is:
"FieldInfos": [ { "name": "col0", "type": "WTEXT" }, { "name": "col1", "type": "WTEXT" }, { "name": "col2", "type": "WTEXT" }, { "name": "col3", "type": "INT32" }, { "name": "col4", "type": "WTEXT" }, { "name": "col5", "type": "WTEXT" }, { "name": "col6", "type": "WTEXT" }, { "name": "col7", "type": "INT32" }, { "name": "col8", "type": "INT32" }, { "name": "col9", "type": "DOUBLE" }, { "name": "X", "type": "DOUBLE" }, { "name": "Y", "type": "DOUBLE" }, { "name": "col12", "type": "DOUBLE" }, { "name": "col13", "type": "DOUBLE" } ], "GeometryType": "POINT", "HasHeader": false, "StorageType": "XYColumn" }
The iServer administrator can register the HBase, Oracle, PostgreSQL, POSTGIS and MONGODB databases as the spatial database of iServer through the "Register data storage" function on the Cluster>Data registration page. For the registration method, please refer to the Register spatial database. The datasets in the spatial database which has been registered successfully will appear in the datasets resource of the Data Category Service and will also be used as input data of the distributed analysis service.