Handling mass data

Handling mass data

To prevent memory shortages due to the handling of large amounts of data, HULFT Square includes two mechanisms for mass data processing: Smart Compiler and mass data processing.

 

Smart Compiler is active by default. Smart Compiler is a function that automatically applies Parallel Streaming Processing (hereinafter referred to as "PSP"), which has no theoretical data volume limit.

PSP enables fast mass-data processing through a multithread flow, which includes reading, converting, and writing data that is partitioned by block.

PSP isn't compatible with some types of processing because of its characteristics of the mechanism.

= Reference =

 

Overview of mass data processing

While normal processing keeps all data in memory, mass data processing can be achieved by storing only the minimum amount of data required for processing in the memory and storing the other data in a file.

Select Perform mass data processing in the settings for the script or connector to handle mass data with component processing (mostly the reading of the connector).

Mass data processing is supported for all connectors of the table model type, some connectors of the XML type (those with the Data processing method tab), and Mapper.

When you execute mass data processing, regardless of whether it's a table model type component, an XML type component, or Mapper, the results of the data aren't maintained in the memory but are maintained in the storage.

The minimum amount of data for actual processing is saved in the memory.

For the XML type, data is stored in the memory at a fixed buffer size or per node.

For the table model type, data is stored in the memory per row. After a row is read, the data is stored in a file.

However, the data isn't written in the file row by row, because the data is written in the write buffer before being written into the file.

For disk storage, you have to secure free space for file saving result data (in the case of the XML type, the data amount of the element names and other data must be considered).

The generated file is deleted when the script that uses the result data completes.