Parallel Stream Processing

Parallel Stream Processing

Parallel Stream Processing (PSP) is the mechanism which enables efficient mass data processing with low memory usage in DataSpider Servista.

PSP has the following features.

An architectural diagram of a Parallel Stream Processing

PSP runs the following architecture.PSP allocates dedicated threads for each read, transform, and write operation that occurs during data transferring.



As described in the diagram above, there are 2 internal buffers for each of the read and the transform component which produces data to be consumed by the subsequent write component in the process flow.
Components generating result data execute writing process upon detection of a block in write-available condition (data consuming condition). Upon in the condition of not detecting a write-available block (data not consuming condition), waits the process.
Components using result data (in the diagram above, Converting process and Writing process), begins processing upon detection of a block in load-available condition (data creation completed condition), into the result data of a component in the input source. Upon in the condition of not detecting a load-available block (data creation not completed condition), waits the process.

Process flow

In the following sections, each of the read, the transform, and the write operation in the diagram above is explained in detail.
The data traversed from the read operation to the write operation is represented by the rectangular strips labeled 'data a', 'data b', and 'data c' each identifiable by their color.

Step 1: Before the process

Step 2: Reading the 'data a'

Step 3: Reading the 'data b' and transforming the 'data a'

Step 4: Reading the 'data c', transforming the 'data b', and writing the 'data a'

Step 5:Transforming 'data c' and writing the 'data b'

Step 6: Writing the 'data c'

Handling the data at each operational stage by its dedicated thread realizes the processing of a large volume data at an optimal speed.

Enabling Parallel Stream Processing

Smart Compiler applies PSP automatically distinguishing the script contents.
Therefore, you can basically create scripts without being aware of PSP.
For details, please refer to "Smart Compiler".

Parallel Stream Processing enabled components

For components and Mapper logics that correspond to PSP, please refer to Help.
For component operations that correspond to PSP, please refer to each operation page.
For Mapper logics that correspond to PSP, please refer to "Mapper logic list".

Parallel Stream Processing and threading

The details about the thread coordination in PSP mechanism is as follows:
In PSP, the number of the producer thread generated equals to the number of the processing components which retrieve data from their associated resources, and the thread that consumes the data in the bounded buffer of the each processing component is the same thread as the thread executing the script.
The total number of additional thread generated in a process flow comprised of "Read"-"Conversion"-"Write" is 3.
The total number of additional thread generated in a process flow comprised of "Read"-"Conversion"-"Conversion"-"Write" is 4.

Limitations