Aggregate

Operation name

Aggregate

Function overview

Aggregates the input data.

Data model

The data model of this component is table model type.

= Remarks =

For more details on input/output schema, refer to Table model type .

Properties

Basic settings

Item name

Required/Optional

Use of variables

Description

Remarks

Name

Required

Not available

Enter a name that is used on the script canvas.

 

Input data

Required

Not available

Select a component on the script canvas.

 

Required settings

Item name

Required/Optional

Use of variables

Description

Remarks

Group key configuration

Optional

-

Specify the columns to be used as keys for the Aggregate operation.

Each row can be operated with the following buttons:

Add

Adds a key.

Delete

Deletes the key.

  • When multiple columns are specified as keys, input data is grouped based on the combination of the values of those columns and aggregated by each group.

  • When no key is specified, all the input data is aggregated as a single group.

Note

You can't specify a byte[] type column and aggregate data.

Group key configuration/Column name

Required

Not available

Select the column name to be used as a key.

  • The index that shows the position of the column in the order is displayed at the end of the column name.

  • When the value of Input data is selected or changed, column name will be updated.

Aggregate settings

Item name

Required/Optional

Use of variables

Description

Remarks

Aggregate target configuration

Optional

-

Specify a target column for aggregation and an aggregate function.

Each row can be operated with the following buttons:

Add

Adds an aggregate target.

Delete

Deletes the aggregate target.

 

Aggregate target configuration/Column name

Required

Not available

Select a column name to be an aggregate target.

  • The index that shows the position of the column in the order is displayed at the end of the column name.

  • When the value of Input data is selected or changed, column name will be updated.

  • Column names set in Group key configuration/Column name aren't displayed as choices.

Aggregate target configuration/Function

Required

Not available

Select an aggregate function to process the aggregate target.

  • Enabled when Aggregate target configuration/Column name is selected.

  • Only the aggregate functions applicable to the data type of the column specified in Aggregate target configuration/Column name are displayed.

  • For details, refer to Aggregate functions.

Output settings

Item name

Required/Optional

Use of variables

Description

Remarks

Configure output manually

Optional

Not available

Select whether or not to manually set columns and aggregated results to be output.

Selected

Output targets are set manually.

Not selected

(Default)

Output targets are automatically set according to the content of the specified property items.

  • When it's not selected, the output targets are in the following order:

    1. The order set in Group key configuration

    2. The order set in Aggregate target configuration

    3. Count

Output target configuration

Optional

-

Specify columns and aggregated results to be output.

Each row can be operated with the following buttons:

Up

Moves the output target upward by one row in the output order.

Down

Moves the output target downward by one row in the output order.

Add

Adds an output target.

Delete

Deletes the output target.

  • Enabled when Configure output manually is selected.

Output target configuration/Data source

Required

Not available

Select a data source of the output target.

<Group key configuration/Column name>

The column that is the group key is output.

<Aggregate target configuration/Function> of <Aggregate target configuration/Column name>

The aggregated results for the aggregate target column are output.

Count

The number of data items that belong to each aggregate group is output.

  • The index that shows the position of the column in the order is displayed at the end of the column name.

  • When the value of Input data is selected or changed, column name will be updated.

  • The internal data type of the output column depends on the selected data source.

    • <Group key configuration/Column name>: Same as the input data type

    • <Aggregate target configuration/Function> of <Aggregate target configuration/Column name>: Data type of the aggregated results

    • Count: int type

  • For more details on aggregated results, refer to Aggregate functions.

Comment

Item name

Required/Optional

Use of variables

Description

Remarks

Comment

Optional

Not available

You can write a short description of this connector.

 

Schemas

Input schema

Same as the schema of input source component.

Output schema

The number of columns varies depending on the settings for Output Settings.

= Remarks =

For schema structure, refer to Table model type .

Loading schema in Mapper

The output schema is loaded automatically while the input schema needs to be loaded manually.

Specify the schema of the data to be loaded.

= Remarks =

For details, refer to Edit Schema.

Mass data processing

Mass data processing isn't supported.

Parallel Stream Processing

PSP isn't supported.

Available component variables

Component variable name

Description

Remarks

message_category

When an error occurs, the category of the message code corresponding to the error is stored.

  • The default value is null.

message_code

When an error occurs, the code of the message code corresponding to the error is stored.

  • The default value is null.

message_level

When an error occurs, the severity of the message code corresponding to the error is stored.

  • The default value is null.

error_type

When an error occurs, the error type is stored.

  • The default value is null.

  • The format of the error type is as follows.

    Example: java.io.FileNotFoundException

error_message

When an error occurs, the error message is stored.

  • The default value is null.

error_trace

When an error occurs, the trace information for the error is stored.

  • The default value is null.

Modifying schema of input source component

  • When the schema of the component specified in Input data is modified, to reflect those changes to the Aggregate operation, open the property setting dialog of the Aggregate operation and click Finish button.

  • When the schema structure of the component specified in Input data is modified (such as changes to the order of the schema elements, or deletion of schema elements), settings for Group key configuration/Column name, Aggregate target configuration/Column name, and Output target configuration/Data source must be modified accordingly.

    The reason is that the Aggregate operation keeps the information of columns by their positions in the order (indices) rather than by their names.

Null and empty strings

  • When there's a null value or an empty string for the value of the column which is specified in Group Key Configuration, the value will be processed as a group key value.

  • An error occurs when there's a null value or an empty string for the value of the column which is specified in Aggregate target configuration with any of the following aggregate functions:

    • Summation

    • Minimum (number)

    • Maximum (number)

    • Minimum (date/time)

    • Maximum (date/time)

Aggregate functions

Aggregate functions have some limitations by nature on the data types that can be used as the input for processing. Further, the data type of the aggregated results might be different than the input data type.

= Remarks =

For more details on data types, refer to Data types.

Aggregate function

Description

Availability for input data type

Data type of aggregated results

byte

short

int

long

float

double

BigDecimal

boolean

String

Date

byte[]

Unique count

Aggregates the number of unique values.

×

int

First value

Extracts the first value.

Same as input

Last value

Extracts the last value.

Same as input

First non-null value

Extracts the first non-null value.

Same as input

Last non-null value

Extracts the last non-null value.

Same as input

Summation

Calculates the sum as a number.

×

×

×

BigDecimal

Minimum (number)

Extracts the minimum value as a number.

×

×

×

Same as input

Maximum (number)

Extracts the maximum value as a number.

×

×

×

Same as input

Minimum (date/time)

Extracts the value of the earliest date and time.

×

×

×

×

×

×

×

×

×

Same as input

Maximum (date/time)

Extracts the value of the latest date and time.

×

×

×

×

×

×

×

×

×

Same as input

  • How to read the table

    Symbol

    Description

    Can be applied as input.

    ×

    Can't be applied as input.

= Remarks =

The internal data type char is treated as String.

Specification limits

  • Multi-thread processing is supported.

    = Remarks =

    For the specification limits on multi-thread processing, refer to Specification limitations.

  • When specifying Group key configuration, Aggregate target configuration, and Output target configuration, the output schema of input source component must be configured.

    For components that require manual schema settings, load schemas with Mapper.

  • The order of the result data of the Aggregate operation isn't ensured.

  • Even when mass data processing is performed by the input source component, data is temporarily held in the memory during the execution of the Aggregate operation.

    Therefore, when the data amount is huge, OutOfMemoryError may occur even with mass data processing enabled.

Exception messages

Exception name

Cause

Solution

InputDataNotFoundException

Input data isn't specified.

Specify Input data or draw a data flow.

InvalidInputTypeException

Input data isn't table model type.

Check whether the component specified in Input data is table model type.

InvalidPropertyConfigurationException

Value is not specified in [<property name>].

A value isn't specified in <property name>.

Specify <property name>.

InvalidPropertyConfigurationException

Value is not specified in [<property name2>] of the row [<row number>] in <property name1>.

A value isn't specified in <property name2>.

Specify <property name2>.

InvalidPropertyConfigurationException

The value specified in the row [<row number>] of <property name> is duplicated.

The value specified in <property name> is duplicated.

Check whether the value specified in <property name> is appropriate.

ConversionFailedException

Column value cannot be processed as specified aggregate target.

The contents of the input data contain some values that can't be processed as per the specification in Aggregate target configuration.

Check whether the input data is compatible with the contents specified in Aggregate target configuration, or whether any null or empty string is contained.