Data profiles and data tests

= Remarks =
  • For profiling of CSV files, strings including "." (dot) cannot be used for the column name (first row).

  • Import and export of data profiles and data tests are not supported. They must be set individually in the details screen.

  • The number of records in the table outline is obtained and displayed for crawling, and the number of records in the profile screen is obtained and displayed for profiling. Therefore, they may not match.

 

Data profile

Assets that retain data such as tables or CSV files are analyzed. For CSV files, schema information is also analyzed at the same time.

 

Profile items that are output

Profile Types

Description

Remarks

Null%

Ratio of NULL values

 

Distinct%

Ratio of distinct values

 

Min

Minimum value

Only numerical values

Max

Maximum value

Only numerical values

Mean

Average value (excluding NULL values)

Only numerical values

= Remarks =

For profiling of CSV files, a delimiter and a character code for the CSV files must be specified. Click [For the data profiling settings for the file, click here to configure them.] under [Execute data profiling], and specify the format for the CSV files.

 

 

Data tests

Data tests are executed for data such as tables or CSV files to check data quality.

 

Setting fields

Field Name

Description

Remarks

Column name

Select columns for which to execute data tests.

Selection of columns is mandatory.

Data test name

Select a data test type.

Selection of a data test type is mandatory.

For the selectable data tests, refer to the table "Data Test Type".

Description

Enter a description of the data test.

 

 

Data Test Type

Data Test Type

Description

Remarks

isComplete()

When NULL values do not exist, the result is success. Otherwise, it is failure.

 

isUnique()

When duplicate values do not exist, the result is success. Otherwise, it is failure.

 

hasMin(num)

When values lower than the specified value (minimum value) do not exist, the result is success. Otherwise, it is failure.

Specification of a minimum value is mandatory.

hasMax(num)

When values higher than the specified value (maximum value) do not exist, the result is success. Otherwise, it is failure.

Specification of a maximum value is mandatory.

hasPattern(string)

When all data satisfies the specified regular expressions, the result is success. Otherwise, it is failure.

This type is available only for string-type fields.

Specification of regular expressions is mandatory.

 

Common items

 

User types and permissions

User Type Execution of Profiling/Data Testing View of Profile/Data Test
Admin
Steward ✓ (Only for assets for which the user is the owner) ✓ (Only for assets that the user can view)
Member   ✓ (Only for assets that the user can view)

 

Supported resources

DB type

  • PostgreSQL

  • Oracle

  • SQL Server

  • MySQL

  • Db2

  • JDBC

Cloud storage type (CSV file format)

  • Amazon S3

  • Azure Blob Storage

  • Google Cloud Storage

 

Automatic execution of data profiling and testing

For the target assets for automatic execution, data profiling and data testing are executed according to the scheduled time that is set for each resource.

The schedule for automatic execution is set on the edit resource screen.

Only one schedule can be set for each resource.

If you execute data profiling and data testing automatically, select [Add <asset name> to automatic data profiling and data test targets] in the [Data quality] tab for the target asset.