Data profiles and data tests
-
For profiling of CSV files, strings including "." (dot) cannot be used for the column name (first row).
-
Import and export of data profiles and data tests are not supported. They must be set individually in the details screen.
-
The number of records in the table outline is obtained and displayed for crawling, and the number of records in the profile screen is obtained and displayed for profiling. Therefore, they may not match.
Data profile
Assets that retain data such as tables or CSV files are analyzed. For CSV files, schema information is also analyzed at the same time.
Profile items that are output
Profile Types |
Description |
Remarks |
---|---|---|
Null% |
Ratio of NULL values |
|
Distinct% |
Ratio of distinct values |
|
Min |
Minimum value |
Only numerical values |
Max |
Maximum value |
Only numerical values |
Mean |
Average value (excluding NULL values) |
Only numerical values |
For profiling of CSV files, a delimiter and a character code for the CSV files must be specified. Click [For the data profiling settings for the file, click here to configure them.] under [Execute data profiling], and specify the format for the CSV files.
Data tests
Data tests are executed for data such as tables or CSV files to check data quality.
Setting fields
Field Name |
Description |
Remarks |
---|---|---|
Column name |
Select columns for which to execute data tests. |
Selection of columns is mandatory. |
Data test name |
Select a data test type. |
Selection of a data test type is mandatory. For the selectable data tests, refer to the table "Data Test Type". |
Description |
Enter a description of the data test. |
|
Data Test Type
Data Test Type |
Description |
Remarks |
---|---|---|
isComplete() |
When NULL values do not exist, the result is success. Otherwise, it is failure. |
|
isUnique() |
When duplicate values do not exist, the result is success. Otherwise, it is failure. |
|
hasMin(num) |
When values lower than the specified value (minimum value) do not exist, the result is success. Otherwise, it is failure. |
Specification of a minimum value is mandatory. |
hasMax(num) |
When values higher than the specified value (maximum value) do not exist, the result is success. Otherwise, it is failure. |
Specification of a maximum value is mandatory. |
hasPattern(string) |
When all data satisfies the specified regular expressions, the result is success. Otherwise, it is failure. This type is available only for string-type fields. |
Specification of regular expressions is mandatory. |
Common items
User types and permissions
User Type | Execution of Profiling/Data Testing | View of Profile/Data Test |
---|---|---|
Admin | ✓ | ✓ |
Steward | ✓ (Only for assets for which the user is the owner) | ✓ (Only for assets that the user can view) |
Member | ✓ (Only for assets that the user can view) |
Supported resources
DB type
-
PostgreSQL
-
Oracle
-
SQL Server
-
MySQL
-
Db2
-
JDBC
Cloud storage type (CSV file format)
-
Amazon S3
-
Azure Blob Storage
-
Google Cloud Storage
Automatic execution of data profiling and testing
For the target assets for automatic execution, data profiling and data testing are executed according to the scheduled time that is set for each resource.
The schedule for automatic execution is set on the edit resource screen.
Only one schedule can be set for each resource.
If you execute data profiling and data testing automatically, select [Add <asset name> to automatic data profiling and data test targets] in the [Data quality] tab for the target asset.