Saturday January 20th 2018




Data Quality and Data Profiling

A Glossary of Terms
by Brian Marshall

* Data Quality
* Data Quality Domain
* Data Quality Rule
* Data Profiling
* Data Quality Profiling
* Database Profiling

Data Quality is a measure of the value of data in relation to
potential data problems:

* missing data records or data values
* incorrect or out-dated data values
* inconsistent abbreviations and spelling
* inconsistent data formats and units of measure
* inconsistent data organization
* duplicate data
* violations of data-rules or business-rules

Data quality can be analyzed in relation to a particular
Data Quality Domain, making it possible to determine the
importance of different data problems.

[Back to Top]

A Data Quality Domain is an application or use of data that
imposes a set of Data Quality Rules, each of which is associated
with a degree-of-importance for the domain.

[Back to Top]

A Data Quality Rule is a specification of one or more data
quality problems which should not exist in a set of data.

For example, a data quality rule might specify that in the
EMPLOYEE table, EMPLOYEE_NAME must be set and
that it must contain only letters and spaces. Another rule
might specify that EMPLOYEE_NAME should not contain
multiple consecutive spaces. These two rules might be specified
separately so that, in a particular Data Quality Domain,
they can be assigned different degrees-of-importance.

[Back to Top]

Data Profiling can refer to: Data Quality Profiling or Database Profiling.

[Back to Top]

Data Quality Profiling is the process of analyzing a database
in relation to a Data Quality Domain, to identify and prioritize
data quality problems. The results can include:

* Summaries (with counts and percentages) describing…
o completeness of datasets and data records
o problems organized by importance
o the distribution of problems in a dataset
* Details – lists of…
o missing data records
o data problems in existing records

Data quality profiling can be useful when planning and managing
data cleanup projects.

[Back to Top]

Database Profiling is the process of analyzing a database to determine
its structure and internal relationships:

* the tables used, their keys and number of rows
* the columns used and the number of rows with a value
* relationships between tables
* columns copied or derived from other columns

Database Profiling can also include analysis of:

* tables and columns used by different applications
* how tables and columns are populated and changed
* the importance of different tables and columns
Database profiling can be useful when planning and managing
data conversion and data cleanup projects.

Database profiling can be an initial step in defining a
Data Quality Domain, which is used in Data Quality Profiling.

Got to that website to read more