Data profiles computed by data profiling analytical techniques include:
* Distinct lengths of string values in a column and the percentage of rows in the table that each length represents. Example: Profile of a column of US State codes, which should be two characters, shows values longer than 2 characters.
* Percentage of null values in a column. Example: Profile of a Zip Code/Postal code column shows a high percentage of missing codes.
* Percentage of regular expressions that occur in a column. Example: A pattern profile of a phone number column shows numbers entered in three different formats: (919)674-9999, [919]6749988, and 9199018888.
* Minimum, maximum, average, and standard deviation for numeric columns; and minimum and maximum for date/time columns. Example: Profile for an Employee birthdate column shows the maximum value is in the future.
* Distinct values in a column and percentage of rows in the table that each value represents. Example: A profile of a U.S State column contains more than 50 distinct values.
* Candidate key column for a selected table. Example: Profile shows duplicate values in a potential key column.
* Dependency of values in one column to values in another column or columns. Example: Profile shows that two or more values in the State field have the same value in the Zip Code field.
* Value inclusion between two or more columns. Example: Some values in the ProductID column of a Sales table have no corresponding value in the ProductID column of the Products table.
[edit]
Key steps undertaken during Data profiling
Following are some of the key steps that are generally employed during data profiling process:-
1. Use of analytical and statistical tools to outline the quality of data structure and data organization by determining various frequencies and ranges of key data element within data sources.
2. Applying Numerical analysis techniques to determine the scope of numeric data within data sources.
3. Identifying multiple coding schemes and different spellings used in the data content
4. Identifying data patterns and data formats and making note of the variation in the datatypes and data formats being used within data sources
5. Identifying duplicacy in the data content such as in name, address or other pertinent information
6. Decipheing and validating redundant data within the data sources.
7. Making note of primary and foreign key relationships and studying their impact on data organization and data retreival
8. Making validation trials by following specific business rules on data records across the data sources







