Sunday January 21st 2018

Categories

Insider

Archives

Traditional Approaches Of Data Cleansing

Data cleansingThis post is about “traditional approaches” a title we found in MSDN for data cleansing. We are including this in our website because we use various methods for data cleansing. For experienced professionals there is hard and fast rule because of task on hand requirement. But a traditional approach title is included in datamart.org to identify which approach was used in data cleansing, enjoy that post.

Any number of techniques and tools can be employed to handle these kinds of situations. Specialized structured query language constructs such as the T-SQL LIKE and CONTAINS clauses can be used for basic wildcard searches. But LIKE queries are limited in their ability to handle misspellings, and CONTAINS queries are used in conjunction with SQL Server Full Text indexing.
Fuzzy search databases can be amassed that compile common misspellings (or variants) of specific words which can then be substituted during the cleansing process. This technique works better for applications that check one word at a time, like Microsoft® Word, which employs a similar technique for making spelling corrections on the fly.

Phonetic matching algorithms, implemented in SQL Server as SOUNDEX queries, also detect similarities between single words by matching prominent phonetic characteristics that are then scored numerically for comparison. Key drawbacks to SOUNDEX are that the input string must be contiguous with no spaces, and if the first character of the input string is not correct, the probability of a match being made drops down to zero.