First things first. As you may (or may not) know, Gartner is using a quite comprehensive definition of MDM for customer data:
A combination of technology, processes and services to deliver an accurate, timely and complete view of the customer across multiple channels, lines of business, departments and divisions drawing customer data from multiple sources and systems.
I like the definition, but , I can also only reach one conclusion: If you really want to deliver this unique customer view across a multitude of channels and sources, you have to be sure that your matching engine is delivering the right data quality.
If we talk about matching, there are generally two approaches: deterministic and probabilistic matching.
Deterministic matching is usually knowledge- and rule based. For example, deterministic matching uses phonetic rule and algorithms for the recognition of acronyms, whereas
probabilistic matching is mostly using a more mathematical approach, in which all kinds of calculations and algorithms are used to determine the degree of similarity.
Look at the example “Jack London Ltd” and “Thompson London Ltd”: The pattern looks the same and a probabilistic method would probably recognise it like this. London is a city ands there is a high probability that this is true.
However, if we combine the two methods, we see that Jack is most a given name, which then changes the signification of the word London. It has become a surname, and we now see a different pattern.
To deliver the best required result, matching engines must combine both approaches. The better a matching is able to determine what is what in a particular context, the better the probability calculation of a certain match or a certain non-match.
That is actually EXACTLY the way we humans would do this…
As matching is used as sort of an umbrella term, we tend to think that matching is a "one size fits all-process". But look at e.g. comparing records, performing duplicate detection and merging data into a golden record... It appears that matching intelligently is not a simple task. Here's why:
- There are many different data formats, which sometimes calls for pre-processing to actually make formats compatible.
- Then there is the volatility of customer data. Customer data changes quite quickly. For example: In the UK, around 7% of the people move house every year. In Germany, this number is even higher: 10 %.
- Whenever data is being processed, human errors are being made. We all know it and it is a fact we should take into account when we process the data ourselves…
- The ever-growing internationalisation of business leads to more linguistic diversity and other cultural matching problems: A truck and a lorry are the same thing, but a football and a football are not…
- And of course there always are specific requirements when it comes to matching data. Think about, for example, the degree of precision when matching with sanction lists in an anti-money-laundering-context.
•Test with a lot of of real data
•Profile data sources in order to assess accuracy and completeness of data
•Compare and benchmark matching results
•Combine deterministic and probabilistic approaches
•Consider the interface for data stewardship (doubtful matches)
•Consider current AND future requirements
•Think across the border
If you want to learn more, please download our white paper on High Precision Matching. Enjoy the read!