On the Origin of Data
  • Blog
  • About
  • Contact

Digital business & eID

15/6/2015

1 Comment

 
Picture
Erik van de Poel of BKR presenting during the workshop
Last week I hosted a workshop during the conference "Digital business & eID" in the Netherlands. The theme of the workshop was data quality and fraud prevention and three speakers and myself each held a short presentation, ending in a statement or question to initiate discussion. 
My discussion topic, for example, was: "Without a single customer view, digital business will fail to be successful......" As you can imagine, this led to some lively discussions with regard to the fact that an electronic identity will only be of use if there is a way to establish reference to the original customer data. How do you actually take are of unique identification within the concept of eID? 
Customer data is the new oil, everybody believes in the surplus value of sound data management; but how do you really achieve an actual advantage in digital business?

My co-presenters covered a range of topics  in the general theme of the workshop. Martijn de Boer of CDDN talked about a new approach to increase data quality. Erik van de Poel of BKR (the Dutch foundation for consumer credit checks) presented his organisation's policy for prevention of identity fraud. Helen van der Sluys of the Dutch Ministry of Internal Affairs showed the participants the consequences of of identity fraud and identity theft. She illustrated her story by showing this movie of an "amazing" mind reader in the heart of Brussels. Check it out ......
1 Comment

The Rosetta Stone revisited

19/5/2015

0 Comments

 
Picture
Computers deal with numbers. Basically, they store letters and other characters by assigning a number for each one. In order to represent all these different letters and characters, there are hundreds of different encoding systems for assigning these numbers. Unfortunately, no single encoding can contain enough characters. Furthermore, some of the encoding systems are in conflict with each other. For example, two encodings are using the same number for two different characters, or different numbers for the same character.
With the advent of the Unicode standard, this all seems to be in the past. Unicode provides a unique number for every character and expands the number of characters encoded by using multiple bytes per character. There are both 16-bit (2 bytes) and 32-bit (4 bytes) versions of Unicode, which enables us to store and exchange over 95.000 characters, including Latin, Greek and Cyrillic alphabets, Hebrew and Arabic scripts, Japanese Hiragana, Chinese ideograms and Kangi radicals.

In complex international business environments however, where customer data integration, master data management, compliance to laws and regulations and operational excellence play an important role, Unicode is nothing more than a commodity. The real challenge in processing multilingual data lies in application of robust transliteration and transcription, normalization and intelligent comparison methods. Comparing characters from different writing systems does not only have historic value. The discovery of the Rosetta Stone shows the importance of transliteration avant-la-lettre. The stone, created in 196 B.C. and containing the same passage in three different languages, gave historians the key to two previously undecipherable Egyptian writing scripts (hieroglyphics) by determining the level of similarity with the known classical Greek alphabet.
Comparison of data, recorded in different writing systems poses many challenges. Naturally, a Unicode enabled environment is necessary to represent the data. But this is not where the real difficulties lie. Assessing the degree of similarity between records in Latin script and non-Latin script includes a much higher degree of complexity. 
Interested? Get in touch with me and I'll send you a factsheet on the processing of non-Latin characters in an international business environment....





 

0 Comments

Synergy - more than just working together.... Introducing the Data Academy

28/4/2015

0 Comments

 
Picture
Oftentimes when companies get acquired by other companies there is a lot of talk about synergy. If you look up the word "synergy" in, for example, Wikipedia, you will find the following definition: Synergy is the creation of a whole that is greater than the simple sum of its parts. Originally, the term synergy comes from the Greek "synergeia", which means "working together". 
In my experience, most organisations consisting of two or more companies, think that the original Greek meaning covers the phenomenon: "As long as work together, we will eventually achieve synergy.....". And that is huge mistake!
The creation of a whole that is greater than the simple sum of its parts, requires more than just working together. 
Neopost (our mother company) acquired Satori Software, Human Inference and DMTI Spatial in order to fulfil their strategy rationale to support customers in their needs with
regard to digital communication. Satori, Human Inference and DMTI represent the data quality and data management division of Neopost. And of course ...., we are aiming for synergy. It is one of our strategic objectives.
When I started thinking about this, it dawned on me that we would need an integrated approach for gathering and sharing knowledge and to apply this approach for visibility and thought leadership. So I created a program called the Data Academy. 
The Academy is the key to unlock our knowledge resources (hence the key symbol in this blog....) and to cross-pollinate them with our customers and partners.
In the weeks and months to come, I will give you regular updates on activities and deliverables of the Data Academy. 
Stay tuned!



0 Comments

The acronym is NOT the solution

8/4/2015

0 Comments

 
Picture
Having worked in the data and information quality industry for quite some years now, I've noticed that our industry feels that there is an urgent need for new acronyms every couple of years. Here's a small selection: CRM, ERP, BI, SaaS, CDI, MDM, FTR..... Are you still with me? If so, you have probably been in this business for a substantial amount of time as well. As these acronyms mysteriously or automagically gain and loose popularity, I am now convinced that they all, more or less, serve the same purpose: They intend to be the "theoretical foundation" for solution selling.
Organizations spend a lot of time on optimizing their production chain, their invoicing processes and the quality of their customer database(s). For this, all kinds of tools and systems are being used (and the corresponding acronyms become popular...;-) . Some of these tools and systems are really intelligent, but many times the actual purpose of the deployment of these means is lost in the process. 
Having a CRM-, BI-or MDM- application in place does not mean that you have improved your customer interaction. The acronym is not the solution....

I think it eventually boils down to answering the question: How do we use our customer data to actually achieve and improve REAL customer interaction? 
In other words: How do we really personalise our customer interaction? Having all the necessary data available is only the beginning of the journey. 
We have made a really nice one-minute-movie on personalised customer interaction. Check it out! It's worth it....



0 Comments

High precision matching at the Gartner MDM Summit

19/3/2015

0 Comments

 
Picture
Last week, during the Gartner MDM Summit in London, I gave a presentation on the importance of high quality matching within Master Data Management. The presentation, which was received very well, thank you ;-), triggered a lot of interesting discussions at the Neopost/Human Inference booth. Therefore, I thought it would be a good idea to write a post on my ideas with regard to combining probabilistic and deterministic matching.
Traditional matching engines are based on atomic string comparison functions, like match-codes, phonetic comparison, Levenshtein string distance and n-gram comparisons. The drawback of these functions is that it’s not always clear for what purpose one needs to use a particular function, and that these low-level DQ functions cannot distinguish between apples and oranges – you end up comparing family names with street names.
In essence, this is the basis of the discussion on the matching approach within customer data management: As intelligent automated matching of records distributed over various heterogeneous data sources is an essential pre-requisite for correct and adequate customer data integration, there are many opinions on how to achieve this.
In theories on data matching, there are in general two methods that prevail when customer data management is concerned: deterministic and probabilistic matching.

·       Deterministic matching uses, among others, country- and subject-specific knowledge, linguistic rules, such as phonetic conversion and comparison, business rules and algorithms, such as letter transposition or contextual acronym resolving to determine the degree of similarity between database records.
·       Probabilistic matching uses statistical and mathematical algorithms, fuzzy logic and contextual frequency rules to assign the degree of similarity between database records. In this, patterns with regard to fault-tolerance play an important role (the matching method is able to take into account that humans make specific errors). Probabilistic matching methods usually assign the probability of a match in a percentage.

Both methods have advantages and disadvantages, but I believe that the two methods should always be combined. The reason for this is actually quite simple: the better the matching engine is able to determine what is what in a particular context, the better the probability calculation of a certain match or a certain non-match. This is, in essence, the same as humans do. We determine what we know and consequently use contextual probability and pattern recognition to assign significations to the words we come across.

Combining deterministic and probabilistic matching will yield in more precise matching, with less mismatches and less missed matches. Probabilistic matching often uses weighting schemes that consider the frequency of information to calculate a score and/or ranking. The more common a particular data element is, the lighter the weight that should be used in a comparison. That is a sound and robust approach. However, assigning weighting factors on data that have been interpreted and enhanced with statistical information, will increase the matching results to a high precision level.





0 Comments

Listen to .... the customers 

17/3/2015

0 Comments

 
The starting point of customer lifetime value lies in making sure that the input of data is correct, valid, complete and standardized. This guiding principle is used in traditional data quality management, but it most definitely applies to "new" ways of dealing with customer data. New channels, online contact forms, self service portals, different customer behaviour - the changing environment adds new challenges to the art of intelligent data management.

In traditional customer contact processes there is an interaction between people and people and/or systems. In online customer contact processes, the initial interaction is between the customer and the system. This calls for a data quality strategy in which the data will be captured and processed the First Time Right! Whether it is to prevent pollution in your existing database, to raise the customer's confidence in your company or to guide your customer to the right product; the design of the online interaction process has a substantial impact on the quality of the customer data.

How do you "guide" your (potential) customer to enter the right data in a self-service portal or in a web contact form? How do you make sure you that you check online loan applications correctly? How do you identify the returning customer? All these and many more questions add an extra dimension and another degree of complexity to the management of online customer interaction.

I think that it eventually boils down to listening to the customers – even if the customer’s first point of contact is with the system. See for yourself how the intelligent management of your online, multi-channel customer interaction will have increasingly positive effect on the total data quality in your organisation.

 

0 Comments
Forward>>
    Tweets by @HolgerWandt

    Author

    Holger Wandt is Director Thought Leadership & Education at Neopost/Human Inference. 
    He joined Human Inference in 1991. As a linguist, he was one of the pioneers of the interpretation and matching technology in the Human Inference product suite. In his current position he is responsible for streamlining the efforts within the different knowledge areas of Neopost, for conveying vision to the current and future customers and partners and for promoting ideas and vision to industry boards, thought communities, universities and analyst firms.
    His career is testimony to his achievements in the field of language, data quality management and master data management.

    Archives

    April 2016
    March 2016
    February 2016
    November 2015
    October 2015
    July 2015
    June 2015
    May 2015
    April 2015
    March 2015

Proudly powered by Weebly