Coding Names by Sound

by James J. Czuchra

When I began indexing parish records in the early 1980s, I was running across similar surnames that were spelled slightly differently from how my family members spelled their surnames. Yet if I said the names aloud, they still sounded alike. Indeed research showed these alternatively spelled surnames were part of my family. As I did census research at a National Archives branch, I learned of the Soundex code and how it grouped together similar sounding names even if they were spelled differently. Many of my alternatively spelled surnames yielded the same Soundex code-- precisely as they should. Working with the Soundex code was a must for using the census indexes which are based on Soundex. This was a good strategy because the census taker sometimes did not know how to spell a name correctly but could write down how it sounded to him. So even if the name was spelled "incorrectly" the Soundex index is a tool for finding the name based on how it sounds. Soundex is so common that database and programming software often have a built in function to compute the Soundex code.

In examining the Soundex rules, it became apparent that it was not designed with Polish names in mind. The rules are applied to the spelling of a name in the hope that the code corresponds to the pronunciation. Because Polish letters have different sounds from English, the Soundex code for many Polish names doesn't always represent the way they are pronounced.

Back then I dreamed of a new and improved Soundex that would account for the Polish sounds of letters. I put off developing such an improvement until the mid-1990s, when the World Wide Web was just taking off with the public. In doing a web search for linguistic guidance, I discovered someone had already developed a new coding system that could handle Polish sounds! And it had been developed in 1985! The system is known as the Daitch-Mokotoff system and is capable of handling more than just Polish sounds. Instead of the 4 character Soundex code, the Daitch-Mokotoff system codes 6 characters. This means it can usually do a better job distinguishing between names. Because some letters can be pronounced more than one way, some names can have more than one coding.

Enter a name into the field below.

The Soundex calculator above uses the built in function of the software to generate the code. The Daitch-Mokotoff calculator was custom written. While it is believed to be faithful to their coding scheme, I make no guarantees. Only one code is provided using the most common Polish pronunciation. Remember that some letters have more than one sound making it possible to have more than one code.

