⬆️ ⬇️

Programmers misconceptions about names - with examples





In 2010, Patrick McKenzie wrote the famous article “Programmers Misconceptions about Names,” listing 40 factoids that are not always true of human names.



Do you think the programmers sat down, thought and changed the processing of names in computer systems? Unfortunately, not really. We are still universally asked to fill in online forms that require the presence of a first and last name (and in that order). These systems still assume that our names can always be written in alphabetic characters, often only ASCII.



I suspect that the article by Patrick did not have enough impact on the industry, including because it lacked examples of every delusion. But as a former employee of the IBM Global Name Management project, I can assure you that everything said in it is true.

')

Do not believe? In this article I will list all 40 misconceptions, giving an example (or two) of my experience in this field. Ready? Go!



1. Each person has one canonical full name.

It seems some people believe that you get a name, and it never changes. But even in Western countries, a person can change his last name when entering into marriage. In the Catholic tradition, a person can get a second name when confirming.



2. Each person has one full name that he uses.

The well-known science fiction writer John Wyndham (author of The Day of the Triffids) was born with the name John Wyndham Parks Lucas Beynon Harris, and published books under the names John Beynon and Lucas Parks, as well as John Wyndham.



3. At this point in time, each person has one canonical full name.

An actor may have a stage pseudonym, completely different from the name on the birth certificate, he may even have a passport for the stage pseudonym.



4. At this point in time, each person has one full name that he uses.

This is not true. Even in Western countries, a woman can keep her maiden name at work (where she is already known by that name) and use her husband's name in communication or in legal documents such as mortgages and loans.



5. Every person has exactly N names, regardless of the value of N.

An English name traditionally contains two names (they are often called a name and a second name) and a surname, but not necessarily everything that way. A person may not have a second name or there may be several. For example, the Portuguese have one or two names and up to four surnames (up to six in the case of a married woman), and these surnames can be phrases, such as da Silva or dos Santos, or even Costa i Silva.



6. Names fit in a certain number of characters.

In the famous artist, who is usually called just Picasso, the full name was Pablo Diego José Francisco de Paula Juan Nepomuceno Maria de los Remedios Cipriano de la Santisima Trinidad Martir Patricio Ruiz and Picasso. Try to fit it into a 30 character form ...



7. Names do not change.

We have already mentioned girls who change their name when they marry, so this is clearly wrong. In addition, Catholics can take a second name at the time of confirmation. Also, a person often adds a name or completely changes it when converting to another religion - remember, after the conversion to Islam, Kat Stevens became Yusuf Islam, and Cassius Clay turned into Mohammed Ali.



8. Names change, but only in certain limited cases.

For some Thais, the usual thing is to change the name to ward off bad luck. This can happen without a special reason. Sometimes a person changes a name when someone else with the same name becomes famous or notorious: a remarkable example when a lot of people refused the name Hitler.



9. Names recorded in ASCII.

Explicit delusion, if only because ASCII does not contain accented characters from French, Portuguese names. This character set does not include the Greek alphabet used in Greek names, Cyrillic characters for Russian names. There are scripts like Devanagari for Indian names, Chinese characters (hanzi), Japanese characters (kanji), and much more.



10. Names are written in any one encoding.

Some names have mixed encodings. For example, Kanji with Latin characters or Hanzi with Latin characters, or Korean Hangul with Latin characters. In many cases, this happens because a person has a “Western name” in favor of those who cannot pronounce his name in his native language.



11. All names correspond to Unicode code points.

Unicode developers continue to add code points to more and more rare characters to the standard. The vast majority of the names already correspond to them, but there are still exceptions, such as the symbol "artist, formerly known as Prince . " Even if you exclude such oddities, several scripts have not yet entered Unicode. Perhaps the most realistic example is Aymara, writing for a language spoken by more than a million people in South America. Less realistic examples are the Klingon language or symbols invented by Tolkien for Middle-earth. In addition, Unicode includes only a portion of the Chinese and Japanese characters, and some of the missing characters are used in the names.



To further complicate the situation, in some languages ​​there are no written characters - they can not be written. And for such languages ​​there are no code points in Unicode. Names in these languages ​​can be phonetically spelled, but this is not particularly useful because most people are unfamiliar with the phonetic alphabet.



12. Names are case sensitive.

Many character sets are not case sensitive: for example, Chinese and Japanese. For them, the idea of ​​uppercase and lowercase letters is simply not applicable.



13. Names are not sensitive to case change.

Some scripts are case sensitive: for example, Latin. More importantly, in some languages, characters can be written in lower case, but not in upper case, so it is impossible to translate them from one register to another.



The correct register can be very important for some people, such as carriers of the names Mackenzie and Mackenzie.



In addition, the correct case is important for names such as Van Gogh, du Barry, da Costa, O'Brien and D'Agostino, and names such as Jean-Pierre.



14. Sometimes there are prefixes or suffixes in names, but you can safely ignore them.

Nothing could be further from the truth. The Dutch name Peter van der Meer is not the same as Peter Meer, although van der is a prefix.



You can think of the “junior” as a suffix in the name of Robert Downey Jr., but if you omit it, you will call his father, not him.



In Arabic names, the suffix al-Din means "faith" or "religion" - such names as Taj al-Din ("crown of faith") or Saif al-Din ("sword of religion") will not remain the same if you let the suffix. The Italian name Di Stefano is not the same as Stefano.



The Spanish woman with the surname “Víuda de de la Cruz” is the widow of a man with patronymic de la Cruz. Missing prefixes changes the meaning of the name.



15. Names do not contain numbers.

Even if you ignore the dynasties (for example, Turston Howell III), in some cases the number becomes part of someone's legal name. For example, Jennifer 8 Lee chose the middle name 8, because 8 is associated with luck.



16. Names cannot be written in CAPITAL letters.

In some countries (especially francophone) it is customary to write a person's name in capital letters, so that it is clear which part is the name. This convention is entrenched to such an extent that the spelling of the family name in lower case can be considered impolite.



17. Names cannot be written in lowercase letters.

Poet uh e. Cummings preferred to have his name written in lower case. Like the singer kd lang . Politely follow the spelling that the owner of the name prefers.



There is an Irish / British surname French , which is traditionally written in lower case, although this tradition suffers from bad software, which forces you to specify the first capital.



18. There is order in names. Selecting one of the record ordering schemes will automatically lead to a constant order among all systems if they all use the same ordering scheme.

In the Netherlands, the name of Vincent Van Gogh will be indexed and sorted by letter G as Gog; in Belgium, the same name will be indexed B for Van Gogh. It is impossible to accept a single name system, which will lead to the generally accepted order. In many libraries, the system is based on the place of birth of a person (I would not want this rule applied in software).



19. Name and surname necessarily different.

Australian businessman and politician Benjamin Benjamin died in 1905. Jerome K. Jerome - English writer, famous for his work “Three in a boat, not counting dogs”. Owen Owen is a Welshman who founded Owen Owen Ltd, a company that manages a network of department stores. And we will not even touch athletes and actors who have adopted such pseudonyms.



20. People have a surname or something similar, common to relatives.

In Java, it was customary to give a person only a name without a surname. For example, the presidents of Indonesia Suharto and Sukarno did not have a surname.



21. The name of the person is unique.

Say it to anyone named John Smith! I have a slightly less common name, but I found a person with the same name and surname working in the same industry in the same country (Australia).



22. The name of the person is almost unique.

Even taking into account non-standard spelling, it is usually easy to find people with the same full name: try to google your own.



23. Okay, okay, but the names are quite rare, so there are no million people with the same name and last name.

The Chinese name Zhang Wei is reportedly carried by more than a quarter of a million people.



If we restrict ourselves to surnames, then about 20% of the population of South Korea have the surname Kim. About 10% of the population of North China is Wang, and more than 10% of the population of South China is Chen. In second place, and there, and there is the surname Lee, which makes her most popular in the country. And about 40% of Vietnamese have the name Nguyen.



Names are also far from unique.



24. My system will never deal with names from China.

Migration distributed the names of each culture to (almost) every country. Almost gone are the days when new names were given to immigrants when entering the country (although, for example, Vietnam still requires the applicant for citizenship to adopt a Vietnamese name). It is unrealistic to expect a complete lack of names from other countries, although you can see them in a transliterated form.



So, a Chinese name like 周潤發 may appear on your system as Chow Yun-Fat, or Chow Yun Fat, or even Yun Fat Chow (Chow is the last name).



25. Or Japan.

see above.



26. Or Korea.

see above.



27. Or Ireland, Great Britain, the USA, Spain, Mexico, Brazil, Peru, Sweden, Botswana, South Africa, Trinidad, Haiti, France, the Klingon Empire — all of these use “weird” schemes for names.

see above.



28. The Klingon Empire was a joke, right?

It is hard to find examples of people officially using Klingon names, but why not? If we implement a system with support for other cultures (for example, a built-in apostrophe for O'Brien), then we will be able to support Klingon names without additional work.



29. To hell with cultural relativism! People in my society , at least, have the same idea of ​​a generally accepted standard for names.

Will your software work only with people who have received names in your community?



30. There is an algorithm that converts names to one and the other without loss. (Yes, yes, you can do it, if the algorithm at the output returns the same as at the entrance, take yourself a medal).

There is no algorithm (other than memorizing the original format) that converts the name in a guaranteed reversible way.



31. I can confidently assume that this dictionary of obscene words does not contain surnames.

This is a common mistake: many “bad words” are not bad in other languages, and some are used in names. In addition, not every society limits which words can be used in a name: it is quite possible that someone’s name has been assigned in such a jurisdiction.



32. Names are given to people at birth.

Births are registered in most countries, but the effectiveness of this system is not the same.



Exact rules vary by jurisdiction, but some delay in birth registration is always allowed. The allowable delay varies from three weeks (Scotland) to two months (Australia), but there are more.



The child's name can be recorded at the time of birth registration, but this does not always happen (in some places children are still registered with names like Baby Boy or Baby Girl, when parents have problems with choosing a name or if the child is a foundling, for example).



33. OK, maybe not at birth, but rather soon after it.



34. Okay, okay, for a year or so.



35. Five years?



36. You're kidding, right?

There are cultures in which an adult does not give a person a name until puberty. Before that, the child may have a “milk” or temporary name.



37. Two different systems in which the name of the same person is indicated will use the same name for it.

If this were the case, then there would not be a software market for reconciling different databases.



In my personal case, some systems contained my official name, including the middle name, and others only the name and surname or an abbreviated name and surname. And this is another simple case. My wife in some systems is listed by maiden name, and in others with her husband's name, with or without a full name, with or without a second name, and with either of the two spellings of her short name.



38. Two different data entry operators, if they are given the name of a person, will necessarily enter the same character set, if the system is well designed.

Imagine what happens when a person enters a name that he hears on the phone. For example, Thomson and Thompson; or Johnson, Johnston, Johnstone and Jonsson.



39. People whose names break my system are strange aliens. They should have normal, acceptable names, like 田中 太郎.

No, your system is poorly designed.



In particular, the aforementioned name is often found as the name of a foreigner in the anime (and manga). There were real people with that name.



40. People have names.

For this case, it is perhaps most difficult to cite convincing examples. There was an isolated culture in which no one had any names: they called each other relative terms, such as "my mother's older sister."



Summarize



So, we did it: we found examples (almost) for all forty points from Patrick Mackenzie ’s article “Programmers' Misconceptions about Names” . If you feel some oversupply of information, then let's summarize. Here is the most important thing when developing a system that processes names:





Finally, I highly recommend reading the small name guide in an article from the W3C .

Source: https://habr.com/ru/post/431866/



All Articles