📜 ⬆️ ⬇️

MSLibrary. Capture and verify phone numbers using regular expressions, for iOS and not only ... Part 1

The reason for the emergence of a series of articles, the first of which was presented to your attention, was a great analytical and practical material that has accumulated in the process of working on the MSLibrary for iOS library. The MSLibrary library includes many classes, and even more functions and macros designed to simplify the routine work of developers, significantly reduce development time and code size. But, everything has its time, we will tell about the library a little later.

So, capture and verification of phone numbers using regular expressions. It would seem, what is there to talk about? Who knows how, he writes himself, and who does not know how to copy one of the many ready-made solutions, scattered in the vast World Wide Web. The only question is what he will write and what will be copied and how will this code correspond to the tasks set, as well as to the current international, industry and corporate standards? Any solution, even the simplest, is good only if the developer is fully aware of his work and is absolutely sure of it.

Any solution, even the simplest, is good only if the developer is fully aware of his work and is absolutely sure of it.


In fact, the topic of the article is divided into two already in the title. Capture is the ability to understand that a set of digits and characters to be tracked can be a telephone number. And verification is a test of whether this set meets certain pre-set conditions. It is easy to verify that these tasks are different by looking at the above example:

 + 1 (408) - 996 - 10 - 10 = 1234
 +14089961010; ext = 1234

In the first case, someone tried to record the phone in the manner in which he was used to doing it, and in the second, the phone was recorded in accordance with the international standard RFC 3966 . The problem is that if we try to use both of these entries to dial a phone number, for example, in an iOS application, then, unfortunately, we will not get anything good. In the first case, the system will not understand anything at all, and in the second, instead of the extension number “1234”, the system will dial completely different numbers (this is a very convincing experiment, you can try. The code is given below).
')
A simple code for a phone call from an iOS application.
NSString *telString =@"tel:+14089961010;ext=1234"; NSURL *urlString = [NSURL URLWithString: telString ]; [[UIApplication sharedApplication] openURL:urlString]; 


In essence, both tasks (capture and verification) are solved by the same methods, the only difference is in the applied regular expressions.

The first step is to look at what is written in RFC 3966 , which governs this issue. And it is written there in a highly simplified form the following:

Simplified structure of telephone-uri in accordance with RFC 3966


  telephone-uri = "tel:" telephone-subscriber 

  •   telephone-subscriber = global-number 
  •   global-number = global-number-digits * par 
  •   par = extension |  isdn-subaddress |  parameter 
  •   isdn-subaddress = "; isub =" 1 * uric 
  •   extension = "; ext =" 1 * phonedigit 
  •   global-number-digits = "+" * phonedigit DIGIT * phonedigit 
  •   parameter = ";"  pname ["=" pvalue] 
  •   pname = 1 * (alphanum | "-") 
  •   pvalue = 1 * paramchar 
  •   paramchar = param-unreserved |  unreserved |  pct-encoded 
  •   unreserved = alphanum |  mark 
  •   mark = "-" |  "_" |  "."  |  "!"  |  "~" |  "*" "'" |  "(" | ")" 
  •   param-unreserved = "[" |  "]" |  "/" |  ":" |  "&" |  "+" |  "$" 
  •   phonedigit = DIGIT |  [visual-separator] 
  •   visual-separator = "-" |  "."  |  "(" | ")" 
  •   alphanum = ALPHA |  Digit 
  •   reserved = ";"  |  "/" |  "?"  |  ":" |  "@" |  "&" |  "=" |  "+" |  "$" |  "," 
  •   uric = reserved |  unreserved 

where ALPHA and DIGIT, as follows from another document, are RFC 2396 :
  •   DIGIT = "0" |  "1" |  "2" |  "3" |  "4" |  "5" |  "6" |  "7" |  "8" |  "9" 
  •   ALPHA = lowalpha |  upalpha 
  •   lowalpha = "a" |  "b" |  "c" |  "d" |  "e" |  "f" |  "g" |  "h" |  "i" |  "j" |  "k" |  "l" |  "m" |  "n" |  "o" |  "p" |  "q" |  "r" |  "s" |  "t" |  "u" |  "v" |  "w" |  "x" |  "y" |  "z" 
  •   upalpha = "A" |  "B" |  "C" |  "D" |  "E" |  "F" |  "G" |  "H" |  "I" |  "J" |  "K" |  "L" |  "M" |  "N" |  "O" |  "P" |  "Q" |  "R" |  "S" |  "T" |  "U" |  "V" |  "W" |  "X" |  "Y" |  "Z" 

Schematically telephone-uri can be represented as follows:

 telephone-uri =    [  ] 
rice one

From RFC 3966, it follows that:


Where:


Since in real life, when working with mobile applications, in particular with iOS applications, and with most sites, there is only an extension telephone number, the telephone-uri scheme will change as follows:

 telephone-uri = global-number-digits [extension] 
rice 2

or by substituting the corresponding values ​​for global-number-digits and extension:

 telephone-uri = "+" *phonedigit DIGIT *phonedigit [";ext=" 1*phonedigit] 
rice 3

where phonedigit consists of the digits "[0-9]" and a limited range of visual delimiters

  "-" |  "."  |  "(" | ")" 


Here, it would seem, it is possible to proceed to the consideration of regular expressions implementing this scheme. but ... Many authors rightly note that not any set of numbers is a telephone number. There is a generally accepted international practice that defines the structure of telephone numbers. Yes, there really is, and we will consider it briefly now.

International structure of telephone numbers


In accordance with existing practice, the following groups can be distinguished, which together make up the telephone number of the global-digits number:

global-number-digits


extension

In this case, the structure of the phone number will look like this:

 telephonNumber = "+" country_code [visual-separator] area_code [visual-separator] exchange [visual-separator] subscriber_number [visual-separator] [";ext=" extension] 
rice four

Thus, if we implement a regular expression that takes into account the standard RFC 3966 and the International Telephone Number Structure, in other words, it corresponds to the diagrams shown in Fig. 3 and fig. 4, the following telephone number entries are quite valid:

 +14089961010; ext = 1234
 +1 (408) 996-1010; ext = 1234
 + 1-408-996-10-10; ext = 1234
 +1.408.996.1010; ext = 1234


and the following will be valid only on the basis of the RFC 3966 standard, since the visual separators (visual separator) are outside the structure defined by the International Standards for Telephone Numbers:

 +1 (4089) 96-1010; ext = 1234
 + 1-408996-10-10; ext = 1234
 +1.408996.1010; ext = 1234


In real life, the user can dial the phone number either according to the pattern specified in the application or on the site, or as he thinks is correct (that is, as he pleases). And the application, in turn, will process the resulting string in accordance with the standards laid down in it. What is not always the same.

In real life, the user can dial the phone number either according to the pattern specified in the application or on the site, or as he thinks is correct (that is, as he pleases). And the application, in turn, will process the resulting string in accordance with the standards laid down in it. What is not always the same.



Corporate WEB telephone dialing standards


Why web? Because in iOS, and practically in all other systems, and, of course, on websites, the easiest way to make a phone call is to use a well-known html scheme:

 <a href="tel:1-408-996-1010">1-408-996-1010</a> 
rice five

At the beginning of the article, a sample of the code for implementing this scheme in Objective-C has already been given, however, for a more elegant presentation, it is worth repeating:

 NSString *telString =@"tel:+14089961010"; NSURL *urlString = [NSURL URLWithString: telString ]; [[UIApplication sharedApplication] openURL:urlString]; 
rice 6

Let us turn to corporate standards:

What do Google experts say about this?
Always supply the phone number using the international dialing format: the plus sign (+), country code, area code and number. If you’re not absolutely necessary, it’s a good idea.

That is, it is necessary to put the "+" sign in front of the country code and "it may be a good idea" to separate the segments (groups) of the telephone number with visual separators in the form of hyphens "-". Which, in general, is in complete agreement with RFC 3966 and the International Telephone Numbers Structure, in other words, corresponds to the diagrams shown in Fig. 3 and fig. 4. However, there is one significant BUT. Visual separators are limited to one character, a hyphen "-". This of course does not mean that the browser will inadequately respond to brackets or dots as visual separators, but this requires careful verification, the Google document guarantees only a hyphen. In addition, this section of the manual does not say anything about the format of the extension. Since the article is mainly devoted to iOS applications, we will not delve into the specifics of the work of Google software, you can experiment and tell about the results.

Apple is even more laconic, in the Phone Links section, about the permissible format of a phone number, we find this phrase:
For more information about the URL URL scheme, see RFC 2806 and RFC 2396.

Formally, everything is correct, why repeat, if there are international standards? But the fact is that Apple’s corporate standards, as well as Google’s, only meet parts of these international standards, which, by the way, are often advisory in nature.

Corporate standards correspond only to parts of international standards, which, by the way, are often advisory in nature.


What can and can not be used in the phone number for the iOS application


Experiments have shown that the system responds adequately to all four types of visual separators regulated in RFC 3966, namely:


Moreover, the separators can be located in arbitrary locations and, moreover, can be unpaired. As an example, the following phone number entries will be processed by the code shown in Fig. 6, equally and adequately:

 +14089961010
 +1(408)996-1010
 + 1-408-996-10-10
 +1.408.996.1010
 +1 (4089) 96-1010
 + 1-408996-10-10
 +1.408996.1010
 + 1-4 (0899 (6-10-10
 +1.40) 8996.10 (10


The situation is different with extension dialing. As mentioned at the beginning of the article, the system does not respond correctly to the prefix of the additional code "; ext =" proposed in RFC 3966. Native, that is, natural for the system are two characters: ";" and ",".

In the first case, when the system encounters a separator ";" in the telephone number, the dialing stops and the extension numbers appear on the screen. When you click on them, the set continues.

In the second case, when the system encounters a separator "," in the phone number, dialing stops at this place and automatically continues after a short pause of approximately 2 seconds. Visual separator "," may not be single, for example, ",,,,". In this case, the length of the pause increases in proportion to the number of characters.

In the case of a separator of the form "; ext =", regulated by RFC 3966, the following occurs: the system accepts the ";" for the breakpoint before dialing the extension number, and interprets the ext characters as digits from which the extension number begins.



So, we have considered the theoretical and practical prerequisites and can go to the second part of the article, that is, in fact, to regular expressions for capturing and verifying telephone numbers.

We hope that the material was useful to you, the MSLibrary for iOS team

Other articles:
Capturing and verifying phone numbers using regular expressions, for iOS and not only ... Part 2
Implementing multiple selection of conditions using bitmasks, for iOS and not only ...
SIMPLE: remove unnecessary characters from the string, for iOS and not only ...
Creating and compiling cross-platform (universal) libraries in Xcode

Source: https://habr.com/ru/post/278345/


All Articles