
In my last
article I told that there are quite a few libraries for parsing html, this time I decided to show how you can extract information from the text using regular expressions, where it is impossible to hook onto tags and use the mentioned libraries. Initially, it all started with a small application, but gradually I came up with something new and in the end, it seems to me, it turned out quite interesting.
Under the cut, I will talk about the development process, show examples of work and options for development.
To begin with, I decided to study what is on the market, so I looked into the market, and with the keyword “whois” I downloaded several applications, I immediately realized that my application would be unique in that it would work with IDN domains, i.e. domains with Cyrillic names.
')
Development
The first distinctive feature of my application is the support of Cyrillic domains, and the second is that I will display the “recognized” data. I will clearly know which line contains which data, and not like downloaded applications, where they simply take and parse the block with data output and output it to WebView, so that I understand why I need regular expressions, I will show a piece of the document with information that interests me:

The screenshot shows that the tags are almost absent and no xpath will help here for splitting this text, so it's time to start creating the project and get to work:

This time I decided to choose the minimum version for which I downloaded the SDK, I read later on Habré that 2.1 and 2.2 are the most popular versions now, I’ll say that I am developing for 2.1 (testing on the emulator), and I will use it on 2.2.
This time I paid more attention to the interface, for this reason I can’t upload full xml files of the interface, due to the fact that they turned out to be quite impressive.
The main elements are:
< EditText android:layout_height ="wrap_content"
android:layout_width ="fill_parent"
android:id ="@+id/site"
android:maxLines ="1"
android:text ="" >
</ EditText >
< Button android:text ="@string/get"
android:id ="@+id/getData"
android:layout_width ="fill_parent"
android:layout_height ="wrap_content"
android:background ="@layout/custom_button" >
</ Button >
< ListView android:id ="@+id/whoisList"
android:layout_width ="wrap_content"
android:layout_height ="wrap_content" >
</ ListView >
* This source code was highlighted with Source Code Highlighter .
In EditText we will specify the site, here I indicated android: maxLines = "1", so that several lines could not be entered. Button is used to start the parsing, later it turned out that in order to set simple styles for the button you need a few more gestures than in many PC applications, so I will use a separate file for the android settings: background = "@ layout / custom_button" . ListView is also not as simple as it may seem at first glance, for it I developed a separate row, which consists of two columns.
Interface files:
main.xml - main file
custom_button.xml - button styles
color.xml - colors
row.xml - ListView row
As a result, we will have a left screenshot (although you will have to ask the constant lines yourself or go to the end of the article and download the project):

I combined all the screenshots by two, so that the post does not come out too elongated, so I’ll say right away that this is the result of the ProgressDialog after clicking on the button “Get whois data”. As I wrote above, input of only one line is available, so when you try to press the transfer button on a new line, we will complete the input, which will immediately hide the keyboard:
EditText edittext = (EditText)findViewById(R.id.site);
edittext.setImeOptions(EditorInfo.IME_ACTION_DONE);
But since the user may not hide the keyboard and press the “Get whois data” button, we will hide it programmatically:
InputMethodManager imm = (InputMethodManager)getSystemService(Context.INPUT_METHOD_SERVICE);
imm.hideSoftInputFromWindow(edittext.getWindowToken(), 0);
I also wanted to try calling the messages on the screen:

At our left:
Toast.makeText(MainActivity. this , " ..." , Toast.LENGTH_LONG).show();
And on the right is displayed:
AlertDialog.Builder alertbox = new AlertDialog.Builder(MainActivity. this );
alertbox.setMessage( " " );
alertbox.setNeutralButton( "Ok" , new DialogInterface.OnClickListener() {
public void onClick(DialogInterface arg0, int arg1) {
//
}
});
alertbox.show();
* This source code was highlighted with Source Code Highlighter .
The second message is more critical, so you need to make sure that the user reads it, for this we use AlertDialog.
To transfer the domain, we will use a third-party
library . Using which you can get from the "site. Rf" character set "xn - 80aswg.xn - p1ai".
import gnu.inet.encoding.IDNA;
IDNA.toASCII( "." );
Many will say, why not use java.net.idn, but the documentation says that it is only in 9 api, and we have 7, my test also showed that in 7-8 api import does not work, but in 9 is really all OK and the following
code works fine without any libraries.
Now we go directly to the parsing, as already mentioned, I will use regular expressions:
HashMap< String , String > map;
Pattern p = Pattern.compile( "created:\\s+(\\d{4}\\.\\d{2}\\.\\d{2})\n" , Pattern.CASE_INSENSITIVE);
Matcher matcher = p.matcher(str);
if (matcher.find()) {
map = new HashMap< String , String >();
map.put( "param" , ":" );
map.put( "value" , matcher.group(1));
list.add(map);
}
* This source code was highlighted with Source Code Highlighter .
"Created: \\ s + (\\ d {4} \\. \\ d {2} \\. \\ d {2}) \ n" - the date in the format yyyy.dd.MM or yyyy.MM.dd
"Person: \\ s + (. +) \ N" - any character set between the space and the carriage to a new line
"Phone: \\ s + (\\ + [\\ s \\ d] +) \ n" - + (plus) and spaces with numbers between spaces and the carriage to a new line
"Mail: \\ s + (. +) \ N" - any character set between the space and the carriage to a new line
"Registrar: \\ s + (. +) \ N" - any character set between the space and the carriage to a new line
"Org: \\ s + (. +) \ N" - any character set between the space and the carriage to a new line
In fact, there are only two options - to use the library and get only a block of code where the required data are contained and in addition to use getting an innertext, which in the future will give quite tangible flexibility, since Many registrars add their own tags — these are links to the mailto, a link to the site as in the example, etc. Either run through the regular expressions the entire response from the server (I tried to choose a registrar, whose page is not too heavily overloaded).
After we get the result, you can load it into the ListView:
ListView list = (ListView) findViewById(R.id.whoisList);
SimpleAdapter myAdapter = new SimpleAdapter(MainActivity. this , mylist, R.layout.row,
new String [] { "param" , "value" },
new int [] {R.id.whoisParam, R.id.whoisData});
list.setAdapter(myAdapter);
* This source code was highlighted with Source Code Highlighter .
The result of the work for the two sites will be as follows:

And the last thing I wanted to do was add a menu and another Activity:
Menu myMenu = null ;
@Override
public boolean onCreateOptionsMenu(Menu menu)
{
//call the parent to attach any system level menus
super.onCreateOptionsMenu(menu);
this .myMenu = menu;
int base = Menu.FIRST; // value is 1
MenuItem item = menu.add( base , base , base , "" );
item.setIcon(R.drawable.help);
//it must return true to show the menu
//if it is false menu won't show
return true ;
}
@Override
public boolean onOptionsItemSelected(MenuItem item) {
if (item.getItemId() == 1) {
Intent intent = new Intent(MainActivity. this , Help. class );
startActivity(intent);
}
//should return true if the menu item
//is handled
return true ;
}
* This source code was highlighted with Source Code Highlighter .
Here I create a menu and do the processing of pressing the menu button, after which I will see the next picture (on the left, the form after calling up the menu, and on the right after pressing the button). To close, simply call finish () in the second Activity.

The only thing that I almost forgot is the manifest file, where you definitely need to mention the use of the Internet and the existence of a new Activity. The finished project is as follows:

Download link:
project (
mirror )
Total
It seems to me that a rather interesting application has turned out, this time the development options are just a lot. In the future, this application can become a rather interesting seo tool; you can add sites for other domain zones, get PageRank, TIC, save the results in the database. In general, the idea seems to me you can endlessly develop, although I agree with those who decide to write a web service, and the application for Android will be used as a client to get data from a single site, which of course will save traffic and most likely will work faster.