⬆️ ⬇️

Java vs go

Recently, the Go language has become very much discussed, and quite often it is compared with Java. Go Week brought us a very interesting introductory article Dreadd , and I wondered how to cope with the described Java task.

As the code was written, it became clear that Java also has a lot of interesting, but little coverage in the press. I tried to use the most interesting innovations from Java7, I hope that both beginners and experienced, but lazy Java developers will find useful information here.



Task


The task was taken without changes, and we will try to solve it as close as possible to the original. In the same way, we will have several data reading streams, one save stream, timer notification and on program closing. We will receive the parameters from the command line at startup.

The original formulation of the problem
... urgently, under the cover of darkness, download a full dump of all quotes on moderation [ http://vpustotu.ru/moderation/ ] for further secret research ...



Thus, you need a program that:

  1. Must consistently update and parse (parse) the page, writing down the quote.
  2. Must be able to discard duplicates.
  3. Must stop not only on command, but also to achieve a certain number of "repetitions", for example 500!
  4. Since this will most likely take some time: you must be able to continue “from the place where you left off” after closing.
  5. Well, since all the same it is a long time - let him do his dirty business in several streams. Well, in as many as 4 threads (or even 5!).
  6. And reports on the success in the console every, say, 10 seconds.
  7. And let them take all these parameters from the command line arguments!




Command line parameters


Let's start, as in the original article, from the beginning, i.e. from parsing the parameters. There is no standard library for this purpose in Java, but there are third-party ones for every taste. I like jcommander. The decision, as they say, “java way”.

private static class CommandLine { @Parameter(names = "-h", help = true) boolean help; @Parameter(names = "-w", description = " ") int workers = 2; @Parameter(names = "-r", description = "  ()") int reportPeriod = 10; @Parameter(names = "-d", description = "-   ") int dupToStop = 500; @Parameter(names = "-hf", description = " ") String hashFile = "hash.bin"; @Parameter(names = "-qf", description = " ") String quotesFile = "quotes.txt"; } ... CommandLine commandLine = new CommandLine(); //    JCommander commander = new JCommander(commandLine, args); //    if (commandLine.help) commander.usage(); //    , ... 


Same on go
 var ( WORKERS int = 2 //- "" REPORT_PERIOD int = 10 //  () DUP_TO_STOP int = 500 //    HASH_FILE string = "hash.bin" //   QUOTES_FILE string = "quotes.txt" //   used map[string]bool = make(map[string]bool) //map        ,    -  . ) func init() { //  : flag.IntVar(&WORKERS, "w", WORKERS, " ") flag.IntVar(&REPORT_PERIOD, "r", REPORT_PERIOD, "  ()") flag.IntVar(&DUP_TO_STOP, "d", DUP_TO_STOP, "-   ") flag.StringVar(&HASH_FILE, "hf", HASH_FILE, " ") flag.StringVar("ES_FILE, "qf", QUOTES_FILE, " ") //    flag.Parse() } 


Annotations make any code better.



Channels


In Go, channels were used to send quotes; in Java, we take the closest analogue - BlockingQueue:

 BlockingQueue<String> queue = new ArrayBlockingQueue<>(10); 


We will not be able to read from multiple queues in a single thread. But we have other buns, for example, we can limit the length of the queue, if we do not have time to clean it.

Gorutin we do not have, but there is Runnable. Of course it's unpleasant to create an object for the sake of one method, but this is a matter of principle.

 new Thread(new Grabber()).start(); 


Yes, quite verbose, you can not argue, but this is not the limit.

 Thread worker = new Thread(new Grabber()); worker.setPriority(2); worker.setDaemon(true); worker.start(); 


Now it is really verbose. But this is a fee for additional features, such as indicating the priority of a thread.

')

HTML parsing


As for the method being called, this is better.

 public class Grabber implements Runnable{ ... public void run() { try { while (true) { //     Document doc = Jsoup.connect("http://vpustotu.ru/moderation/").get(); Element element = doc.getElementsByClass("fi_text").first(); if (element != null){ queue.put(element.text()); //     } } } catch (IOException | InterruptedException e) { e.printStackTrace(); } } 


Same on go
 func() { for { //     x, err := goquery.ParseUrl("http://vpustotu.ru/moderation/") if err == nil { if s := strings.TrimSpace(x.Find(".fi_text").Text()); s != "" { c <- s //     } } time.Sleep(100 * time.Millisecond) } } 


In principle, the contents of the method are similar. For parsing HTML, an external library is also used, a rather nice jsoup. Anything more convenient than the built-in swing.

Many of them make Java cumbersome for cumbersome exception handling, but using if err == nil in Go is just awful. And in Java, you can refuse processing, which we will use in the following example.



Work with files


Working with files is also quite similar. Pay attention to the new non-blocking classes for working with files from Java7. You can find from in java.nio, the use almost coincides with the analog in Go:

 //    InputStream hashStream = Files.newInputStream(Paths.get(commandLine.hashFile) //    OutputStream hashFile = Files.newOutputStream(Paths.get(commandLine.hashFile), CREATE, APPEND, WRITE); 


Same on go
 //    hash_file, err := os.OpenFile(HASH_FILE, os.O_RDONLY, 0666) //    hash_file, err := os.OpenFile(HASH_FILE, os.O_APPEND|os.O_CREATE, 0666) 


In Java, as I promised, you can refuse to explicitly handle errors.

 public static void main(String[] args) throws IOException 




try-resource


I really liked the defer statement in Go, who tried to close the stream in finally should evaluate. But fortunately, we are saved, and the try-resource construct has been added to Java7.

 try ( OutputStream hashFile = Files.newOutputStream(Paths.get(commandLine.hashFile), CREATE, APPEND, WRITE); InputStream hashStream = Files.newInputStream(Paths.get(commandLine.hashFile)); BufferedWriter quotesFile = Files.newBufferedWriter(Paths.get(commandLine.quotesFile), Charset.forName("UTF8"), CREATE, APPEND, WRITE);) { ... } 


The objects mentioned in parentheses must implement the java.lang.AutoCloseable interface, and they will be closed at the end of the try block. Yes, much like a crutch, but no less convenient than defer.



Hash comparison


Separately, you can pay attention to the transfer of an array of bytes into a string.

 Hex.encodeHexString(hash); 


There is no such method in the standard library; for the sake of identity to the original, I used the apache library commons codec. But you could write one method yourself.

 static String encodeHexString(byte[] a) { StringBuilder sb = new StringBuilder(); for (byte b : a) sb.append(String.format("%02x", b & 0xff)); return sb.toString(); } 


In fact, it is not needed here, because it does not matter in what encoding to save an array of bytes, it could be UTF-16 for example.

 new String(hash, "UTF16"); 


Or you may not need to encode at all; you need only the Comparator to compare arrays. For example such.

 static Set<byte[]> hashes = new TreeSet<>(new Comparator<byte[]>() { public int compare(byte[] a1, byte[] a2) { int result = a1.length - a2.length; if (result == 0){ for (int i = 0; i < a1.length; i++){ result = a1[i] - a2[i]; if (result != 0) break; } } return result; }; }); 




Auxiliary flows


The main thread is already managed by a quotation queue, which means that alerts should work in their own threads. In addition to this point, there are almost no differences in the code.

The closing process with shuldownHook'a.

 Runtime.getRuntime().addShutdownHook(new Thread() { public void run() { System.out.printf(" .  : " + hashes.size()); } }); 


Take a timer from the swing.

 new Timer(commandLine.reportPeriod * 1000, new ActionListener() { @Override public void actionPerformed(ActionEvent arg0) { System.out.printf(" %d /  %d (%d /) \n", hashes.size(), dupCount, quotesCount/commandLine.reportPeriod); quotesCount = 0; } }).start(); 


In order to have access to dupCount and quotesCount, they had to be taken out of the method into the class attributes, but this did not affect the work with them.



Find the full code here:

http://pastebin.com/pLLVxTXZ



Conclusion


Interestingly, the volume of programs in the lines was about the same. Readability, in my opinion, is also similar, but this can only be assessed from the outside. In one language, something is done more conveniently, in another - another, and I cannot single out any language unequivocally. But this is a fairly small entry-level application, and it would be interesting to compare languages ​​and approaches in large-scale Enterprise solutions.



Thanks for attention.

Source: https://habr.com/ru/post/197926/



All Articles