📜 ⬆️ ⬇️

What do not need to code yourself

Recently I wrote my bike and laid it out on a habr. Here it is: "The Simplest Connection pool without a DataSource in Java . " The article is not the most successful, but please do not minus more. So, in order not to repeat such mistakes myself and, perhaps, to warn someone against such mistakes, I decided to translate the article “Seven Things You Should Never Code Yourself” well-known in the open-source environment of the IT-industry leader Andy Lester . So, anyone interested, please under the cat.

We, programmers, love to solve problems. We love it when ideas arise in our heads, are redirected to our fingers and thus create great solutions.

But sometimes we jump too fast and start turning our code without taking into account all the consequences to which this may lead. We do not take into account that someone may have already solved this problem, and that there is already code available for use that was written, tested and traded by someone else. Sometimes we just need to stop and think before we start typing.

For example, if you come across one of these seven programming tasks, then it is almost always better for you to search for an existing solution than to try to implement something yourself:
')

1. Parsing HTML or XML


A task whose complexity is often neglected, at least based on the number of times asked about it on StackOverflow, is parsing HTML or XML. Extracting data from arbitrary HTML looks deceptively simple, but in fact this problem should be solved by using libraries. Let's say you want to extract a URL from a tag such as

<img src="foo.jpg"> 

This is actually a simple regular expression that matches the pattern.

 /<img src="(.+?)">/ 

The string “” will be displayed in the pattern search results and it can be assigned to a string variable. But will such a code find the desired values ​​in tags that have other attributes:

 <img id="bar" src="foo.jpg"> 

After changing the code so that it handles such cases, will it work if the quotes have a different look:

 <img src='foo.jpg'> 

or quotes will not be at all:

 <img src=foo.jpg> 

What to do if the tag takes several lines and is self-closing:

 <img id="bar" src="foo.jpg" /> 

And will your code know whether to ignore commented tags:

 <!-- <img src="foo.jpg"> --> 

By the time you do another cycle in search of cases that your code cannot deal with, while correcting and testing your code, you could already use the necessary library and solve all your problems.

I gave you a vivid history with examples: you will spend much less time searching for the existing library and studying it than trying to write your own bike, which you will then have to expand to work in cases that you didn’t think when you started write it.

2. Parsing CSV and JSON


CSV files are deceptively simple, but they pose some danger. Files with comma-separated values ​​are trivial to parse, aren't they?

# Id, name, city
1, Queen Elizabeth II, London

Of course, as long as you don’t have to deal with commas enclosed in double quotes:

2, JR Ewing, "Dallas, Texas"

If you have solved the problem with the use of such double quotes, what would happen if the string contains embedded quotes that should be skipped:

3, "Larry \"Bud\" Melman", "New York, New York"

You can deal with this until you have to deal with line breaks in the middle of a record.

JSON has the same dangers associated with data types as CSV, with the additional problem arising from the ability to store multi-level data structures.

Save yourself from troubles and inaccuracies. Any data that cannot be processed by splitting a string by commas should be processed by the library.

If reading structured data with an unstructured method is considered bad practice, then the idea of ​​changing the data in place is even worse. People often say something like “I want to change all tags with such and such URLs so that they have a new attribute.” But even such a seemingly simple thing as “I want to change in every fifth field in This CSV name Bob on Steve is fraught with danger because, as noted above, you will not be able to read commas properly. For everything to be correct, you need to read the data with the help of a competent library into the internal structure, change the data, and then write the changed data back using the same library. Nothing represents such a risk of data corruption as if their structure does not meet your expectations.

3. Verify Email Addresses


There are two ways to verify your email address. You can check in a simple way, saying, “I need to have some characters before the @ sign, and then some characters after it,” this idea is implemented by a regular expression:

 /.+@.+/ 

It is, of course, not complete, and admits the presence of invalid elements, but at least we have the @ sign in the middle.

Or you can check for compliance with RFC 822 rules. These rules cover all cases that are rare, but still acceptable. A simple regular expression does not produce such a slice. You will have to use a library written by someone else.

If you are not going to check for compliance with RFC 822, then everything you do will be using rules that may seem reasonable, but may not be right. This approach is a compromise, but do not be fooled into thinking that you covered all cases, if you did not turn to the RFC in the end, or simply use a library written by someone else.

(For further discussion on validating email addresses, see Stackoverflow )

4. Work with URL


URLs are not as nasty as email addresses, but they are still full of annoying little rules that you should remember. What characters should be encoded? How do you handle spaces? What about the + signs? What characters can follow the # sign?

Regardless of the language you use, there is code to split URLs into components and to build URLs from properly designed components.

5. Work with date / time


Date / time manipulations are a major problem in which you most likely will not be able to cover all aspects on your own. When processing the date / time, time zones, summertime, leap years, and even leap seconds should be taken into account. There are only four time zones in the United States, and they differ by an hour. In the rest of the world is not so simple.

Whether for arithmetic with dates, which is reduced to calculating a date that will come three days from a certain date, or to validate an input string to match the date format, use existing libraries.

6. Template systems


It is almost a rite of passage. The junior programmer has to create a huge amount of sample text and comes up with some simple format like:

Dear # user #,
Thank you for your interest in #product # ...

This format works for some time, but then everything ends up with the need to add output formats, numerical formatting, output of structured data to a table, etc. until there is a monster that requires endless care and feeding.

If you are doing something more complicated than just replacing a string with a string, take a step back and find a good template library. It is even simpler if you write in PHP, the language itself in this case is a template system (although this is often forgotten today).

7. Logging frameworks


Logging tools are another example of projects that start small and grow into monsters. A small function intended for logging to a file may soon require logging into several files, or sending an email at the end of the process, or that it supports logging levels, etc. Regardless of the language you use, there are at least three logging packages that have been used for years and will save you from the problems described above.

Isn't library overkill?


Before you treat with disdain or contempt the idea of ​​connecting a third-party module, you should pay close attention to your protests and objections. The first objection is usually: "Why do I need a whole library just to do this (check this date / parse this HTML / etc ..)," My answer is: "What's wrong with that?" All in all, do not write the microcontroller code for the toaster, where you must squeeze every byte of space for the code.

If you have speed limits, please note that avoiding the use of the library may be a premature optimization. Loading a whole library for working with a date / time can make validation 10 times slower than your decision on your knees, but check your code, is it really that good?

We programmers are proud of our skills, and we love the code creation process. This is normal. Just remember that your duty as a programmer is not just to write code, but to solve problems, and often the best way to solve a problem is to write as little code as possible.

Translator's Note:
By the way, the last paragraph very harmoniously echoes the main idea from the article “How to improve your programming style?” .
UPD1. The list of tools by main programming languages, divided into categories: awesome-awesomeness (link provided by hell0w0rd in the comments, special thanks to him).

Source: https://habr.com/ru/post/230737/


All Articles