📜 ⬆️ ⬇️

Parsing the address bar (street [house]) using Golang and Postgis

Hi,% habrauser%.
I faced the other day with an interesting task - the user enters a string that can be a street with a house, just a street or not at all, but we need to find out if he meant the street with the house and tell him the appropriate one.
- It would seem that something easier - break the line in space and enjoy - thought Stirlitz
- What about the street of Pavel Korchagin - whispered bird Oblomingo
- Um, well, the house number is probably a number - said Stirlitz
- Yeah, building 1 - a good number
- Mdja, it is necessary to invent a bicycle


And uncovered Stirits plyusomet Golang, yes Postgis loaded into it ...


And so, what we have is a random user input device , a certain line and the urgent need to perform a certain action depending on whether the user entered the street with the house

const MARK_STEP = 20 func AnalyzeString(str string) (result int, street, house string) { result = 100 LastSpace := strings.LastIndex(str, " ") if LastSpace < 1 { result = 0 street = str return result, street, house } if LastSpace < (len([]rune(str)) - 6) { result -= MARK_STEP } else { result += MARK_STEP } street = str[:LastSpace] house = str[LastSpace+1:] if models.StreetCount(street) > 0 { result += MARK_STEP * 2 } else { result -= MARK_STEP * 2 } if models.StreetCount(str) > 0 { result -= MARK_STEP } else { result += MARK_STEP } if models.HouseCount(street, house) > 0 { result += MARK_STEP } else { result -= MARK_STEP * 4 } var int_count, char_count uint8 for _, run := range []rune(house) { if (run > 47) && (run < 58) { int_count++ } else { char_count++ } } switch { case char_count == 0: { result += MARK_STEP * 3 } case int_count == 0: { result -= MARK_STEP * 4 } case int_count == char_count: { result += MARK_STEP } case int_count > char_count: { result += MARK_STEP * 2 } case char_count > int_count: { result -= MARK_STEP } } return result, street, house } 


And so, what is this function and what does it do?
The function accepts the input string entered by the user and analyzes it returning the probability that it is a street with a house, a separate street and a separate house.
If the probability is more than 200 - you can be sure - the user had in mind the street with the house.
')
You probably noticed the calls of StreetCount (street) and HouseCount (street, house)
in principle, behind them lies two banal SQL queries

 rows, err := DB.Query("SELECT COUNT(*) FROM planet_osm_line WHERE highway <> '' AND name ILIKE $1 ", "%"+name+"%") 

and
 rows, err := DB.Query("SELECT COUNT(house.*) FROM planet_osm_polygon AS house WHERE \"addr:street\" ILIKE $1 AND \"addr:housenumber\" ILIKE $2", "%"+streetName+"%", "%"+houseNum+"%") 

respectively

And so, now in order
The starting probability is 100, we break the line by the last space, if it did not work out (there is no space) - then to hell this is not a street with a house.
If it works, then we look at how many characters are left after the space, if it is less than 6 (this number, like many others, is selected according to the method of Professor Tyka Sky Finger), then it is worth increasing the probability, and if it is greater or equal, reduce it.

Already somehow changing the probability, or even coming out of the function of crying to the garbage collector, we continue our epic (well, or do not continue, Uncle Garbazh does not like to let go).
All that to the last gap we consider the street, and the fact that after - the house.
Then we ask Postgis to see how many streets it looks like our user entered, if there are any, then we increase the probability, if not, we decrease (no, we don’t exit, because it is quite possible that the streets are not yet in the database).
Now we will try to search the streets in the baseline for the initial line, if there is, then it is worth holding the horses and lowering the probability, and if not, then we can beat at a gallop.
Repeat the same operation with houses, ask Postgis about whether there are houses with similar numbers, standing on similar streets.

All right, more base we do not need.
Now we expand the string into characters and calculate how many numbers we have, and how many letters and other characters, I think the corresponding construction of switch - case is not necessary to explain.

This is not how tricky Makarov Stirlitz fulfilled the next task of the bet, defeated the bird Oblomingo, saved the dragon from the princess, replacing Mario
Thank you all, everyone is free, and I, Stirlitz, will go and read on to Habra.

Source: https://habr.com/ru/post/225481/


All Articles