📜 ⬆️ ⬇️

From satellite images to graphs (the competition of the SpaceNet Road Detector) - hit the top 10 and code (translation)

Hi, Habr! I present you the translation of the article .



This is a Vegas with the markup provided, a test dataset, and probably white squares are deferred validation (privat). Looks cool. True, this panorama is the best of all four cities, so it was because of the data, but more on that below.


0. TLDR


Link to the competition and a detailed description .


Quick picture of the site, too lazy to walk.


We finished preliminary on the 9th place , but the position may change after additional testing of submitters by the organizers.


I also spent some time writing good, readable code on PyTorch and data generators. It can be used without shyness for your own purposes (just put a plus sign). The code is as simple and modular as possible, plus read on about best practices for semantic segmentation.


In addition, it is possible that we will write a post about understanding and analyzing the Skeleton Network , which all the finalists in the top competition used to convert the image mask into a graph.


The essence of the competition
The essence of the competition in 1 picture


1. Competition


As you may know from my past entries, we sometimes participate in competitions ( 1 and 2 ) and have a specific set of criteria for choosing such competitions - firstly, a decent level of competition (usually well-drawn contest on the pin attracts hordes of stackers and are not very difficult ), secondly, our common interest in the topic, and thirdly, the challenge.


In this competition were all the criteria:



My short opinion on current popular data science platforms:
  • Kaggle is a bunch of stackers, and mostly business oriented tasks, at least in 2017.
  • TopCoder - sometimes very interesting and challenging niche tasks, but in general the platform is scary and the requirements for participants are high.
  • DrivenData is the smallest, but the best. Interesting areas, excellent organizers who do their job and do not require too much from you.

2. Quickly about the initial data analysis and interesting boundary cases.




It can be seen that the land and asphalt do not actually have sharp peaks on the reflectivity graph. So I spent some time testing a combination of different channels.


You can see the input data in the image above, but there are a couple of interesting points and noteworthy marginal cases that explain what actually happens.



Histogram of the original 16-bit images



8-bit histogram, some information was lost



Asphalt roads versus unpaved



An example of asphalt roads together with unsightly plus intersections



Combination of lanes - notice a few 12 lane roads - at border checkpoints, I suppose



Typical layout in Paris is not very good.


3. Useful reading


Here is a list of the best materials on the topic that I learned during the competition.



To summarize all the architectures and experiments, something like this will come out:




Unet + transfer learning



LinkNet



Some scientists still use MSE ...


4. Initial versions of architectures and kosyachny cases


The points on the leaderboard started from 0 to almost 900k for approximately 900 test images, each image was assigned 0 - 1. The final metric was considered as the average between cities, which was not very good, as Paris lowered the speed down very significantly.



Distribution of scores for tiles for some of the most effective models. In Paris, a lot of vegetation and suburban tiles with poor markings that affect the speed. In addition, Shanghai and Khartoum have higher discrepancies between the tp and pt estimates, that is, it is much easier for the model to make sure that all the edges of the graph are true, but it is much more difficult to find ALL the edges of the graph that exist in nature


I tried a lot of ideas and architectures, and to my surprise, the simplest and most naive model worked best of all, or at least as good as the others (all other participants also used it with variations).


Loss function


  1. BCE + DICE / BCE + 1 - DICE - proved to be more or less the same.
  2. Clipping loss functions here proved to be terrible
  3. N BCE + DICE, BCE + N DICE - did not work in my case. Then the colleagues from the chat suggested that it is necessary to monitor not DICE, but hard DICE (in fact, the percentage of guessing pixels) and it goes well 4 * BCE + DICE, taking into account the fact that you need to take the best era on hard DICE;

CNN


  1. LinkNet34 (Resnet34 + Decoder) - was the best in speed and accuracy;
  2. Linknet50, LinkNext101 (ResNeXt + LinkNet), VGG11-Resnet - all showed themselves the same, but required 2-4 times more resources;
  3. All encoders were pre-trained on ImageNet, essno;
  4. For the 8-channel grid, I just replaced the first layer, but it behaved more or less like the 3-channel grid (but ate more CPU resources);

Treatment


  1. Mask binarization;
  2. I did an ablation analysis of all reasonable combinations of channels - the best were vegetation, rgb, 8 channels, urban (see what they corresponded to in the /src/presets.py code) - but the difference was minimal (3-5%);
  3. Naturally HD images performed better;
  4. 8 and 16-bit images also showed themselves to be more or less the same;
  5. Image normalization - Imagenet’s average and standard deviation extraction yielded a + 2-3% decrease in loss function, but there wasn’t much difference on the leaderboard;

Masks


  1. Mask from apls turnips. 10% of masks were not created, because the organizers code did not contain error handling and there were no roads. At first I didn’t sweat or include them in the training session, but then probably it was my mistake, although I couldn’t check it out normally;
  2. Just loading lines into skimage and finishing about 10% of empty images gave a reduction in loss, but the effect on the final result was unclear;
  3. The same thing was given the masks, where the width of the road corresponded to its real width - like the loss converges better, but the graphs are no better. Probably hard DICE would solve this problem;
  4. Models for unpaved roads worked much worse because of poor data;
  5. Multilayer masks also did not work (1 layer for intersections, 1 layer for dirt roads, 1 layer for asphalt roads);

Metamodel


Siamese networks also did not work to refine the predictions of models on "wide" and "narrow" masks. What is interesting is that the predictions of models with “wide” masks looked much better, but the graphs turned out worse.


Pipelines


  1. 3 or 8 channel images showed themselves to be more or less the same;
  2. Turns by 90 degrees, horizontal and vertical coups, small affine transformations reduced the loss by another 15-20%;
  3. My best model was at full resolution, but the random 800x800 crop worked out well too;

Ensemble


3x fold + 4x TTA did not give any gain in my case.


Still any ideas that apparently gave a boost to my competitors LB


  1. LinkNet34 (LinkNet + Resnet) => Inception encoder - + 20-30k (I wanted to try Inception Resnet, but after a series of failures I despaired and did not try);
  2. On the train - training at full resolution. In inference: downsize => predict => upsize => ensemble - + 10-20k;
  3. Post processing of graphs and bringing them to the edges of tiles - + 20-30k;
  4. RGB images (as opposed to MUL) - + 10-15k due to the fact that the original data were inconsistent;
  5. 4 BCE + 1 DICE, monitoring hard DICE as a metric - + 10-30k.


The main cant. No one has solved this problem, as far as I know. Wide masks would probably decide - but the graphs from them were not very good.



The main reason why wide masks kosyachat - they produce polyps and octopuses =)



Same problem



Paris is a bad markup. Forest areas - also kosyachny case



Multi-level roads - predicting a wide versus narrow model



The model came up with the road over the parking =)



Sometimes PyTorch was buggy and gave birth to such artifacts.


5. Construction of graphs and basic kosyachny cases


The key ingredient is Skeleton Network + add graph edges for curved roads (thanks to Dmytro for his advice!). One thing is likely to guarantee that you will be in the top 10.



Without these steps, all the masks looked like this.


Alternative Pipelines I Have Tried


  1. Skeletonization + corner detection from skimage
  2. Some variation of the dilation from here.
  3. All this produced some kind of edge, but the Skeleton Network is still much better.

Additional things that sometimes helped other members




Tools for visualization from orgov


6. The final decision



Essentially a replacement for TLDR


7. Beautiful visualization - tiles for all cities gathered together


Following the tradition of the last competition, one person from the chat made full-size pictures of the city from geo-cells using geodata from tif images.


Pictures

View them in high resolution by clicking on them.



Vegas



Paris



Shanghai



Khartoum



')

Source: https://habr.com/ru/post/349068/


All Articles