📜 ⬆️ ⬇️

Finding the best place in the world for a windmill

The story of how NASA, ESA, Danish Technological University, neural networks, decision trees and other good people helped me find the best free hectare in the Far East, as well as in Africa, South America and other “so-so” places.



Prehistory


It seems, about two years ago, and maybe already three, they announced the program of distribution of free hectares in the Far East of Russia . Quickly looking at the map, it became clear that simply choosing the right hectare is not so easy, and the best and obvious places near the cities will probably move away or have already moved to the local ones. Probably, it was at this moment that I had the idea that you can somehow automate the search for the best place.

Having started to think romantically further, I thought that it was not necessary to look to the Far East. There is now a lot of land that no one needs anywhere, but this may change, after about 50 years, when fossil fuels begin to come to an end. And people will go looking for new sources of energy. Then I began to look at renewable energy sources. And very quickly I realized that the map of resources and territories where this new energy can be mined will change a lot. Finding such places now, you can buy them in advance and be rich later. Having estimated still, it seemed to me that in a couple of weekends this can be done easily ... Now, looking back, I understand that it took me about a year. I want to immediately note that at that time I didn’t understand much about energy, renewable sources, or machine learning. Below is a brief retelling of my one-year-old project.
')

Choosing a type of renewable energy source


Having decided on the idea, I quickly went to look, and what kind of renewable energy sources are there at all, and which of them is the most energetic. Here is an incomplete, but the most common list:

  1. solar radiation (solar power);
  2. wind power (wind power);
  3. energy of rivers and watercourses (hydropower);
  4. tidal energy;
  5. wave energy;
  6. geothermal energy;
  7. dissipated thermal energy: heat of air, water, oceans, seas and reservoirs;
  8. biomass energy

But how to determine which one is the best and will win all the others in the future? After reading some more interesting articles from the journals “Science and Life” and “Young Technician”. I went to the LCOE (levelized cost of electricity) technique, which has a simple principle: smart guys try to estimate the total cost per kilowatt-hour of energy, taking into account production, materials, maintenance, etc. Below is a picture of data from 2016 with some projection on 2022. I took a fresher picture from here , below is a boring tablet from this document.


In general, these pictures I have darkness for different countries, made by different organizations and everything looks about the same:

  1. In the first place geothermal energy.
  2. Further Hydroelectricity, but it already depends strongly on the country.
  3. In third place is the Wind.

Geothermal and hydro I did not like, because in my opinion, the places where it would be possible to extract this energy can be counted on fingers. The wind and the sun is another matter, since you can put them on almost every roof and balcony. The sun turned out to be more expensive, and three years ago the difference was more than 30%, I chose the Wind.

By the way, already in the middle of the project, I began to run into documents with similar reflections of the US state, namely the NREL organizations, the US Department of Energy and others, which made predictions and bets on different energy sources in order to understand now how to modernize the energy system country. For example, in one of these documents, it all came down to several options: the share of wind energy will be large or very large.

How I wanted to crank


The idea how to crank it up was pretty simple and looked like this:

  1. Find places where windmills are located around the world.
  2. Collect information at these points:

    a. Wind speed.
    b. Direction.
    c. Temperature
    d. Relief.
    e. What do local fishermen like for dinner?
    f. Etc.

  3. To give this information to a machine learning model that would study and find patterns, which parameters best influence the choice of a construction site by a person.

  4. Give a trained model, all points remaining points in the world with the same information on it.

  5. Get a list of those points that are great for placing a windmill.

In graphic form, this plan, as it turned out later, looked like this well-known picture:


How it was in reality


The first stage was pretty easy. I just unloaded all the entry points from OpenStreetMaps.

By the way, I want to note that OSM is just a storehouse of information about objects around the world with their coordinates, there is almost everything. Therefore, note to data lovers, OSM is the coolest Big Data source.

It was not very difficult to do this. At first I tried using online utilities, it seems here overpass-turbo.eu, by the way a very cool thing, but it didn’t work because of limitations on the number of points and not very fast work on a large amount of data. Therefore, we had to deal with utilities that unloaded data from the OSM data cast locally. Can you always download the current cast here ? In compressed form, it takes about 40GB. Data from it can be unloaded with queries using this Osmosis utility. As a result, I had a date set at 140 thousand points around the world with coordinates and heatmap. He looked something like this:



All problems began in the second stage , as I did not really understand what information I needed to collect. Therefore, for a couple of days I went into reading the principles of the work of wind turbines and recommendations for their placement, restrictions, etc. I even had such amusing schemes about placement, wind gradients, wind roses and other other useful terms in my notes.


As a result, I got this list of parameters that, in my opinion, are important when choosing a place:

  1. The average wind speed per year (ideally 10-11m / s).
  2. Wind direction (Prevailing direction - wind rose).
  3. Minimum wind speed.
  4. Maximum wind speed.
  5. Power density.
  6. Average temperature.
  7. Average humidity
  8. Mean pressure
  9. Height above sea level.
  10. Distance to the water.
  11. Height difference.
  12. Smooth height differences.
  13. The maximum difference in the area of ​​5-10km.
  14. The percentage of trees or plantations in the area (roughness).
  15. Distance to the settlement.
  16. The distance to the industrial facility.
  17. The average number of residents in the area.
  18. The distance to the road (sea, air).
  19. Distance to the electricity network.
  20. Visual and sound inconvenience.
  21. Protected area: reserves and so on.
  22. Icing.

Big data


WIND . Actually, as 90% of all projects on big data break down at the stage “so now let's look at your data about which you talked so much,” and mine was cracked. Having run to look for data on wind speed in Russia, I came across this:


And a dozen similar and useless pictures. Then I began to guess that it was possible in Russia and the truth there is no wind energy, since we simply do not have enough wind in force and somewhere at that moment there was Sechin’s laughter. But I clearly remember that in the Samara region alone the steppes and very often going out for bread as a child I was blown back to the porch.
Starting to look for data on Russia and the fact that I realized that it was not similar to the data with which it was possible to do something useful. Therefore, I moved to foreign sources and immediately found excellent wind maps from Tier3 (Vaisala) . In appearance, the resolution was sufficient and the coverage of the whole world was simply excellent. Then I realized that such data cost quite good money at about ~ $ 1000 per 10 square km (data from three years ago). Failure, I thought.

After a week of mourning, I decided to write Vaisala, Tier3 and other foreign consulting agencies for working with winds and other wind generators, and ask for data. I thought that after telling me what a cool idea I was going to do, I would download everything at once. Only one answered - from the Sander-Partner company. Sander himself gave some advice and also gave links to what I need: the data of the MERRA program, which is maintained by NASA. It is worth noting that it took me about a week of evenings to figure out what Reanalysis is, WRF, and roughly understand what is going on: collecting, aggregating, simulating and predicting weather, winds and other things.

In short, humanity has collected a bunch of weather data, a bunch of maps with average temperatures and wind speeds have been drawn, but it was impossible to collect all this data at every point on the globe, so white spots are filled with the results of weather simulation for past years and called it Reanalysis . For example, here is a site with the visualization of such simulations of the wind, and this is how it looks:


This data was essentially a .csv coordinate grid file with an average wind speed with a large step, I made this kind of map using the coolest free QGIS package and the data grid interpolation method.


And then with the help of it, I pulled from this map data on wind speed for each pair of coordinates. In fact, I got a map, and a data layer for each pixel on it.

Having understood the principle of working with QGIS in about a couple of weeks, I began to build the same maps for the rest of the data sources and drag out values ​​along the coordinates. For temperature, humidity, pressure and other things. Here it should be noted that the data files themselves were mainly taken from NASA, NOAA, ESA, WorldClim , etc. All of them are freely available. With the help of QGIS, I made calculations and search for the distance to the nearest points, from cities, airports and other infrastructure facilities. Each card in one parameter was considered to be about 6-8 hours. And if something was wrong, you had to do it again and again. My home computer rustled at night somewhere for a couple of weeks, but after that even the neighbors were tired of listening to the loose cooler on it and I crawled into the cloud, where I picked up a small virtual machine for calculation.

Already after a few months, I came across this site, made by the Department of Wind Energy in Denmark (DTU Wind Energy). It quickly became clear that permission from them was many times better than my card, I wrote to them and they were happy to upload data to me all over the world, since through the site you can only get small impressions of the territory. By the way, they also made this map using the simulation of the movement of the wind layers by the models WRF, WAsP and achieved data resolution up to 50-100 meters, as I had about 1-10km.

RELIEF. Remember, I wrote that the relief is very important, so I decided to use this parameter in the same way, but it turned out that everything was not easy either. First, I wrote a utility that downloaded data from the Google Elevation API . She did an excellent job and downloaded data for all my points around the world in 10 km increments, it took only about 12 hours of work. But I also had the parameters of the smoothness of the relief or the average value of the differential over the territory around the potential location of the windmill. That is, I needed data with a pitch of 100–200 meters of the whole world, with the help of which I could already calculate the mean value of the difference.


In order to calculate the drops, it would take a couple of months to download data from Google Elevation. So I went to look for other options.

The first thing I found was the Wolfram cloud , which already had the necessary data. Just writing a formula, this thing began to count, using data from the Wolfram cloud. But there, too, failure was waiting for me, since I stumbled into some limits that were not indicated anywhere, and having received a funny correspondence with the support of this service, I went to look for another option.

Here again, data sources in NASA and data from the space program STRM (NASA Shuttle Radar Topography Mission Global) helped me. I honestly tried to pump them out of the site, but there the data were only for small territories. Having plucked up courage, I wrote a letter to NASA and after about a week of correspondence, they unloaded the necessary data for me, for which I thank them so much. There, however, the data turned out to be in some tricky satellite binary format, which I probably raked for a week.

It all ended well, and I calculated the metrics I needed for the height difference for the whole world in 10 km increments. By the way, I also made my own API service, which returns the altitude above the sea level by coordinates and published it here at algorithmia.com/algorithms/Gaploid/Elevation. He works on Azure Tables, where I slyly fit the data and literally keep the centers there for the centers. By the way, even someone bought access to the API a couple of times, since it is cheaper than Google.

TOTAL . Having spent about 4 months of searching, cleaning, and calculating in QGIS, I got a data set that I could use in machine learning models. And which contained about 20 different parameters in the following categories: Climate, Relief, Infrastructure, Necessity or Consumers.

Machine learning and predictions


At that time I already had some knowledge and understanding of how machine learning algorithms work, but I didn’t really want to deploy all these Pythons and Anaconda. Therefore, I used the online service for dummies without SMS from Microsoft Azure ML Studio. Bribed that it is free and everything can be done with the mouse in the browser. Here, in theory, there should be a description of how I spent another month on creating a model, clustering data and other things. All these clusterings were especially difficult since QGIS did them for a very long time on my old home PC. As a result, the experiment looks like this.



The total number of points that needed to be estimated was about 1.5 million . Each such point is an area of ​​10 by 10 km and so the whole world. I removed the cells that already have wind turbines in a radius of 100 km, as well as some areas, and received a date set of ~ 1 500 000 entries. The model assessed the suitability of each such square on planet Earth . Used mainly neural networks and boosted decision trees. Accuracy on those points where wind turbines are already standing and what my model predicted came out like this: Accuracy - ~ 0.9; Precision - ~ 0.9 . That how, it seems to me, quite accurately, well, or somewhere, retraining took place. From this exercise I got:


In total, I found about 30,000 of the most suitable places (these are new places where there are no windmills nearby at a distance of 100 km).

Result and Validation


Having received 30,000 points with new locations, I visualized them and it looks like a heatmap.



I made a small web site using cartodb to visualize the map and laid out the entire world map - windcat.ch . Also, I calculated for each point an approximate energy production from one industrial-size wind turbine (50 m). The points here are colored by the volume of energy, and not by the estimate of the Probability of the model. At each point, you can click and there will seem “confidence” of the model at this point, I called it Goodness.


I also tried to check the veracity of all this by expert means.

Visual inspection: the model predicts points that spread along the shore, which seems to be true, since there will be a good even wind from the water surface.
Visual inspection: the cluster of points for the most part coincides with the places of good and excellent air velocity and density, when compared with maps of winds. For example, here are Egypt and China:



What's next


Sometimes they write to me and ask me to send more detailed maps of places or explain some things on the map, but nothing more has come of it yet. Theoretically, you can recalculate the data not in 10 km increments, but in 100 meters and in theory the picture can change dramatically, and in theory it will be able to predict not only the area, but also a specific location point. But for this you need some more computing power, which I do not yet have. If you have any ideas for use, I will be glad to hear them.

Source: https://habr.com/ru/post/334804/


All Articles