⬆️ ⬇️

How we helped digitize the age-old history of weather observations in Brazil

image



Since 1909, scientists from the National Institute of Meteorology of Brazil have been recording all the information about weather conditions and climate change in the country. Researchers analyze these data and build forecasts based on them. For a hundred years, experts have collected more than 3 million pages of weather records in the hot Rio de Janeiro, on Iguazu’s roaring waterfalls, in the gloomy forests of the Amazon and in the misty São Paulo. But all the information was stored in paper form. Every year it was accumulated more and more, and the old records fell into disrepair. It became increasingly difficult for researchers to work with documents.



Today we will talk about how with the help of our ABBYY FlexiCapture Engine technology, the National Institute of Meteorology of Brazil digitized the archive of weather observations that scientists have collected for over 100 years.



"But in sunny Brazil, my Brazil ..."



Brazil is the largest country in South America. On its territory there are three types of climate: equatorial, subtropical and tropical. The development of almost all sectors of the Brazilian economy, and especially agriculture, depends on a variety of weather conditions. Therefore, it is important for specialists to analyze and accurately predict possible changes in weather conditions. Forecasts are also needed to ensure the safety of aircraft, pilots and passengers, to protect ships and sailors, to properly organize fisheries and develop tourism.

')

Weather history helps to anticipate possible climate change and make decisions on adjusting the country's agricultural and industrial policy. The National Institute of Meteorology of Brazil (INMET, Instituto Nacional de Meteorologia), which was founded in 1909, has been engaged in this work for over a hundred years. He reports to the Ministry of Agriculture and Livestock. Since the beginning of the 20th century, the Institute has painstakingly collected data on precipitation, winds, relative humidity, pressure, etc. For decades, experts have recorded this information day after day and recorded it in observation diaries — such material is of great scientific value. In the photo - a diary of observations of the weather in one of the cities of Amazonas in July 1961.



image



image



Until recently, precious documents were stored in paper form. Archives with records were scattered in different cities of Brazil: Rio de Janeiro, Sao Paulo, Manaus, Belen, Salvador, Porto Alegre, Cuiabá, Goiânia, Recife, Belo Horizonte and Brasília. Thus, it was almost impossible to analyze documents or work with them.



image



In addition, books and notebooks lay in warehouses, where there were no suitable conditions for the careful storage of historical documents. In three cities in Brazil, especially hot and humid climate. For example, Manaus and Belem are on the Amazon in the middle of tropical forests, it is hot and humid all year round. Or Cuiabá, beyond which the Pantanal stretches - the largest swamps on the planet. Due to humid air and an abundance of insect pests, the paper deteriorated, and the institute risked losing some of its valuable records. Meanwhile, some observations were made in the XIX century, when Brazil was an empire :



image



In the early 2010s, INMET decided to digitize the entire archive of weather observations — notebooks, books, and even microfilms. This is 3 million pages, or 4 billion characters. However, for this it was necessary to bring together and organize all the records stored in different cities.



image



In 2011, employees of the institute transported documents to Brasilia and placed them in a new archive in the building of INMET. The storage area is 1500 square meters. After that, the institute experts began to process and restore records that were not always kept in good conditions before:



image



The final step in creating a large paper archive was the cataloging of all records — thanks to it, it was easy to find the necessary weather observation diary in the repository. Now it was possible to start digitizing documents.



image



Forward to digitize



In 2012, the institute began to cooperate with the Brazilian company Flexdoc , which develops software for processing and storing documents. To translate weather observations into an electronic form, Flexdoc did not use optical character recognition (OCR) technology , but used a severe “manual OCR” . The company has developed templates and outlined which data from the scanned documents should be entered into the system. Then, Flexdoc sent scans to check a group of hardworking operators located in India. They received images and manually entered valuable data according to a template.



The archive contains more than 20 types of brochures with weather data. In each of them at least 6 types of pages, and some of them contain more than 150 fields. To significantly simplify the work of verifiers, in 2014, Flexdoc began using the ABBYY FlexiCapture Engine to digitize the archive.



12 scanners and one program



At first, Flexdoc employees scanned pages from weather observation diaries. For this purpose, 12 ATIZ BookDrive PR and Plustek OpticPro A360 scanners were used.



image



They digitized documents in A4 and A3 formats, as well as in non-standard formats:



image



image



Employees of IT companies, and then INMET experts checked the quality of scanned images. Then the scans were imported into the system based on ABBYY FlexiCapture Engine. Flexdoc employees provided templates for the processing of documents created by ABBYY FlexiCapture, and ABBYY OCR technologies helped define and superimpose templates on documents, find the necessary fields in them and extract data. In shabby documents and handwritten records, OCR technologies could not always recognize any field — in this case, Flexdoc employees manually digitized them.



image



Further information was checked by 85 verifiers - specialists of a Brazilian company. They were helped by two more employees of the institute: meteorologists had to make sure that the climatic indicators were within the normal range for the region. Only after that the data entered the information system INMET.



Record processing scheme looks like this:



image



To digitize the microfilm Flexdoc used a Kodak ABR 2400 \ 3000 DSV scanner. It helps to divide the movie into images, extract them and save in TIFF format on the hard disk.



Some statistics



Full digitization of the archive of weather observations collected over 100 years took three years. All historical data is now stored not only in paper form in the archive, but also on a large high-performance and fault-tolerant SGI Altix 4700 server with a capacity of 870 Gigaflops.



Digital versions of weather diaries are available to everyone on the INMET website . To view the data, just register. For example, the result for a climate data request in the municipality of Arcoverdi for January-December 1990 is as follows:



image



First of all, information is used by researchers from INMET, students, and companies that need to analyze climatic conditions in different regions of Brazil. Historical data from INMET has already become the basis for creating analytical models for climate evolution and weather prediction - these are the scientists of the meteorological institute.



Elizaveta Titarenko

ABBYY Corporate Blog Editor

Source: https://habr.com/ru/post/349130/



All Articles