Hi, Habr.I want to share the joy - we finished parsing the proprietary RAW formats from Canon and Nikon cameras for our
Pics.io service. For those who do not know: the main idea of ​​Pics.io is to enable people to work with RAW photos right in the browser. Without installing any programs, plug-ins and extensions - the real zero footprint.
When we started to work, there was an understanding that in the coming years, digital photography will move to the “cloud”. We knew that the mobility trend would increase and the prices for cloud storage would decline. Then the web web lacked only one piece of the mosaic - adequate image processing. There were a lot of online editors, most of them written in Flash, which could not satisfy the photographers due to a number of limitations: they worked with 8-bit JPEG, PNG and had a file size limit. We decided to make an editor with RAW support.
')
At that time, we had several prototypes working with DNG, which proved that all this can be done in JavaScript and WebGL. But we, unfortunately, could not force all the people of the world to convert their photos into DNG. Even Adobe didn't succeed. We understood that support for “native” formats was needed, and a few months ago we took on the most common proprietary formats from Canon and Nikon.
Parsing DNG and proprietary RAW formats
Compared to CR2 or NEF, DNG has quite a few advantages. Starting from the openness of the format and the ability to embed XMP into it and ending with a more optimal format for storing data and metadata inside the DNG container. We already wrote about the differences and features of the formats
in our blog , there was a
post on Habré and, if you search, there is a lot of information. Here we will pay attention to the technical moments hidden from the ordinary user.
Most of the RAW formats (CR2, NEF and DNG) are based on the
TIFF format, which is a tag format. And since TIFF provides the ability to expand itself with private tags, Canon and Nikon actively use this by writing a bunch of necessary information to their own tags in their own format. The reasons why camera manufacturers do this remain a mystery to me, and if someone has any suggestions on this, please comment in the comments.
Essentially, the analysis of any RAW consists of several steps: decompression of data from JPG, resulting in a “raw” image captured by the camera matrix, and dematrization (also debayerization) necessary to restore color information (since the camera sensor fixes the brightness, and not Colour).
JPG decompression
So the camera matrix “sees” the image.The first thing to do when parsing RAW is to subtract the metadata necessary for the decompression algorithm. It stores the necessary dimensions, offsets, data storage method, etc. With DNG, everything is simple, the specification clearly states what lies where and (and lies neatly, and not scattered around the file). Data decompression turns into fun. The CR2 format is a bit more complicated, since the variables are scattered in different groups of tags and the decompression algorithm varies slightly from camera to camera. Nikon always uses the same algorithm in its format, only the Huffman trees used for decompression change. These trees, unlike Canon, do not need to be rebuilt every time, but can be subtracted from the metadata. Metadata is stored deep in the Makernote section, which has its own format. But the main disadvantage is that inside CR2 and NEF, the packed data is stored in one piece (in fact, Canon stores several pieces, which then need to be glued together in one> _ <), and DNG stores many small pieces (tiles), so the task can be easily parallelized. In
raw.pics.io, DNG decompression is performed 3-4 times faster than on the original RAW file.
Some cameras that support DNG can write uncompressed data, the file size is larger, but you can skip the decompression step.
Dematization of raw data
The second big step is dematrization. The metadata needed in this step, manufacturers write in their own structures within the TIFF-tags, which change with the release of new models. By adding new features to cameras, manufacturers add new tags to their proprietary formats, and thus make it difficult to support these functions in third-party software. And when it comes to restoring the correct white balance or gamma correction, we have to take into account both the manufacturer and specific camera models.
Of course, we save a little by caching metadata, since we already know the features of the cameras and their “iron” stuffing, but to get the parameters depending on the shooting conditions, we have to maintain the entire zoo of formats.
In general, the process of dematrization is quite resource-intensive. It is necessary to consistently perform several operations on each pixel (or the surrounding pixels), and this is not done very quickly on images of 20 megapixels. = (Here we are using WebWorkers with all our might and parallel as we can. But still, we need and want even faster, so now we are looking at
SIMD ,
WebCL and other techno-joys, which will help speed up the process.
Afterword
When developing the converter, we learned quite a lot about how RAWs are arranged, and if anyone is interested in this topic, ask in comments.
Try to convert your CR2 and NEF files right now on the page of our
RAW converter . It doesn’t work fast, you have to wait an average of 15-20 seconds, but now the last stone on the way of photographers to the “cloud” is shifted. And together with the April price reduction for Google Drive, almost five times ... You can imagine. So wait for "Lightroom in the browser." We are already working on it.
UPD: We posted a new version, in which analysis of Canon 5D Mark III, Nikon D5000, Nikon D700 was fixed.
Thanks
Snowly ,
scumware ,
fetis26UPD2: If someone had problems with the message "Format not supported yet ...", now it should work correctly.
UPD3: Added a gradient filter to the editor. One of the best of our tools at the moment. You can see
here .