In the field of open data satellite images there have been a number of changes recently that amount to moving the cheese. These changes relate to Landsat and Sentinel-2 data. Here in more detail what changes and a few comments on the context.
The USGS in end September started changing their primary distribution form for Landsat data to what they call Collections. This essentially means
- introducing different processing versions in explicit form. So far reprocessing a scene simply replaced the existing package. You could identify the processing version in the scene metadata but not in the package name. Having the processing date in the package identifier makes this more transparent but it also forces data users to handle this.
- introducing quality assessment levels. This essentially means some scenes are verified to conform with higher quality standards, apparently in particular concerning geometric accuracy and this is indicated in the package identifiers and available scenes will be prominently classified into these classes in the download interfaces.
- introducing some additional metadata.
This change is done gradually – they right now move to this for Landsat 5 and 7 data and are planning to start with Landsat 8 in November. The whole reprocessing will apparently take several months. The old distribution form will stay available during this including new scenes so there is plenty of time to adjust to this. And the new distribution form is apparently mostly backwards compatible except for the different file names so it should be fairly simple to deal with.
Now the Sentinel-2 changes are a whole other story.
First on September 19 ESA turned off the registration free distribution system for Sentinel-2 data. This means to access Sentinel-2 data you now have to register with ESA. Not a big deal, this is an automated registration. But not really that convenient if you casually want to try out the data.
Then in late September they moved data distribution from the previous scene based packaging to single granule packages. This change was pre-announced in early August. As i explained in my initial Sentinel-2 data review the original distribution form for Level 1C Sentinel-2 data (which is the only processing level distributed) was packages each containing a 300km section of the satellite’s recording swath – which is about 290km wide. These packages were – when images are recorded for the full 300km length – usually about 6 to 15 GB in size – depending on latitude since the 300km segment cuts were in latitude direction.
Apart from the larger size (due to both the higher spatial resolution and the larger footprint) and the different internal organization of the packages these were quite comparable to Landsat scenes. But apparently quite a lot of users found the large packages somewhat inconvenient so ESA is moving to distributing single granule packages now. The term granule is how ESA calls the 100x100km tiles the data is structured into internally which corresponds to a modified version of the MGRS system. Each package now contains exactly one of these tiles and the 300km length scenes which commonly contained about 10-15 of these granules are history. This might not seem such a large deal – just splitting the same files into several smaller packages and ESA also announces it as such. But this has quite a few implications:
- There is quite a bit of added redundancy between the packages since you now download a lot of the metadata and supplementary files 10-15 times which you previously got only once
- Since ESA continues to generate their preview images with individually adjusted tone mapping you now have to deal with even larger and more fine grained arbitrary differences in the preview images. These were already quite difficult to use for assessing image quality and despite the larger size (the previews for a single granule are now the same size as they were for a whole scene) this got worse. On the bright side you can now – with some trickery – approximately geocode the preview images.
- Although ESA distributes individual granules now they apparently do not consider it necessary to add to the package metadata information on which granule a package contains. You have a footprint polygon but no information on the MGRS tile or UTM zone.
- For larger scale data users who do not just deal with individual granules or want to casually see some images for a specific location the ESA download interface (and similarly the various alternative browsing tools around which are based on the ESA structures) are now essentially unusable. Instead of browsing 200-300 packages per day you now have to deal with many thousands. This means the only feasible way to efficiently access Sentinel-2 data on a larger scale is now through automated tools.
- ESA has already before fairly randomly reprocessed images resulting in duplicate packages in the archive. I don’t know to what extent this leads to different image data. But with the same happening now on a much more fine grained level this issue is much more acute. Like you download a granule in the last processed version available and then you also need a neighboring granule but see this is only available in an earlier processing. Will this lead to a difference in data at the edge between the granules?
Regarding the preview images – here an example for a preview of one of the old style packages:
And here the same area with newly processed data assembled from the previews of the new single granule packages:
Well – it is better than nothing, which is not really a compliment though…
But the story does not end here. Since this change was implemented access to the ESA download systems has been fairly erratic – who would have guessed that offering about 10 times the number of files for download and also serving metadata and query services for all of these puts additional strain on the infrastructure. Today they announced that unreliable and delayed data access will likely continue throughout October.
No matter what the reasons and motives for all of this are – the prospect that Sentinel-2 could turn into a reliable and dependable open data alternative for Landsat just became significantly less likely. Considering the amount of tax money that went into that this is more than just a bit sad.
I am generally inclined – in line with Hanlon’s razor – to put most of this on incompetence. The various obviously not well thought through aspects in the ESA data distribution and tools, like for example the preview image tone mapping, underline this. But the possibility that increasing difficulties of routine use of Sentinel data through the venues available to the general public is actually intentional on some level is not all that far fetched on an overall look.