I know this has been a long time coming but today ESA rolled out another change in their Sentinel-2 data distribution form. While the previous change was moving from multiple tile scenes to single tile packages – which i discussed previously resulting in significant performance problems as predicted this new change keeps the content of packages but changes the naming – both of the whole package and the internal file structure.
If you have read my Sentinel-2 data review you might remember that one of the first things i complained about there were the excessively blown up file names full of redundant data and information irrelevant to identify the files. I mentioned this because it is a nuisance when you work with the files but ultimately it is not a big deal – you just rename things into whatever form suits you when you ingest the data into your systems and then don’t have to worry about this any more. The idea that changing this now after more than a year of public data distribution is odd at best. Even more peculiar is the primary reason given for the change
The product naming (including the naming of folders and files inside the product structure) is compacted to overcome the 256 characters limitation on pathnames imposed by Windows platforms
Let me translate that: We now after more than a year of public data distribution change our data distribution form in a not backwards compatible way to cater users of a historic computer platform no more sold or even maintained by its creator that is so outdated that we did not even consider it and its limitations when we initially planned this 3-4 years ago.
Of course you could also simply say: 256 characters ought to be enough for anybody…
This is how the change looks like: The old structure had package names like this:
and within that were data files like this:
Now you get something like:
This shows just the main data files. The metadata and QA stuff is changed as well, many file names are now generic, that means they are identical for all packages – a bit like with the Sentinel-3 data, just that Sentinel-3 uses lower case file names while Sentinel-2 uses upper case file names.
There are also some quite sensible aspects in the change. For example the MGRS Tile ID is now in the package name. And the timestamps in the package name are in a different order, previously the processing time stamp was first while now the recording time steps is. This for example means when you sort the file names you get them in recording order rather than processing order which makes more sense.
The data distribution system continues to be very unreliable by the way so if you want to take this opportunity to download and look at some Sentinel-2 data you likely need quite a bit of patience.
Addition: The depth of obfuscation in the file format specifications is really impressive by the way. Looking for the actual meaning of the second time stamp in the package file name leads you to three different specifications. In the one that is currently distributed the second time stamp is apparently the datastrip sensing time but there are two other format variants where this is either
- the package creation date or
- the newest datastrip sensing time incremented by one second.
You can now really quite visualize what has happened here. Originally the creation date was meant to be used – this is at first mentioned everywhere in the specs. And then someone noticed that when processing the data in parallel the creation date is not necessarily unique…