Description
A common need is getting URLs from item assets within a catalog, which involves iterating over hundreds of items. Here is a quick example:
import satsearch
import intake
bbox = [35.48, -3.24, 35.58, -3.14] # (min lon, min lat, max lon, max lat)
dates = '2010-07-01/2020-08-15'
URL='https://earth-search.aws.element84.com/v0'
results = satsearch.Search.search(url=URL,
collections=['sentinel-s2-l2a-cogs'], # note collection='sentinel-s2-l2a-cogs' doesn't work
datetime=dates,
bbox=bbox,
sortby=['+properties.datetime'])
print('%s items' % results.found())
itemCollection = results.items()
#489 items
Initializing the catalog is fast!
%%time
catalog = intake.open_stac_item_collection(itemCollection)
#CPU times: user 3.69 ms, sys: 0 ns, total: 3.69 ms
#Wall time: 3.7 ms
Iterating through items is slow. I'm a bit confused by the syntax too. I find myself wanting to use an integer index to get the first item in a catalog (first_item = catalog[0]
) or simplify the code block, but currentlty below to hrefs = [item.band.metadata.href for item in catalog]
(currently iterating through catalogs returns item IDs as strings.
%%time
band = 'visual'
hrefs = [catalog[item][band].metadata['href'] for item in catalog]
#CPU times: user 4.6 s, sys: 1.23 ms, total: 4.6 s
#Wall time: 4.61 s
As for speed, it only takes microseconds to iterate through the underlaying JSON via sat-stac
%%time
band = 'visual'
hrefs = [i.assets[band]['href'] for i in catalog._stac_obj]
#CPU times: user 684 µs, sys: 0 ns, total: 684 µs
#Wall time: 689 µs
@martindurant any suggestions here? I'm a bit perplexed about where the code lives to handle list(catalog)
or for item in catalog:
...