8000 Iterating through catalog items is awkward and slow · Issue #66 · intake/intake-stac · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Iterating through catalog items is awkward and slow #66
Open
@scottyhq

Description

@scottyhq

A common need is getting URLs from item assets within a catalog, which involves iterating over hundreds of items. Here is a quick example:

import satsearch
import intake

bbox = [35.48, -3.24, 35.58, -3.14] # (min lon, min lat, max lon, max lat)
dates = '2010-07-01/2020-08-15'

URL='https://earth-search.aws.element84.com/v0'
results = satsearch.Search.search(url=URL,
                                  collections=['sentinel-s2-l2a-cogs'], # note collection='sentinel-s2-l2a-cogs' doesn't work
                                  datetime=dates,
                                  bbox=bbox,    
                                  sortby=['+properties.datetime'])
print('%s items' % results.found())
itemCollection = results.items()
#489 items

Initializing the catalog is fast!

%%time 
catalog = intake.open_stac_item_collection(itemCollection)
#CPU times: user 3.69 ms, sys: 0 ns, total: 3.69 ms
#Wall time: 3.7 ms

Iterating through items is slow. I'm a bit confused by the syntax too. I find myself wanting to use an integer index to get the first item in a catalog (first_item = catalog[0]) or simplify the code block, but currentlty below to hrefs = [item.band.metadata.href for item in catalog] (currently iterating through catalogs returns item IDs as strings.

%%time 
band = 'visual'
hrefs = [catalog[item][band].metadata['href'] for item in catalog]
#CPU times: user 4.6 s, sys: 1.23 ms, total: 4.6 s
#Wall time: 4.61 s

As for speed, it only takes microseconds to iterate through the underlaying JSON via sat-stac

%%time 
band = 'visual'
hrefs = [i.assets[band]['href'] for i in catalog._stac_obj]
#CPU times: user 684 µs, sys: 0 ns, total: 684 µs
#Wall time: 689 µs

@martindurant any suggestions here? I'm a bit perplexed about where the code lives to handle list(catalog) or for item in catalog: ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0