Broyden defeats the purpose of DEQs?

Heya,

Thanks for your continued work in building better DEQs.

The main selling point of DEQs is that the solver can take as many steps as required to converge without increasing the memory. This isn't true for your implementation of broyden, which starts off with:

Us = torch.zeros(bsz, total_hsize, seq_len, max_iters).to(dev)
VTs = torch.zeros(bsz, max_iters, total_hsize, seq_len).to(dev)

and therefore has a memory cost linear with max_iters, even though the ops aren't tracked. Anderson also keeps the previous m states in memory, where m is usually larger than the number of solver iterations needed anyways. Don't those solvers contradict the claim of constant memory cost?

On a related note, I've found it quite hard to modify these solvers even after going over the theory. Is there any notes or resources you could point to to help people understand your implementation? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions