8000 AcceleratedDHTClient needs more resources than the ResourceManager has by default · Issue #8945 · ipfs/kubo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
AcceleratedDHTClient needs more resources than the ResourceManager has by default #8945
Closed
@aschmahmann

Description

@aschmahmann

Checklist

Installation method

ipfs-update or dist.ipfs.io

Version

go-ipfs version: 0.13.0-rc1
Repo version: 12
System version: amd64/windows
Golang version: go1.18.1

Config

Abridged:
{
  "Experimental": {
    "AcceleratedDHTClient": true,
  },
  "Internal": {
    "Bitswap": {
      "EngineBlockstoreWorkerCount": null,
      "EngineTaskWorkerCount": null,
      "MaxOutstandingBytesPerPeer": null,
      "TaskWorkerCount": null
    }
  },
  "Routing": {
    "Type": "dhtclient"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "20s",
      "HighWater": 100,
      "LowWater": 50,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

Running with the AcceleratedDHTClient enabled results in bursty connections (e.g. on startup) that result in the routing table being very small (compared to running without the resource manager enabled) and as a result not particularly functional.

Enabling debug logging on the resource manager shows that we're out of available outbound connections.

Some analysis here is still need, however AFAICT the reasons for this are:

  • We now have enforceable connection limits
  • The accelerated DHT client makes lots of connections in a short period of time. While it doesn't need to keep them open for very long it can't really close the connections in case they're in use (e.g. for Bitswap requests) and so it has to rely on the garbage collection on the host which also means waiting for the grace period.

Some options here include:

  1. Increase the resource limits, particularly for outbound connections and some overall limits like connections and file descriptors. Given that outbound connections are largely controlled by user behavior (e.g. turning on the AcceleratedDHTClient, making lots of API or gateway requests, etc.) this might not be so bad
  2. Make the AcceleratedDHTClient consume fewer resources - some examples include
    • Reduce (the default) amount of concurrent work the AcceleratedDHTClient does when doing routing table initialization or maintenance
    • More aggressively return resources to the system. Could be having the AcceleratedDHTClient call Trim() or manually calling close on unprotected connections it recently used.
    • Have go-libp2p give the application layer more control over how dialing is done (e.g. for expensive but not latency-critical operations have the client only dial WAN peers and try different transports serially rather than in parallel)
  3. Emitting louder errors to users of the AcceleratedDHTClient when insufficient resources are available to maintain the routing tables. At the extremes we could have hard coded errors in go-ipfs telling users to bump their limits or errors emitted from the AcceleratedDHTClient based on checking the error types received

Looking for feedback here on other ideas or preferences for how to handle this. My initial reaction is to go with increasing limits and have the AcceleratedDHTClient log errors when its routing table won't update properly due to resource starvation. Without the logs getting people to report issues properly will likely be difficult, and increasing the limits vs reducing the resource consumption is something we can then more easily tune over time (i.e. lower the limits as we reduce resource consumption)

Metadata

Metadata

Assignees

Labels

kind/bugA bug in existing code (including security flaws)need/triageNeeds initial labeling and prioritization

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0