Description
PRETTY_NAME="Flatcar Container Linux by Kinvolk 2905.2.5 (Oklo)"
We are running Trident (https://github.com/NetApp/trident) on our bare metal cluster running on prem.
Our StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: netapp-ontap-san-ext4
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.trident.netapp.io
allowVolumeExpansion: true
parameters:
backendType: "ontap-san"
storagePools: "ontapsan_10.20.50.4:.*"
fsType: "ext4"
Running iscsid and multipathd config:
iscsi-config.txt
mpath-config.txt
We currently run fstrim
weekly.
Previously what we have observed, when one of our volumes fills up (because we aren't doing online discard on our mount), that volume would be flipped to ro
mode.
Recently (I don't know which version exactly, or even if this was associated with a Flatcar version change), instead of being flipped to ro
, we experience mulipathd becoming unresponsive and eating a lot of CPU for iowait
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
107036 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % multipathd -d -s
177230 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % multipathd -d -s
109544 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % multipathd -d -s
226892 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % multipathd -d -s
From kernel logs:
Nov 01 10:59:26 worker-24.dev.merit.uw.systems kernel: sd 6:0:0:1035: [sdd] tag#59 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Nov 01 10:59:26 worker-24.dev.merit.uw.systems kernel: sd 6:0:0:1035: [sdd] tag#59 Sense Key : Data Protect [current]
Nov 01 10:59:26 worker-24.dev.merit.uw.systems kernel: sd 6:0:0:1035: [sdd] tag#59 Add. Sense: Space allocation failed write protect
Nov 01 10:59:27 worker-24.dev.merit.uw.systems kernel: sd 6:0:0:1035: [sdd] tag#59 CDB: Write(10) 2a 00 00 5d 71 b8 00 00 80 00
Nov 01 10:59:27 worker-24.dev.merit.uw.systems kernel: blk_update_request: critical space allocation error, dev sdd, sector 6123960 op 0x1:(WRITE) flags 0x4200 phys_seg 16 prio class 0
Nov 01 10:59:29 worker-24.dev.merit.uw.systems kernel: blk_update_request: critical space allocation error, dev dm-2, sector 6123960 op 0x1:(WRITE) flags 0x4000 phys_seg 16 prio class 0
full log
blkid-debug.log
To help debug this problem I want to understand kernel/multipath behavioral change to react so badly when unable to write a block.
Thank you
Metadata
Metadata
Assignees
Labels
Type
Projects
Status