chunkserver: abnormal/disproportional distribution of chunk files #326

onlyjob · 2020-01-16T19:35:57Z

Chunkserver is not spreading chunk files among directories fairly, evenly or even randomly.

Consider the following assessment of one chunkserver HDD:

Click to expand: [seconds to walk :: directory :: number of files]

$  for D in $(find -maxdepth 1 -type d -printf '%P\n' | sort); do \
printf ":: $D :: "; tree $D | wc -l; done | ts -i %s

8 :: 00 :: 45112
3 :: 01 :: 39850
2 :: 02 :: 45017
2 :: 03 :: 44686
3 :: 04 :: 43994
2 :: 05 :: 41658
2 :: 06 :: 40154
4 :: 07 :: 40048
29 :: 08 :: 244076
3 :: 09 :: 42750
2 :: 0A :: 45197
2 :: 0B :: 45364
2 :: 0C :: 42651
3 :: 0D :: 42456
2 :: 0E :: 45458
2 :: 0F :: 42104
2 :: 10 :: 42145
2 :: 11 :: 43952
1 :: 12 :: 42432
2 :: 13 :: 45485
3 :: 14 :: 45673
2 :: 15 :: 44992
2 :: 16 :: 45443
2 :: 17 :: 40049
2 :: 18 :: 45005
2 :: 19 :: 45306
2 :: 1A :: 42005
3 :: 1B :: 45535
2 :: 1C :: 45485
2 :: 1D :: 44972
2 :: 1E :: 45384
2 :: 1F :: 45656
2 :: 20 :: 45605
2 :: 21 :: 42786
2 :: 22 :: 45094
2 :: 23 :: 43966
2 :: 24 :: 45457
2 :: 25 :: 42309
2 :: 26 :: 45562
2 :: 27 :: 40002
2 :: 28 :: 45588
1 :: 29 :: 40124
2 :: 2A :: 45760
2 :: 2B :: 45556
2 :: 2C :: 40880
1 :: 2D :: 45507
2 :: 2E :: 44101
2 :: 2F :: 40143
2 :: 30 :: 45456
1 :: 31 :: 45587
2 :: 32 :: 45521
2 :: 33 :: 44904
2 :: 34 :: 44025
2 :: 35 :: 40066
2 :: 36 :: 45567
2 :: 37 :: 43891
3 :: 38 :: 38642
2 :: 39 :: 40785
1 :: 3A :: 40365
2 :: 3B :: 50906
3 :: 3C :: 46865
1 :: 3D :: 45387
2 :: 3E :: 45597
2 :: 3F :: 45433
2 :: 40 :: 42164
2 :: 41 :: 45673
2 :: 42 :: 38552
2 :: 43 :: 45663
2 :: 44 :: 45406
1 :: 45 :: 45360
2 :: 46 :: 45485
2 :: 47 :: 45442
2 :: 48 :: 45612
2 :: 49 :: 45417
2 :: 4A :: 44002
2 :: 4B :: 45709
2 :: 4C :: 45551
2 :: 4D :: 44162
2 :: 4E :: 42338
2 :: 4F :: 43851
2 :: 50 :: 45363
3 :: 51 :: 42380
3 :: 52 :: 45255
2 :: 53 :: 45648
2 :: 54 :: 45430
3 :: 55 :: 43788
2 :: 56 :: 45631
2 :: 57 :: 38531
2 :: 58 :: 40189
2 :: 59 :: 42682
2 :: 5A :: 38416
2 :: 5B :: 38464
1 :: 5C :: 42466
2 :: 5D :: 42048
2 :: 5E :: 45315
2 :: 5F :: 41650
2 :: 60 :: 38572
1 :: 61 :: 42315
2 :: 62 :: 38717
2 :: 63 :: 38561
2 :: 64 :: 42442
2 :: 65 :: 43784
1 :: 66 :: 42390
2 :: 67 :: 38843
2 :: 68 :: 42262
2 :: 69 :: 42175
2 :: 6A :: 42375
2 :: 6B :: 39840
2 :: 6C :: 38845
1 :: 6D :: 38553
3 :: 6E :: 42256
2 :: 6F :: 42517
2 :: 70 :: 38588
2 :: 71 :: 38615
1 :: 72 :: 42343
3 :: 73 :: 42434
1 :: 74 :: 42556
2 :: 75 :: 39992
2 :: 76 :: 38606
2 :: 77 :: 45634
2 :: 78 :: 45402
1 :: 79 :: 38542
2 :: 7A :: 43878
2 :: 7B :: 38481
2 :: 7C :: 42255
1 :: 7D :: 53235
3 :: 7E :: 42451
2 :: 7F :: 42529
1 :: 80 :: 40449
2 :: 81 :: 38468
2 :: 82 :: 42201
2 :: 83 :: 43858
1 :: 84 :: 45791
161 :: 85 :: 1961750
2 :: 86 :: 43868
2 :: 87 :: 42475
2 :: 88 :: 43470
2 :: 89 :: 40532
1 :: 8A :: 42686
2 :: 8B :: 42257
2 :: 8C :: 42918
53 :: 8D :: 234492
1 :: 8E :: 44002
2 :: 8F :: 42235
2 :: 90 :: 40394
2 :: 91 :: 44864
2 :: 92 :: 45255
2 :: 93 :: 41999
2 :: 94 :: 41799
1 :: 95 :: 38596
3 :: 96 :: 42744
1 :: 97 :: 42465
2 :: 98 :: 38387
2 :: 99 :: 38565
2 :: 9A :: 42571
2 :: 9B :: 42355
153 :: 9C :: 1322210
3 :: 9D :: 42405
2 :: 9E :: 40034
3 :: 9F :: 44811
2 :: A0 :: 38631
2 :: A1 :: 45187
2 :: A2 :: 42325
2 :: A3 :: 38592
2 :: A4 :: 38516
2 :: A5 :: 40494
115 :: A6 :: 1192862
2 :: A7 :: 38511
2 :: A8 :: 45001
2 :: A9 :: 42399
2 :: AA :: 39865
1 :: AB :: 38652
3 :: AC :: 43823
2 :: AD :: 45640
2 :: AE :: 40367
2 :: AF :: 43941
2 :: B0 :: 45488
2 :: B1 :: 46192
2 :: B2 :: 42293
63 :: B3 :: 804730
3 :: B4 :: 42213
2 :: B5 :: 45559
2 :: B6 :: 40320
2 :: B7 :: 45614
2 :: B8 :: 42670
2 :: B9 :: 38750
2 :: BA :: 42219
369 :: BB :: 3622774
3 :: BC :: 43950
3 :: BD :: 38513
3 :: BE :: 42749
3 :: BF :: 42186
2 :: C0 :: 44939
2 :: C1 :: 44043
2 :: C2 :: 42281
2 :: C3 :: 38511
3 :: C4 :: 40576
2 :: C5 :: 42385
2 :: C6 :: 42104
2 :: C7 :: 38608
2 :: C8 :: 42751
2 :: C9 :: 40154
2 :: CA :: 38682
1 :: CB :: 43832
2 :: CC :: 45574
2 :: CD :: 42634
2 :: CE :: 42255
2 :: CF :: 45347
2 :: D0 :: 42330
2 :: D1 :: 42361
2 :: D2 :: 45601
3 :: D3 :: 44261
1 :: D4 :: 45207
2 :: D5 :: 45777
3 :: D6 :: 38637
2 :: D7 :: 44850
1 :: D8 :: 38559
3 :: D9 :: 42346
2 :: DA :: 38667
2 :: DB :: 44973
124 :: DC :: 1062189
2 :: DD :: 44040
2 :: DE :: 42282
2 :: DF :: 44054
2 :: E0 :: 45583
3 :: E1 :: 45304
2 :: E2 :: 44050
2 :: E3 :: 44019
2 :: E4 :: 42356
2 :: E5 :: 38628
2 :: E6 :: 38696
2 :: E7 :: 45415
3 :: E8 :: 41575
2 :: E9 :: 44003
2 :: EA :: 45469
3 :: EB :: 40114
2 :: EC :: 45564
2 :: ED :: 41276
2 :: EE :: 45604
2 :: EF :: 40099
2 :: F0 :: 45714
2 :: F1 :: 44809
2 :: F2 :: 45635
2 :: F3 :: 45215
2 :: F4 :: 38525
2 :: F5 :: 40062
2 :: F6 :: 45575
2 :: F7 :: 44073
216 :: F8 :: 339613
1 :: F9 :: 42408
3 :: FA :: 45703
2 :: FB :: 42413
2 :: FC :: 43974
2 :: FD :: 44224
2 :: FE :: 45548
1 :: FF :: 46093

Note abnormally large directory BB - 80 times heavier than most with walk time 123 times slower.
Several directories have 40...50 times more files than most directories and walk time on those large directories is slow. Probabilistically heavy directories get more hits and overall performance suffers.

As far as I'm concerned disproportional distribution of chunk files among directories is a bug.

In #319 (comment) @chogata written:

MooseFS does not try to spread the chunks evenly between directories, but rather uses one until it reaches its internal quota of chunks, then moves to the next, to the next and so on. General observations shows that the most frequently used files are usually the ones least recently recorded, so this solution works best in most situations.

Apparently distribution algorithm is flawed but IMHO the presence of such logic is flawed too as which chunks are accessed more frequently can not be predicted.

Please consider making it optional (configurable) to use fair (or completely random) distribution algorithm.

I recommend to remove assumption-based distribution logic in favour of completely random bias-less distribution.

Random distribution of chunk files among chunkserver directories is always better because fair distribution reduces access concentration. More data in one place means more hits, more congestion and less benefits from distribution. Imagine one chunkserver having 40...50 times more data than any other chunkserver -- guess which chunkserver would be the busiest?? Same thing with number of files per directory.

The text was updated successfully, but these errors were encountered:

chogata · 2020-01-17T08:48:59Z

I will re-iterate: we made a lot of tests, and not only artificial tests, but we introduced this algorithm on several working instances of MooseFS (big and busy ones) and in every case the cumulated distribution (10000 per directory) was behaving much better than fair distribution (1 per directory, AKA even distribution). We actually made those tests on users' requests, because the even distribution was not optimal.

Nevertheless, it's an easy change to make the 10000 an option in chunkserver configuration, so we will do that. Setting it to 1 (which we absolutely do not recommend) will give you even distribution of newly created chunks.

Will be introduced together with the bugfix of distribution value going too much over the 10000 maximum.

onlyjob · 2020-01-17T11:58:56Z

Thanks.

eleaner · 2020-01-18T15:58:04Z

I took the liberty of running the check on some of my chunkservers.
using GPL version 3.0.109
It looks like the logic described by @chogata works perfectly when the chunkserver is first filled (or files add?)

I have a new chunkserver just recently balanced with the cluster.
there is a fair number of empty folders (3 elements). For folders with chunks the bigger number of elements is 10004 and the smaller is 791
for an older chunkserver it looks slightly different

there are no empty folders
minimum count is 7
maximum count is 29963
there are four folders with total count of elements minus 7 higher than 10000

in the cluster there are only 107 chunks are waiting for deletion
the load on the cluster is low, chunkservers are running between 1-2
as far as I can see nothing major is happening

…older (issues #319,#326)

… newly created chunks (issues #319,#326)

borkd · 2020-03-03T16:35:43Z

Maybe there should be a "sane" upper bound? 6-9M chunks in one directory, and <10k in all others seems a bit excessive.

acid-maker · 2020-03-04T06:26:16Z

@borkd The situation described here (millions of chunks in one directory and thousands in others) was caused by a software bug that has been fixed by commit 2cbaa82

onlyjob · 2020-04-12T23:58:31Z

Unfortunately this bug is not fixed. On 3.0.112, on freshly added disk, directory 30 grew to accommodate over 80_000 files -- 80 times the amount of files than in most other directories (with many directories having only around 1000 files).

On some other (randomly checked) disks directory 30 tends to be heavier than most in regards to number of files.

Please reopen.

chogata · 2020-04-14T13:14:08Z

If you added the new disk to a chunk server that already had a hdd affected by this bug, this may have "copied", because this condition is not checked when copying chunks between disks in the - a chunk goes to the "same" directory on the new disk when it's moved from the old one. The algorithm only works for chunks created via MooseFS file system, not for internal copying.

I remember you were hit particularly strongly by this bug, so unless you re-distributed all badly distributed chunks manually, this will still persist.
Can you confirm the state of your other disks in regard to chunk distribution before the new one was added?

Also, remember that duplicates don't count.

After we found this bug we created tests that were able to replicate it and those tests show the new version works as intended. So you need to be able to confirm 100% that you don't propagate the uneven distribution from old bug and that you don't have duplicates in your system.

onlyjob · 2020-04-14T14:11:14Z

Thanks. Most if not all of my chunkservers' HDDs are affected by disproportional distribution of chunk files. Since this bug is considered "fixed" I was surprised to find the same condition on new HDD... No duplicates involved.

What would be the best way to equalise distribution of chunk files among folders on chunkserver with many HDDs of which all are affected by the problem?

chogata · 2020-04-15T09:31:02Z

We introduced a change for the internal rebalance into the code, so it's gonna start working in the next release and will also distribute chunks evenly when copying disk-to-disk. However, this would require you to insert new disks into all your chunk servers and indicate old ones should be emptied, wait for them to copy data, pull old disks and clear (format) them to re-use them etc. until the whole system balances itself eventually. It's gonna be a long process and will require spare disks, although, if you have lots of those lying around, you can do it 8000 on several chunk servers at the same time.

Much quicker would be to just do it manually (I thought you already did that, but it seems you didn't?): shut down chunk servers one by one and re-distribute chunks manually, delete .chunkdb before starting up again.

For anybody with the same problem, hints on how to do it properly (repeat for each disk):

calculate the total number of chunks, divide by number of directories (256) to determine how many chunks on average should go into each directory
(important!) isolate all directories that go a lot over the average number of chunks in them and rename them to some other names (for example tmp_[OLD NAME]); for each renamed directory create a new directory with the old name, so all 256 original directory names are back there
re-distribute the files so all 256 regular MooseFS directories contain more or less the same number of chunks
delete the renamed (tmp_*) empty directories
The trick is to get rid of the physical directories that had too many chunks in them, if you don't do that, your MooseFS will still suffer from long reading times.

onlyjob · 2020-04-15T10:04:49Z

Manual re-distribution takes a long time and requires to stop chunkserver. I do have a proof-of-concept script but if all HDDs has to be processed at once (because processing one disk at a time does not work due to propagation of proble from other disks) then the task becomes very tedious... So far I've only did manual re-distribution on two chunkservers with least number of HDDs (and not even on all HDDs yet) where the problem was the most prominent.

Incorporating correction to internal distribution algorithm is much appreciated. Thanks.

…ssue #326)

onlyjob mentioned this issue Jan 16, 2020

Connection with chunkserver timed out (3.0.109) #319

Closed

onlyjob changed the title ~~abnormal distribution of chunk files~~ abnormal disproportional distribution of chunk files Jan 16, 2020

onlyjob changed the title ~~abnormal disproportional distribution of chunk files~~ chunkserver: abnormal/disproportional distribution of chunk files Jan 17, 2020

acid-maker added a commit that referenced this issue Jan 22, 2020

(cs) fixed bug that may lead to creating much more chunks in one subf…

2cbaa82

…older (issues #319,#326)

acid-maker added a commit that referenced this issue Jan 22, 2020

(cs) added option for controling subdirectory selection algorithm for…

02ba9dc

… newly created chunks (issues #319,#326)

acid-maker added the confirmed bug Confirmed bug label Feb 5, 2020

onlyjob mentioned this issue Feb 20, 2020

blocked I/O during chunkserver scanning #343

Open

acid-maker closed this as completed Mar 4, 2020

acid-maker added a commit that referenced this issue Apr 21, 2020

(cs) added changing subfolder during internal rebalance (related to i…

a29d9a0

…ssue #326)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chunkserver: abnormal/disproportional distribution of chunk files #326

chunkserver: abnormal/disproportional distribution of chunk files #326

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunkserver: abnormal/disproportional distribution of chunk files #326

chunkserver: abnormal/disproportional distribution of chunk files #326

Comments

Uh oh!

Please consider making it optional (configurable) to use fair (or completely random) distribution algorithm.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!