8000 chunkserver: abnormal/disproportional distribution of chunk files · Issue #326 · moosefs/moosefs · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

chunkserver: abnormal/disproportional distribution of chunk files #326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
onlyjob opened this issue Jan 16, 2020 · 10 comments
Closed

chunkserver: abnormal/disproportional distribution of chunk files #326

onlyjob opened this issue Jan 16, 2020 · 10 comments
Labels
confirmed bug Confirmed bug

Comments

@onlyjob
Copy link
Contributor
onlyjob commented Jan 16, 2020

Chunkserver is not spreading chunk files among directories fairly, evenly or even randomly.

Consider the following assessment of one chunkserver HDD:

Click to expand: [seconds to walk :: directory :: number of files]
$  for D in $(find -maxdepth 1 -type d -printf '%P\n' | sort); do \
printf ":: $D :: "; tree $D | wc -l; done | ts -i %s

8 :: 00 :: 45112
3 :: 01 :: 39850
2 :: 02 :: 45017
2 :: 03 :: 44686
3 :: 04 :: 43994
2 :: 05 :: 41658
2 :: 06 :: 40154
4 :: 07 :: 40048
29 :: 08 :: 244076
3 :: 09 :: 42750
2 :: 0A :: 45197
2 :: 0B :: 45364
2 :: 0C :: 42651
3 :: 0D :: 42456
2 :: 0E :: 45458
2 :: 0F :: 42104
2 :: 10 :: 42145
2 :: 11 :: 43952
1 :: 12 :: 42432
2 :: 13 :: 45485
3 :: 14 :: 45673
2 :: 15 :: 44992
2 :: 16 :: 45443
2 :: 17 :: 40049
2 :: 18 :: 45005
2 :: 19 :: 45306
2 :: 1A :: 42005
3 :: 1B :: 45535
2 :: 1C :: 45485
2 :: 1D :: 44972
2 :: 1E :: 45384
2 :: 1F :: 45656
2 :: 20 :: 45605
2 :: 21 :: 42786
2 :: 22 :: 45094
2 :: 23 :: 43966
2 :: 24 :: 45457
2 :: 25 :: 42309
2 :: 26 :: 45562
2 :: 27 :: 40002
2 :: 28 :: 45588
1 :: 29 :: 40124
2 :: 2A :: 45760
2 :: 2B :: 45556
2 :: 2C :: 40880
1 :: 2D :: 45507
2 :: 2E :: 44101
2 :: 2F :: 40143
2 :: 30 :: 45456
1 :: 31 :: 45587
2 :: 32 :: 45521
2 :: 33 :: 44904
2 :: 34 :: 44025
2 :: 35 :: 40066
2 :: 36 :: 45567
2 :: 37 :: 43891
3 :: 38 :: 38642
2 :: 39 :: 40785
1 :: 3A :: 40365
2 :: 3B :: 50906
3 :: 3C :: 46865
1 :: 3D :: 45387
2 :: 3E :: 45597
2 :: 3F :: 45433
2 :: 40 :: 42164
2 :: 41 :: 45673
2 :: 42 :: 38552
2 :: 43 :: 45663
2 :: 44 :: 45406
1 :: 45 :: 45360
2 :: 46 :: 45485
2 :: 47 :: 45442
2 :: 48 :: 45612
2 :: 49 :: 45417
2 :: 4A :: 44002
2 :: 4B :: 45709
2 :: 4C :: 45551
2 :: 4D :: 44162
2 :: 4E :: 42338
2 :: 4F :: 43851
2 :: 50 :: 45363
3 :: 51 :: 42380
3 :: 52 :: 45255
2 :: 53 :: 45648
2 :: 54 :: 45430
3 :: 55 :: 43788
2 :: 56 :: 45631
2 :: 57 :: 38531
2 :: 58 :: 40189
2 :: 59 :: 42682
2 :: 5A :: 38416
2 :: 5B :: 38464
1 :: 5C :: 42466
2 :: 5D :: 42048
2 :: 5E :: 45315
2 :: 5F :: 41650
2 :: 60 :: 38572
1 :: 61 :: 42315
2 :: 62 :: 38717
2 :: 63 :: 38561
2 :: 64 :: 42442
2 :: 65 :: 43784
1 :: 66 :: 42390
2 :: 67 :: 38843
2 :: 68 :: 42262
2 :: 69 :: 42175
2 :: 6A :: 42375
2 :: 6B :: 39840
2 :: 6C :: 38845
1 :: 6D :: 38553
3 :: 6E :: 42256
2 :: 6F :: 42517
2 :: 70 :: 38588
2 :: 71 :: 38615
1 :: 72 :: 42343
3 :: 73 :: 42434
1 :: 74 :: 42556
2 :: 75 :: 39992
2 :: 76 :: 38606
2 :: 77 :: 45634
2 :: 78 :: 45402
1 :: 79 :: 38542
2 :: 7A :: 43878
2 :: 7B :: 38481
2 :: 7C :: 42255
1 :: 7D :: 53235
3 :: 7E :: 42451
2 :: 7F :: 42529
1 :: 80 :: 40449
2 :: 81 :: 38468
2 :: 82 :: 42201
2 :: 83 :: 43858
1 :: 84 :: 45791
161 :: 85 :: 1961750
2 :: 86 :: 43868
2 :: 87 :: 42475
2 :: 88 :: 43470
2 :: 89 :: 40532
1 :: 8A :: 42686
2 :: 8B :: 42257
2 :: 8C :: 42918
53 :: 8D :: 234492
1 :: 8E :: 44002
2 :: 8F :: 42235
2 :: 90 :: 40394
2 :: 91 :: 44864
2 :: 92 :: 45255
2 :: 93 :: 41999
2 :: 94 :: 41799
1 :: 95 :: 38596
3 :: 96 :: 42744
1 :: 97 :: 42465
2 :: 98 :: 38387
2 :: 99 :: 38565
2 :: 9A :: 42571
2 :: 9B :: 42355
153 :: 9C :: 1322210
3 :: 9D :: 42405
2 :: 9E :: 40034
3 :: 9F :: 44811
2 :: A0 :: 38631
2 :: A1 :: 45187
2 :: A2 :: 42325
2 :: A3 :: 38592
2 :: A4 :: 38516
2 :: A5 :: 40494
115 :: A6 :: 1192862
2 :: A7 :: 38511
2 :: A8 :: 45001
2 :: A9 :: 42399
2 :: AA :: 39865
1 :: AB :: 38652
3 :: AC :: 43823
2 :: AD :: 45640
2 :: AE :: 40367
2 :: AF :: 43941
2 :: B0 :: 45488
2 :: B1 :: 46192
2 :: B2 :: 42293
63 :: B3 :: 804730
3 :: B4 :: 42213
2 :: B5 :: 45559
2 :: B6 :: 40320
2 :: B7 :: 45614
2 :: B8 :: 42670
2 :: B9 :: 38750
2 :: BA :: 42219
369 :: BB :: 3622774
3 :: BC :: 43950
3 :: BD :: 38513
3 :: BE :: 42749
3 :: BF :: 42186
2 :: C0 :: 44939
2 :: C1 :: 44043
2 :: C2 :: 42281
2 :: C3 :: 38511
3 :: C4 :: 40576
2 :: C5 :: 42385
2 :: C6 :: 42104
2 :: C7 :: 38608
2 :: C8 :: 42751
2 :: C9 :: 40154
2 :: CA :: 38682
1 :: CB :: 43832
2 :: CC :: 45574
2 :: CD :: 42634
2 :: CE :: 42255
2 :: CF :: 45347
2 :: D0 :: 42330
2 :: D1 :: 42361
2 :: D2 :: 45601
3 :: D3 :: 44261
1 :: D4 :: 45207
2 :: D5 :: 45777
3 :: D6 :: 38637
2 :: D7 :: 44850
1 :: D8 :: 38559
3 :: D9 :: 42346
2 :: DA :: 38667
2 :: DB :: 44973
124 :: DC :: 1062189
2 :: DD :: 44040
2 :: DE :: 42282
2 :: DF :: 44054
2 :: E0 :: 45583
3 :: E1 :: 45304
2 :: E2 :: 44050
2 :: E3 :: 44019
2 :: E4 :: 42356
2 :: E5 :: 38628
2 :: E6 :: 38696
2 :: E7 :: 45415
3 :: E8 :: 41575
2 :: E9 :: 44003
2 :: EA :: 45469
3 :: EB :: 40114
2 :: EC :: 45564
2 :: ED :: 41276
2 :: EE :: 45604
2 :: EF :: 40099
2 :: F0 :: 45714
2 :: F1 :: 44809
2 :: F2 :: 45635
2 :: F3 :: 45215
2 :: F4 :: 38525
2 :: F5 :: 40062
2 :: F6 :: 45575
2 :: F7 :: 44073
216 :: F8 :: 339613
1 :: F9 :: 42408
3 :: FA :: 45703
2 :: FB :: 42413
2 :: FC :: 43974
2 :: FD :: 44224
2 :: FE :: 45548
1 :: FF :: 46093

Note abnormally large directory BB - 80 times heavier than most with walk time 123 times slower.
Several directories have 40...50 times more files than most directories and walk time on those large directories is slow. Probabilistically heavy directories get more hits and overall performance suffers.

As far as I'm concerned disproportional distribution of chunk files among directories is a bug.

In #319 (comment) @chogata written:

MooseFS does not try to spread the chunks evenly between directories, but rather uses one until it reaches its internal quota of chunks, then moves to the next, to the next and so on. General observations shows that the most frequently used files are usually the ones least recently recorded, so this solution works best in most situations.

Apparently distribution algorithm is flawed but IMHO the presence of such logic is flawed too as which chunks are accessed more frequently can not be predicted.

Please consider making it optional (configurable) to use fair (or completely random) distribution algorithm.

I recommend to remove assumption-based distribution logic in favour of completely random bias-less distribution.

Random distribution of chunk files among chunkserver directories is always better because fair distribution reduces access concentration. More data in one place means more hits, more congestion and less benefits from distribution. Imagine one chunkserver having 40...50 times more data than any other chunkserver -- guess which chunkserver would be the busiest?? Same thing with number of files per directory.

@onlyjob onlyjob changed the title abnormal distribution of chunk files abnormal disproportional distribution of chunk files Jan 16, 2020
@chogata
Copy link
Member
chogata commented Jan 17, 2020

I will re-iterate: we made a lot of tests, and not only artificial tests, but we introduced this algorithm on several working instances of MooseFS (big and busy ones) and in every case the cumulated distribution (10000 per directory) was behaving much better than fair distribution (1 per directory, AKA even distribution). We actually made those tests on users' requests, because the even distribution was not optimal.

Nevertheless, it's an easy change to make the 10000 an option in chunkserver configuration, so we will do that. Setting it to 1 (which we absolutely do not recommend) will give you even distribution of newly created chunks.

Will be introduced together with the bugfix of distribution value going too much over the 10000 maximum.

@onlyjob
Copy link
Contributor Author
onlyjob commented Jan 17, 2020

Thanks.

@onlyjob onlyjob changed the title abnormal disproportional distribution of chunk files chunkserver: abnormal/disproportional distribution of chunk files Jan 17, 2020
@eleaner
Copy link
eleaner commented Jan 18, 2020

I took the liberty of running the check on some of my chunkservers.
using GPL version 3.0.109
It looks like the logic described by @chogata works perfectly when the chunkserver is first filled (or files add?)

  1. I have a new chunkserver just recently balanced with the cluster.
    there is a fair number of empty folders (3 elements). For folders with chunks the bigger number of elements is 10004 and the smaller is 791

  2. for an older chunkserver it looks slightly different

  • there are no empty folders
  • minimum count is 7
  • maximum count is 29963
  • there are four folders with total count of elements minus 7 higher than 10000

in the cluster there are only 107 chunks are waiting for deletion
the load on the cluster is low, chunkservers are running between 1-2
as far as I can see nothing major is happening

@borkd
Copy link
Collaborator
borkd commented Mar 3, 2020

Maybe there should be a "sane" upper bound? 6-9M chunks in one directory, and <10k in all others seems a bit excessive.

@acid-maker
Copy link
Member

@borkd The situation described here (millions of chunks in one directory and thousands in others) was caused by a software bug that has been fixed by commit 2cbaa82

@onlyjob
Copy link
Contributor Author
onlyjob commented Apr 12, 2020

Unfortunately this bug is not fixed. On 3.0.112, on freshly added disk, directory 30 grew to accommodate over 80_000 files -- 80 times the amount of files than in most other directories (with many directories having only around 1000 files).

On some other (randomly checked) disks directory 30 tends to be heavier than most in regards to number of files.

Please reopen.

@chogata
Copy link
Member
chogata commented Apr 14, 2020

If you added the new disk to a chunk server that already had a hdd affected by this bug, this may have "copied", because this condition is not checked when copying chunks between disks in the - a chunk goes to the "same" directory on the new disk when it's moved from the old one. The algorithm only works for chunks created via MooseFS file system, not for internal copying.

I remember you were hit particularly strongly by this bug, so unless you re-distributed all badly distributed chunks manually, this will still persist.
Can you confirm the state of your other disks in regard to chunk distribution before the new one was added?

Also, remember that duplicates don't count.

After we found this bug we created tests that were able to replicate it and those tests show the new version works as intended. So you need to be able to confirm 100% that you don't propagate the uneven distribution from old bug and that you don't have duplicates in your system.

@onlyjob
Copy link
Contributor Author
onlyjob commented Apr 14, 2020

Thanks. Most if not all of my chunkservers' HDDs are affected by disproportional distribution of chunk files. Since this bug is considered "fixed" I was surprised to find the same condition on new HDD... No duplicates involved.

What would be the best way to equalise distribution of chunk files among folders on chunkserver with many HDDs of which all are affected by the problem?

@chogata
Copy link
Member
chogata commented Apr 15, 2020

We introduced a change for the internal rebalance into the code, so it's gonna start working in the next release and will also distribute chunks evenly when copying disk-to-disk. However, this would require you to insert new disks into all your chunk servers and indicate old ones should be emptied, wait for them to copy data, pull old disks and clear (format) them to re-use them etc. until the whole system balances itself eventually. It's gonna be a long process and will require spare disks, although, if you have lots of those lying around, you can do it 8000 on several chunk servers at the same time.

Much quicker would be to just do it manually (I thought you already did that, but it seems you didn't?): shut down chunk servers one by one and re-distribute chunks manually, delete .chunkdb before starting up again.

For anybody with the same problem, hints on how to do it properly (repeat for each disk):

  • calculate the total number of chunks, divide by number of directories (256) to determine how many chunks on average should go into each directory
  • (important!) isolate all directories that go a lot over the average number of chunks in them and rename them to some other names (for example tmp_[OLD NAME]); for each renamed directory create a new directory with the old name, so all 256 original directory names are back there
  • re-distribute the files so all 256 regular MooseFS directories contain more or less the same number of chunks
  • delete the renamed (tmp_*) empty directories
    The trick is to get rid of the physical directories that had too many chunks in them, if you don't do that, your MooseFS will still suffer from long reading times.

@onlyjob
Copy link
Contributor Author
onlyjob commented Apr 15, 2020

Manual re-distribution takes a long time and requires to stop chunkserver. I do have a proof-of-concept script but if all HDDs has to be processed at once (because processing one disk at a time does not work due to propagation of proble from other disks) then the task becomes very tedious... So far I've only did manual re-distribution on two chunkservers with least number of HDDs (and not even on all HDDs yet) where the problem was the most prominent.

Incorporating correction to internal distribution algorithm is much appreciated. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed bug Confirmed bug
Projects
None yet
Development

No branches or pull requests

5 participants
0