8000 Fixed a bug in preprocessing in crop_to_nonzero. #2796 by kreininmv · Pull Request #2824 · MIC-DKFZ/nnUNet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fixed a bug in preprocessing in crop_to_nonzero. #2796 #2824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kreininmv
Copy link

Hi!
#2796

I’ve identified the root cause of the issue. In preprocessing.preprocessors.default_preprocessor, the function crop_to_nonzero is called to generate a nonzero_mask of the image and crop to the region of interest—but that region is defined as every non-zero pixel in the image. Clearly, that’s not correct: any voxel inside the patient can be non-zero, and manufacturer-specific differences in CT scanner background intensity mean the background is almost never exactly zero.

I see several possible solutions:

  1. Swap the preprocessing steps: run normalize first, then crop_to_nonzero. However, this has drawbacks: if users have already cropped tightly around an organ or body region, the mask may over-crop and break training. Moreover, the CT normalization used doesn’t zero out voxels outside the body, which could lead to label errors (for example, mask labels becoming –1).

  2. Remove this heuristic entirely and let users define the ROI before preprocessing. This would increase training time, but I believe avoiding hard-to-detect errors is worth it. I suggest this option.

  3. Add a toggle parameter to enable or disable cropping. But this approach will always perform worse than other options and require extra tuning, since background intensities vary so widely. A better idea is to leverage your body-localization model [Foreground-and-Anonymization-Area-Segmentation](https://github.com/MIC-DKFZ/Foreground-and-Anonymization-Area-Segmentation). The downsides are needing to download additional weights and perform an extra inference step, plus a small chance of failure on unusual anatomies. In my opinion, though, it’s far more robust than any simple algorithmic ROI heuristic.

I spent considerable time developing heuristics to separate patient body from background, but they invariably fail when applied to diverse data from different hospitals and scanners (on the order of 1,000–2,000 varied samples). In my experience, there’s no reliable algorithmic solution for this - it simply didn’t work for me.

Example of work crop_to_nonzero

I understand that at the moment this doesn’t cause major training errors—since the training pipeline applies an augmentation that replaces label values of –1 with 0—but it could potentially lead to errors if mask values, rather than background, get replaced.

@FabianIsensee FabianIsensee self-assigned this May 26, 2025
@FabianIsensee
Copy link
Member
FabianIsensee commented Jun 2, 2025

Hi there, thanks for the comprehensive writeup. I believe however that there are some thins you did not interpret quite right, leading to the perception there is a bug or unintended behavior. I have already written a short response outlining the need for the nonzero mask in the issue you link to, here is a rundown for why none of the suggestions are quite fitting:

  1. There is a tiny little detail you have not considered: binary_fill_holes in create_nonzero_mask. This removes all zero values pixels at the center of an image. The final mask should never have -1's inside the body of a CT scan. The cropping is always done such that it only removes parts of the array that are all-zero. There is never a single pixel removed that contains nonzero values. So no information is lost and we never have overcropping. The only situation in which the cropping can cause harm is if you have labels that extend beyond the visible object into a zero-valued region that also coincidentally lies at the border of the image.
  2. I hate this heuristic. It has caused a lot of problems and headaches in the past. But it needed. Without it we would achieve worse results in BraTS-like datasets. And we would waste compute training on empty all-zero background patches.
  3. Toggle parameter could be added but would like you said cause additional complexity. Maybe we can consider adding an optional toggle in the future that gives users more control. Using a body recognition model as a substitute would be a very bad idea, not just because these models may fail but especially because nnU-Net is not just a method for medical data. It can be used for any segmentation task and we really want to keep this flexibility

Since you seem to have gone quite deep into the cropping and normalization rabbit hole may I ask whether you have encountered any issues arising from it? I would be very interested in that!
Best,
Fabian

PS: also remember that the -1 mask is basically never used unless cropping to nonzero region resulted in a substantial reduction in image size in which case it is used to apply normalization in a masked way

@kreininmv
Copy link
Author

Thank you very much for your answer!
I really may not understand something completely correctly, so there may be some misunderstandings.

  1. I took one of the random studies I worked with and built a mask using your create_to_nonzero_mask function together with my own bounding-box function, which constrains by the mask itself. From that I obtained the following results.
image

I then dug into the details of avcl_utils, where the logic for how the mask is constrained by the bounding box works differently. I built it using that approach and got these results. This is a yellow rectangle spanning the entire volume - its contours are only visible a bit at the top (i.e., the overall image size remains the same, and this did not affect the final result).
image

If you look at the mask itself, you can see that binary_fill_holes still does introduce points inside the body, although this only affects the first and last axial slices.
image

  1. In my testing experience, the model for body segmentation can indeed capture extra regions, but I have not observed a case where they fail to segment something entirely.

If we talk about how to isolate the body using some heuristics, then it’s probably not worth using mask = image != 0, because the background intensity unless it’s a CT scanner table on which the patient lies is almost always the lowest value, but it is never zero, I suspect that's how you knew it all.

I tried to segment the body mask using heuristics. I had hypotheses that, by using certain intensity statistics of the image, I would be able to delineate the body mask cleanly, but those approaches did not work. The best option turned out to be choosing a specific HU threshold and segmenting based on that [I settled on –600 HU, although I don’t really know why - it was empirical]. Overall, ended up with three versions that work reliably:

  1. Take all pixels that are above than -600 HU, apply morphological operations (opening, closing) with a large kernel (7), and then take the largest connected component. This gives a generally stable result, although obviously all the lungs and airways will not be segmented, but you can already build a reasonably good bounding box.
image
  1. Next, I tested the same procedure but segmented the background instead, performed morphology on it, took its largest connected component, and then inverted that mask. This works significantly better: the airways in the patient are correctly included, but part of the CT scanner apparatus is captured as well.
image
  1. The best approach, in my experience, was to repeat the previous step, take morphological opening, then take the largest connected component, after that morphological closing with the same kernel. This removes artifacts from the CT scanner and segments the body mask (including the airways and lungs) quite accurately.
image

I tried other approaches based on image-intensity statistics or more generic table-based methods, but the results were unclear, so I’m not including them here. I understand that this example does not cover every possible case, but the conclusions were drawn based on a review of the flow of studies from different hospitals and hospitals.. There can be significant problems when a patient has their head positioned such that the mouth and nose are included, causing the largest connected component of the background to encompass the airways and lungs - making the last two methods less stable and preventing the lungs from being included as part of the body. Also understand that this may not work well if the organs (for example, the lungs) have already been pre-cut before training nnUNetv2, I have not tested these methods on such cases. Additionally, CT equipment settings can vary greatly, and the patient may not be separable from the table on which they lie. However, I believe that if we apply your method for obtaining the bounding box, everything will work much more reliably.

Maybe this will help you somehow.

Best,
Matvei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0