Zsmooth - cross-platform, cross-architecture video smoothing functions for Vapoursynth, written in Zig
Goals
- Clean, easy to read code, with a standard scalar (non-SIMD) implementation for every algorithm.
- Support for 8-16 integer, and 16-32 float bit depths. (See FP16 note below)
- Tests for all filters, covering the scalar and vector implementations.
- Support for RGB, YUV, and GRAY colorspaces (assuming an algorithm isn't designed for a specific color space).
- Support Linux, Windows, and Mac.
- Support x86_64 and aarch64 CPU architectures, with all architectures supported by the Zig compiler being possible in theory.
- (Eventually) Vapoursynth and Avisynth support. (Whenever I get the spare time and motivation.)
Note on FP16: FP16 support is a work in progress. All functions support it but some are much slower than they need to be. Future Zig versions should make this easier, see this Zig issue for more details.
Note on AVX2: AVX2 is the assumed baseline for all pre-built x86_64 binaries. AVX2 has been out available since 2013, so there's very little hardware left that doesn't support it. If there's demand for pre-AVX2 builds, please open an issue and explain (in detail) your needs and reasoning.
Please see this pinned issue for the current list, and up vote accordingly.
TemporalMedian is a temporal denoising filter. It replaces every pixel with the median of its temporal neighbourhood.
This filter will introduce ghosting, so use with caution.
core.zsmooth.TemporalMedian(clip clip[, int radius = 1, int[] planes = [0, 1, 2]])
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
radius | int | 1 - 10 (1) | Size of the temporal window from which to calculate the median. First and last radius frames of a clip are not filtered. |
planes | int[] | ([0, 1, 2]) | Which planes to process. Any unfiltered planes are copied from the input clip. |
TemporalSoften averages radius * 2 + 1 frames. A pixel is included in the average only if the absolute difference between it and the middle frame's corresponding pixel is less than the threshold.
If the scenechange
parameter is -1
, or greater than 0, TemporalSoften will not average
frames from different scenes.
Setting scenechange
to -1
skips the internal invocation of SCDetect from Misc
filters and uses the standard "_SceneChangePrev" and
"_SceneChangeNext" properties, which should be set by other scene detection filters prior to invoking TemporalSoften.
core.zsmooth.TemporalSoften(clip clip[, int radius = 4, float[] threshold = [], int scenechange = 0, bool scalep=False])
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
radius | int | 1 - 7 (4) | Size of the temporal window. This is an upper bound. At the beginning and end of the clip, only legally accessible frames are incorporated into the radius. So if radius if 4, then on the first frame, only frames 0, 1, 2, and 3 are incorporated into the result. |
threshold | float[] | 0 - 255 8-bit, 0 - 65535 16-bit, 0.0 - 1.0 float ([4,4,4] RGB, [4, 8, 8] YUV, [4] GRAY) | If the difference between the pixel in the current frame and any of its temporal neighbors is less than this threshold, it will be included in the mean. If the difference is greater, it will not be included in the mean. If set to -1, the plane is copied from the source. |
scenechange | int | -1 - 255 (-1) | Zero (0) disables scene change detection, negative one (-1) respects any existing scene change properties ("_SceneChangePrev", "_SceneChangeNext") and does not call SCDetect from Misc filters. If greater than zero, it is calculated as a percentage internally (scenechange/255) to qualify if a frame is a scenechange or not. Currently requires the SCDetect filter from the Miscellaneous filters plugin. |
scalep | bool | (False) | Parameter scaling. If set to true, all threshold values will be automatically scaled from 8-bit range (0-255) to the corresponding range of the input clip's bit depth. |
RemoveGrain is a spatial denoising filter.
Modes 0-24 are implemented. Different modes can be specified for each plane. If there are fewer modes than planes, the last mode specified will be used for the remaining planes.
Note on differences:
- Edge pixels are properly processed using a "mirror"-based algorithm. Meaning that any pixel values that are absent at an edge are filled in by mirroring the data from the opposite side. Other implementations simply skip (copy) edge pixels verbatim.
- This plugin operates slightly differently than RGSF, the 'single precision' floating point Vapoursynth implementation of RemoveGrain. Specifically, RGSF isn't actually 'single precision' - it's double precision. Even for operations that don't benefit from increased floating point precision. This means that RGSF is actually significantly slower than it needs to be for some/most operations.
The implementation in this plugin properly uses single precision floating point for all modes. This is exactly the same approach that the Avisynth version of RgTools takes. It does mean that for some operations, the output will very sligtly differ between RGSF and this plugin, as RGSF is technically doing higher precision (but much slower) calculations.
core.zsmooth.RemoveGrain(clip clip, int[] mode)
Parameters:
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
mode | int | 1-24 | For a description of each mode, see the docs from the original Vapoursynth documentation here: https://github.com/vapoursynth/vs-removegrain/blob/master/docs/rgvs.rst |
Repairs unwanted artifacts from (but not limited to) RemoveGrain.
Modes 0-24 are implemented. Different modes can be specified for each plane. If there are fewer modes than planes, the last mode specified will be used for the remaining planes.
Notes on differences: This implementation of Repair is different than others in 2 key ways:
- Edge pixels are properly processed using a "mirror"-based algorithm. Meaning that any pixel values that are absent at an edge are filled in by mirroring the data from the opposite side. Other implementations simply skip (copy) edge pixels verbatim.
- Unlike RGSF, all calculations are done in single precision floating point. See the note on
RemoveGrain
for more information.
core.zsmooth.Repair(clip clip, clip repairclip, int[] mode)
Parameters:
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
repairclip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Reference clip, often is (but not required to be) the original unprocesed clip | |
mode | int | 1-24 | For a description of each mode, see the docs from the original Vapoursynth documentation here: https://github.com/vapoursynth/vs-removegrain/blob/master/docs/rgvs.rst |
VerticalCleaner is a fast vertical median filter.
Different modes can be specified for each plane. If there are fewer modes than planes, the last mode specified will be used for the remaining planes.
Mode 0 The input plane is simply passed through.
Mode 1 Vertical median.
Mode 2 Relaxed vertical median (preserves more detail).
Let b1, b2, c, t1, t2 be a vertical sequence of pixels. The center pixel c is to be modified in terms of the 4 neighbours. For simplicity let us assume that b2 <= t1. Then in mode 1, c is clipped with respect to b2 and t1, i.e. c is replaced by max(b2, min(c, t1)). In mode 2 the clipping intervall is widened, i.e. mode 2 is more conservative than mode 1. If b2 > b1 and t1 > t2, then c is replaced by max(b2, min(c, max(t1,d1))), where d1 = min(b2 + (b2 - b1), t1 + (t1 - t2)). In other words, only if the gradient towards the center is positive on both clipping ends, then the upper clipping bound may be larger. If b2 < b1 and t1 < t2, then c is replaced by max(min(b2, d2), min(c, t1)), where d2 = max(b2 - (b1 - b2), t1 - (t2 - t1)). In other words, only if the gradient towards the center is negative on both clipping ends, then the lower clipping bound may be smaller.
In mode 1 the top and the bottom line are always left unchanged. In mode 2 the two first and the two last lines are always left unchanged.
core.zsmooth.VerticalCleaner(clip clip, int[] mode)
Parameters:
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
mode | int | 0-2 | Mode 0 is passthrough, Mode 1 is a vertical median, Mode 2 is a relaxed vertical median that preserves more detail |
Clense is a temporal median of three frames. (previous, current and next) ForwardClense is a modified version of Clense that works on current and next 2 frames. BackwardClense is a modified version of Clense that works on current and previous 2 frames.
core.zsmooth.Clense(clip clip, [clip previous, clip next, int[] planes])
core.zsmooth.ForwardClense(clip clip,[ int[] planes])
core.zsmooth.BackwardClense(clip clip,[ int[] planes])
Parameters:
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
previous | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | (main clip) | Optional alternate clip from which to retrieve previous frames |
next | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | (main clip) | Optional alternate clip from which to retrieve next frames |
planes | int[] | ([0, 1, 2]) | Which planes to process. Any unfiltered planes are copied from the input clip. |
core.zsmooth.FluxSmoothT(clip clip[, float[] temporal_threshold = 7, float[] planes = [0,1,2], bool scalep=False])
core.zsmooth.FluxSmoothST(clip clip[, float[] temporal_threshold = 7, float[] spatial_threshold = 7, float[] planes = [0,1,2], bool scalep = False])
FluxSmoothT (T\ emporal) examines each pixel and compares it to the corresponding pixel in the previous and next frames. Smoothing occurs if both the previous frame's value and the next frame's value are greater, or if both are less than the value in the current frame.
Smoothing is done by averaging the pixel from the current frame with the pixels from the previous and/or next frames, if they are within temporal_threshold.
FluxSmoothST (S\ patio\ T\ emporal) does the same as FluxSmoothT, except the pixel's eight neighbours from the current frame are also included in the average, if they are within spatial_threshold.
The first and last rows and the first and last columns are not processed by FluxSmoothST.
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
temporal_threshold | float[] | -1 - bit depth max ([7,7,7]) | Temporal neighbour pixels within this threshold from the current pixel are included in the average. Can be specified as an array, with values corresonding to each plane of the input clip. A negative value (such as -1) indicates that the plane should not be processed and will be copied from the input clip. |
spatial_threshold | float[] | -1 - bit depth max ([7,7,7]) | Spatial neighbour pixels within this threshold from the current pixel are included in the average. A negative value (such as -1) indicates that the plane should not be processed and will be copied from the input clip. |
planes | int[] | ([0, 1, 2]) | Which planes to process. Any unfiltered planes are copied from the input clip. |
scalep | bool | (False) | Parameter scaling. If set to true, all threshold values will be automatically scaled from 8-bit range (0-255) to the corresponding range of the input clip's bit depth. |
core.zsmooth.DegrainMedian(clip clip[, float[] limit, int[] mode, bool scalep])
Modes:
Mode | Description |
---|---|
0 | Spatial-Temporal version of RemoveGrain mode 9. Essentially a line (or edge) sensitive, limited, clipping function. Clipping parameters are calculated from the minimum difference of the current pixels spatial-temporal neighbors, in a 3x3 grid. |
1 | Spatial-Temporal and stronger version of RemoveGrain mode 8 |
2 | Spatial-Temporal version of RemoveGrain Mode 8 |
3 | Spatial-Temporal version of RemoveGrain Mode 7 |
4 | Spatial-Temporal version of RemoveGrain Mode 6 |
5 | Spatial-Temporal version of RemoveGrain Mode 5 |
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
limit | float[] | 0 - bit depth max ([7, 7, 7]) | The maximum amount that a pixel can change. A higher limit results in more smoothing. Can be specified as an array, with values corresonding to each plane of the input clip. |
mode | int[] | 0 - 5, inclusive ([1,1,1]) | The processing mode. 0 is the strongest, 5 is the weakest. Can be specified as an array, with values corresponding to each plane. |
scalep | bool | (False) | Parameter scaling. If set to true, all threshold values will be automatically scaled from 8-bit range (0-255) to the corresponding range of the input clip's bit depth. |
core.zsmooth.InterQuartileMean(clip clip[, int[] radius])
Smartish spatial blurring filter, works well as a prefilter.
Works well with limit_filter
from vs-jetpack
(or similar) for limiting/thresholding.
Performs an interquartile mean of a 3x3 grid. An interquartile mean is a mean (average) where the darkest 1/4 and brightest 1/4 of pixels in the grid are thrown out, and the remaining middle values are averaged. This prevents the extremes from skewing the average.
Future versions will support 5x5 (and maybe 7x7).
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
radius | int[] | 1 | The spatial radius of the filter. Currently only 1 (3x3) is supported, but future versions will include higher radii |
core.zsmooth.TTempSmooth(vnode clip[, int maxr=3, int[] thresh=[4, 5, 5], int[] mdiff=[2, 3, 3], int strength=2, float scthresh=12.0, bint fp=True, vnode pfclip=None, int[] planes=[0, 1, 2]])
TTempSmooth is a motion adaptive (it only works on stationary parts of the picture), temporal smoothing filter.
It's essentially a fancy lookup table internally, but it works by computing a set of weights based on the input parameters, and then applying those weights based on the temporal differences of the input clip (or pfclip, if provided).
Higher weights contribute more to the final pixel value, and lower weights contribute less.
The parameters are related to each other, with maxr
and strength
governing the temporal distance and temporal
weight, respectively. Frames closer to the center have a higher weight, and frames further from the center have a lower
weight.
thresh
and mdiff
govern the weights concerning the difference in pixel values between frames. Smaller differences
are weighted higher and larger differences are weighted lower.
Note that there are essentially two modes - a simple temporal weighted mode, and a temporal + difference weighted mode.
The former is activated when mdiff >= threshold - 1
. This disables all difference weighting, and simply weights pixels
that have a temporal difference below threshold
based on how far they are from the center. This is the fastest mode.
The latter is activated when mdiff < threshold - 1
. In this mode, temporal weights and difference weights are
applied. So in addition to the weights applied in the previous mode, the amount that a pixel differs from the center
effects how much weight is given to it. Again, smaller differences have higher weights.
Parameter | Type | Options (Default) | Description |
---|---|---|---|
clip | 8-16 bit integer, 16-32 bit float, RGB, YUV, GRAY | Clip to process | |
radius | int[] | 1 | The spatial radius of the filter. Currently only 1 (3x3) is supported, but future versions will include higher radii |
maxr | int | 1-7 (3) | This sets the maximum temporal radius. By the way it works TTempSmooth automatically varies the radius used... this sets the maximum boundary. At 1 TTempSmooth will be (at max) including pixels from 1 frame away in the average (3 frames total will be considered counting the current frame). At 7 it would be including pixels from up to 7 frames away (15 frames total will be considered). With the way it checks motion there isn't much danger in setting this high, it's basically a quality vs. speed option. Lower settings are faster while larger values tend to create a more stable image. |
thresh | int[] | ([4, 5, 5]) | (8-bit scale) Your standard thresholds for differences of pixels between frames. TTempSmooth checks 2 frame distance as well as single frame, so these can usually be set slightly higher than with most other temporal smoothers and still avoid artifacts. Valid settings are from 1 to 256. Also important is the fact that as long as mdiff is less than the threshold value then pixels with larger differences from the original will have less weight in the average. Thus, even with rather large thresholds pixels just under the threshold won't have much weight, helping to reduce artifacts. If a single value is specified, it will be used for all planes. If two values are given then the second value will be used for the third plane as well. |
mdiff | int[] | ([2, 3, 3]) | (8-bit scale) Any pixels with differences less than or equal to mdiff will be blurred at maximum. Usually, the larger the difference to the center pixel the smaller the weight in the average. mdiff makes TTempSmooth treat pixels that have a difference of less than or equal to mdiff as though they have a difference of 0. In other words, it shifts the zero difference point outwards. Set mdiff to a value equal to or greater than thresh-1 to completely disable inverse pixel difference weighting. Valid settings are from 0 to 255. If a single value is specified, it will be used for all planes. If two values are given then the second value will be used for the third plane as well. |
strength | int | 1-8 (2) | TTempSmooth uses inverse distance weighting when deciding how much weight to give to each pixel value. The strength option lets you shift the drop off point away from the center to give a stronger smoothing effect and add weight to the outer pixels. It does for the spatial weights what mdiff does for the difference weights. |
scthresh | float | -1.0 - 0 - 100.0 (12.0) | The standard scenechange threshold as a percentage of maximum possible change of the luma plane. A good range of values is between 8 and 15. Set scthresh to 0.0 to disable scenechange detection. Set scthresh to -1 to disable calls to misc.SCDetect internally and just use existing _SceneChangePrev/Next properties (useful for when said properties have already been set prior to calling this function). |
fp | bool | True | Setting fp=True will add any weight not given to the outer pixels back onto the center pixel when computing the final value. Setting fp=False will just do a normal weighted average. fp=True is much better for reducing artifacts in motion areas and usually produces overall better results. |
pfclip | same format clip as clip |
(none) | This allows you to specify a separate clip for TTempSmooth to use when calculating pixel differences. This applies to checking the motion thresholds, calculating inverse difference weights, and detecting scenechanges. Basically, the pfclip will be used to determine the weights in the average but the weights will be applied to the original input clip's pixel values. |
planes | int[] | ([0, 1, 2]) | Which planes to process. Any unfiltered planes are copied from the input clip. |
Example of the impact of strength
on the temporal weights, with the center frame being in the middle of each line:
- 1 = 0.13 0.14 0.16 0.20 0.25 0.33 0.50 1.00 0.50 0.33 0.25 0.20 0.16 0.14 0.13
- 2 = 0.14 0.16 0.20 0.25 0.33 0.50 1.00 1.00 1.00 0.50 0.33 0.25 0.20 0.16 0.14
- 3 = 0.16 0.20 0.25 0.33 0.50 1.00 1.00 1.00 1.00 1.00 0.50 0.33 0.25 0.20 0.16
- 4 = 0.20 0.25 0.33 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.50 0.33 0.25 0.20
- 5 = 0.25 0.33 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.50 0.33 0.25
- 6 = 0.33 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.50 0.33
- 7 = 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.50
- 8 = 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
The values shown are for maxr=7
, when using smaller radius values the weights outside of the range are simply dropped. Thus, setting strength
to a value of maxr+1
or higher will give you equal spatial weighting of all pixels in the kernel.
All build artifacts are placed under zig-out/lib
.
To build for the operating system and architecture of the current machine:
zig build -Doptimize=ReleaseFast
Zig has excellent cross-compilation support, letting us create Windows, Mac, or Linux compatible libraries from any of those same operating systems and architectures.
To generate Windows compatible DLLs, with AVX2 support:
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v3
To generate Windows compatible DLLs with AVX512 support:
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=x86_64_v4
# or the following for specific targeting of AMD Zen4 CPUs
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-windows -Dcpu=znver4
See https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 for a better breakdown on which CPUs support AVX512 features.
To generate Mac (x86_64) compatible libraries:
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-macos
To generate Mac (aarch64) ARM compatible libraries:
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-macos
To generate Mac (aarch64) ARM compatible libraries for a specific CPU (like M1, M2, etc):
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-macos -Dcpu=apple_m1
Use zig targets
to see an exhaustive list of all architectures, CPUs, and operating systems that Zig supports.
The following open source software provided great inspiration and guidance, and this plugin wouldn't exist without the hard work of their authors.
- Avisynth RemoveGrain: https://github.com/pinterf/RgTools
- Vapoursynth RemoveGrain: https://github.com/vapoursynth/vs-removegrain
- Vapoursynth TemporalSoften: https://github.com/dubhater/vapoursynth-temporalsoften2
- Vapoursynth TemporalMedian: https://github.com/dubhater/vapoursynth-temporalmedian
- Neo Temporal Median: https://github.com/HomeOfAviSynthPlusEvolution/neo_TMedian
- Vapoursynth FluxSmooth: https://github.com/dubhater/vapoursynth-fluxsmooth/
- Dogway's
ex_median
functions: https://github.com/Dogway/Avisynth-Scripts/blob/c6a837107afbf2aeffecea182d021862e9c2fc36/ExTools.avsi#L2456