Description
Currently the implementation for filtering points on the scatterplot to the view window uses an O(n) solution (iterates over all points then filters them). However, assuming the points are sorted in the array, a binary search strategy reduces this to O(log n). As there may be many points (600k+ across all the plots) and because this operation is done every time the user moves the view window, it is worthwhile to optimize this.
The basic idea is to convert the start/end position of the genomic window to an index of the points array. Since the array sorted, you can then just slice the array with the returned indices to get all the points within that window. A binary search is necessary because the distance between the x-positions are variable.
Some generic methods for doing the search are given below:
function findLowerBound(arr, x) {
let low = 0, high = arr.length;
while (low < high) {
const mid = Math.floor((low + high) / 2);
if (arr[mid].x < x) {
low = mid + 1;
} else {
high = mid;
}
}
return low;
}
function findUpperBound(arr, x) {
let low = 0, high = arr.length;
while (low < high) {
const mid = Math.floor((low + high) / 2);
if (arr[mid].x <= x) {
low = mid + 1;
} else {
high = mid;
}
}
return low;
}
function getRange(arr, x1, x2) {
const lower = findLowerBound(arr, x1);
const upper = findUpperBound(arr, x2);
return arr.slice(lower, upper);
}
What is omitted here are custom comparison operators (e.g <
, <=
, >
, >=
) to determine the relationship between two genomic coordinates.