A weight-balanced tree is a binary search tree that stores the sizes of subtrees in the nodes. That is, a node has fields
By definition, the size of a leaf (typically represented by a nil pointer) is zero. The size of an internal node is the sum of sizes of its two children, plus one: (size[n] = size[n.left] + size[n.right] + 1). Based on the size, one defines the weight to be weight[n] = size[n] + 1.8 Weight has the advantage that the weight of a node is simply the sum of the weights of its left and right children.
Operations that modify the tree must make sure that the weight of the left and right subtrees of every node remain within some factor α of each other, using the same rebalancing operations used in AVL trees: rotations and double rotations. Formally, node balance is defined as follows:
Here, α is a numerical parameter to be determined when implementing weight balanced trees. Larger values of α produce "more balanced" trees, but not all values of α are appropriate; Nievergelt and Reingold proved that
is a necessary condition for the balancing algorithm to work. Later work showed a lower bound of 2⁄11 for α, although it can be made arbitrarily small if a custom (and more complicated) rebalancing algorithm is used.10
Applying balancing correctly guarantees a tree of n elements will have height11
If α is given its maximum allowed value, the worst-case height of a weight-balanced tree is the same as that of a red–black tree at 2 log 2 n {\displaystyle 2\log _{2}n} .
The number of balancing operations required in a sequence of n insertions and deletions is linear in n, i.e., balancing takes a constant amount of overhead in an amortized sense.12
While maintaining a tree with the minimum search cost requires four kinds of double rotations (LL, LR, RL, RR as in an AVL tree) in insert/delete operations, if we desire only logarithmic performance, LR and RL are the only rotations required in a single top-down pass.13
Several set operations have been defined on weight-balanced trees: union, intersection and set difference. Then fast bulk operations on insertions or deletions can be implemented based on these set functions. These set operations rely on two helper operations, Split and Join. With the new operations, the implementation of weight-balanced trees can be more efficient and highly-parallelizable.1415
The join algorithm is as follows:
Here balance ( x , y ) {\displaystyle (x,y)} means two weights T and T are balanced. expose(v)=(l, k, r) means to extract a tree node T's left child T, the key of the node T and the right child T. Node(l, k, r) means to create a node of left child T, key T and right child T.
The split algorithm is as follows:
The union of two weight-balanced trees t1 and t2 representing sets A and B, is a weight-balanced tree t that represents A ∪ B. The following recursive function computes this union:
Here, Split is presumed to return two trees: one holding the keys less than its input key, the other holding the greater keys. (The algorithm is non-destructive, but an in-place destructive version exists as well.)
The algorithm for intersection or difference is similar, but requires the Join2 helper routine that is the same as Join but without the middle key. Based on the new functions for union, intersection or difference, either one key or multiple keys can be inserted to or deleted from the weight-balanced tree. Since Split and Union call Join but do not deal with the balancing criteria of weight-balanced trees directly, such an implementation is usually called the join-based algorithms.
The complexity of each of union, intersection and difference is O ( m log ( n m + 1 ) ) {\displaystyle O\left(m\log \left({n \over m}+1\right)\right)} for two weight-balanced trees of sizes T and n ( ≥ m ) {\displaystyle n(\geq m)} . This complexity is optimal in terms of the number of comparisons. More importantly, since the recursive calls to union, intersection or difference are independent of each other, they can be executed in parallel with a parallel depth O ( log m log n ) {\displaystyle O(\log m\log n)} .16 When m = 1 {\displaystyle m=1} , the join-based implementation has the same computational directed acyclic graph (DAG) as single-element insertion and deletion if the root of the larger tree is used to split the smaller tree.
Tsakalidis, A. K. (1984). "Maintaining order in a generalized linked list". Acta Informatica. 21: 101–112. doi:10.1007/BF00289142. S2CID 26127563. /wiki/Doi_(identifier) ↩
Nievergelt, J.; Reingold, E. M. (1973). "Binary Search Trees of Bounded Balance". SIAM Journal on Computing. 2: 33–43. doi:10.1137/0202005. /wiki/Doi_(identifier) ↩
This article incorporates public domain material from Paul E. Black. "BB(α) tree". Dictionary of Algorithms and Data Structures. NIST. /wiki/Copyright_status_of_works_by_the_federal_government_of_the_United_States ↩
Hirai, Y.; Yamamoto, K. (2011). "Balancing weight-balanced trees" (PDF). Journal of Functional Programming. 21 (3): 287. doi:10.1017/S0956796811000104. https://yoichihirai.com/bst.pdf ↩
Roura, Salvador (2001). A new method for balancing binary search trees. ICALP. Lecture Notes in Computer Science. Vol. 2076. pp. 469–480. doi:10.1007/3-540-48224-5_39. ISBN 978-3-540-42287-7. 978-3-540-42287-7 ↩
Adams, Stephen (1993). "Functional Pearls: Efficient sets—a balancing act". Journal of Functional Programming. 3 (4): 553–561. doi:10.1017/S0956796800000885. https://doi.org/10.1017%2FS0956796800000885 ↩
This is the definition used by Nievergelt and Reingold. Adams uses the size as the weight directly,[6] which complicates analysis of his variant and has led to bugs in major implementations.[4] ↩
Brass, Peter (2008). Advanced Data Structures. Cambridge University Press. pp. 61–71. https://archive.org/details/advanceddatastru00bras ↩
Blum, Norbert; Mehlhorn, Kurt (1980). "On the average number of rebalancing operations in weight-balanced trees" (PDF). Theoretical Computer Science. 11 (3): 303–320. doi:10.1016/0304-3975(80)90018-3. http://scidok.sulb.uni-saarland.de/volltexte/2011/4019/pdf/fb14_1978_06.pdf ↩
Cho, Seonghun; Sahni, Sartaj (2000). "A New Weight Balanced Binary Search Tree". International Journal of Foundations of Computer Science. 11 (3): 485–513. CiteSeerX 10.1.1.36.3888. doi:10.1142/S0129054100000296. https://www.researchgate.net/publication/2241064 ↩
Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2016), "Just Join for Parallel Ordered Sets", Symposium on Parallel Algorithms and Architectures, Proc. of 28th ACM Symp. Parallel Algorithms and Architectures (SPAA 2016), ACM, pp. 253–264, arXiv:1602.02120, doi:10.1145/2935764.2935768, ISBN 978-1-4503-4210-0, S2CID 2897793. 978-1-4503-4210-0 ↩
Adams, Stephen (1992), Implementing sets efficiently in a functional language, CiteSeerX 10.1.1.501.8427. https://groups.csail.mit.edu/mac/users/adams/BB/92-10.ps.gz ↩