Take a look at the org chart of a startup company.
There are 7 employees in the company. Each employee can have up to 2 other employees reporting to them.
Tree is a data structure which is used to represent this information in computer’s memory. Using the org chart let us understand the vocabulary used in trees.
Node – Each circle is called as the node. It is the fundamental part of the tree. Node contains the data. There are 7 nodes and each node contains the employee name.
Edge – An edge connects two nodes. It tells that these nodes are related. A node can have only one incoming edge and many outgoing edges. In our example a node can have utmost 2 outgoing edges. Hence this tree is called as a binary tree. Since there are 7 nodes we have 6 edges.
Root – A node without an incoming edge is called as the root. Peter is the root node.
Leaf – A node without an outgoing edge is called as leaf node. There are 4 leaf nodes – Jane, Chen, Ravi, and Harry.
Sibling – Nodes sharing the same parent are called as siblings. [John,Alan], [Jane, Chen], and [Ravi, Harry] are siblings.
Level – The level of any node is the number of edges you need to cross from the root to reach that node. John and Alan are at level 1. Jane, Chen, Ravi, and Harry are at level 2. Since Peter is the root node it is at level 0.
Height – The height of the tree is the maximum level of any node in the tree. In this case the height is 2.
There are 4 things that each node should have. It needs to hold the data and references to the left, right child and its parent. I am using Python as the language for the implementation.
class BinaryTree: def __init__(self, data, parent=None): self.data = data self.left = None self.right = None self.parent = parent
We need to add methods to get the data, parent, left, and right child. Also methods to tell if a node is root or a leaf.
class BinaryTree: ... def getData(self): return self.data def getLeftChild(self): return self.left def getRightChild(self): return self.right def getParent(self): return self.parent def isRoot(self): if self.parent == None: return True else: return False def isLeaf(self): if self.left == None and self.right == None: return True else: return False
Let us add methods to insert the left and right nodes.
class BinaryTree: ... def insertLeftChild(self, node): l = BinaryTree(node, self) if self.left == None: self.left = l else: l.left = self.left l.left.parent = l self.left = l return l def insertRightChild(self, node): r = BinaryTree(node, self) if self.right == None: self.right = r else: r.right = self.right r.right.parent = r self.right = r return r #Code for creating the binary tree according to the org chart peter = BinaryTree("Peter") john = peter.insertLeftChild("John") alan = peter.insertRightChild("Alan") john.insertLeftChild("Jane") john.insertRightChild("Chen") alan.insertLeftChild("Ravi") alan.insertRightChild("Harry")
We have constructed the org chart using the binary tree data structure. We need a way to access the names of all the employees. Accessing the contents of a tree is called as traversing the tree. There are 3 ways to traverse a tree.
1. inOrder
In the inOrder traversal we visit the left subtree and then the root and finally the right subtree. The nodes will be visited in the following order [Jane, John, Chen, Peter, Ravi, Alan, Harry].
def inOrder(root): if root != None: inOrder(root.getLeftChild()) print root.getData() inOrder(root.getRightChild()) #call the method with the root node peter inOrder(peter)
2. preOrder
In the preOrder traversal we visit the root and then the left subtree and finally the right subtree. The nodes will be visited in the following order [Peter, John, Jane, Chen, Alan, Ravi, Harry].
def preOrder(root): if root != None: print root.getData() preOrder(root.getLeftChild()) preOrder(root.getRightChild()) #call the method with the root node peter preOrder(peter)
3. postOrder
In the postOrder traversal we visit the left subtree and then the right subtree and finally the root. The nodes will be visited in the following order [Jane, Chen, John, Ravi, Harry, Alan, Peter].
def postOrder(root): if root != None: postOrder(root.getLeftChild()) postOrder(root.getRightChild()) print root.getData() #call the method with the root node peter postOrder(peter)
Tree is a powerful data structure. Modification to the structure can be done in constant time. But to find a node for modification it takes linear O(n) amount of time as we need to visit each node until we find the desired one. Can we do better? It turns out that if we store the data in a certain way then we can achieve log2(n) time to find any node. These special trees are called as search trees. Since we are dealing with nodes having 2 references it is called as binary search tree.
Binary Search Trees
To the org chart, I have added an identifier for each employee.
The identifier for Peter is 100. All the identifiers on the left subtree of Peter is less than 100. All the identifiers on the right subtree of Peter is greater than 100. This property is true for each node in the tree. Make sure you understand this.
A binary search tree (BST) relies on the property that keys that are less than the parent are found in the left subtree, and keys that are greater than the parent are found in the right subtree.
Why is this very important?
Imagine that you are searching for Chen, whose identifier is 60. Peter is the root node and has an identifier of 100. Since it is a BST you know that Chen will be on the left hand side of Peter as 60 is less than 100. In the 1st pass we have discarded 1/2 the nodes on the right hand side. You move to John who has an identifier of 50. Chen should on the right hand side of John since 60 is greater than 50. In the 2nd pass we have discarded 1/2 the nodes on the left hand side. On the 3rd pass we found Chen. Since on each pass 1/2 the nodes are discarded the search will take log2(n) time. If you want a refresher on logarithms refer here. On a perfectly balanced BST with 10 trillion items we can find a node with in 44 passes. In a regular tree you need 10 trillion passes.
Let us create a TreeNode class to hold the employee information along with the references to parent, left and right child.
class TreeNode: def __init__(self, key, value, parent=None): self.key = key self.value = value self.parent = parent self.left = None self.right = None
Inserting a new node should take care of the BST property.
class BinarySearchTree: def __init__(self): self.root = None def insert(self, key, value): if self.root: self.__insert(key, value, self.root) else: self.root = TreeNode(key, value) def __insert(self, key, value, node): if key < node.key: if node.left: self.__insert(key, value, node.left) else: node.left = TreeNode(key, value, node) else: if node.right: self.__insert(key, value, node.right) else: node.right = TreeNode(key, value, node) # Code for creating the org chart bst = BinarySearchTree() bst.insert(100, 'Peter') bst.insert(50, 'John') bst.insert(25, 'Jane') bst.insert(60, 'Chen') bst.insert(150, 'Alan') bst.insert(140, 'Ravi') bst.insert(170, 'Harry')
Given below is the code to find the node given the employee identifier. It makes use of the BST property to skip 1/2 the nodes in each pass.
class BinarySearchTree: ... def get(self, key): if self.root: return self.__get(key, self.root) else: return None def __get(self, key, node): if node: if key == node.key: return node elif key < node.key: return self.__get(key, node.left) elif key > node.key: return self.__get(key, node.right) else: return None
If you traverse a BST inorder (left, root and right) then you will visit the nodes in ascending order of the keys. The code given below will output the keys in the following order [25, 50, 60, 100, 140, 150, 170].
def inOrder(root): if root != None: inOrder(root.left) print root.key inOrder(root.right)
Deleting a node in a BST is little bit tricky. After deletion the property of the BST should be maintained. A node can have 0 or 1 or 2 children. Each case needs to be handled differently.
Case 1: Node without any child
Harry(170) is a leaf node. Harry is the right child of Alan. Hence when you delete Harry the right child of Alan should be set to none. On the other hand if you remove Ravi then you should set the left child of Alan to none.
I added few additional methods in the TreeNode class which are used by the delete procedure.
class TreeNode: ... def hasNoChildren(self): return self.left == None and self.right == None def hasOneChildren(self): return self.left or self.right def hasBothChildren(self): return self.left and self.right def findMin(self): current = self while current.left: current = current.left return current
Code to handle the case with no children.
class BinarySearchTree: ... def delete(self, key): #If the root has a value if self.root: # Only root node is present if self.root.key == key and self.root.hasNoChildren(): self.root = None else: node = self.get(key) # found the node with the key if node: #CASE: NO CHILDREN if node.hasNoChildren(): if node.parent.left == node: node.parent.left = None else: node.parent.right = None
Case 2: Node with one child
Chen(60) is a node with Vibhu as its left child. When Chen is deleted Vibhu should be linked to the right child of John. There are few variations that needs to be taken care.
- Node that is being deleted can be a left or right child of the parent.
- Node that is being deleted can have left or right child.
- Node that is being deleted can also be a root node.
Code for deleting node with 1 children.
class BinarySearchTree: ... def delete(self, key): #If the root has a value if self.root: # Only root node is present if self.root.key == key and self.root.hasNoChildren(): self.root = None else: node = self.get(key) # found the node with the key if node: ... #CASE: HAS ONE CHILD elif node.hasOneChildren(): if node.left: if node == self.root: self.root = node.left self.root.parent = None elif node.parent.left == node: node.parent.left = node.left node.left.parent = node.parent elif node.parent.right == node: node.parent.right = node.left node.left.parent = node.parent else: if node == self.root: self.root = node.right self.root.parent = None elif node.parent.left == node: node.parent.left = node.right node.right.parent = node.parent elif node.parent.right == node: node.parent.right = node.right node.right.parent = node.parent
Case 3: Node with two child
Peter is the node with 2 children. When this node is deleted we need to replace it with the node that maintains the BST property. Let us call the node that is going to replace Peter as successor. We can select the successor from the left or right subtree of Peter.
- If we want to pick up a node from the right subtree then we need to pick up the node with the minimum key. Why? The rule of BST tells that all the nodes in the right subtree should have value greater than the root. Hence picking the minimum will satisfy this condition.
- If we want to pick up a node from the left subtree then we need to pick up the node with the maximum key. Why? The rule of BST tells that all the nodes in the left subtree should have value lesser than the root. Hence picking the maximum will satisfy this condition.
In the code given below the successor is the node with the minimum key on the right subtree. Deleting Peter the successor turns out to be Ravi. The successor node can fall into two conditions
- It can be a leaf node. In that case you need to set the parent left reference to none if the successor is a left child.
- It can have 1 child. If that is the case it can only be a right child. Why? Remember since this is the node with minimum value it cannot have a left child. Set the child node to the successor parent’s left node.
def delete(self, key): #If the root has a value if self.root: # Only root node is present if self.root.key == key and self.root.hasNoChildren(): self.root = None else: node = self.get(key) # found the node with the key if node: ... #CASE: NODE WITH 2 CHILDREN elif node.hasBothChildren(): successor = node.right.findMin() node.key = successor.key node.value = successor.value # Successor has no children if successor.hasNoChildren(): if successor.parent.left == successor: successor.parent.left = None elif successor.right: if successor.parent.left == successor: successor.parent.left = successor.right successor.right.parent = successor.parent
Balanced Trees
To the BST let us insert the nodes in the descending order of employee identifiers (Harry, Alan, Ravi, Peter, Chen, John and Jane). Take a look at the tree that we got.
The tree is no longer balanced. Search will take linear time instead of log2(n) time. We need to a way to create a tree that is balanced. There are several techniques to balance a tree. AVL tree is one such technique. It is a self-balancing binary search tree named after its inventors G.M. Adelson-Velsky and E.M. Landis.
In the AVL tree each node will have a variable called heightDifference and it will store the height difference between left and the right subtree.
heightDifference = height(leftSubTree) – height(rightSubTree)
Given below is the code for AvlTreeNode
class AvlTreeNode: def __init__(self, key, value, parent=None): self.key = key self.value = value self.parent = parent self.left = None self.right = None self.heightDifference = 0
Take a look at the image given below. The height difference for Harry is 2.
heightDifference(Harry) = Height(Left Subtree) - Height(Right Subtree) heightDifference(Harry) = 2 - 0 heightDifference(Harry) = 2
If the height difference for the node is greater than 1, the tree starts to get unbalanced and it will have a deep left subtree. If the height difference for the node is less than -1, the tree starts to get unbalanced and it will have a deep right subtree. If the height difference is 0 then the subtrees from that node is balanced.
In the AVL tree each node should have height difference of -1 or 0 or 1. Why do we have this restriction? If we keep the height difference between -1 and 1, the tree becomes balanced and we can get log2(n) time for searches. If the height difference is not between -1 and 1 then we need to fix it by doing rotations. There are 2 types of rotations – left rotation and right rotation.
Of the 3 nodes given below, the height difference for Harry is 2 and hence it violates the AVL tree condition. The tree is left heavy. Hence we need to fix it by doing a right rotation.
After right rotation the tree gets balanced.
Here are the conditions on which you will do either a left or right rotation.
1. Right Rotation – If the node is left tree heavy i.e. the height difference is greater than 1 then you need to do right rotation.
2. Left Rotation – If the node is right tree heavy i.e. the height difference is less than -1 then you need to do left rotation.
Left rotation is asymmetrical to right rotation and hence it should be easy to understand. Given below is the code for doing left and right rotations.
class AvlBinarySearchTree: ... def rotateLeft(self, node): newRoot = node.right node.right = newRoot.left if node.right: node.right.parent = node newRoot.parent = node.parent if self.root == node: self.root = newRoot else: if node.parent.left == node: node.parent.left = newRoot else: node.parent.right = newRoot node.parent = newRoot newRoot.left = node node.heightDifference = node.heightDifference + 1 + max(0, -newRoot.heightDifference) newRoot.heightDifference = newRoot.heightDifference + 1 + max(0, node.heightDifference) def rotateRight(self, node): newRoot = node.left node.left = newRoot.right if node.left: node.left.parent = node newRoot.parent = node.parent if self.root == node: self.root = newRoot else: if node.parent.left == node: node.parent.left = newRoot else: node.parent.right = newRoot node.parent = newRoot newRoot.right = node node.heightDifference = node.heightDifference - 1 + min(0, -newRoot.heightDifference) newRoot.heightDifference = newRoot.heightDifference - 1 + min(0, node.heightDifference)
Majority of the code should be easy to understand except the code for calculating the new height difference for the nodes that gets rotated.
class AvlBinarySearchTree: ... def rotateLeft(self, node): ... node.heightDifference = node.heightDifference + 1 + max(0, -newRoot.heightDifference) newRoot.heightDifference = newRoot.heightDifference + 1 + max(0, node.heightDifference) def rotateRight(self, node): ... node.heightDifference = node.heightDifference - 1 + min(0, -newRoot.heightDifference) newRoot.heightDifference = newRoot.heightDifference - 1 + min(0, node.heightDifference)
For the rotateRight method I am going to show how the value for newRoot.heightDifference is derived.
Goal: To derive newRoot.heightDifference for rotateRight oldA is the node A before the right rotation newA is the node A after the right rotation oldB is the node B before the right rotation newB is the node B after the right rotation NOTE: Only nodes A and B gets affected during the rotation. Height Difference (oldB) = Height(C) - Height(D) Height Difference (newB) = Height(C) - Height(newA) Height(newA) = 1 + Max(Height(D), Height(E)) # This is height of a node is calculated Height Difference (newB) = Height(C) - [1 + Max(Height(D), Height(E))] # Substituting Height(newA) Height Difference (newB) - Height Difference (oldB) = Height(C) - [1 + Max(Height(D), Height(E))] - [Height(C) - Height(D)] = Height(C) - Height(C) + Height(D) -[1 + Max(Height(D), Height(E))] = Height(D) - 1 - Max(Height(D), Height(E)) = -1 +Height(D) - Max(Height(D), Height(E)) = -1 +Height(D) + Min(-Height(D), -Height(E)) # -Max(4,5) can be written as +Min(-4,-5) = -1 + Min(Height(D) - Height(D), Height(D) - Height(E)) = -1 + Min(0, Height(D) - Height(E)) = -1 + Min(0, Height Difference(newA)) Height Difference (newB) = Height Difference (oldB) - 1 + Min(0, Height Difference(newA)) This is same as the code newRoot.heightDifference = newRoot.heightDifference - 1 + Min(0, node.heightDifference)
Using the similar method you can derive the values for the others.
Given below is the code for AVL tree. insert() method is modified slightly as it needs to call updateHeightDifference(). If the node height difference is not between -1 to 1 then rebalance() is called.
class AvlBinarySearchTree: ... def insert(self, key, value): if self.root: self.__insert(key, value, self.root) else: self.root = AvlTreeNode(key, value) def __insert(self, key, value, node): if key < node.key: if node.left: self.__insert(key, value, node.left) else: node.left = AvlTreeNode(key, value, node) self.updateHeightDifference(node.left) else: if node.right: self.__insert(key, value, node.right) else: node.right = AvlTreeNode(key, value, node) self.updateHeightDifference(node.right) def updateHeightDifference(self, node): if node.heightDifference > 1 or node.heightDifference < -1: self.rebalance(node) return #If the node has a parent then adjust its heightDifference if node.parent: if node.parent.left == node: node.parent.heightDifference += 1 if node.parent.right == node: node.parent.heightDifference -= 1 # Move one level as the height difference needs to be propogated if node.parent.heightDifference != 0: self.updateHeightDifference(node.parent) def rebalance(self, node): #Right side heavy if node.heightDifference < 0: #If the right child is left heavy if node.right.heightDifference > 0: self.rotateRight(node.right) self.rotateLeft(node) #Left side heavy elif node.heightDifference > 0: #If the left child is right heavy if node.left.heightDifference < 0: self.rotateLeft(node.left) self.rotateRight(node) ...
In the rebalance method why do we do call rotateLeft on the left child before doing a rotateRight?
class AvlBinarySearchTree: def rebalance(self, node): ... #Left side heavy elif node.heightDifference > 0: #If the left child is right heavy if node.left.heightDifference < 0: self.rotateLeft(node.left) self.rotateRight(node)
The picture given below should answer why we did two rotations instead of one.
You should do double rotation
- If the node is left side heavy and the left child of that node is right side heavy then you need to do a left rotation on the left child and then do a right child rotation on the node itself.
- If the node is right side heavy and the right child of that node is left side heavy then you need to do a right rotation on the right child and then do a left child rotation on the node itself.
I inserted the nodes in the AVL tree in descending order of employee identifiers – (Harry, Alan, Ravi, Peter, Chen, John and Jane). I got a balanced tree as shown below.