# Trees

Take a look at the org chart of a startup company.

There are 7 employees in the company. Each employee can have up to 2 other employees reporting to them.

Tree is a data structure which is used to represent this information in computer’s memory. Using the org chart let us understand the vocabulary used in trees.

Node – Each circle is called as the node. It is the fundamental part of the tree. Node contains the data. There are 7 nodes and each node contains the employee name.

Edge – An edge connects two nodes. It tells that these nodes are related. A node can have only one incoming edge and many outgoing edges. In our example a node can have utmost 2 outgoing edges. Hence this tree is called as a binary tree. Since there are 7 nodes we have 6 edges.

Root – A node without an incoming edge is called as the root. Peter is the root node.

Leaf – A node without an outgoing edge is called as leaf node. There are 4 leaf nodes – Jane, Chen, Ravi, and Harry.

Sibling – Nodes sharing the same parent are called as siblings. [John,Alan], [Jane, Chen], and [Ravi, Harry] are siblings.

Level – The level of any node is the number of edges you need to cross from the root to reach that node. John and Alan are at level 1. Jane, Chen, Ravi, and Harry are at level 2. Since Peter is the root node it is at level 0.

Height – The height of the tree is the maximum level of any node in the tree. In this case the height is 2.

There are 4 things that each node should have. It needs to hold the data and references to the left, right child and its parent. I am using Python as the language for the implementation.

```class BinaryTree:

def __init__(self, data, parent=None):
self.data = data
self.left = None
self.right = None
self.parent = parent
```

We need to add methods to get the data, parent, left, and right child. Also methods to tell if a node is root or a leaf.

```class BinaryTree:

...

def getData(self):
return self.data

def getLeftChild(self):
return self.left

def getRightChild(self):
return self.right

def getParent(self):
return self.parent

def isRoot(self):
if self.parent == None:
return True
else:
return False

def isLeaf(self):
if self.left == None and self.right == None:
return True
else:
return False
```

Let us add methods to insert the left and right nodes.

```class BinaryTree:

...
def insertLeftChild(self, node):
l = BinaryTree(node, self)
if self.left == None:
self.left = l
else:
l.left = self.left
l.left.parent = l
self.left = l
return l

def insertRightChild(self, node):
r = BinaryTree(node, self)
if self.right == None:
self.right = r
else:
r.right = self.right
r.right.parent = r
self.right = r
return r

#Code for creating the binary tree according to the org chart
peter = BinaryTree("Peter")
john  = peter.insertLeftChild("John")
alan  = peter.insertRightChild("Alan")

john.insertLeftChild("Jane")
john.insertRightChild("Chen")

alan.insertLeftChild("Ravi")
alan.insertRightChild("Harry")

```

We have constructed the org chart using the binary tree data structure. We need a way to access the names of all the employees. Accessing the contents of a tree is called as traversing the tree. There are 3 ways to traverse a tree.

1. inOrder

In the inOrder traversal we visit the left subtree and then the root and finally the right subtree. The nodes will be visited in the following order [Jane, John, Chen, Peter, Ravi, Alan, Harry].

```def inOrder(root):
if root != None:
inOrder(root.getLeftChild())
print root.getData()
inOrder(root.getRightChild())

#call the method with the root node peter
inOrder(peter)
```

2. preOrder

In the preOrder traversal we visit the root and then the left subtree and finally the right subtree. The nodes will be visited in the following order [Peter, John, Jane, Chen, Alan, Ravi, Harry].

```def preOrder(root):
if root != None:
print root.getData()
preOrder(root.getLeftChild())
preOrder(root.getRightChild())

#call the method with the root node peter
preOrder(peter)
```

3. postOrder

In the postOrder traversal we visit the left subtree and then the right subtree and finally the root. The nodes will be visited in the following order [Jane, Chen, John, Ravi, Harry, Alan, Peter].

```def postOrder(root):
if root != None:
postOrder(root.getLeftChild())
postOrder(root.getRightChild())
print root.getData()

#call the method with the root node peter
postOrder(peter)
```

Tree is a powerful data structure. Modification to the structure can be done in constant time. But to find a node for modification it takes linear O(n) amount of time as we need to visit each node until we find the desired one. Can we do better? It turns out that if we store the data in a certain way then we can achieve log2(n) time to find any node. These special trees are called as search trees. Since we are dealing with nodes having 2 references it is called as binary search tree.

## Binary Search Trees

To the org chart, I have added an identifier for each employee.

The identifier for Peter is 100. All the identifiers on the left subtree of Peter is less than 100. All the identifiers on the right subtree of Peter is greater than 100. This property is true for each node in the tree. Make sure you understand this.

A binary search tree (BST) relies on the property that keys that are less than the parent are found in the left subtree, and keys that are greater than the parent are found in the right subtree.

Why is this very important?

Imagine that you are searching for Chen, whose identifier is 60. Peter is the root node and has an identifier of 100. Since it is a BST you know that Chen will be on the left hand side of Peter as 60 is less than 100. In the 1st pass we have discarded 1/2 the nodes on the right hand side. You move to John who has an identifier of 50. Chen should on the right hand side of John since 60 is greater than 50. In the 2nd pass we have discarded 1/2 the nodes on the left hand side. On the 3rd pass we found Chen. Since on each pass 1/2 the nodes are discarded the search will take log2(n) time. If you want a refresher on logarithms refer here. On a perfectly balanced BST with 10 trillion items we can find a node with in 44 passes. In a regular tree you need 10 trillion passes.

Let us create a TreeNode class to hold the employee information along with the references to parent, left and right child.

```class TreeNode:

def __init__(self, key, value, parent=None):
self.key = key
self.value = value
self.parent = parent
self.left = None
self.right = None
```

Inserting a new node should take care of the BST property.

```class BinarySearchTree:

def __init__(self):
self.root = None

def insert(self, key, value):
if self.root:
self.__insert(key, value, self.root)
else:
self.root = TreeNode(key, value)

def __insert(self, key, value, node):
if key < node.key:
if node.left:
self.__insert(key, value, node.left)
else:
node.left = TreeNode(key, value, node)
else:
if node.right:
self.__insert(key, value, node.right)
else:
node.right = TreeNode(key, value, node)

# Code for creating the org chart
bst = BinarySearchTree()
bst.insert(100, 'Peter')
bst.insert(50, 'John')
bst.insert(25, 'Jane')
bst.insert(60, 'Chen')
bst.insert(150, 'Alan')
bst.insert(140, 'Ravi')
bst.insert(170, 'Harry')
```

Given below is the code to find the node given the employee identifier. It makes use of the BST property to skip 1/2 the nodes in each pass.

```
class BinarySearchTree:

...

def get(self, key):
if self.root:
return self.__get(key, self.root)
else:
return None

def __get(self, key, node):
if node:
if key == node.key:
return node
elif key < node.key:
return self.__get(key, node.left)
elif key > node.key:
return self.__get(key, node.right)
else:
return None

```

If you traverse a BST inorder (left, root and right) then you will visit the nodes in ascending order of the keys. The code given below will output the keys in the following order [25, 50, 60, 100, 140, 150, 170].

```def inOrder(root):
if root != None:
inOrder(root.left)
print root.key
inOrder(root.right)

```

Deleting a node in a BST is little bit tricky. After deletion the property of the BST should be maintained. A node can have 0 or 1 or 2 children. Each case needs to be handled differently.

Case 1: Node without any child

Harry(170) is a leaf node. Harry is the right child of Alan. Hence when you delete Harry the right child of Alan should be set to none. On the other hand if you remove Ravi then you should set the left child of Alan to none.

I added few additional methods in the TreeNode class which are used by the delete procedure.

```
class TreeNode:

...

def hasNoChildren(self):
return self.left == None and self.right == None

def hasOneChildren(self):
return self.left or self.right

def hasBothChildren(self):
return self.left and self.right

def findMin(self):
current = self
while current.left:
current = current.left
return current
```

Code to handle the case with no children.

```class BinarySearchTree:

...

def delete(self, key):
#If the root has a value
if self.root:
# Only root node is present
if self.root.key == key and self.root.hasNoChildren():
self.root = None
else:
node = self.get(key)
# found the node with the key
if node:
#CASE: NO CHILDREN
if node.hasNoChildren():
if node.parent.left == node:
node.parent.left = None
else:
node.parent.right = None

```

Case 2: Node with one child

Chen(60) is a node with Vibhu as its left child. When Chen is deleted Vibhu should be linked to the right child of John. There are few variations that needs to be taken care.

1. Node that is being deleted can be a left or right child of the parent.
2. Node that is being deleted can have left or right child.
3. Node that is being deleted can also be a root node.

Code for deleting node with 1 children.

```class BinarySearchTree:

...

def delete(self, key):
#If the root has a value
if self.root:
# Only root node is present
if self.root.key == key and self.root.hasNoChildren():
self.root = None
else:
node = self.get(key)
# found the node with the key
if node:
...
#CASE: HAS ONE CHILD
elif node.hasOneChildren():
if node.left:
if node == self.root:
self.root = node.left
self.root.parent = None
elif node.parent.left == node:
node.parent.left = node.left
node.left.parent = node.parent
elif node.parent.right == node:
node.parent.right = node.left
node.left.parent = node.parent
else:
if node == self.root:
self.root = node.right
self.root.parent = None
elif node.parent.left == node:
node.parent.left = node.right
node.right.parent = node.parent
elif node.parent.right == node:
node.parent.right = node.right
node.right.parent = node.parent

```

Case 3: Node with two child

Peter is the node with 2 children. When this node is deleted we need to replace it with the node that maintains the BST property. Let us call the node that is going to replace Peter as successor. We can select the successor from the left or right subtree of Peter.

1. If we want to pick up a node from the right subtree then we need to pick up the node with the minimum key. Why? The rule of BST tells that all the nodes in the right subtree should have value greater than the root. Hence picking the minimum will satisfy this condition.
2. If we want to pick up a node from the left subtree then we need to pick up the node with the maximum key. Why? The rule of BST tells that all the nodes in the left subtree should have value lesser than the root. Hence picking the maximum will satisfy this condition.

In the code given below the successor is the node with the minimum key on the right subtree. Deleting Peter the successor turns out to be Ravi. The successor node can fall into two conditions

1. It can be a leaf node. In that case you need to set the parent left reference to none if the successor is a left child.
2. It can have 1 child. If that is the case it can only be a right child. Why? Remember since this is the node with minimum value it cannot have a left child. Set the child node to the successor parent’s left node.
```def delete(self, key):
#If the root has a value
if self.root:
# Only root node is present
if self.root.key == key and self.root.hasNoChildren():
self.root = None
else:
node = self.get(key)
# found the node with the key
if node:
...

#CASE: NODE WITH 2 CHILDREN
elif node.hasBothChildren():
successor = node.right.findMin()
node.key = successor.key
node.value = successor.value

# Successor has no children
if successor.hasNoChildren():
if successor.parent.left == successor:
successor.parent.left = None
elif successor.right:
if successor.parent.left == successor:
successor.parent.left = successor.right
successor.right.parent = successor.parent

```

## Balanced Trees

To the BST let us insert the nodes in the descending order of employee identifiers (Harry, Alan, Ravi, Peter, Chen, John and Jane). Take a look at the tree that we got.

The tree is no longer balanced. Search will take linear time instead of log2(n) time. We need to a way to create a tree that is balanced. There are several techniques to balance a tree. AVL tree is one such technique. It is a self-balancing binary search tree named after its inventors G.M. Adelson-Velsky and E.M. Landis.

In the AVL tree each node will have a variable called heightDifference and it will store the height difference between left and the right subtree.

heightDifference = height(leftSubTree) – height(rightSubTree)

Given below is the code for AvlTreeNode

```class AvlTreeNode:

def __init__(self, key, value, parent=None):
self.key = key
self.value = value
self.parent = parent
self.left = None
self.right = None
self.heightDifference = 0
```

Take a look at the image given below. The height difference for Harry is 2.

```heightDifference(Harry) = Height(Left Subtree) - Height(Right Subtree)
heightDifference(Harry) = 2 - 0
heightDifference(Harry) = 2```

If the height difference for the node is greater than 1, the tree starts to get unbalanced and it will have a deep left subtree. If the height difference for the node is less than -1, the tree starts to get unbalanced and it will have a deep right subtree. If the height difference is 0 then the subtrees from that node is balanced.

In the AVL tree each node should have height difference of -1 or 0 or 1. Why do we have this restriction? If we keep the height difference between -1 and 1, the tree becomes balanced and we can get log2(n) time for searches. If the height difference is not between -1 and 1 then we need to fix it by doing rotations. There are 2 types of rotations – left rotation and right rotation.

Of the 3 nodes given below, the height difference for Harry is 2 and hence it violates the AVL tree condition. The tree is left heavy. Hence we need to fix it by doing a right rotation.

After right rotation the tree gets balanced.

Here are the conditions on which you will do either a left or right rotation.

1. Right Rotation – If the node is left tree heavy i.e. the height difference is greater than 1 then you need to do right rotation.
2. Left Rotation – If the node is right tree heavy i.e. the height difference is less than -1 then you need to do left rotation.

Left rotation is asymmetrical to right rotation and hence it should be easy to understand. Given below is the code for doing left and right rotations.

```class AvlBinarySearchTree:
...

def rotateLeft(self, node):
newRoot = node.right
node.right = newRoot.left
if node.right:
node.right.parent = node
newRoot.parent = node.parent
if self.root == node:
self.root = newRoot
else:
if node.parent.left == node:
node.parent.left = newRoot
else:
node.parent.right = newRoot
node.parent = newRoot
newRoot.left = node

node.heightDifference = node.heightDifference + 1 + max(0, -newRoot.heightDifference)
newRoot.heightDifference = newRoot.heightDifference + 1 + max(0, node.heightDifference)

def rotateRight(self, node):
newRoot = node.left
node.left = newRoot.right
if node.left:
node.left.parent = node
newRoot.parent = node.parent
if self.root == node:
self.root = newRoot
else:
if node.parent.left == node:
node.parent.left = newRoot
else:
node.parent.right = newRoot
node.parent = newRoot
newRoot.right = node

node.heightDifference = node.heightDifference - 1 + min(0, -newRoot.heightDifference)
newRoot.heightDifference = newRoot.heightDifference - 1 + min(0, node.heightDifference)
```

Majority of the code should be easy to understand except the code for calculating the new height difference for the nodes that gets rotated.

```class AvlBinarySearchTree:
...

def rotateLeft(self, node):
...

node.heightDifference = node.heightDifference + 1 + max(0, -newRoot.heightDifference)
newRoot.heightDifference = newRoot.heightDifference + 1 + max(0, node.heightDifference)

def rotateRight(self, node):
...

node.heightDifference = node.heightDifference - 1 + min(0, -newRoot.heightDifference)
newRoot.heightDifference = newRoot.heightDifference - 1 + min(0, node.heightDifference)
```

For the rotateRight method I am going to show how the value for newRoot.heightDifference is derived.

```Goal: To derive newRoot.heightDifference for rotateRight

oldA is the node A before the right rotation
newA is the node A after the right rotation
oldB is the node B before the right rotation
newB is the node B after the right rotation

NOTE: Only nodes A and B gets affected during the rotation.

Height Difference (oldB) = Height(C) - Height(D)
Height Difference (newB) = Height(C) - Height(newA)

Height(newA) = 1 + Max(Height(D), Height(E)) # This is height of a node is calculated

Height Difference (newB) = Height(C) - [1 + Max(Height(D), Height(E))] # Substituting Height(newA)

Height Difference (newB) - Height Difference (oldB)
= Height(C) - [1 + Max(Height(D), Height(E))] - [Height(C) - Height(D)]
= Height(C) - Height(C) + Height(D) -[1 + Max(Height(D), Height(E))]
= Height(D) - 1 - Max(Height(D), Height(E))
= -1 +Height(D) - Max(Height(D), Height(E))
= -1 +Height(D) + Min(-Height(D), -Height(E)) # -Max(4,5) can be written as +Min(-4,-5)
= -1 + Min(Height(D) - Height(D), Height(D) - Height(E))
= -1 + Min(0, Height(D) - Height(E))
= -1 + Min(0, Height Difference(newA))

Height Difference (newB) = Height Difference (oldB) - 1 + Min(0, Height Difference(newA))

This is same as the code

newRoot.heightDifference = newRoot.heightDifference - 1 + Min(0, node.heightDifference)```

Using the similar method you can derive the values for the others.

Given below is the code for AVL tree. insert() method is modified slightly as it needs to call updateHeightDifference(). If the node height difference is not between -1 to 1 then rebalance() is called.

```class AvlBinarySearchTree:

...

def insert(self, key, value):
if self.root:
self.__insert(key, value, self.root)
else:
self.root = AvlTreeNode(key, value)

def __insert(self, key, value, node):
if key < node.key:
if node.left:
self.__insert(key, value, node.left)
else:
node.left = AvlTreeNode(key, value, node)
self.updateHeightDifference(node.left)
else:
if node.right:
self.__insert(key, value, node.right)
else:
node.right = AvlTreeNode(key, value, node)
self.updateHeightDifference(node.right)

def updateHeightDifference(self, node):

if node.heightDifference > 1 or node.heightDifference < -1:
self.rebalance(node)
return

#If the node has a parent then adjust its heightDifference
if node.parent:
if node.parent.left == node:
node.parent.heightDifference += 1
if node.parent.right == node:
node.parent.heightDifference -= 1

# Move one level as the height difference needs to be propogated
if node.parent.heightDifference != 0:
self.updateHeightDifference(node.parent)

def rebalance(self, node):
#Right side heavy
if node.heightDifference < 0:
#If the right child is left heavy
if node.right.heightDifference > 0:
self.rotateRight(node.right)
self.rotateLeft(node)
#Left side heavy
elif node.heightDifference > 0:
#If the left child is right heavy
if node.left.heightDifference < 0:
self.rotateLeft(node.left)
self.rotateRight(node)

...

```

In the rebalance method why do we do call rotateLeft on the left child before doing a rotateRight?

```class AvlBinarySearchTree:
def rebalance(self, node):
...
#Left side heavy
elif node.heightDifference > 0:
#If the left child is right heavy
if node.left.heightDifference < 0:
self.rotateLeft(node.left)
self.rotateRight(node)

```

The picture given below should answer why we did two rotations instead of one.

You should do double rotation

1. If the node is left side heavy and the left child of that node is right side heavy then you need to do a left rotation on the left child and then do a right child rotation on the node itself.
2. If the node is right side heavy and the right child of that node is left side heavy then you need to do a right rotation on the right child and then do a left child rotation on the node itself.

I inserted the nodes in the AVL tree in descending order of employee identifiers – (Harry, Alan, Ravi, Peter, Chen, John and Jane). I got a balanced tree as shown below.