Trees

Take a look at the org chart of a startup company.

orgchart

There are 7 employees in the company. Each employee can have up to 2 other employees reporting to them.

Tree is a data structure which is used to represent this information in computer’s memory. Using the org chart let us understand the vocabulary used in trees.

Node – Each circle is called as the node. It is the fundamental part of the tree. Node contains the data. There are 7 nodes and each node contains the employee name.

Edge – An edge connects two nodes. It tells that these nodes are related. A node can have only one incoming edge and many outgoing edges. In our example a node can have utmost 2 outgoing edges. Hence this tree is called as a binary tree. Since there are 7 nodes we have 6 edges.

Root – A node without an incoming edge is called as the root. Peter is the root node.

Leaf – A node without an outgoing edge is called as leaf node. There are 4 leaf nodes – Jane, Chen, Ravi, and Harry.

Sibling – Nodes sharing the same parent are called as siblings. [John,Alan], [Jane, Chen], and [Ravi, Harry] are siblings.

Level – The level of any node is the number of edges you need to cross from the root to reach that node. John and Alan are at level 1. Jane, Chen, Ravi, and Harry are at level 2. Since Peter is the root node it is at level 0.

Height – The height of the tree is the maximum level of any node in the tree. In this case the height is 2.

There are 4 things that each node should have. It needs to hold the data and references to the left, right child and its parent. I am using Python as the language for the implementation.

class BinaryTree:

	def __init__(self, data, parent=None):
		self.data = data
		self.left = None
		self.right = None
		self.parent = parent

We need to add methods to get the data, parent, left, and right child. Also methods to tell if a node is root or a leaf.

class BinaryTree:

	...

	def getData(self):
		return self.data

	def getLeftChild(self):
		return self.left

	def getRightChild(self):
		return self.right

	def getParent(self):
		return self.parent

 	def isRoot(self):
 		if self.parent == None:
 			return True
 		else:
 			return False

 	def isLeaf(self):
 		if self.left == None and self.right == None:
 			return True
 		else:
 			return False

Let us add methods to insert the left and right nodes.

class BinaryTree:

	...
	def insertLeftChild(self, node):
		l = BinaryTree(node, self)
		if self.left == None:
			self.left = l
		else:
			l.left = self.left
			l.left.parent = l
			self.left = l
		return l

	def insertRightChild(self, node):
		r = BinaryTree(node, self)
		if self.right == None:
			self.right = r
		else:
			r.right = self.right
			r.right.parent = r
			self.right = r
		return r

#Code for creating the binary tree according to the org chart
peter = BinaryTree("Peter")
john  = peter.insertLeftChild("John")
alan  = peter.insertRightChild("Alan")

john.insertLeftChild("Jane")
john.insertRightChild("Chen")

alan.insertLeftChild("Ravi")
alan.insertRightChild("Harry")

We have constructed the org chart using the binary tree data structure. We need a way to access the names of all the employees. Accessing the contents of a tree is called as traversing the tree. There are 3 ways to traverse a tree.

1. inOrder

In the inOrder traversal we visit the left subtree and then the root and finally the right subtree. The nodes will be visited in the following order [Jane, John, Chen, Peter, Ravi, Alan, Harry].

def inOrder(root):
	if root != None:
		inOrder(root.getLeftChild())
		print root.getData()
		inOrder(root.getRightChild())

#call the method with the root node peter
inOrder(peter)

2. preOrder

In the preOrder traversal we visit the root and then the left subtree and finally the right subtree. The nodes will be visited in the following order [Peter, John, Jane, Chen, Alan, Ravi, Harry].

def preOrder(root):
	if root != None:
		print root.getData()
		preOrder(root.getLeftChild())
		preOrder(root.getRightChild())

#call the method with the root node peter
preOrder(peter)

3. postOrder

In the postOrder traversal we visit the left subtree and then the right subtree and finally the root. The nodes will be visited in the following order [Jane, Chen, John, Ravi, Harry, Alan, Peter].

def postOrder(root):
	if root != None:
		postOrder(root.getLeftChild())
		postOrder(root.getRightChild())
		print root.getData()

#call the method with the root node peter
postOrder(peter)

Tree is a powerful data structure. Modification to the structure can be done in constant time. But to find a node for modification it takes linear O(n) amount of time as we need to visit each node until we find the desired one. Can we do better? It turns out that if we store the data in a certain way then we can achieve log2(n) time to find any node. These special trees are called as search trees. Since we are dealing with nodes having 2 references it is called as binary search tree.

Binary Search Trees

To the org chart, I have added an identifier for each employee.

orgchart

The identifier for Peter is 100. All the identifiers on the left subtree of Peter is less than 100. All the identifiers on the right subtree of Peter is greater than 100. This property is true for each node in the tree. Make sure you understand this.

A binary search tree (BST) relies on the property that keys that are less than the parent are found in the left subtree, and keys that are greater than the parent are found in the right subtree.

Why is this very important?

Imagine that you are searching for Chen, whose identifier is 60. Peter is the root node and has an identifier of 100. Since it is a BST you know that Chen will be on the left hand side of Peter as 60 is less than 100. In the 1st pass we have discarded 1/2 the nodes on the right hand side. You move to John who has an identifier of 50. Chen should on the right hand side of John since 60 is greater than 50. In the 2nd pass we have discarded 1/2 the nodes on the left hand side. On the 3rd pass we found Chen. Since on each pass 1/2 the nodes are discarded the search will take log2(n) time. If you want a refresher on logarithms refer here. On a perfectly balanced BST with 10 trillion items we can find a node with in 44 passes. In a regular tree you need 10 trillion passes.

Let us create a TreeNode class to hold the employee information along with the references to parent, left and right child.

class TreeNode:

	def __init__(self, key, value, parent=None):
		self.key = key
		self.value = value
		self.parent = parent
		self.left = None
		self.right = None

Inserting a new node should take care of the BST property.

class BinarySearchTree:

	def __init__(self):
		self.root = None

	def insert(self, key, value):
		if self.root:
			self.__insert(key, value, self.root)
		else:
			self.root = TreeNode(key, value)

	def __insert(self, key, value, node):
		if key < node.key:
			if node.left:
				self.__insert(key, value, node.left)
			else:
				node.left = TreeNode(key, value, node)
		else:
			if node.right:
				self.__insert(key, value, node.right)
			else:
				node.right = TreeNode(key, value, node)

# Code for creating the org chart
bst = BinarySearchTree()
bst.insert(100, 'Peter')
bst.insert(50, 'John')
bst.insert(25, 'Jane')
bst.insert(60, 'Chen')
bst.insert(150, 'Alan')
bst.insert(140, 'Ravi')
bst.insert(170, 'Harry')

Given below is the code to find the node given the employee identifier. It makes use of the BST property to skip 1/2 the nodes in each pass.


class BinarySearchTree:

    ...

    def get(self, key):
        if self.root:
            return self.__get(key, self.root)
        else:
            return None

    def __get(self, key, node):
        if node:
            if key == node.key:
                return node
            elif key < node.key:
                return self.__get(key, node.left)
            elif key > node.key:
                return self.__get(key, node.right)
        else:
            return None

If you traverse a BST inorder (left, root and right) then you will visit the nodes in ascending order of the keys. The code given below will output the keys in the following order [25, 50, 60, 100, 140, 150, 170].

def inOrder(root):
    if root != None:
        inOrder(root.left)
        print root.key
        inOrder(root.right)

Deleting a node in a BST is little bit tricky. After deletion the property of the BST should be maintained. A node can have 0 or 1 or 2 children. Each case needs to be handled differently.

Case 1: Node without any child

Harry(170) is a leaf node. Harry is the right child of Alan. Hence when you delete Harry the right child of Alan should be set to none. On the other hand if you remove Ravi then you should set the left child of Alan to none.

orgchartleaf

I added few additional methods in the TreeNode class which are used by the delete procedure.


class TreeNode:

    ...

    def hasNoChildren(self):
        return self.left == None and self.right == None

    def hasOneChildren(self):
        return self.left or self.right

    def hasBothChildren(self):
        return self.left and self.right

    def findMin(self):
        current = self
        while current.left:
            current = current.left
        return current

Code to handle the case with no children.

class BinarySearchTree:

    ...

    def delete(self, key):
      #If the root has a value
      if self.root:
          # Only root node is present
          if self.root.key == key and self.root.hasNoChildren():
              self.root = None
          else:
              node = self.get(key)
              # found the node with the key
              if node:
                  #CASE: NO CHILDREN
                  if node.hasNoChildren():
                      if node.parent.left == node:
                          node.parent.left = None
                      else:
                          node.parent.right = None

Case 2: Node with one child

Chen(60) is a node with Vibhu as its left child. When Chen is deleted Vibhu should be linked to the right child of John. There are few variations that needs to be taken care.

  1. Node that is being deleted can be a left or right child of the parent.
  2. Node that is being deleted can have left or right child.
  3. Node that is being deleted can also be a root node.
orgchart1child

Code for deleting node with 1 children.

class BinarySearchTree:

    ...

	def delete(self, key):
		#If the root has a value
		if self.root:
			# Only root node is present
			if self.root.key == key and self.root.hasNoChildren():
				self.root = None
			else:
				node = self.get(key)
				# found the node with the key
				if node:
					...
					#CASE: HAS ONE CHILD
					elif node.hasOneChildren():
						if node.left:
							if node == self.root:
								self.root = node.left
								self.root.parent = None
							elif node.parent.left == node:
								node.parent.left = node.left
								node.left.parent = node.parent
							elif node.parent.right == node:
								node.parent.right = node.left
								node.left.parent = node.parent
						else:
							if node == self.root:
								self.root = node.right
								self.root.parent = None
							elif node.parent.left == node:
								node.parent.left = node.right
								node.right.parent = node.parent
							elif node.parent.right == node:
								node.parent.right = node.right
								node.right.parent = node.parent

Case 3: Node with two child

Peter is the node with 2 children. When this node is deleted we need to replace it with the node that maintains the BST property. Let us call the node that is going to replace Peter as successor. We can select the successor from the left or right subtree of Peter.

  1. If we want to pick up a node from the right subtree then we need to pick up the node with the minimum key. Why? The rule of BST tells that all the nodes in the right subtree should have value greater than the root. Hence picking the minimum will satisfy this condition.
  2. If we want to pick up a node from the left subtree then we need to pick up the node with the maximum key. Why? The rule of BST tells that all the nodes in the left subtree should have value lesser than the root. Hence picking the maximum will satisfy this condition.

orgchart

In the code given below the successor is the node with the minimum key on the right subtree. Deleting Peter the successor turns out to be Ravi. The successor node can fall into two conditions

  1. It can be a leaf node. In that case you need to set the parent left reference to none if the successor is a left child.
  2. It can have 1 child. If that is the case it can only be a right child. Why? Remember since this is the node with minimum value it cannot have a left child. Set the child node to the successor parent’s left node.
def delete(self, key):
	#If the root has a value
	if self.root:
		# Only root node is present
		if self.root.key == key and self.root.hasNoChildren():
			self.root = None
		else:
			node = self.get(key)
			# found the node with the key
			if node:
				...

				#CASE: NODE WITH 2 CHILDREN
				elif node.hasBothChildren():
					successor = node.right.findMin()
					node.key = successor.key
					node.value = successor.value

					# Successor has no children
					if successor.hasNoChildren():
						if successor.parent.left == successor:
							successor.parent.left = None
					elif successor.right:
						if successor.parent.left == successor:
							successor.parent.left = successor.right
							successor.right.parent = successor.parent

Balanced Trees

To the BST let us insert the nodes in the descending order of employee identifiers (Harry, Alan, Ravi, Peter, Chen, John and Jane). Take a look at the tree that we got.

orgchart

The tree is no longer balanced. Search will take linear time instead of log2(n) time. We need to a way to create a tree that is balanced. There are several techniques to balance a tree. AVL tree is one such technique. It is a self-balancing binary search tree named after its inventors G.M. Adelson-Velsky and E.M. Landis.

In the AVL tree each node will have a variable called heightDifference and it will store the height difference between left and the right subtree.

heightDifference = height(leftSubTree) – height(rightSubTree)

Given below is the code for AvlTreeNode

class AvlTreeNode:

	def __init__(self, key, value, parent=None):
		self.key = key
		self.value = value
		self.parent = parent
		self.left = None
		self.right = None
		self.heightDifference = 0

Take a look at the image given below. The height difference for Harry is 2.

heightDifference(Harry) = Height(Left Subtree) - Height(Right Subtree)
heightDifference(Harry) = 2 - 0
heightDifference(Harry) = 2

heightDifference

If the height difference for the node is greater than 1, the tree starts to get unbalanced and it will have a deep left subtree. If the height difference for the node is less than -1, the tree starts to get unbalanced and it will have a deep right subtree. If the height difference is 0 then the subtrees from that node is balanced.

In the AVL tree each node should have height difference of -1 or 0 or 1. Why do we have this restriction? If we keep the height difference between -1 and 1, the tree becomes balanced and we can get log2(n) time for searches. If the height difference is not between -1 and 1 then we need to fix it by doing rotations. There are 2 types of rotations – left rotation and right rotation.

Of the 3 nodes given below, the height difference for Harry is 2 and hence it violates the AVL tree condition. The tree is left heavy. Hence we need to fix it by doing a right rotation.

heightDifference

After right rotation the tree gets balanced.

rightrotationbalanced

Here are the conditions on which you will do either a left or right rotation.

1. Right Rotation – If the node is left tree heavy i.e. the height difference is greater than 1 then you need to do right rotation.
2. Left Rotation – If the node is right tree heavy i.e. the height difference is less than -1 then you need to do left rotation.

Left rotation is asymmetrical to right rotation and hence it should be easy to understand. Given below is the code for doing left and right rotations.

class AvlBinarySearchTree:
	...

	def rotateLeft(self, node):
		newRoot = node.right
		node.right = newRoot.left
		if node.right:
			node.right.parent = node
		newRoot.parent = node.parent
		if self.root == node:
			self.root = newRoot
		else:
			if node.parent.left == node:
				node.parent.left = newRoot
			else:
				node.parent.right = newRoot
		node.parent = newRoot
		newRoot.left = node

		node.heightDifference = node.heightDifference + 1 + max(0, -newRoot.heightDifference)
		newRoot.heightDifference = newRoot.heightDifference + 1 + max(0, node.heightDifference)

	def rotateRight(self, node):
		newRoot = node.left
		node.left = newRoot.right
		if node.left:
			node.left.parent = node
		newRoot.parent = node.parent
		if self.root == node:
			self.root = newRoot
		else:
			if node.parent.left == node:
				node.parent.left = newRoot
			else:
				node.parent.right = newRoot
		node.parent = newRoot
		newRoot.right = node

		node.heightDifference = node.heightDifference - 1 + min(0, -newRoot.heightDifference)
		newRoot.heightDifference = newRoot.heightDifference - 1 + min(0, node.heightDifference)

Majority of the code should be easy to understand except the code for calculating the new height difference for the nodes that gets rotated.

class AvlBinarySearchTree:
	...

	def rotateLeft(self, node):
		...

		node.heightDifference = node.heightDifference + 1 + max(0, -newRoot.heightDifference)
		newRoot.heightDifference = newRoot.heightDifference + 1 + max(0, node.heightDifference)

	def rotateRight(self, node):
		...

		node.heightDifference = node.heightDifference - 1 + min(0, -newRoot.heightDifference)
		newRoot.heightDifference = newRoot.heightDifference - 1 + min(0, node.heightDifference)

For the rotateRight method I am going to show how the value for newRoot.heightDifference is derived.

rightrotationmath

Goal: To derive newRoot.heightDifference for rotateRight

oldA is the node A before the right rotation
newA is the node A after the right rotation
oldB is the node B before the right rotation
newB is the node B after the right rotation

NOTE: Only nodes A and B gets affected during the rotation. 

Height Difference (oldB) = Height(C) - Height(D)
Height Difference (newB) = Height(C) - Height(newA)

Height(newA) = 1 + Max(Height(D), Height(E)) # This is height of a node is calculated

Height Difference (newB) = Height(C) - [1 + Max(Height(D), Height(E))] # Substituting Height(newA)

Height Difference (newB) - Height Difference (oldB)
    = Height(C) - [1 + Max(Height(D), Height(E))] - [Height(C) - Height(D)]
    = Height(C) - Height(C) + Height(D) -[1 + Max(Height(D), Height(E))]
    = Height(D) - 1 - Max(Height(D), Height(E))
    = -1 +Height(D) - Max(Height(D), Height(E))
    = -1 +Height(D) + Min(-Height(D), -Height(E)) # -Max(4,5) can be written as +Min(-4,-5)
    = -1 + Min(Height(D) - Height(D), Height(D) - Height(E))
    = -1 + Min(0, Height(D) - Height(E))
    = -1 + Min(0, Height Difference(newA))

Height Difference (newB) = Height Difference (oldB) - 1 + Min(0, Height Difference(newA))

This is same as the code

newRoot.heightDifference = newRoot.heightDifference - 1 + Min(0, node.heightDifference)

Using the similar method you can derive the values for the others.

Given below is the code for AVL tree. insert() method is modified slightly as it needs to call updateHeightDifference(). If the node height difference is not between -1 to 1 then rebalance() is called.

class AvlBinarySearchTree:

	...

	def insert(self, key, value):
		if self.root:
			self.__insert(key, value, self.root)
		else:
			self.root = AvlTreeNode(key, value)

	def __insert(self, key, value, node):
		if key < node.key:
			if node.left:
				self.__insert(key, value, node.left)
			else:
				node.left = AvlTreeNode(key, value, node)
				self.updateHeightDifference(node.left)
		else:
			if node.right:
				self.__insert(key, value, node.right)
			else:
				node.right = AvlTreeNode(key, value, node)
				self.updateHeightDifference(node.right)

	def updateHeightDifference(self, node):

		if node.heightDifference > 1 or node.heightDifference < -1:
			self.rebalance(node)
			return

		#If the node has a parent then adjust its heightDifference
		if node.parent:
			if node.parent.left == node:
				node.parent.heightDifference += 1
			if node.parent.right == node:
				node.parent.heightDifference -= 1

			# Move one level as the height difference needs to be propogated
			if node.parent.heightDifference != 0:
				self.updateHeightDifference(node.parent)

	def rebalance(self, node):
		#Right side heavy
		if node.heightDifference < 0:
			#If the right child is left heavy
			if node.right.heightDifference > 0:
				self.rotateRight(node.right)
			self.rotateLeft(node)
		#Left side heavy
		elif node.heightDifference > 0:
			#If the left child is right heavy
			if node.left.heightDifference < 0:
				self.rotateLeft(node.left)
			self.rotateRight(node)

    ...

In the rebalance method why do we do call rotateLeft on the left child before doing a rotateRight?

class AvlBinarySearchTree:
	def rebalance(self, node):
		...
		#Left side heavy
		elif node.heightDifference > 0:
			#If the left child is right heavy
			if node.left.heightDifference < 0:
				self.rotateLeft(node.left)
			self.rotateRight(node)

The picture given below should answer why we did two rotations instead of one.

doublerotation

You should do double rotation

  1. If the node is left side heavy and the left child of that node is right side heavy then you need to do a left rotation on the left child and then do a right child rotation on the node itself.
  2. If the node is right side heavy and the right child of that node is left side heavy then you need to do a right rotation on the right child and then do a left child rotation on the node itself.

I inserted the nodes in the AVL tree in descending order of employee identifiers – (Harry, Alan, Ravi, Peter, Chen, John and Jane). I got a balanced tree as shown below.

avlbalancedtree