The document discusses the concepts of trees in data structures, focusing on binary trees, their properties, and terminology such as root, leaves, internal nodes, and levels. It explains the structure of binary trees, including complete binary trees, and how they can represent various hierarchical data, such as organizational structures and file systems. Additionally, it covers the relationship between the number of nodes, leaves, and the height of binary trees, emphasizing the importance of understanding these concepts for effective data representation.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
21 views14 pages
L 6
The document discusses the concepts of trees in data structures, focusing on binary trees, their properties, and terminology such as root, leaves, internal nodes, and levels. It explains the structure of binary trees, including complete binary trees, and how they can represent various hierarchical data, such as organizational structures and file systems. Additionally, it covers the relationship between the number of nodes, leaves, and the height of binary trees, emphasizing the importance of understanding these concepts for effective data representation.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14
Last
class we discussed about hashing. We saw few collision resolution techniques,
chaining, double hashing linear programming and you also did a little bit of analysis of these collision resolution techniques. Today we are going to talk about trees. We are also going to look at binary trees and some data structures for trees. What is a tree? Many of you might have come across a tree before, except this tree is going to be different from one that you have seen before. The root will be at the top. In most of the trees around, you do not see the root. The root is going to be at the top of the tree. In the tree given in the slide above A is the root. There is a notion of parent and children, the node B is the parent of node D and E. By the same argument A is the parent of B and C, C is parent of F, G and H. A is a parent of B which in turn parent of D and E, so A is ancestor of D and E. A is also an ancestor of F, G and H. A is also an ancestor of I. A is a grandparent, sometimes we use the term grandparent. A is a grandparent of D, E, F, G and H. Hope you understand the difference between ancestor and grandparent. D and E are descendents of A, in fact B, C, D, E, F, G, H, I are all descendents of A. C and B are siblings because they have the same parent. B is a sibling of C and C is a sibling of B. G and E are not siblings but F, G and H are siblings. D and E are the children of node B. A is a parent of B, B is a parent of D and E, D and E are children of B, B and C are children of A and all of these are descendents of A. I have 3three ancestors H, C and A. H is the parent, C is the grandparent and A is the great grand parent but we do not use that term we just call it as an ancestor. The terms we defined till now were more in the nature of a family tree and then we will come to real trees. D, E, F, G, I are called the leaves of the tree. A is the root, if you just turn it upside down then the extremities should be the leaves. What is the leaf? The generic term for A, B...I are also called nodes of a tree. A leaf is a node which has no children. If a node has no children then it is a leaf. H is not a leaf since H has a child but I, F, G, E and D are nodes which do not have any children and so they are called the leaves. A, B, C and H are called internal nodes, a node which is not a leaf is called an internal node. We associate a notion of level with each node, the root is at level 0. The children of the root are at level 1. The children of those nodes which are at level 1 are at level 2. D, E, F, G and H all are at level 2. It is not that H is at level 2, all of these nodes are at level 2. I is at level 3. Sometimes we also use the term depth, in which depth and level are the same thing. The level 0, level 1, level 2 are also called as depth 0, depth 1 and depth 2. The height of the tree is the maximum level of any node in the tree. What is the maximum level of any node in the tree? The height of this tree is 3. The degree of the node is the number of children it has. B has a degree 2, C has a degree 3 and H has a degree 1. The leaves have degree 0 because they do not have any children. Basic terminologies are quite intuitive. What are trees used for? They can represent the hierarchy in an organization. For instance there is a company let us call electronics R Us which has some divisions. RND is 1 division, purchasing is another and manufacturing is division. Domestic and international are the sub-division for the sales. You could represent the organizational structure through a tree. You could also use a tree to represent the table of contents in a book. Let us say a book called student guide which has chapters on overview, grading, environment, programming and support code. The chapter grading has some sections called exams, homework and programs. They could also have some sub-sections and that would build up the tree. Your file system in which if you use the Unix environment or the Windows environment is also organized as a tree. The one in the level 0 is the root directory and in the 1stlevel are those 2 sub-directories. Then with in the sub-directory I have some other sub- directories and with in that I have homework, assignment and so on. Your file system is also organized like a tree. Today in our class we are going to see about definitions and then we start using those definitions in our later classes. An ordered tree is one in which the children of the each node are ordered. That means there is a notion in which we would like to put the left child in the 1stlevel to the right side. Suppose if you want to draw a family tree, you may want to draw the eldest child to the left and the younger child as you move from left to right. There is a notion of order there and some time you want to reflect that order in your tree. But there would be no notion of order in the following example. The node which is in the level 0 is a directory and in the 1stlevel there are two sub-directories. Whether I place the left node to the right or the right node to the left, it does not really make any difference as far as the picture is concerned. Also it does not convey any additional information but sometimes you might have the notion of order between the children. Such a tree is called an ordered tree. A binary tree is an ordered tree in which there is a notion of left child and a right child. Actually it is an ordered tree in which every node has at most 2 children. The diagram given in the slide below is an example of a binary tree. The root node has 2 children and the child node on the left has only 1 child and the following child on the right has only 1 child. The node which does not have child node are said to be leaves. We have 5 leaves which have no children. These nodes in the 1stlevel are ordered and there is a notion of left and right nodes. If I were to change the tree that is if I were to draw the left nodes on the right and right nodes on the left then I get a different binary tree. That would still be a binary tree but it would be different from this binary tree. All of this is dependent upon the application you have. This is just a way of representing information. Sometimes the order has meaning to it, sometime it has no meaning to it. When it has some meaning to it then you would rather use an ordered binary tree and when you change the order, then you are representing something different. We will see more example of this and things would become clear. I can also define a binary tree in a recursive form as follows. A binary tree is just a single node or a leaf or it is an internal node which is the root to which I have attached 2 binary trees. In the following slide the nodes which are marked on the left side are called left subtree and the nodes marked on the right are called right subtree. I can construct any binary tree in this manner. I take a node and attach a left subtree and a right subtree. I get a left subtree and right subtree through recursive in which it is obtained by taking a node and attaching it to the left and right subtrees. I have said and/or which means this left subtree might be null that is I might not attached anything or I might not attached anything to the right or I might have attached both the subtrees. Remember we have introduced other terms, left subtree and right subtree. The node to the left side of the root node is called the left subtree and the node on the right side is called right subtree. What is the left subtree of the node which is in the 1stlevel? In the 2ndlevel, the node at the extreme left is the left subtree of the node. One example of a binary tree is the arithmetic expressions. I have an arithmetic expression which looks like the one given in the slide below. I can represent this as a binary tree. Let us look at a parenthezisation of this expression. Suppose I have parenthesized in the manner like, which is given in the last line of the slide. We have (4+6), the numbers here will be the leaves of my binary tree and the internal node would correspond to the operations. In fact this is also one way to evaluate this expression. You would take 4+6 and you would sum that. You would draw a tree which has 1 internal node and its two children are 4 and 6. The internal node would have plus operator in it. Whatever is the resulting value we are adding that to 1. I draw a tree whose root is a plus operator and one child is 1 and the other child is the subtree that is obtained from this operation and I could build this tree. This is just another way of representing arithmetic expression. Decision tree is another example of binary tree. The example given in the slide below is taken from the book. Star bucks, Café paragon and most of it would not make much sense, may be we would not come across them. What is the decision tree? Each node in the decision tree corresponds to some decision that you want to make. You come to root node and ask whether you want a fast meal. The answer is yes then you come to the left node and whether you want coffee or not. The answer is yes then you go to star bucks. If the answer is no you may go to some other place and so on. Thus decision trees are another example of binary trees. Why because typically it is yes and no. You would follow the decision tree to get into a particular node. This was just more of terminology and examples. Let us see more concrete stuff. Let us define a complete binary tree. We are still at binary trees, as you can see every node in this tree has less than or equal to 2 children or at most 2 children. But I will call such a tree as a complete binary tree. We call a tree as a complete binary tree if at the level there are nodes. In some sense it is full and when every node has 2 children it does not give you a complete binary tree. I will show you why it cannot be a complete binary tree. Let us look at the slide below and check whether every internal node have 2 children in this tree. Every node has a 2 children then that tree should also have leaves. It cannot be the case in which every node has 2 children, in some case there are no children. Just with the requirement that every node has 2 children, every node other than the leaf that means every internal node has 2 children does not implies it as a complete binary tree. This is a counter example to that in which every internal node has 2 children. This is not a complete binary tree. The following is an example of a complete binary tree. We want to say that at level i there are node. The root node is at level 0 that is 1 node, at level 1 there are 2 nodes, at level 2 there are 4 and at level 3 there are 8. If h is the height of the tree, in the following example what is the height of the tree? We call height as the maximum level number so we should not count this as 4. Thus the height of the tree is 3. If h is the height of the tree that means all the leaves are at level h then by the definition of the binary tree we have said that the level i has nodes that means there are leaves. The number of leaves in a complete binary tree of height h is just . What is the number of internal nodes? At level 0 we have 1 node, at level 1 we have 2 nodes and so on. Thus the sum is given as 1+2+ , because at level h all the nodes are leaf nodes. Thus the sum is , this is the number of internal nodes and the number of leaves is . The number of internal nodes is the number of leaves-1. This is for a complete binary tree. What is the total number of nodes in this tree? It is which is the number of leaves + which is the number of internal nodes, hence it becomes . Let us call this number as n. If I have a complete binary tree of n nodes, what is the height of this tree? Let us go one step at a time. What is the number of leaves in this tree? If the number of nodes is n and the number of leaves was which equals , just from this ( =n) expression. The number of leaves in a complete binary tree on n nodes is . If I have a complete binary tree on n nodes, half of the nodes are leaves and the remaining half are of internal nodes. Similarly I can say that if I have a tree on n nodes, then the height of the tree is (no of leaves). I can evaluate h from ( =n), h will be log ( ) and so it is the log (no of leaves). Else we can go directly from , where the number of leaves is and so h is log (no of leaves). You are just doing some simple counting here. If I give you a complete binary tree of height h then you should be able to say about the number of leaves and the number of internal nodes it has. When I give you a complete binary tree on n nodes, you should be able to say the height and so on. If you have a tree on n nodes then the height of the tree is log ( ). The other thing your have to keep in mind is that in such a tree the number of leaves is very large. It is roughly half the total number of nodes. It is very leafy kind of a tree. So far we have seen a complete binary tree but a binary tree is any tree in which every node has atmost 2 children. To get any binary tree, you can start with a suitably large complete binary tree and just cut it off. For instance if I were to cut off some pieces then I would get a binary tree as shown in the slide below. I can always do it, no matter about the tree I need. Take the binary tree on the right side as height 3 then I would start with the complete binary tree of height 3 which is on the left. Just cut off some pieces on the left side of the tree to get the tree which is on the right side. The picture given in the slide below is the proof. Let us use this fact that you can obtain any binary tree by just pruning of a complete binary tree. Take a complete binary tree, cut off some branches then you will get a binary tree. If I have a binary tree of height h then in a complete binary tree at level i there were atmost nodes. In a binary tree at level i there will be atmost nodes, there cannot be more than nodes because the binary tree is obtained from a complete binary tree by pruning. This is an important fact, atmost nodes at level i implies that the total number of nodes in your binary tree of height h is atmost 1+2+ nodes. The last level is h, at level 0 there will be 1 node, at level 1 there is atmost 2 nodes, at level 2 there are atmost 4 nodes and so on. This is the maximum number of nodes that binary tree can have. Let us rewrite this. Suppose I told you that a tree has n nodes. Then n is less than or equal to this ( ) quantity, n <= which means that the height of the tree is just rearranged and it is h >= . If I give you a binary tree with n nodes in it, its height is atleast and there is a particular binary tree which achieves this equality and that is a complete binary tree. Think of a complete binary tree as a tree which acquires the smallest height. If I create a binary tree with the certain number of nodes, the one which has the shortest height will be a complete binary tree. Because there we are packing all the nodes as close to the root as possible by filling up all the levels to the maximum. That is the minimum height of the binary tree. I give you a binary tree on n nodes, its minimum possible height is . What is the maximum height that a binary tree on n nodes can have? A binary tree on n nodes has height atmost n-1. This is obtained when every node has exactly 1 child and the picture is given in the slide below. This would be a zigzag in any manner and the height is 8 since there are 9 nodes in it. In a binary tree on n nodes the minimum height is log (n) that is log ( ), but we say it as log (n) and the maximum height is n-1. That is the mistake many people make. They always assume that binary tree means height is log (n). But it is not the case, it could be anywhere between log n and n. How many leaves does the binary tree have? What is the minimum and the maximum number of leaves it can have? Let us figure it out. We will prove that the number of leaves in a tree is <=+ no of internal nodes. This is the useful inequality, in any binary tree the number of leaves is <+ the number of internal nodes or atmost the number of leaves in a tree can be 1 more than the number of internal nodes. How will you prove this? We will prove it by induction on the number of internal nodes. In a base case consider a tree with 1 node. If a tree has only 1 node how many internal nodes does it have? It is 0, because that 1 node does not have any child so that is the leaf. Base case is when the number of internal nodes is 0, in which case the right hand side is 1 that is the number of leaves is 1 so the inequality is satisfied. We will assume that the statement is true for all trees with less than or equal to k-1 internal nodes. This should be read as, the statement is true for trees with atmost k-1 internal nodes not just k-1 but anything even for less this statement is true. We will prove it for a tree with k internal nodes. Suppose I have a tree with k internal nodes, let us say on the left subtree I have internal node. Then how many internal nodes do I have on the right subtree? It is exactly k- -1 and not atmost because all the internal nodes are either in the left subtree or in the right subtree or it is the root node. The minus one is because of the root node. This is the number of internal nodes in the right subtree. Let us apply the induction hypothesis. is less than or equal to k-1 and the quantity (k- -1) is also less than or equal to k-1. We can use the induction hypothesis. In the left subtree which has internal nodes, the number of leaves is less than or equal to +1. In the right subtree the number of leaves is less than or equal to k- -1+1 which is k- . The total number of leaves is just the sum of these two (( +1) + (k- )), all the leaves are either in the left subtree or in the right subtree. The total number of leaves is just the sum which is k+1 that is we wanted to prove. Since we started a tree with a k internal node, you have to show that the number of leaves is less than 1+k. This is a simple proof which shows the number of leaves is atmost 1+ the number of internal nodes. There was a tree in which we saw the number of leaves is equal to the number of internal nodes +1. It was in a complete binary tree. What was the number of leaves in a complete binary tree? The number of leaves was , if h was the height of the tree and the number of internal nodes was 1+2+ nodes. There was exactly a difference of 1 between the number of leaves and the number of internal nodes. The complete binary tree once again achieves the equality. For any other tree the number of leaves will only be less than or equal to this sum. How small it can be? Let us look at that. For a binary tree on n nodes, the number of leaves + the number of internal nodes is n. Because every node is either a leaf or an internal node. Also we just saw that the number of leaves is less than or equal to the number of internal nodes +1. I will just rearrange, this implies that the number of leaves is . I have just rearranged, as the number of internal nodes is greater than or equal to the number of leaves -1. I replace that and get the number of leaves as for any binary tree. The another thing to keep in mind is for any binary tree the number of leaves will never be more than half the number of nodes in the tree. Again this equality ( ) was achieved for our complete binary tree, which is the most leafy tree. All others trees are dry and the minimum number of leaves that tree might have is just 1. The example for that is the same example that I have showed you before. The tree on 9 nodes has only 1leaf in it. Let us look at an abstract data type for trees. You would have the generic methods which you seen for all the abstract data types till now. The following are the generic container methods, size () which tell us about how many nodes are there in the tree, isEmpty () tells whether the tree is empty or not and the method elements () which list out all the elements of the tree. You can have position based container methods, it is as the kind we saw for the list or sequence data types. The swapElements (p, q) in which I have specified 2 positions p and q. Think of the positions as references in to the tree except that using the position data type I am not able to access anything else but the elements sitting at that position. The method positions () will specify all the positions in the tree. It will give you all the positions in the tree as a sequence. The positions method has no parameters, when you invoke it on a certain tree it will just give you a sequence of all the positions in the tree, references to all the nodes in the tree. Once you access a particular position then using the element method on that positions you can access the element in that node. The swapElements (p, q) given 2 positions p and q, you are swapping the elements at these 2 positions. replaceElement (p, e) which means that given a position p you are replacing the element at that position with e. In query methods given a particular position isRoot (P) is this the root of the tree. Given a particular position isInternal (p) is this is an internal node, given a particular position isExternal (p) is this external or leaf. Sometimes we use external or sometimes we use leaf, does this position correspond to leaf. In accessor method when I call root it will return a position of the root, an object of type position. isRoot (p) is determined as given a position is it a root and root () returns the position of the root. Hope you understand the difference between the both. The position of the root means it is not a reference to that particular node but it is a reference of type position so that you cannot access anything except the element. This was the same as the type casting which we did earlier. The method parent (p), given a particular position returns the parent node. The children (p), given a particular position returns the children of this node. If it were children, there could be of many children for a certain node. How it will return the various children? It will return as a sequence, it will return a sequence of object type sequence which will contain the position of all the children. Position has an element method which will let you to access the data. The update methods are typically application specific and this would be the generic method for a tree. Binary tree should really be treated as a sub-class, as a derived class from a tree. All we need to do is to continue to have the same method as we described for the tree but we will have some additional methods. There would be a notion of a left child given a position give me the left child, give me the right child or give me a sibling. We will come to the update methods when we see the example of it. What is the node structure in a binary tree? What are the kinds of data that you would be keeping in an object corresponding to a node of the binary tree? You would have the data, you would have a reference to the left child and a reference to the right child and you would have reference to the parent typically. You would also have a reference to key or data associated with this node, any element that is sitting in this node you would have reference to it and these were all sitting together. The reference to the node which is at the center will not be stored in the node and that does not make any sense. For instance if I access to this node, suppose this was the root node and I use the root method to get a position to this node then using that position I can now access the left child by invoking the left child method and in this manner I can get the position of any node. Once I have the position of a node I can then invoke element method and any method to get the data associated with that. A node in this case would definitely implement the position interface. In the slide given below, this is what the binary tree would look like if you look at the links and so on. The parent link would be null for a root node because it has no parent. Then it would have left child in which the left child would be referring to the node on the left and the right child would be referring to the node which is on the right and so on. In the above diagram the extreme right node does not have any right child, its right child member would be referring to null. That was for a binary tree. How do we take care of arbitrary trees? Let us say unbounded trees. The root node has 3 children and its child which is on the left also has 3 children. Are we going to have 3 different data members to refer to 3 children? That is not clear about how to do it, because then if it has 4 children then how you would create space for another member. The way to do it is that you have a reference to 1of the child only and then all the children are in a linked list. Each child will have a reference to the parent, so all of these children would be pointing to the parent node. But the parent node would be pointing to only 1of them. Which would be the head of the linked list in the 2nd level? From the node which is at the 1st level, if I want to refer to the children, I can just come here essentially return all the elements of this linked list. How do I know that I have reached the last element of the linked list when the next is empty? The 1stfield of the node is empty because it does not have children. Every node still has only 3 data members, parent or 3 references 1 for the parent, 1 for left most child and 1 for the right sibling. The left node on the level 1 would refer to left most child and not to all the children because that we do not know how many are there. It will have 1 more to refer to the right sibling because for the left node at the level 2 should refer to the right sibling. You can do with only 3 references. The node in the level 2 has only 1 child and it is just pointing to that 1 child, there is no sense of left and right here. This is not a binary tree, left and right makes sense only in a binary tree. Actually I should not have written left child, I should have written 1st child. That is any 1 child then it just point to that 1 child and that let you to access its siblings through a linked list. From the 1st child you will go to the next child and to the next child and so on. You can step through all the various children throughout linked list. With that we will end our discussion on binary trees today. In the next class we are going to look at reversals of trees. English - Lec 06