Hash Pointers and Data Structures In section 1.2, we're going to talk about Hash Pointers and their application. A hash pointer is a kind of data structure that turns out to be used a lot in the systems that we're talkingabout. And a hash pointer is basically a simple thing, that we're going to take a pointer towhere some information is stored. And we're going to together with the pointer store acryptographic hash of the information. So whereas a regular pointer gives you a way toretrieve the information. A hash pointer is gonna let us ask to get the information back. It's also gonna let us verify that the information hasn't changed. So a hash pointer tells us where something is and what it's value was. And in diagrams like these, we'll depict a hash pointer. that we will each have, followed by an arrow pointing in a specific direction. Thus, consider anything drawn in this manner to bea hash pointer to this object. It serves as both a pointer to the location of the data's storageand a hash of the value the data had when we last viewed it. And we can take hash pointers and we can use them to build all kinds of data structures. So a key idea here, take any data structure or link lists or binary search tree or somethinglike that and implement it with hash pointers instead of pointers as we normally would. Below is an example of a linked list that was created using hash pointers. And we're going to refer to this data structure as a block chain. So just like a regular linked list where you have a series of blocks and each block has data as well as a pointer to the previous block in the list, here the previous block pointer will bereplaced with a hash pointer. So it says where it is and what the value of this entire previousblock was. And we're going to store, we're gonna remember the head of the list like this. Justas a regular hash pointer. And a use case for this for a block train like this is a tamperevident log, that is if we wanna build a log data structure that stores a bunch of data. So thatwe can add data onto the end of the log but if somebody goes later and messes with datathat is earlier in the log we're going to be detect it. That's what temper evidence means. Soto understand why a block chain gives us this tamper evident property. Let's ask whathappens if an adversary wants to go back and tamper with data later that's in the middle ofthe chain. So let's assume that an adversary wants to tamper with this block down here. Hewants to change the data here. And he wants to do it in such a way that we, the holders ofthe hash pointer at the head here, won't be able to detect it. So the adversary changed the contents of this block. And therefore, the hash here which is a hash of this entire block is not going to mash up because the hash function is collision free,it must be the case that the hash of this block is now different. And so we could detect theinconsistency between this data and the hash pointer that we remembered before or wecould do that unless the advisory allows tampers with the hash pointer. If he tampers withthis hash pointer then he makes these two match up. But now he's changed the content ofthis block. And what that means is that when we come back later and hash the contents of
this block, it's not going to match the hash that we remembered before because the contentsof the block has changed. And so we're going to detect the inconsistency between thecontents of this block and this hash, unless the adversary also tampers with the block overhere on the right. But now, when he does that, the hash of this block is not going to matchthe hash that we remembered up here and the hash that we're holding on to. And this theadversary can't tamper with because this is the value that we remembered as being thehead of the list. And so the upshot of this is that if the adversary wants to tamper with dataanywhere in this entire chain, in order to keep the story consistent he's going to have totamper with hash pointers all the way back to the beginning. And he's ultimately going to runinto a road block because he's wont be able to tamper with the head of the list. And so whatthis means is that just by remembering this hash pointer, we've essentially remembered akind of hash, a tamper evident hash of the entire list all the way back to the beginning. Andso we can build a block chain like this containing as many blocks as we want going back tosome special block at the beginning of the list which we might call the genesis block. Andthat's a tamper evidence log built out of the block chamber. So, a binary tree is a handy data structure that we can create using hash pointers. Hash pointers allow us to create binary trees, which are known as Merkle trees after their creatorRalph Merkle. The concept is as follows: let's say we have several data blocks that we willdraw across the bottom down here. We will select two of these data blocks in succession,and for each of these two data blocks, we will create a data structure with two hash pointers,one to each of these blocks, and so on across. We'll then go one more level, and a hashpointer to these two kids down here will be located in this block here. And so on, all the wayback up to the root of the tree. And then just like before we're going to remember just thehash pointer up here at the head of the tree. And we can then, if we want traverse downthrough the hash pointers to any point in the list. And we can make sure that the data hasn'tbeen tampered with. Because just like I showed you with the block chain, if an adversarytampers with some block down here at the bottom with the data that will cause the hashpointer that's one level up to not match. So he'll have to tamper with that. And therefore, he'llhave to tamper with the hash pointer one level up from there. And eventually he'll get up tothe top, where he won't be able to tamper with the hash pointer that we've remembered. Soagain, any attempt to tamper with any piece of data across the bottom will be in shortagainst, by just remembering the hash pointer at the top. Now, another nice feature about Merkle trees, is that unlike the block chain that we built before, that if someone wants to prove to us that a particular data block is a member of thisMerkle tree. All they need to show us is this amount of data. So if we remember just the rootand someone wants to convince us that this block is in the Merkle tree, they need to show usthis block. And we can verify that the hash matches up. And then they need to show us thisblock and we can verify that the hash of this matches that. They can show us this block. Weverify that the hash of this block matches this hash pointer. And then they show us the data.And just by verifying the hashes up to the root, we can ensure, we can verify that this datablock was in the Merkle tree. So that takes about log n items that we need to be shown, andit takes about log n time for us to verify it. And so at the very large number of data blocks inthe Merkle tree, we can still verify proven membership in a relatively short time. So Merkle trees have various advantages. One advantage of course, is the tree holds many items but we just need to remember the one root hash which is only 256 bits. We can
verify membership in a Merkle tree in logarithmic time and logarithmic space. That's nice.And there's a variant which is a sorted Merkle tree. That's just a Merkle tree where we takethe blocks at the bottom and we sort them into some order. Say alphabetical, lexicographicor numeric order or some order that we agree on. Having done that, once we've sorted theMerkle tree now, it's possible to verify non-membership in a Merkle tree. That is, we canprove that a particular block is not in the Merkle tree. And the way we do that is simply byshowing a path to the item that's just before where that item would be and just after where itwould be. And then we can say look, both of these items are in the Merkle tree, they'reconsecutive. And therefore there is no space in between them. There is nothing in betweenthem and so the thing that we are trying to prove non-membership of can't be there. Merkletree is binary search tree, built with hash pointers, we can do logarithmic time membershipproofs, non-membership proofs if we sort the tree and it is very efficient. It turns out that, more broadly, we can employ has pointers in any pointer-based data structure as long as it doesn't contain cycles. We won't be able to make all hashes match ifthe data structure contains cycles. If you think of it as an acyclic data structure, you can sortof start close to the lees or close to the objects that don't have any pointers coming out ofthem, compute their hashes, and then work your way sort of back towards the beginning. Butin a structure with cycles, there's no end that we can start with and compute back from. Sofor example, a directed acyclic graph, out of hash pointers and we'll be able to verifymembership in that day very efficiently. And it'll be easy to compute. So this is a general trickyou'll see over and over throughout the distributed data structures and throughout thealgorithms that we talk about later in this lecture and in subsequent lectures.