Lecture Note
University
Princeton UniversityCourse
Bitcoin and Cryptocurrency TechnologiesPages
4
Academic year
9
anon
Views
24
Cryptographic Hash Functions We shall examine the nature and traits of cryptographic hash functions in this topic. In essence, a cryptographic hash function is a mathematical function with three key characteristics. First ofall, it accepts any string as input, regardless of length. Second, it generates a 256-bit output thatis fixed in size, similar to the bitcoin technology. Finally, it must be efficiently computable, whichmeans that one should be able to quickly calculate the output given a string. Yet, we demandcryptographically safe hash algorithms. The cryptographic properties of hash functions are a complex topic. In this discussion, we will focus on three specific properties: collision-free, hiding property, and puzzle-friendly. Each ofthese properties is valuable and serves a unique purpose. The first essential property is that thefunction must be collision-free. This means that no one can find values x and y, such that x and yare different, yet the hash of x is equal to the hash of y. To understand this property, we can look at the operation of the function as depicted by the red arrows. If we have values x and H(x), and y and H(y), no one can find a situation where we havex and y that are distinct, yet when hashed, they produce the same output. Although collisionsexist, it is crucial to determine whether any collisions can be found by ordinary individuals usingregular computers. The method is based on the following: a. The method is based on the following: b. The method is based on the following: c. There is a 99.8% likelihood that at least two of the inputs will clash ifwe choose them. Although this method works for all hash functions, it is computationallyunfeasible because it takes a very long time. The likelihood of discovering a collision would stillbe infinitesimally minuscule even if every computer ever created by humanity were to computeback to the Big Bang. Сryptographic hash functions are a vital component of modern technology. Understanding their properties and how they work is crucial to building secure systems. Okay, so we know how to find a collision. But this method takes too long to matter. The question is, is there some other method that could be used on a particular hash function, in orderto find a collision? And that's the question that is harder to answer. Is there a faster way to findcollisions? Well, for some possible values of hash functions, of course there are. For example, ifour hash function were to simply take the input, modulo 2 to the 256, that is, it just selected thelast 256 bits of the input. Then we would know an easy way to find a collision. One collisionwould be the values 3, and 3 plus 2 to the 256. So, for some possible values of the hashfunction, it's very easy to find a collision. For others, we don't know. Now, one thing I need to note is that there's no hash function in existence which has been proven to be collision free. There are just some that people have tried really, really hard to findcollisions and haven't succeeded. And so we choose to believe that those are collision free.Okay, now, what good does collision freedom do us? If we can assume that we have a hashfunction that is collision free, then we can use that hash function as message digest. And what Imean by that is the following. That if we know that x and y have the same hash, then it's safe to
assume that x and y are the same. Because if someone knew an x and y that were different, thathad the same hash, of course, that would be a collision. Since there's not a collision that weknow of, then knowing the hashes are the same, we can assume that the values are the same.And this let's us use the hash as a kind of message digest. Suppose, for example, that we had afile, a really big file. And we wanted to be able to recognize later whether another file was thesame as the file we saw the first time, right? So one way to do that would be to save the wholebig file. And then when we saw another file later, just compare them. But because we havehashes that we believe are collision free, it's more efficient to just remember the hash of theoriginal file. Then if someone shows us a new file, and claims that it's the same, we can computethe hash of that new file and compare the hashes. If the hashes are the same, then we concludethat the files must have been the same. And that gives us a very efficient way to rememberthings we've seen before and recognize them again. And, of course, this is useful because thehash is small, it's only 256 bits, while the original file might be really big. So hash is useful as amessage digest. And we'll see, later on in this lecture, and in subsequent lectures, why it's usefulto use hash as a message digest. Thus, we need our hash function to have the second feature of hiding. And the property that we desire resembles this. There is no practical method to determine what the input x was if we areonly given the hash function's output. The issue is that this characteristic isn't really true. Let'slook at this example to see why that's the case. So here, what we're going to do is an experiment where we flip a coin. And if the result of the coin flip was heads, we're going to return the hash of the string "heads". And if the result wastails, we're going to return the hash of the string "tails". Now we're going to ask someone who didn't witness the coin toss but only saw this hash output to identify the hashed string. Of course, that will be simple. Finding x and the input string in thissituation are both simple tasks. Simply calculate the hash of the string "heads" and the string"tails," then check your results to see which one you got. And so, in just a couple of steps, you can figure out what x was. So the reason this example failed, that is the reason why an adversary was able to guess what the string was, was that therewere only a couple of possible values of x. Hence, there must be no value of x that is especially likely if we are to have a hidden attribute like this. In other words, x must be selected from a collection that is, in some ways, quitedispersed. So, the adversary's approach of just attempting all possible values of x or just a fewparticularly likely values of x won't be successful. So the hiding property that we are going toneed to set up is a little bit more complicated. And the way we're gonna fix this problem with thecommon value x, like heads and tails, is we're gonna take the x. And we're gonna put next to it,we're gonna concatenate with it, a value, r, which is chosen from a distribution that's reallyspread out. And so this H(r | x), that means take all the bits of r, and put after them all the bits of x. And so what we're going to say is given the hash of r together with x, that it's infeasible to find x. And thatthis will be true in the formally stated property that, if r is a random value chosen from adistribution that has high min-entropy, then, given H(r | x), it's infeasible to find x. And what doeshigh min-entropy mean? Well, it captures this intuitive idea that r is chosen from a distributionthat's really spread out. And what that means specifically is that there is no particular value that rcould have had, that would occur with more than a negligible probability. So, for example, if r is
chosen uniformly from among all of the strings that are 256 bits long, then any particular stringwas chosen with probability 1 in 2 to the 256, which is truly a negligible value. So, as long as rwas chosen that way, then the hash of r concatenated with x is going to hide x. And that's thehiding property that the hash function will be deemed to have. Okay, now let's look at anapplication of that hiding property. And, in particular, what we wanna do is something called acommitment. And this is kind of the digital analogy of taking a value, a number, and sealing it inan envelope, and putting that envelope out on the table, where everyone can see it. Now, whenyou do that, you've committed to what's in the envelope. But you haven't opened it, it's secret from everyone else. Later, you can open the envelope and get out the value, but it's sealed. So commit to a value and reveal it later. We wanna do that in adigital sense. So, to be more specific about what is the API that we're going to provide here, thecommitment API looks like this, that there are two things you can do. First, you can commit to amessage. And that's going to return two values, a commitment and a key. Think of thecommitment as the envelope that you're gonna put on the table, and the key as a secret key forunlocking the envelope. Then later, you allow someone else to verify, given the commitment andgiven a key, which you've told them in the meantime, and the message. So they can verify thatthat commitment, key, and message really do go together. And this will return a true or false.Okay, now to seal an msg in an envelope, what we do is we commit to the message. And thatreturns a commitment and a key, and then we publish the commitment. That's putting theenvelope on the table. Now, later, to open the envelope, what we're going to do is publish the keyand the message that we committed to. And then anybody can use this verify call, with thecommitment that we published previously, the key and message that we just announced, tocheck the validity of our opening the envelope. Okay, and the property, of course, we want fromthis, is that it behaves like sealing an envelope. And, in particular, the two security properties arethese. First, given com, the commitment, the envelope on the table, that someone just looking atthe envelope can't figure out what the message is. In cryptography, commitment is a process that ensures the integrity and confidentiality of messages by binding them to a specific value, which cannot be altered later. For instance, if onecommits to a message, they cannot later claim to have committed to a different one. We create a random 256-bit value called the key to carry out this process, and we concatenate it with the message. We then utilize the key value as H and return the hash of this concatenationas the commitment (key). Someone must compute the same hash of the key they received andthe message in order to verify the commitment, then they must compare it to the initialcommitment. There are two main security properties of this commitment scheme. The first property is hiding, which ensures that given H(key|msg), it is infeasible to find msg. This is possible because key ischosen randomly from a distribution, making it impossible to determine the message without thekey. The second property is collision-free, which means that it is impossible to find two different messages that produce the same hash. If this occurs, it would imply that two distinct inputsproduce the same hash value, violating the collision resistance property of hash functions. The third security property is puzzle-friendly, which means that it is challenging to find an input value that satisfies a particular output value y. This property enables the creation of search
puzzles, which are mathematical problems that require searching a large space to find thesolution, and where there are no shortcuts other than searching the space. To sum up, these characteristics are crucial for guaranteeing the security of commitment systems. They make it possible to bind messages to certain values, protecting their integrity andconfidentiality. An industry standard hash function that satisfies all of these requirements isSHA-256.
Exploring the Characteristics of Cryptographic Hash Functions
Please or to post comments