15

Is it possible to generate a file with a given sha256sum checksum?

That is, reverse the process of a sha256sum checksum. That is, if we have a checksum, can we generate txt file data (need not be original) whose its checksum is equal to the required checksum?

Just highlighting "I don't need the original file. Any other random file with given sha256sum is also ok"

Also, can two or more files have the same sha256sum?

7
  • 1
    Not only is it borderline impossible to generate a collision for an arbitrary checksum. To date, there have been 0 repeating checksums generated between any sha256sums. Unless preimage is broken, at our current rate of generating them, statistically, there will never be any collision for the lifetime of the universe: crypto.stackexchange.com/questions/47809/…CommentedApr 24, 2023 at 21:35
  • We don't know anything about your scenario... @mentallurg's answer is fine, just of course, if you do have knowledge of the file size, it might make it a lot more "tempting". Say it's 5 bytes long (like "Hello"), then bruteforcing it will be very quick (assuming you try all possibilities one by one starting with the shortest). Each byte you add will make it 256 times longer, so you can see why it's generally deemed "unfeasible".CommentedApr 25, 2023 at 15:13
  • 12
    You've asked about preimage resistance, second preimage resistance, and collision resistance in the same question. These are exactly the 3 core properties of a cryptographic hash function.
    – Nayuki
    CommentedApr 25, 2023 at 16:43
  • 3
    Yes, it is Is it possible to generate a file with a given sha256sum checksum. As an example, consider the following 'given' sha256 checksum: b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9. The file whose sha256 checksum is that above is: hello world. Here is the proof: echo -n 'hello world' | sha256sum produces b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9.
    – mti2935
    CommentedMay 1, 2023 at 17:56
  • 2
    @AmruthA Of course, I have no idea. The point I was trying to make is that for well-known inputs (e.g. every password in rockyou.txt), look-up tables (aka rainbow tables) exist, which make it simple for anyone to lookup a 'given sha256 checksum', and find the input whose sha256 checksum is that which was given. I realize your question pertains to inputs that are not well-known, but this highlights the reason that salted password hashing is so important.
    – mti2935
    CommentedMay 2, 2023 at 14:58

6 Answers 6

34

No. It is computationally infeasible.

It is not obvious. One might think that since the algorithm is simple and since the number of collisions is infinite, there should be a way to find a file for the given hash. (Because of pigeonhole principle there exists at least one hash that can be produced by an infinite number of files; we don't know if there are more such hashes, and how many; neither we know if for any hash there exists a file that produces it).

But currently there is no algorithm known to find a file that produces the given hash within a practically acceptable time. This property is called pre-image resistance. By "no algorithm known" we mean no algorithm except of brute-forcing, which in case of SHA-256 would require the computing power of the whole world for the time longer than the Universe exists.

SHA-256 was designed to be used for cryptographic purposes and pre-image resistance is a desired property.

See more details here: What gives SHA-256 its preimage resistance?

1
13
+50

Theoretically yes, but practically no (we think).

SHA-256 is designed to be a cryptographic hash function. Among other things, this means that it's designed to be resistant to doing what you're asking (which is either a "pre-image", "second pre-image, or possibly "collision" attack).

Basically, those are all slightly different ways of asking "can I generate a file whose SHA-256 hash is 'h'".

To the best of our knowledge, there's no way to "run the algorithm backwards". That is: there's no way to start with a hash and run the math backwards to see what the original message (file) was. That means that, to the best of our knowledge, the only way to see what message might hash to a certain value is to start trying messages to see where they go.

SHA-256 hashes are 256 bits long, which means that there are 2^256 ~= 1.16e+77 possible hash values. New Scientist reports that there are "about 1.04 × 10^44 molecules in the Earth’s atmosphere" (making some simplifying assumptions). NASA reports that there are "at least 100 billion stars in the Milky Way" (1 billion = 1,000,000,000, so 100 billion = 100,000,000,000 = 1e11). If every star in the Milky Way has a copy of Earth, that's ~1e55 molecules of atmosphere; if every star had 1,000 Earths, that's still only ~1e58 molecules of atmosphere. If every one of those molecules had a unique 256-bit number (which a SHA-256 hash can be treated as), we still need to find a few more billions to get to even a full 1% of the possible values.

The universe is 13.4 billion years old, or about 4.2e+17 seconds. If each molecule of atmosphere got a new, unique 256-bit number every second since the big bang, we're up to around 4e75, or about 4% of the possible values.

To be guaranteed to find a file that generates a given SHA-256 hash, you need to search every molecule of atmosphere across every one of those Earths once a second for ~25 times the age of the universe. On average, you'd find one in about 12.5 times the age of the universe.

... we think. It's possible that there's a short-cut, but nobody has found one yet.

2
  • 1
    @user253751 Indeed, SHA-256 (and others like MD5, SHA-1, etc.) are based on an internal cipher where the plaintext is the public initialization vector and the key is the message to be hashed. SHACAL-2 is an unpopular way to use SHA-256's internal cipher directly.
    – Nayuki
    CommentedApr 26, 2023 at 17:51
  • 1
    "We think" is a very important part of this answer. A few years ago, many people would have replied the same thing about SHA-1 (and previously, MD5). While the brute force attack on any of those is definitely beyond the realm of possible, weakness have been found in those, and it's quite possible one will be found in SHA-256 as well. Let's hope not!
    – jcaron
    CommentedApr 27, 2023 at 15:47
4

It is highly unlikely that you can generate a file with a specific sha256sum checksum. Sha256 is a cryptographic hash function, meaning it is a one-way only function. In other words, it is, as of the current time at least, not possible to reverse the process and generate a file that matches a given sha256 checksum. The only way is to brute force the contents of a file until it randomly matches the checksum.

It is also theoretically indeed possible to create a file with the same sha256 checksum as another file. This is known as a "collision attack". However, finding such a collision also requires much computing power and is practically unlikely to happen or even up to impossible, too.

As for your follow-up question, it is theoretically possible for two different files to have the same or find the original contents of the sha256 checksum. However, the probability of this happening is extremely low. The sha256 algorithm produces a 256-bit hash value, which means that there are 2^256 possible hash values. The chance of two files producing the same hash value is therefore very unlikely.

    4

    I will answer the second question first, it will make more sense.

    Can 2 or more files have the same sha256sum?

    Yes, it's possible. It's called collision and it's a property of any hash function. As the output size is limited but the amount of possible inputs is not, there will be unlimited distinct inputs that map to the same output.

    For instance, create a very simple hash function (I will call it Mod2) that converts the input to an integer, divides by 2 and returns the remainder. There are only 2 possible outcomes (0 or 1) for any number of inputs, so when you enter 381 and 111111113 and have the same output, you have a collision.

    SHA256 is more useful than our Mod2, because it has 256 bits of output (hence 256 on its name), so the possible outputs is 2²⁵⁶. That's a very large number, but it's way smaller than the amount of possible inputs. As Peter reminded me on the comment, the SHA hashes are cryptographically secure hashes, and that means that a minimal change on the input will generate a very large difference on the output. CRC32 is an example of a non cryptographic hash function.

    Is it possible to generate a file with the given sha256sum checksum?

    If you get our Mod2 hash function before, can you create an input only having the output? Sure you can! Just get the output (1, for instance), throw an input on the function (say 9000), and check if the result is 1. It's not, so you take another input (say 9001), check the result, and it's 1. You generated a file using the hash, but you have no way to tell if you generated the file. It could be 31337 and you believed it was 9001.

    You are able to generate something that collides with the output, but it's not possible to guarantee that you created the original file that produced that output. Not only that, but to create a collision on a 256-bit output space you would have to use so much computing power that makes no sense to do so. It will cost several orders of magnitude more than just storing the entire file on the most expensive media you have around.

    8
    • 1
      Mod2 is a very silly example for a hash function. It just returns the low bit of the input! No dependence on higher bits, and directly leaks one bit of the plaintext. A less silly example would be parity (horizontal xor of all the bits); you could call it a non-cryptographic hash. Of course a preimage attack on either is trivial: an all-zero message except for the low bit being equal to the hash value you want. What makes SHA256 strong isn't just the number of bits; mod2^256 (return the low 256 bits of the message) would be equally useless.CommentedApr 26, 2023 at 5:54
    • 4
      Mod2 is as silly and simple as possible, by design.
      – ThoriumBR
      CommentedApr 26, 2023 at 15:12
    • 1
      It's so silly that it makes a bad example, I think, because it's so obvious and trivial to break analytically, rather than brute-force. And you don't mention any ways in which SHA256 is different except for having more bits. According to comments on another answer, there are no known breaks against SHA256 that let you do any better than brute force for preimage, but that's an important thing to mention.CommentedApr 26, 2023 at 15:21
    • 1
      @PeterCordes it was an example of the very basic idea of a hash function, nothing more. And it perfectly suffices at that. I've seen it used in lecture slides repeadetlyCommentedApr 27, 2023 at 8:36
    • 1
      @PeterCordes you completely glance over the fact that a parity bit would need extra explanation and might confuse a reader with a less deep background (such as likely OP). The example was chosen well for the purpose of this answer and your refusal to accept it because your ADVANCED UNDERSTANDING doesn't like is is off-topic here. Exclusively the "hey a disclaimer should be there"-part of your comments is valid feedback. For this quite good answerCommentedApr 27, 2023 at 14:15
    1

    Is it possible to generate a file with a given sha256sum checksum?

    Yes. It is possible to generate an input with the given checksum. (See answer to your last question).

    That is, reverse the process of a sha256sum checksum. That is, if we have a checksum, can we generate txt file data (need not be original) whose its checksum is equal to the required checksum?

    No, sha256 is not reversible.

    Also, can two or more files have the same sha256sum?

    Yes, more than one input can generate the same output.

    Since sha256 is a hashing algorithm, it is possible for more than one input to generate the same output. Sha256 generates a finite number of outputs (2^256) given an infinite number of possible inputs. If you somehow managed to find an input that generates a given checksum, there is no guarantee that it is the original input that generated the checksum in the first place. This is called a collision and is a feature of hashing algorithms in general.

    Since 2^256 is a large number, the possibility for collisions is small. That is why sha256 is very useful as a check for data integrity. For example, when you download a file, many sites will give you a sha256 checksum to validate the integrity of your download. If bits get corrupted during transfer, the algorithm will almost certainly produce a very different checksum from the original. It is not a guarantee that the data is not corrupt, however, but your confidence can be extremely high it is not.

      0

      Of course it's possible but where do you find a computer that's powerful enough to do so? You can always just write a program that starts creating every single possible combination of bytes and calculates the checksum for it until it matches, but that would take an insane amount of time.

      It's also possible for 2 files to have the same checksum, the checksums aren't infinitely long so there also aren't an infinite amount of combinations. Finding 2 files with identical checksums will be nearly impossible though. Maybe when we have quantum computers.

      3
      • 4
        Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
        – CommunityBot
        CommentedApr 25, 2023 at 17:01
      • IMO you should make more clearly and not only mention at the end that its pratically impossibleCommentedApr 25, 2023 at 21:19
      • Please do not repeat other answers. Just upvote unless you have a different perspective on the answer.
        – schroeder
        CommentedMay 1, 2023 at 12:30

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.