3

I am wanting to secure some highly sensitive data in a database. This would mean that the data needs to be encrypted and remain secure for 100 years if it were to fall into adversary hands. I also want to limit the amount of data that is vulnerable in plaintext at a time in RAM. This is so there is less chance of plaintext data being paged to disk. Also the database may be quite large so it needs to be more efficient than decrypting the whole database at a time just to access it. Therefore I am thinking about encrypting the sensitive data on a database row level. This would mean a unique index which references the record is unencrypted, so each record can still be found/retrieved, however the sensitive data itself is encrypted.

My solution would be to have the data per database row:

index | IV | sensitive encrypted data | MAC 
  • A 256 bit database key will be used to encrypt the sensitive data which will be generated using /dev/random.
  • The IV for each row will be 256 bits from /dev/urandom (faster than /dev/random).
  • The encryption algorithm will be Twofish.
  • The MAC of each record will be HMAC-SHA3 of the index, IV & sensitive data using the key.

The system is single user. The user will create a strong alphanumeric passphrase (minimum 19 characters).

A password based key derivation function will be run on the passphrase to create a derived encryption key which will then be used to separately encrypt the database key with Twofish. This is so the user can change their password without having to re-encrypt the entire database - they can just create a new password and re-encrypt the database key instead. I understand that this is the weakest part of the scheme, but would like to make it very difficult for an attacker to brute force attempt password guesses. I think the security needs to rest in the strength of the passphrase, as any sort of secondary token could be compromised at the same time as the device which holds the encrypted data (I am thinking the device & token could be confiscated for arbitrary reasons when going through airport security so it would be no use).

  • To derive the key from the passphrase, PBKDF2 will be used with 10,000 iterations using HMAC-SHA3 with a 256 bit output and a salt of 256 bits obtained from /dev/urandom.
  • What I am trying to do is balance the number of password characters required to make the data secure versus making it reasonably fast for users on a mobile device which have slow processors and limited memory. I don't expect the user to wait more than 5 seconds for the PBKDF to complete.
  • A MAC is created using HMAC-SHA3-256(derived encryption key, salt | encrypted database key) and stored next to the salt and encrypted database key on disk. This can be verified when logging in to make sure they entered the correct password.

When the program loads, the user enters the passphrase. The KDF runs, which generates the key to decrypt the database encryption key. The real encryption key is then the only thing kept in RAM while the program is running and used to verify and decrypt individual database records when required.

  1. What's the optimal length for the row level IV? Is 256 bits fine?
  2. Is the minimum password strength of 19 characters and 10,000 iterations of PBKDF2 strong enough to protect the 256 bit database key? If not, what parameters would work?
  3. Is PBKDF2 still a good algorithm still to use here? If not, what Scrypt parameters?
  4. Any further changes or recommendations to make the system secure?
8
  • This is a cross-post from CryptoSE because they did not think it was on topic there.
    – aobocod
    CommentedOct 15, 2014 at 10:09
  • Regarding the size of the IV, it's not a variable. Your block size is 128bits with Twofish, thus your IV must be 128bits. Regarding your other questions, I'd say your threat model is very confused. You're afraid of leaving plaintext in RAM, yet you let your main database key in memory? I also don't understand why you use the exact same key for every row. And are you also afraid of possible attacks against your software implementation, or just your cold data?
    – Dillinur
    CommentedOct 15, 2014 at 12:22
  • The MAC key should be computationally independent of the Twofish key.
    – user49075
    CommentedOct 15, 2014 at 13:50
  • @Dillinur Correct, I should use the IV for the cipher. Regarding the thread model, how are you supposed to access the data without a key being in memory somewhere? I consider it better to have just the key and load small amounts of the sensitive data one row at a time into memory, which is controllable, rather than loading the entire database into memory which could be 100 MB+ and the OS might decide to randomly page some of it to disk. Or is that an unlikely threat? What's wrong with encrypting using the same key per row? One key can encrypt 2^128 bits of data safely.
    – aobocod
    CommentedOct 16, 2014 at 8:56
  • @Dillinur It is different IV per row. Mainly I am afraid against attacks on the cold data. It is to protect the data at rest and the device turned off. If an attacker gets a hold of the device while the program is open and the key is in memory there isn't much you can do to protect against that. So the user will be responsible for logging out (which would wipe the key from memory) before going through an airport security screening or similar where the device could be seized.
    – aobocod
    CommentedOct 16, 2014 at 9:21

1 Answer 1

1

What's the optimal length for the row level IV? Is 256 bits fine?

The answer depends on whether some of the plain text will be known or can be easily guessed by the attacker. The question described the data as, "Highly sensitive," and the protection period of, "100 years," so use the strongest cipher and largest secret conveniently available for encryption.

Is the minimum password strength of 19 characters and 10,000 iterations of PBKDF2 strong enough to protect the 256 bit database key? If not, what parameters would work?

The answer to the number of characters is not determinable without knowing the rules imposed on passwords during validation.

Is PBKDF2 still a good algorithm still to use here? If not, what Scrypt parameters?

An answer exists that provides some good background on bcrypt or PBKDF2.

Any further changes or recommendations to make the system secure?

Yes. Many. Here are a few.

  • Remove all redundancy that you can reconstruct in query results after decryption.
  • Ensure that the salt used to exhaust the block before encryption is truly unpredictable. (Use the best entropy source you can acquire on the server.)
  • Separate critical payload from less critical payload if they can be statistically independent to reduce the memory footprint of the plain text versions of the critical data.

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.