Providing integrity for Encrypted data with HMACs in .NET

Once again I’m about to dip into and scratch the surface of Cryptography. So here’s the disclamer: This is not my job. I don’t do this for a living. Don’t ever make up your own encryption algorithms. Try not to write your own cryptographic code. Don’t take anything that I say as the advice of a security expert or legal advice or assume that any code in this post or any linked posts are correct in any way shape or form. If you have a project that requires cryptographic security, I suggest you find someone who has been doing this far longer than I have to write the code for you. Then have several 3rd party security firms review or write your security code. In short, I’m not responsible for your mistakes, or the correctness of the code or writing presented here.

As even more of a taste of just how hard it is, I admit that I thought the last article I wrote covered most of the basics of modern encryption in .NET pretty well. The result? Nope. Missed something big. See, cryptography is hard not because it’s hard to say “take a password plus this data and make it so someone can’t see the data without the password”, but because of all the ways someone could get access to your password, or access your data when its not encrypted, or access your computer when it’s unlocked, or crack your password if you pick a bad password to begin with. Take a look at this reddit thread article that nicely rips apart my previous article. In all seriousness, I really appreciate the feedback, especially when it comes to security because it’s hard, and the devil is in the details.

I forgot to cover data integrity in my last article. Encryption is not enough. Redditors referred to it as authentication, which is one of the uses of HMAC’s, but authenticity is not whats important in the example case I presented for my original article. We are interested in “How to detect that someone or something accidentally or maliciously changed the encrypted data”, essentially, we’re interested in cryptographic data integrity. In addition to that there were a few other things wrong with the article that the redditors pointed out, such as my salt values being overly large, just picking CBC without covering other chaining methods, not discussing how to remove keys from memory, and referencing Jeff Atwood. The long and the short of it, they’re completely right. Getting crypto correct is hard, and good crypto systems are worth millions of dollars. You’re probably not getting paid that much to write a little encryption. So use a library thats already been written and provides high level abstractions and don’t write crypto code yourself.

Alright. HMAC, what is it? The MAC stands for a Message Authentication Code, the H stands for Hash. Put it all together and you have a Hash-based Message Authentication Code. It’s a hashing function thats deliberately designed to resist malicious tampering. The key is preventing malicious tampering. A normal hash function (e.g. MD5 or SHA1) would detect accidental byte tweaks, but somebody maliciously tampering with your data could tinker with it, create a new hash code for the data, replace the old hash code, and you would never be the wiser. For this reason, Message Authentication Codes generally fall into the category of keyed hash algorithms since they use a key or derived key and mix it with a hashing or encryption function to produce a value that an attacker can’t reproduce, providing both data integrity, and (if you happen to be in a sender and receiver role where both parties share a key) authentication. There are however, a couple of things to take into account. First, we shouldn’t use the same key twice to encrypt and hash the data. The more times the same key is used, especially against a known piece of data, the more likely an attack can be developed and used to figure out our key. Arguably, the final key used for encryption and the final key used for the message authentication should be different, as different as possible. The best way to do this would be to append or change the key such that both the encryption key and authentication key run through the KDF (Key Derivation Function) from different starting points. As an example, consider the following:

// Derive the passkey from a hash of the passwordBytes plus salt with the number of hashing rounds.
var deriveKey = new Rfc2898DeriveBytes(password, passwordSalt, 10000);
var deriveHMAC = new Rfc2898DeriveBytes(password, hmacSalt, 10000);
// This gives us a derived byte key from our passwordBytes.
var aes256Key = deriveKey.GetBytes(32);
var hmacKey = deriveHMAC.GetBytes(32);

Because the hash function mixes the password with the salt, and because we have different salts, after only one round the derived keys will already be different on account of the salt value. So we have a derived key. One for the actual encryption, one to prevent tampering and provide data integrity.

On a side note, it’s arguably better to authenticate the encrypted output (encrypt then authenticate) rather than authenticate the plaintext, then encrypt, or authenticate and encrypt. Again, I know it’s beating a dead horse, but cryptography is hard, and ultimately, the security of a system is going to depend on the security of the entire system, not just the individual parts. So. We have an HMAC key, we have an encryption key, and we know that we want to encrypt, then authenticate the encrypted output. In addition, we want to make sure anything else that could easily be tampered with is also authenticated, such as our Initialization Vector, since any change to it can easily affect our decrypted output. Another small advantage thats almost not worth mentioning is that by authenticating the encrypted output instead of the plain text is that we can detect if anything has changed even before we start decrypting the text. So here’s an example. In this case, I chose to use HMACSHA1, there’s others on the MSDN but I chose this particular one since it uses the same hash algorithm used internally by the KDF I used in my previous post, Rfc2898DeriveBytes, aka (PKDF2).

var hmac = new HMACSHA1(hmacKey);
var ivPlusEncryptedText = iv.Concat(cipherTextBytes).ToArray();
var hmacHash = hmac.ComputeHash(ivPlusEncryptedText);

In this case, we’re using our derived hmacKey, and we’re computing the hash of both the initialization vector concated with our encrypted ciphertext. That gives us everything we need to have a self validating “package” of data that is secured and can’t be tampered without us knowing unless the attacker knows our key or can break AES256 encryption, but at that point this whole discussion is pointless.

With decryption, remember how I said we compute the Encryption and HMAC key separately? If we did that, and if we computed the hmac over the encrypted data, we can perform the validation step on the data before we compute our decryption key. The only reason we would do this is so that if the data is invalid or has been tampered with we don’t take the time to also compute the decryption key. Small things, but I wanted to explain why the key computation for the encryption and hmac is kept separate:

var deriveHmac = new Rfc2898DeriveBytes(password, hmacSalt, 10000);
var hmacKey= deriveHmac.GetBytes(32);
var hmacsha1 = new HMACSHA1(hmacKey);
var ivPlusEncryptedText = ivBytes.Concat(encryptedBytes).ToArray();
var hash = hmacsha1.ComputeHash(ivPlusEncryptedText);

if (!BytesAreEqual(hash, hmac))
   throw new CryptographicException( "Your encrypted data was tampered with!" );

var deriveKey = new Rfc2898DeriveBytes(password, passwordSalt, 10000);
var aes256Key = deriveKey.GetBytes(32);

using (var transform = new AesManaged())
{
   using (var ms = new MemoryStream(encryptedBytes))
   {
      using (var cryptoStream = new CryptoStream(ms, transform.CreateDecryptor(aes256Key, ivBytes), CryptoStreamMode.Read))
      {
         var decryptedBytes = new byte[encryptedBytes.Length];
         var length = cryptoStream.Read(decryptedBytes, 0, decryptedBytes.Length);

         var decryptedData = decryptedBytes.Take(length).ToArray();
      }
   }
}

So, there you go. Basic explanation about why HMAC’s are important, what I missed, some code, and the disclaimer to write security code at your own risk. Full demo code, demo code output, and a bunch of random links after the break.

See the code, output, and related links

GitHub 101 on Windows

This post is a continuation off of this post on Up and Running with Git and Visual Studio, if you have no idea what Git is, or how to setup git and git-extensions on windows I recommend you start there.

So you’ve installed git and git extensions, written a sweet hello world application, made some commits, split some branches and merged features in and out here and there, and now your looking to share that code with the world or you’ve decided that just having a local copy of your code isn’t good enough for a backup and don’t want or can’t go out and purchase another machine or setup remote hard drive. For whatever reason, you’ve decided to use GitHub to push a copy of your source code up to, either to share with the world, keep a local backup, or to share with a team.

First, you’ll need to create a GitHub account, which is free for public/shared repositories (example: Open Source Projects), and from $7-22 a month for private repositories, and $25+ for organizations and companies. Assuming you decide to go with a free account to try things out, you’ll land at a page that will look like this:

Pick a username, an e-mail address, and a password, click the signup button, and bam! You’re now a member of the GitHub community. You can now go explore, setup your profile, write a bio (Your picture is based on your e-mail address from the Gravatar service, the homepage explains it well enough)

Now, on GitHub your going to create a new repository, it’ll ask you for a project name, optional description and homepage. Enter all the details and click Create Repository (Note that Project Name doesn’t have to be unique in all of GitHub, just unique to the repositories and forks that you have on your user account)

Now obviously, if you have a paid account you’d have the option to create a private repositories that only you (and people you invite to see) can see, fork and access. Once you’ve created your repository, you’ll be greeted with instructions on how to set it up. If you’ve followed my previous post you should have already setup your e-mail address (assuming that it’s the e-mail address you used to sign up to GitHub) and name, and so any commits will automatically be associated to this account based on the e-mail address. Now, once you’ve created your account your going to have to add your public key. What the heck?

Ok, without getting into too much detail (You can read way more about transport layer security and public – private key encryption if you find it as interesting as I do…) GitHub is providing two way authentication by using an SSL key. Normally when you connect to a secure server over SSL the server identifies itself with the server name, certificate authority, and a public key, however, your computer remains anonymous and it wouldn’t matter what computer your using. By providing an SSL certificate, your telling the GitHub server “Not only do I trust that you are who you say you are, I’m also providing a strong encryption grade authentication method that proves who I am when I’m connecting to you.”

Details aside, the long and short of it is that you have to generate an SSL certificate and give the public key token to GitHub (obviously keeping the private key secret). Browse to your repository in GitExtensions and open “Generate or import key”.

Which will allow you to generate a key (You will need to save both the public and private key somewhere), by default I believe it generates an SSH-2 RSA 1024bit key which is what you need for GitHub as of this writing.

If you’ve already done this you can just use an existing key. Once you’ve saved it (optionally supplying a pass-phrase for the private key file), it’s time to tell GitHub what your public key is.

Log into GitHub, go to account settings, and go down into the SSH Public Keys section, in there, give your public key a title, and paste in the public key from putty. The text should look something like this:

ssh-rsa AAAAB3NzaC1yc2EAAAABJQAAAIEAkn1Hp72xRMjYtfmaHRbtwTrNBEt2oPVKyGh8uU1b8BPw
CjrtCzOGHF66YzBAitpHqaGOkLPyLXHHk1BrAKaWE06W2aVooNdqAEUYBwe0gJGR/bmf73Qhey2xW9
PBE5ocsEpVnW5HtEoMTpiKaPDqSTY6ogH/dt4FSclubb/5DKE=

And after saving it, you should have a new key:

These SSH keys uniquely identify you as a user.

Now that you’ve added your public key into Git, you need to add the remote repository to your local repository and setup PuTTY with your SSH key. Open up Git Extensions, browse to your repository, open up the Remotes folder, and you should see the following:

From here, enter a name for the remote repository, and pull the URL from the repository startup page, it should look like this: git@github.com:yourusername/Test-Project.git

Brows to wherever you’ve saved your private SSH key file for PuTTY, and click the Load SSH Key. If you saved it with a password, you’ll be prompted to enter the passphrase you protected the public key file with:

Enter your passphrase and click ok.

Once done, click the Test connection. If this is the first time you’ve connected to GitHub, you’ll be notified that this server’s RSA fingerprint isn’t in the registry:

Check to make sure it’s the same as the one shown on the GitHub website, and hit yest to cache the GitHub server host key.

And you should then see that you can authenticate:

Save your configuration, close out of the dialog and select the up arrow icon (or go into the menu and select push) and push all or some of your branches to GitHub:

And there you go! Remote repository pushing and pulling. Just like that.

And now that you’ve pushed your repository up to GitHub, you can also pull other changes:

And that’s all for now, if you’ve got questions or comments, leave them in the comments for this post and I’ll respond to them as soon as I can.