Skip to content

Instantly share code, notes, and snippets.

@sscherfke
Last active March 26, 2017 19:16
Show Gist options
  • Select an option

  • Save sscherfke/fe58bb5bcc3e5028b9199902bf895d7e to your computer and use it in GitHub Desktop.

Select an option

Save sscherfke/fe58bb5bcc3e5028b9199902bf895d7e to your computer and use it in GitHub Desktop.

Common data exchange format for the Master Password algorithm

There are a lot different implementations of the Master Password algorithm for various platforms – official ones and unofficial ones. You can use MPW on your mobile device(s), at home, and at work.

In theory, you only need to remember your master password in order to use the app on different platforms. In reality, you need to remember the exact spelling of the site name, the password type, the counter value, the algorithm version, and maybe other attributes. To help with that, you may want to syncronize your site data between multiple devices.

Since there is currently no online sync. service, you have to use the import and export feature of your app(s). This makes it necessary to have – in the best case – a single format that is reliable, that can be read and written in various languages and that everyone (the authors of all the apps) agrees with.

One of the most common formats for this use case is JSON. It is supported by virtually any language, it uses UTF-8 by default and it supports (in contrast to simpler formats like CSV) nested data structures.

There is also a platform independend encryption standard built on top of JSON: JWE (it is actually part of the bigger Standard JOSE). There are JOSE implementations for C, Java, Python and other languages.

An alternative to JWE would have been Fernet. It is a bit easier too use, but I could not find Fernet implementations in Java or C.

Proposed data file format

Here is a short example of the proposed exchange format:

{
    # Version of this file format.  Makes it easier to support older formats
    # (if we change something in the future)
    "version": 2,

    # Name of the default user at program start
    "default": "Alice",

    # The users’ sites data
    "users": {
        "Alice": {
            # The base64-urlsafe-encoded Scrypt hash of the user’s master key.
            # It is used by some aps to verify the user’s mpw.
            # 
            # Hashing the key with sha256 might (for now) be okay as well, but 
            # for marketing reasons, we should not use sha256 for hasing the 
            # user’s most important secret.  Also, we don’t know what NSA and 
            # friends are capable of now or in two years.
            # 
            # Since using Scrypt does not really hurt, we should use it instead
            "keyhash": "<base64urlsafe(scrypt(master_key))>",

            # The version of the MPW algorithm used to derive the master key:
            "keyhash_version": "v3",

            # This is JWE encrypted object with the user’s sites data.  The
            # data is encrypted using AES256-GCM (so it is encrypted *and*
            # authenticated).
            #
            # The encryption key is derived from the master key using HKDF 
            # expand with sha256.
            #
            # The "keyhash" and "keyhash_version" could also be part of the 
            # protected JWE header to make sure they are not tempered with.
            "sites": {
                "ciphertext": "<JWE AES256-GCM encrypted sites>",
                "iv": "<JWE initialization header>",
                "protected": "<JWE protected header>",
                "tag": "<JWE tag>"
            }
        },
        "Bob": {
            # ...
        }
    }
}

The unencrypted sites data would look as follows:

{
    # Site name:
    "example.com": {
        # Password type as string
        "pwd_type": "long",
        # Counter value
        "counter": 1,
        # Login name for the site
        "login": "my_username",
        # Bool indicating whether to generate a login name based on the MPW
        # and using the *login* variant
        "generate_login": false,
        # Algorithm version
        "version": "3",
        # Last access time for the site (can be used for sorting sites)
        "access_time": 123456789,
        # User’s password if "pwd_type" is personal
        "personal_pwd": null,
        # Contexts for security questions/answers.  Answers are based on the
        # MPW and generated using the *answer* variant.
        "questions": [
            "name dog",
            "mother maiden name"
        ],
        # Tags/categories for site (for grouping and searching)
        "tags": ["awesome", "spam"]
    },
    "spam.org": {
        # ...
    }

I have been using this format in my own MPW implementation since several years now (using Fernet for encryption instead of JWE) and it worked quite nicely so far.

@dkunzler
Copy link

Hi,

looks ok for me.
I have a few suggestions however:

  • in my app I have categories to group sites together. Would be nice to have these groups in the exchange format too. (just Strings, the list of all groups can automatically be derived by calculating the set of all used categories)
  • in the original algorithm there is not only the password type but also the password variant. This ranges from password over login to answer. I don't know exactly how they are used in the original app, but I use login for generated names and answers for generated phrase in the Android app.

Regards
David

@sscherfke
Copy link
Author

Adding categories is a good idea. Does one site only have one group or can they rather be used like tags, so each site can have a list of groups? I tend to implemend the latter since it will provide more flexibility.

The password variant is only implicitly encoded. For passwords, the password variant is used. The login variant is used if the key generate_login is true. Security questions are not yet included. They will use the answer variant and will also include a context which holds the question's most significant words.

@sscherfke
Copy link
Author

I added questions (although the corresponding password variant is called answer) and tags to the site object.

@ttyridal
Copy link

Interesting.
I would like to suggest a couple of ideas:

uuid and lastModified field. accessTime would be defined to not update lastModified. This should help conflict resolving.
Leave a field for the encryption algorithm (future proof). Do you support null-cipher ? (Hope you do gcm-iv handling correctly btw)

Regarding the pw_type. What about using the template instead of pin/short/long/...? (just starting the discussion here)

A settings object at (global?) user and site level. With the web-addon I would typically store the following fields.

matchURL: (regexp-string,   the browser-addon auto-pick site based on this)
defaultTemplate/Type: (long,short,  only meaningful in global, user)
masterPasswordTimeout: (int-seconds)
autoInsertUser (true/false/none)
autoInsertPassword (true/false/none)
autoCopyToClipboard (true/false/none)
externalMasterPassword: {keyName: string}    (integration to kwallet etc, keyName is the lookup string in external store)

As I wrote to Maarten, not every implementation need understand all keys/values, but the keys used (and their meaning) should be publicized. Unknown items should be ignored. (sane default behaviour)

Finally, I think it would be great if we could move the discussion over to @lhunath pages. While not paramount, it would definitively not hurt if he's in on the outcome of this.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment