Octavia: Architecture

By Chris Palmer (noncombatant.org), 28 May 2010
Slightly amended after valuable help from Zooko O’Whielacronx (see below), 28 May 2010

TODO: Explain why we don’t (or do?) need an all-or-nothing transform. Consider borrowing Tahoe-LAFS’ combined ciphers technique.

This document (very) informally describes the Octavia protocol. Octavia is a decentralized network data storage system that aims to provide the security guarantees of confidentiality, integrity, and availability against internet threats.

Clients should experience high availability regardless of the availability or policy of any particular server. Servers and active and passive network attackers have no chance to adversely affect confidentiality and integrity without breaking the cryptographic functions. Depending on their situation relative to the client and the client’s servers, network attackers will have more or less opportunity to adversely affect availability and performance.

Octavia makes no attempt to defeat traffic analysis attacks; however, Octavia could be transported over an overlay or proxy network such as Tor to provide this feature.

Octavia provides its security guarantees regardless of transport. I designed Octavia with UDP in mind, but TCP (with message pipelining) also seems plausible (and necessary for use with Tor).

This document focuses primarily on the local and network data structures that enable the system to work; I do not describe the processes that operate on the structures because I hope they are mostly obvious. However, they may not be. :)

File and Directory Descriptors

Segment

Files are streams of bytes, broken up into arbitrarily-sized segments. (Segments will usually be of some consistent size, perhaps related to the link’s maximum transfer unit.) Segments are encrypted with a per-directory encryption-key before transmission.

Clients store their file segments on as many servers as they like. They can later retrieve segments of a given file from any combination of servers that have a copy of the segment. (In some deployments, it might even make sense to simply broadcast get-requests.)

ciphertext(plaintext) := {
      iv
      aes-128-cbc(encryption-key, iv, plaintext)
}

segment-id(ciphertext) := {
      size(ciphertext)
      sha-512-d(ciphertext)
}

File Descriptor

type := file
type := directory

file-descriptor := {
      type
      name
      mtime
      segment-id ...
}

Directory Descriptor

Directory descriptors are stored as regular files in the Octavia filesystem. Only the root descriptor is delivered out-of-band; once a client has its root descriptor, it can dynamically discover, retrieve, verify, and decrypt the descriptors of all file and directory children of the root.

Because each directory has a different encryption key, the minimal unit of read-sharing is the subtree. Any directory descriptor could be mounted as a root. For example, imagine that Zooko wants to share his archive of cryptography papers with the world. He would create a new subdirectory underneath his own Octavia root (or create a completely new Octavia root), populate it with documents and (optionally) subdirectories, and then publish it on his web site. Anyone can retrieve the descriptor, mount it in a directory on their system, and now browse the entire crypto document subtree. Zooko’s friends would download the root descriptor and mount it as follows (assuming an unfortunately still-hypothetical Octavia implementation called 8va):

$ wget https://zooko.com/crypto-papers.8va > crypto-papers.8va
$ 8va crypto-papers.8va ~/8va-mounts/crypto-papers

Note that the guarantee of confidentiality is lost in this case — the documents are publically readable — because the descriptor is public. If Zooko wanted to maintain confidentiality, he would need to ensure that only his closest friends were able to retrieve the descriptor (perhaps by password-protecting the web resource, or only sending it in PGP-encrypted email, or some similar mechanism).

It is similarly a good idea for users to back up copies of their own root descriptors and the signing keys they have shared with servers, using secure means such as PGP, SSH, or USB keys in a safe. A directory descriptor is necessary and sufficient to discover, verify, retrieve, and decrypt all children of the directory. The signing keys are necessary and sufficient to write and delete segments on servers. With nothing but their root descriptor and their signing keys, you can completely recover your Octavia data.

Once mounted, a user can create new, private versions of a tree. The new segments can only be stored on servers with which the client has registrations, which may or may not overlap with the servers Zooko’s client has its own registrations with.

$ mkdir ~/8va-mounts/crypto-papers/my-additions
$ cp some-cool-paper.pdf ~/8va-mounts/crypto-papers/my-additions

An Octavia directory tree is persistent (or functional, or immutable); that is, an update to a node creates a new version of that node without destroying the old version and without affecting other readers of the object. The update also incurs new versions of all parent nodes up to the root. (This is necessary so that the parent can describe the new version of the child.) Thus, all previous versions of all files and directories remain available, enabling a “snapshot” feature. (However, a particular server’s garbage collection, quota, or liveness policies may result in the unavailability of some segments. Server policy may be influenced by delete-requests.)

Each directory descriptor contains a list of servers that are known to have stored segments of its immediate children. Segment retrieval is unauthenticated.

To store new segments on a server, clients must have a pre-arranged relationship (a registration) with the server. Clients and servers authenticate each other by verifying symmetric signatures made with the signing key. Segment storage and deletion are authenticated.

server := hostname:port
server := ip-address:port

directory-descriptor := {
      encryption-key
      mtime
      version
      server ...
      file-descriptor ...
}

Protocol Messages

All currently-defined protocol messages concern the transfer and status of segments. Support protocols, such as for notifying other clients about updates to shared directories, registering clients with servers, and deliverying root directory descriptors, are not yet specified.

The protocol is largely stateless; the client needs to maintain a table of pending put and delete requests (identified by their nonce or transaction-id).

Although it may seem that clients need to maintain a table of pending get-requests, in fact a get-response identifies itself. Because the client must verify the response data anyway (by recalculating the segment-id given the data in hand), it thereby learns what request the response is for. Clients simply ignore redundant or damaged responses. (In fact both clients and servers ignore any message they don’t like, whatever the reason.) Clients should retry requests (possibly to different servers) in the event of unsatisfactory responses, or if no response arrives after a timeout.

Zooko is correct that have-{request,response} seem unnecessary and bloaty. And there is no guarantee that servers will honor delete-requests, either. The first draft of Octavia did not include these four message types, but I added them later as a premature hinting mechanism for hypothetical optimizations. Implementors MAY laugh them off.

type := get-request
type := get-response
type := put-request
type := put-response
type := have-request
type := have-response
type := delete-request
type := delete-response

key-id := sha-512-d(signing-key)

signature(data) := hmac(sha-512, signing-key, data)

get-request := {
      type
      protocol-version
      segment-id
}

get-response := {
      type
      protocol-version
      size
      ciphertext
}

put-request := {
      type
      protocol-version
      nonce
      key-id
      signature(nonce || key-id || size || ciphertext)
      size
      ciphertext
}

put-response := {
      type
      protocol-version
      nonce
      signature(nonce || size || ciphertext)
}

have-request := {
      type
      protocol-version
      segment-id
}

have-response := {
      type
      protocol-version
}

delete-request := {
      type
      protocol-version
      segment-id
      nonce
      key-id
      signature(nonce || key-id || segment-id)
}

delete-response := {
      type
      protocol-version
      nonce
      signature(nonce || segment-id)
}

Zooko’s Comments

These are in roughly descending order of importance.

CC Attribution-ShareAlike