Browse Source

Move reusable part of versioning system to dedicated library

Joeri Exelmans 3 years ago
commit
57399aeee2
7 changed files with 599 additions and 0 deletions
  1. 1 0
      .gitignore
  2. 42 0
      README.md
  3. 256 0
      doh.js
  4. BIN
      homer.jpg
  5. 27 0
      package-lock.json
  6. 5 0
      package.json
  7. 268 0
      test_doh.js

+ 1 - 0
.gitignore

@@ -0,0 +1 @@
+node_modules/

+ 42 - 0
README.md

@@ -0,0 +1,42 @@
+
+# Dependency-aware Operation History
+... or **Doh** for short (pronounced "dough" or "D'OH!" - see illustration below) is a **distributed, operation-based, dependency-aware version control system (VCS)**.
+![D'oh!](homer.jpg)
+
+## Goals
+The intention of Doh was to create an operation-based versioning system, not for "text", but for graphs.
+
+### Why graphs?
+Because they are a much more suitable foundation for reasoning about (visual) models (e.g. Statecharts), both at the concrete and abstract syntax levels, which is our main goal. Furthermore, "text" can be represented as a graph as well, more precisely as a linked list of lines, words, characters or "hunks", which is how some CRDTs do it.
+
+### Why operation-based?
+There are a number of reasons. In no particular order:
+* A history of operations contains more information than a history of snapshots. This information can be used by algorithms for better diffing, merging, conflict detection and resolution.
+* Lightweight and simple: Edit operations can be described with relatively little information. In contrast to state-based VCS, diffs between versions (needed for version comparison, merging and compression purposes) do not have to be computed; they are already known.
+* Many editors already log users' edit operations, disguised under the functionality of "undo & redo". Sadly, most of these editors can only persist a single snapshot of their state when "saving": the undo history is lost after a restart. A side-effect of operation-based versioning is a persistent undo history.
+* The only efficient and reliable way to enable "synchronous collaboration" (i.e. everyone immediately sees everyone's changes, like in Google Docs), is to somehow serialize, broadcast and persist users' edit operations. Moreover, synchronous collaboration only needs to be implemented once, in the versioning system instead of in the editor. Any editor integrating with such a versioning system will get synchronous collaboration, "for free".
+
+The main drawback of the operation-based approach, is the repeated, ad-hoc and sometimes complex effort of editor integration. Pragmatism is our enemy here, and my hope is that at some point, some operation-based VCS with enough traction will define an open (e.g. socket-based) API, to be implemented by developers of editors, who are best positioned to do the VCS integration.
+
+## Non-exhaustive list of things that inspired me
+"Nothing is original." - Jim Jarmusch
+* Git (Linus Torvalds)
+* CRDTs (??)
+* The ModelVerse (Yentl Van Tendeloo and Hans Vangheluwe): graphs and a primitive set of graph operations are the most generic way of storing and transforming "models"
+* The paper "Enhancing Collaborative Modeling" (Jakob Pietron) which sparked my interest in model versioning
+
+## Comparison with Git
+The easiest way (for me) to explain Doh, is to compare it with Git. Back when I conceived Doh, Git was the versioning system I was most familiar with, and also found it beautiful from a theoretical perspective. 
+| |Git|Doh|
+|--|--|--|
+|Distributed|Yes|Yes|
+|History forms a...|Directed acyclic graph|Directed acyclic graph|
+|Supported collaboration modes|Asynchronous|Synchronous & asynchronous|
+|What is being versioned?|A filesystem hierarchy, i.e. directories and files, mostly containing "text"|A key → value mapping|
+|How are versions created?|Manually with the "commit" command. (Works with any editor that can save its state to a file.)|Automatically, for every edit operation. (Therefore requires non-trivial editor integration.)|
+|What is recorded with every version?|A snapshot of a filesystem hierarchy|A set of key → value assignments|
+|How are versions linked?|"parent" relation: the immediate previous version(s) in logical time|"dependencies": the ancestor version(s) that are at least partially overwritten
+|HEAD points to a|Single version|Set of independent, non-conflicting operations (= set of versions)|
+|What's the result of a merge?|A new version, with as parents the merged versions|An update to HEAD, being the union of the merged HEADs
+|Conflict resolution|Manually specify what the merged version must look like|Destructive: manually or randomly pick an operation to be excluded from HEAD. (The excluded operation becomes an abandoned branch.)|
+|Version IDs|Content-addressed (SHA-1)|GUIDs (I want to change this to content-addressed at some point)

+ 256 - 0
doh.js

@@ -0,0 +1,256 @@
+"use strict";
+
+const { v4: uuidv4 } = require("uuid");
+
+class Operation {
+  constructor(id, detail) {
+    this.id = id;
+    this.detail = detail;
+  }
+  // Basically replaces JS references by IDs.
+  // Result can be JSON'd with constant time+space complexity. Useful for sharing an edit over the network.
+  serialize(op) {
+    const self = this; // workaround
+    return {
+      id: this.id,
+      detail: Object.fromEntries(
+        (function*() {
+          for (const [key, {value, parent, depth}] of self.detail.entries()) {
+            yield [key, {
+              value,
+              parentId: parent.id,
+              depth,
+            }];
+          }
+        })()),
+    }
+  }
+}
+
+class Context {
+  constructor(fetchCallback) {
+    // Must be a function taking a single 'id' parameter, returning a Promise resolving to the serialized operation with the given id.
+    this.fetchCallback = fetchCallback;
+
+    // "Global" stuff. Operations have GUIDs but can also be shared between Histories. For instance, the 'initial' operation is the common root of all model histories. We could have put these things in a global variable, but that would make it more difficult to mock 'remoteness' (separate contexts) in tests.
+    this.initialOp = new Operation("0", new Map()); // The parent of all parentless Operations. Root of all histories.
+    this.ops = new Map(); // contains all pending or resolved operation-requests; mapping from operation-id to Promise resolving to Operation.
+    this.ops.set(this.initialOp.id, Promise.resolve(this.initialOp));
+  }
+
+  // Get a promise resolving to the Operation with given ID. Fetches the operation (and recursively its dependencies) if necessary. Resolves when the operation and all its dependencies are present. Idempotent.
+  requestOperation(id) {
+    let promise = this.ops.get(id);
+    if (promise === undefined) {
+      promise = this.fetchCallback(id).then(serialized => this._awaitParents(serialized));
+      this.ops.set(id, promise);
+    }
+    return promise;
+  }
+
+  // Similar to requestOperation, but instead the argument is an already fetched/received operation. Missing dependencies are (recursively) fetched, if necessary. Resolves when the operation and all its dependencies are present. Idempotent.
+  receiveOperation(serialized) {
+    let promise = this.ops.get(serialized.id);
+    if (promise === undefined) {
+      promise = this._awaitParents(serialized);
+      this.ops.set(serialized.id, promise);
+    }
+    return promise;
+  }
+
+  // Internal function. Do not use directly.
+  async _awaitParents({id, detail}) {
+    const dependencies = Object.entries(detail).map(async ([key, {value, parentId, depth}]) => {
+      return [key, {
+        value,
+        parent: await this.requestOperation(parentId),
+        depth,
+      }];
+    });
+    return new Operation(id, new Map(await Promise.all(dependencies)));
+  }
+}
+
+class History {
+  constructor(context, setState, resolve) {
+    this.context = context;
+
+    // callbacks
+    this.setState = setState;
+    this.resolve = resolve;
+
+    this.heads = new Map(); // HEAD ptrs; mapping from key to Operation
+
+    this.ops = new Map(); // Operations (winning and losing) that happened within this History.
+    this.ops.set(context.initialOp.id, context.initialOp);
+
+    this.childrenMapping = new Map(); // mapping from operation to object mapping key to current winning child.
+  }
+
+  _getHead(key) {
+    const op = this.heads.get(key);
+    if (op !== undefined) {
+      return {
+        op,
+        depth: op.detail.get(key).depth,
+      };
+    };
+    return {
+      op: this.context.initialOp,
+      depth: 0,
+    };
+  }
+
+  _update_head(op) {
+    for (const [key, {value}] of op.detail.entries()) {
+      this.heads.set(key, op);
+    }
+  }
+
+  _update_state(op) {
+    for (const [key, {value}] of op.detail.entries()) {
+      this.setState(key, value);
+    }
+  }
+
+  _setChild(parent, key, child) {
+    let childMap = this.childrenMapping.get(parent);
+    if (childMap === undefined) {
+      childMap = {};
+      this.childrenMapping.set(parent, childMap);
+    }
+    childMap[key] = child;
+  }
+
+  _getChild(parent, key) {
+    let childMap = this.childrenMapping.get(parent);
+    if (childMap === undefined) return;
+    return childMap[key];
+  }
+
+  // To be called when a new user operation has happened locally.
+  // The new operation advances HEADs.
+  new(v, updateState=true) {
+    const newId = uuidv4();
+    const detail = new Map(Object.entries(v).map(([key,value]) => {
+      const {op: parent, depth} = this._getHead(key);
+      return [key, {
+        value,
+        parent,
+        depth: depth + 1,
+      }];
+    }));
+    const newOp = new Operation(newId, detail);
+    for (const [key, {parent}] of detail.entries()) {
+      this._setChild(parent, key, newOp);
+    }
+    this._update_head(newOp);
+    if (updateState) {
+      this._update_state(newOp);
+    }
+
+    this.context.ops.set(newId, Promise.resolve(newOp));
+    this.ops.set(newId, newOp);
+
+    return newOp;
+  }
+
+  // Idempotent.
+  autoMerge(op) {
+    if (this.ops.has(op.id)) {
+      // Already merged -> skip
+      // console.log('skip (already merged)', op.id)
+      return;
+    }
+
+    let exec = true;
+    for (const [key, {parent}] of op.detail.entries()) {
+      if (!this.ops.has(parent.id)) {
+        // Update this History with operation's dependencies first
+        this.autoMerge(parent);
+      }
+
+      // Check if there's a concurrent sibling with whom there is a conflict
+      const sibling = this._getChild(parent, key);
+      if (sibling) {
+        // Conflict
+        if (this.resolve(op, sibling)) {
+          // console.log("conflict: op wins")
+          const visited = new Set();
+          const rollback = op => {
+            visited.add(op); // Children form a DAG, with possible 'diamond' shapes -> prevent same operation from being visited more than once.
+            for (const [key, {parent}] of op.detail.entries()) {
+              // recurse, child-first
+              const child = this._getChild(op, key);
+              if (child && !visited.has(child)) {
+                // (DFS) recursion
+                rollback(child);
+              }
+              // rollback
+              if (parent === this.context.initialOp) {
+                // Invariant: HEADs never contains initialOp
+                this.heads.delete(key);
+                this.setState(key, undefined);
+              } else {
+                this.heads.set(key, parent);
+                this.setState(key, parent.detail.get(key).value);
+              }
+            }
+          };
+          // Received operation wins conflict - state must be rolled back before executing it
+          rollback(sibling);
+        } else {
+          // Received operation loses conflict - nothing to be done
+          // console.log("conflict: op loses")
+          exec = false;
+          continue;
+        }
+      } else {
+        // console.log('no conflict')
+      }
+      // won (or no conflict):
+      this._setChild(parent, key, op);
+      if (parent !== this._getHead(key).op) {
+        // only execute received operation if it advances HEAD
+        exec = false;
+      }
+    }
+
+    if (exec) {
+      this._update_head(op);
+      this._update_state(op);
+    }
+
+    this.ops.set(op.id, op);
+  }
+
+  // Shorthand
+  async receiveAndMerge(serializedOp) {
+    const op = await this.context.receiveOperation(serializedOp);
+    this.autoMerge(op);
+    return op;
+  }
+
+  // Get operations in history in a sequence, such that any operation's dependencies precede it in the list. To reproduce the state of this History, operations can be executed in the returned order (front to back), and are guaranteed to not give conflicts.
+  getOpsSequence() {
+    const added = new Set([this.context.initialOp]);
+    const visiting = new Set();
+    const seq = [];
+    const visit = op => {
+      if (!added.has(op)) {
+        visiting.add(op);
+        for (const [key, {parent}] of op.detail.entries()) {
+          visit(parent);
+        }
+        seq.push(op);
+        added.add(op);
+      }
+    }
+    for (const op of this.heads.values()) {
+      visit(op);
+    }
+    return seq;
+  }
+}
+
+module.exports = { Context, History, uuidv4 };

BIN
homer.jpg


+ 27 - 0
package-lock.json

@@ -0,0 +1,27 @@
+{
+  "name": "doh",
+  "lockfileVersion": 2,
+  "requires": true,
+  "packages": {
+    "": {
+      "dependencies": {
+        "uuid": "^8.3.2"
+      }
+    },
+    "node_modules/uuid": {
+      "version": "8.3.2",
+      "resolved": "https://registry.npmjs.org/uuid/-/uuid-8.3.2.tgz",
+      "integrity": "sha512-+NYs2QeMWy+GWFOEm9xnn6HCDp0l7QBD7ml8zLUmJ+93Q5NF0NocErnwkTkXVFNiX3/fpC6afS8Dhb/gz7R7eg==",
+      "bin": {
+        "uuid": "dist/bin/uuid"
+      }
+    }
+  },
+  "dependencies": {
+    "uuid": {
+      "version": "8.3.2",
+      "resolved": "https://registry.npmjs.org/uuid/-/uuid-8.3.2.tgz",
+      "integrity": "sha512-+NYs2QeMWy+GWFOEm9xnn6HCDp0l7QBD7ml8zLUmJ+93Q5NF0NocErnwkTkXVFNiX3/fpC6afS8Dhb/gz7R7eg=="
+    }
+  }
+}

+ 5 - 0
package.json

@@ -0,0 +1,5 @@
+{
+  "dependencies": {
+    "uuid": "^8.3.2"
+  }
+}

+ 268 - 0
test_doh.js

@@ -0,0 +1,268 @@
+"use strict";
+
+// Should work in browser but only tested with NodeJS v14.16.1
+
+const { Context, History } = require("./History.js");
+
+// From: https://stackoverflow.com/a/43260158
+// returns all the permutations of a given array
+function perm(xs) {
+  let ret = [];
+
+  for (let i = 0; i < xs.length; i = i + 1) {
+    let rest = perm(xs.slice(0, i).concat(xs.slice(i + 1)));
+
+    if(!rest.length) {
+      ret.push([xs[i]])
+    } else {
+      for(let j = 0; j < rest.length; j = j + 1) {
+        ret.push([xs[i]].concat(rest[j]))
+      }
+    }
+  }
+  return ret;
+}
+
+// Reinventing the wheel:
+
+class AssertionError extends Error {
+  constructor(msg) {
+    super(msg);
+  }
+}
+function assert(expr, msg) {
+  if (!expr) {
+    // console.log(...arguments);
+    throw new AssertionError(msg);
+  }
+}
+
+function deepEqual(val1, val2) {
+  if (typeof(val1) !== typeof(val2)) return false;
+
+  if ((val1 === null) !== (val2 === null)) return false;
+
+  switch (typeof(val1)) {
+    case 'object':
+      for (var p in val2) {
+        if (val1[p] === undefined) return false;
+      }
+      for (var p in val1) {
+        if (!deepEqual(val1[p], val2[p])) return false;
+      }
+      return true;
+    case 'array':
+      if (val1.length !== val2.length) return false;
+      for (let i=0; i<val1.length; ++i)
+        if (!deepEqual(val1[i], val2[i])) return false;
+      return true;
+    default:
+      return val1 === val2;
+  }
+}
+
+
+// Test:
+
+
+async function runTest(verbose) {
+
+  function info() {
+    if (verbose) console.log(...arguments);
+  }
+
+  function resolve(op1, op2) {
+    // info("resolve...", props1, props2)
+    if (op1.detail.get('geometry').value !== op2.detail.get('geometry').value) {
+      return op1.detail.get('geometry').value > op2.detail.get('geometry').value;
+    }
+    return op1.detail.get('style').value > op2.detail.get('style').value;
+  }
+
+  function createAppState(label) {
+    const state = {};
+
+    function setState(prop, val) {
+      state[prop] = val;
+      info("  ", label, "state =", state);
+    }
+    
+    return {setState, state};
+  }
+
+  function createHistory(label, context) {
+    const {setState, state} = createAppState(label);
+    // const context = new Context(requestCallback); // simulate 'remoteness' by creating a new context for every History.
+
+    const history = new History(context, setState, resolve);
+    return {history, state};
+  }
+
+  {
+    info("\nTest case: Add local operations (no concurrency) in random order.\n")
+
+    const local = new Context();
+
+    info("insertions...")
+    const {history: expectedHistory, state: expectedState} = createHistory("expected", local);
+    const insertions = [
+      /* 0: */ expectedHistory.new({geometry: 1, style: 1}),
+      /* 1: */ expectedHistory.new({geometry: 2}), // depends on 0
+      /* 2: */ expectedHistory.new({style: 2}), // depends on 0
+    ];
+
+    const permutations = perm(insertions);
+    for (const insertionOrder of permutations) {
+      info("permutation...")
+      const {history: actualHistory, state: actualState} = createHistory("actual", local);
+      // Sequential
+      for (const op of insertionOrder) {
+        actualHistory.autoMerge(op);
+      }
+      console.log("expected:", expectedState, "actual:", actualState)
+      assert(deepEqual(expectedState, actualState));
+    }
+  }
+
+  function noFetch() {
+    throw new AssertionError("Did not expect fetch");
+  }
+
+  {
+    info("\nTest case: Multi-user without conflict\n")
+
+    // Local and remote are just names for our histories.
+    const localContext = new Context(noFetch);
+    const remoteContext = new Context(noFetch);
+
+    const {history: localHistory,  state: localState } = createHistory("local", localContext);
+    const {history: remoteHistory, state: remoteState} = createHistory("remote", remoteContext);
+
+    const localOp1 = localHistory.new({geometry: 1});
+    await remoteHistory.receiveAndMerge(localOp1.serialize());
+
+    console.log("11")
+
+    const remoteOp2 = remoteHistory.new({geometry: 2}); // happens after (hence, overwrites) op1
+    await localHistory.receiveAndMerge(remoteOp2.serialize());
+
+    assert(deepEqual(localState, remoteState));
+  }
+
+  {
+    info("\nTest case: Concurrency with conflict\n")
+
+    const localContext = new Context(noFetch);
+    const remoteContext = new Context(noFetch);
+
+    const {history: localHistory, state: localState} = createHistory("local", localContext);
+    const {history: remoteHistory, state: remoteState} = createHistory("remote", remoteContext);
+
+    const localOp1 = localHistory.new({geometry: 1});
+    const remoteOp2 = remoteHistory.new({geometry: 2});
+
+    await localHistory.receiveAndMerge(remoteOp2.serialize());
+    await remoteHistory.receiveAndMerge(localOp1.serialize());
+
+    assert(deepEqual(localState, remoteState));
+  }
+
+  {
+    info("\nTest case: Concurrency with conflict (2)\n")
+
+    const localContext = new Context(noFetch);
+    const remoteContext = new Context(noFetch);
+
+    const {history: localHistory, state: localState} = createHistory("local", localContext);
+    const {history: remoteHistory, state: remoteState} = createHistory("remote", remoteContext);
+
+    info("localHistory insert...")
+    const localOp1 = localHistory.new({geometry: 1});
+    const localOp2 = localHistory.new({geometry: 4});
+
+    info("remoteHistory insert...")
+    const remoteOp3 = remoteHistory.new({geometry: 2});
+    const remoteOp4 = remoteHistory.new({geometry: 3});
+
+    info("localHistory receive...")
+    await localHistory.receiveAndMerge(remoteOp3.serialize()); // op3 wins from op1 -> op2 and op1 undone
+    await localHistory.receiveAndMerge(remoteOp4.serialize()); // buffered
+
+    info("remoteHistory receive...")
+    await remoteHistory.receiveAndMerge(((localOp1.serialize()))); // op1 loses from op3
+    await remoteHistory.receiveAndMerge(((localOp2.serialize()))); // no conflict
+
+    assert(deepEqual(localState, remoteState));
+  }
+
+  {
+    info("\nTest case: Fetch\n")
+
+    const fetched = [];
+
+    async function fetchFromLocal(id) {
+      // console.log("fetching", id)
+      fetched.push(id);
+      return localContext.ops.get(id).then(op => op.serialize());
+    }
+
+    const localContext = new Context(noFetch);
+    const remoteContext = new Context(fetchFromLocal);
+
+    const {history: localHistory, state: localState} = createHistory("local", localContext);
+
+    const localOps = [
+      localHistory.new({geometry:1}),                       // [0] (no deps)
+      localHistory.new({geometry:2, style: 3}),             // [1], depends on [0]
+      localHistory.new({style: 4}),                         // [2], depends on [1]
+      localHistory.new({geometry: 5, style: 6, parent: 7}), // [3], depends on [1], [2]
+      localHistory.new({parent: 8}),                        // [4], depends on [3]
+      localHistory.new({terminal: 9}),                      // [5] (no deps)
+    ];
+
+    // when given [2], should fetch [1], then [0]
+    await remoteContext.receiveOperation(localOps[2].serialize());
+    assert(deepEqual(fetched, [localOps[1].id, localOps[0].id]));
+
+    // when given [5], should not fetch anything
+    await remoteContext.receiveOperation(localOps[5].serialize());
+    assert(deepEqual(fetched, [localOps[1].id, localOps[0].id]));
+
+    // when given [4], should fetch [3]. (already have [0-2] from previous step)
+    await remoteContext.receiveOperation(localOps[4].serialize());
+    assert(deepEqual(fetched, [localOps[1].id, localOps[0].id, localOps[3].id]));
+  }
+
+  {
+    info("\nTest case: Get as sequence\n")
+
+    const {history} = createHistory("local", new Context(noFetch));
+
+    const ops = [
+      history.new({x:1, y:1}), // 0
+      history.new({x:2}),      // 1 depends on 0
+      history.new({y:2}),      // 2 depends on 0
+      history.new({x:3, z:3}), // 3 depends on 1
+      history.new({a:4}),      // 4
+      history.new({a:5}),      // 5 depends on 4
+      history.new({a:6, z:6}), // 6 depends on 5, 3
+    ];
+
+    const seq = history.getOpsSequence();
+    console.log(seq.map(op => op.serialize()));
+
+    assert(seq.indexOf(ops[1]) > seq.indexOf(0));
+    assert(seq.indexOf(ops[2]) > seq.indexOf(0));
+    assert(seq.indexOf(ops[3]) > seq.indexOf(1));
+    assert(seq.indexOf(ops[5]) > seq.indexOf(4));
+    assert(seq.indexOf(ops[6]) > seq.indexOf(5));
+    assert(seq.indexOf(ops[6]) > seq.indexOf(3));
+  }
+}
+
+runTest(/* verbose: */ true).then(() => {
+  console.log("OK");
+}, err => {
+  console.log(err);
+  process.exit(1);
+});