Skip to content
This repository was archived by the owner on Jan 9, 2023. It is now read-only.

[pr] Feature/high reliability #36

Merged
merged 15 commits into from
Jun 7, 2018
Merged

Conversation

stephen-palmer
Copy link
Contributor

Added new feature, High Reliablity mode. For details see the documentation added to the README.md file in this PR.

A few supporting refactors are included here:

  • Moved the loki.js database handling to the base Cache class (from CacheRAM), so now any cache module can use the in-memory DB for storage. This was necessary to give CacheFS a facility for tracking reliability factors for versions
  • Added a small utility class to handle cluster messaging, to facilitate worker nodes reading and writing to the in-memory database on the master node. This messaging class doesn't have test coverage yet - I haven't cracked the nut on properly testing forked clients with Mocha (and getting code coverage with Istanbul).
  • Consolidated all of the string literals for file types into constants

…e uploads (configured via reliabilityThreshold) of the same file to have a matching content hash before the file will be cached and served.
- Added requireUniqueClient feature to reliability_manager, where transactions for the same version are forced to alternate between different clients to increment the reliability index.
…ences to a new FILE_TYPE constant. This also has the nice side effect of making it simpler to add new types, if desired in the future.
@coveralls
Copy link

coveralls commented May 15, 2018

Coverage Status

Coverage decreased (-1.8%) to 90.353% when pulling a461810 on feature/high-reliability into 5c662b8 on master.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first pass at this looked pretty solid. Have a few recommendations regarding optimization of multiple async calls.


Was this helpful? Let us know!

}
}

cm.listenFor("_updateReliabilityFactorForVersion", async (data) => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using async function with no await.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -264,6 +289,39 @@ class PutTransactionFS extends PutTransaction {

return new Promise(resolve => stream.on('open', () => resolve(stream)));
}

async writeFilesToPath(filePath) {
const paths = [];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could utilize Promise.all for a more parallel operation with less upfront allocation and moving parts:

await fs.ensureDir(filePath);

return Promise.all(this._files.map(async f => {
  const dest = `${path.join(filePath, f.byteHash.toString('hex'))}.${f.type}`;
  await fs.copyFile(f.file, dest);
  return dest;
}));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


await transaction.finalize();
return Promise.all(transaction.files.map(moveFile));
for(const file of transaction.files) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to verify that you intend for this cache to be written to serially. Using for/of with a blocked await will perform these actions serially, while using Promise.all would give you an eager parallel, provided your operation could support it:

await Promise.all(
  transaction.files.map(async file => {
    try {
      const filePath = await self._writeFileToCache(file.type, transaction.guid, transaction.hash, file.file);

      helpers.log(consts.LOG_TEST, `Added file to cache: ${file.size} ${filePath}`);
    } catch (err) {
      helpers.log(consts.LOG_DBG, err);
    }
  })
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -153,10 +159,20 @@ class CacheFS extends CacheBase {
self.emit('cleanup_search_finish', progressData());

for(const d of deleteItems) {
self.emit('cleanup_delete_item', d.path);
const guidHash = CacheFS._extractGuidAndHashFromFilepath(d.path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an existing loop, but had the same question about serial vs. parallel async operations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call ✅

@@ -249,12 +274,12 @@ class PutTransactionFS extends PutTransaction {
throw new Error("Invalid size for write stream");
}

if(type !== 'a' && type !== 'i' && type !== 'r') {
if(Object.values(consts.FILE_TYPE).indexOf(type) < 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use includes for a more succinct description of intent:

if(!Object.values(consts.FILE_TYPE).includes(type)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @param {string} guidStr
* @param {string} hashStr
* @param {boolean} create
* @returns {Object}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also return null since you do a !entry check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it can - I rely on that for tests. Accordingly I'll add null as a return type.

});

it("should remove versions from the reliability manager, when in high reliability mode", async () => {
const opts = Object.assign({}, cacheOpts);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A random tip: your Node.js engines support object rest/spread natively, so you could take advantage of that in your codebase if you feel inclined:

const opts = { ...cacheOpts };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, cool .. I'll leave as is but this is a cool trick!

@@ -237,6 +254,14 @@ class PutTransactionFS extends PutTransaction {
return this._files;
}

async invalidate() {
await super.invalidate();
for(const f of this._files)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should support a parallel operation with Promise.all:

await super.invalidate();
await Promise.all(this._files.map(f => fs.unlink(f.file)));

this._files = [];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
*
*/
async _saveDb() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this function does not do any awaiting, so you can technically remove the async keyword.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -396,38 +372,81 @@ class PutTransactionRAM extends PutTransaction {
throw new Error("Invalid size for write stream");
}

if(type !== 'a' && type !== 'i' && type !== 'r') {
if(Object.values(consts.FILE_TYPE).indexOf(type) < 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using includes so the intent is clearer:

if(!Object.values(consts.FILE_TYPE).includes(type)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…e some parallelization of async operations that were previously done in serial.
Copy link
Contributor

@BrettKercher BrettKercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions about the multiClient option, but looks good everywhere else!

const entry = this.getEntry(params.guidStr, params.hashStr, true);
if(!entry.versionHash) {
entry.versionHash = params.versionHashStr;
entry.clientId = params.clientId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entry's clientId is set when it's first created, it looks like it will always treat the first transaction as a duplicate when multiClient is enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, will look closer at this.

return entry;
}

entry.clientId = params.clientId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With two clients "A" and "B" if transactions come in the order A->B->A->B, it would end up with a reliabilityFactor of 4, but transactions in the order A->A->B->B, would have a reliabilityFactor of 2. Seems a little odd that the same transition set can end up with differing reliabilityFactors if the timing changes a bit. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intended - I don't want to actually track unique clients. In practice, I believe 99% of the time the default threshold of 2 will be used, so we would hardly gain from the added complexity of keeping track of all unique clients.

});
});

describe("multiClient", () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is testing transactions from 3 different client addresses - "A", "B", and "".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - this test as written was actually masking the bug you found where the clientId was being set on a new entry.

@stephen-palmer stephen-palmer merged commit bf012ab into master Jun 7, 2018
@stephen-palmer stephen-palmer deleted the feature/high-reliability branch December 10, 2018 19:06
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Message
⚠️ Due to its size, this pull request will likely have a little longer turnaround time and will probably require multiple passes from our reviewers.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants