-
Notifications
You must be signed in to change notification settings - Fork 126
Conversation
…e uploads (configured via reliabilityThreshold) of the same file to have a matching content hash before the file will be cached and served.
- Added requireUniqueClient feature to reliability_manager, where transactions for the same version are forced to alternate between different clients to increment the reliability index.
…ences to a new FILE_TYPE constant. This also has the nice side effect of making it simpler to add new types, if desired in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first pass at this looked pretty solid. Have a few recommendations regarding optimization of multiple async calls.
Was this helpful? Let us know!
lib/cache/reliability_manager.js
Outdated
} | ||
} | ||
|
||
cm.listenFor("_updateReliabilityFactorForVersion", async (data) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using async
function with no await
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/cache/cache_fs.js
Outdated
@@ -264,6 +289,39 @@ class PutTransactionFS extends PutTransaction { | |||
|
|||
return new Promise(resolve => stream.on('open', () => resolve(stream))); | |||
} | |||
|
|||
async writeFilesToPath(filePath) { | |||
const paths = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could utilize Promise.all
for a more parallel operation with less upfront allocation and moving parts:
await fs.ensureDir(filePath);
return Promise.all(this._files.map(async f => {
const dest = `${path.join(filePath, f.byteHash.toString('hex'))}.${f.type}`;
await fs.copyFile(f.file, dest);
return dest;
}));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/cache/cache_fs.js
Outdated
|
||
await transaction.finalize(); | ||
return Promise.all(transaction.files.map(moveFile)); | ||
for(const file of transaction.files) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to verify that you intend for this cache to be written to serially. Using for/of
with a blocked await
will perform these actions serially, while using Promise.all
would give you an eager parallel, provided your operation could support it:
await Promise.all(
transaction.files.map(async file => {
try {
const filePath = await self._writeFileToCache(file.type, transaction.guid, transaction.hash, file.file);
helpers.log(consts.LOG_TEST, `Added file to cache: ${file.size} ${filePath}`);
} catch (err) {
helpers.log(consts.LOG_DBG, err);
}
})
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/cache/cache_fs.js
Outdated
@@ -153,10 +159,20 @@ class CacheFS extends CacheBase { | |||
self.emit('cleanup_search_finish', progressData()); | |||
|
|||
for(const d of deleteItems) { | |||
self.emit('cleanup_delete_item', d.path); | |||
const guidHash = CacheFS._extractGuidAndHashFromFilepath(d.path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is an existing loop, but had the same question about serial vs. parallel async operations here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call ✅
lib/cache/cache_fs.js
Outdated
@@ -249,12 +274,12 @@ class PutTransactionFS extends PutTransaction { | |||
throw new Error("Invalid size for write stream"); | |||
} | |||
|
|||
if(type !== 'a' && type !== 'i' && type !== 'r') { | |||
if(Object.values(consts.FILE_TYPE).indexOf(type) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use includes
for a more succinct description of intent:
if(!Object.values(consts.FILE_TYPE).includes(type)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/cache/reliability_manager.js
Outdated
* @param {string} guidStr | ||
* @param {string} hashStr | ||
* @param {boolean} create | ||
* @returns {Object} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this also return null
since you do a !entry
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it can - I rely on that for tests. Accordingly I'll add null
as a return type.
}); | ||
|
||
it("should remove versions from the reliability manager, when in high reliability mode", async () => { | ||
const opts = Object.assign({}, cacheOpts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A random tip: your Node.js engines support object rest/spread natively, so you could take advantage of that in your codebase if you feel inclined:
const opts = { ...cacheOpts };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, cool .. I'll leave as is but this is a cool trick!
lib/cache/cache_fs.js
Outdated
@@ -237,6 +254,14 @@ class PutTransactionFS extends PutTransaction { | |||
return this._files; | |||
} | |||
|
|||
async invalidate() { | |||
await super.invalidate(); | |||
for(const f of this._files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should support a parallel operation with Promise.all
:
await super.invalidate();
await Promise.all(this._files.map(f => fs.unlink(f.file)));
this._files = [];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/cache/cache_base.js
Outdated
/** | ||
* | ||
*/ | ||
async _saveDb() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this function does not do any await
ing, so you can technically remove the async
keyword.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
lib/cache/cache_ram.js
Outdated
@@ -396,38 +372,81 @@ class PutTransactionRAM extends PutTransaction { | |||
throw new Error("Invalid size for write stream"); | |||
} | |||
|
|||
if(type !== 'a' && type !== 'i' && type !== 'r') { | |||
if(Object.values(consts.FILE_TYPE).indexOf(type) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using includes
so the intent is clearer:
if(!Object.values(consts.FILE_TYPE).includes(type)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
…e some parallelization of async operations that were previously done in serial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple questions about the multiClient option, but looks good everywhere else!
lib/cache/reliability_manager.js
Outdated
const entry = this.getEntry(params.guidStr, params.hashStr, true); | ||
if(!entry.versionHash) { | ||
entry.versionHash = params.versionHashStr; | ||
entry.clientId = params.clientId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the entry's clientId is set when it's first created, it looks like it will always treat the first transaction as a duplicate when multiClient is enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, will look closer at this.
return entry; | ||
} | ||
|
||
entry.clientId = params.clientId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With two clients "A" and "B" if transactions come in the order A->B->A->B, it would end up with a reliabilityFactor of 4, but transactions in the order A->A->B->B, would have a reliabilityFactor of 2. Seems a little odd that the same transition set can end up with differing reliabilityFactors if the timing changes a bit. Is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intended - I don't want to actually track unique clients. In practice, I believe 99% of the time the default threshold of 2 will be used, so we would hardly gain from the added complexity of keeping track of all unique clients.
}); | ||
}); | ||
|
||
describe("multiClient", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is testing transactions from 3 different client addresses - "A", "B", and "".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch - this test as written was actually masking the bug you found where the clientId was being set on a new entry.
…n incorrectly that would have caught the bug to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added new feature, High Reliablity mode. For details see the documentation added to the README.md file in this PR.
A few supporting refactors are included here: