I'm trying to come up with a peace of code that would fetch centralised cache shared across multiple threads/app instances. Callers might come in swarms. The data is a large set, reads during reloading are not desirable as they would likely yield incorrect/incomplete data. Let's say about once a day (not much more frequent than that) the cache needs to be reloaded, with two triggers: empty cache due to TTL or a business trigger that can not be scheduled ahead of time. Any caller carries enough information to tell if the cache is stale (O(1) check) but not while the cache is being written to - false negatives/positives are possible. Given that, here's what I managed to come up with:
public ImportantData fetchCache() throws InterruptedException { RCountDownLatch latch = getLatch(); RAtomicLong controlValue = getAtomicLong(); long controlValueBefore = controlValue.get(); //1 ImportantData result = readCache(); var someMetadata = readCacheMetadata(); if (!validate(result, someMetadata) || controlValueBefore != controlValue.get()) { //2 if ((controlValueBefore == controlValue.get()) && latch.trySetCount(1)) { //3 try { reloadCache(); controlValue.set(RandomUtils.nextLong()); //4 return readCache(); } finally { latch.countDown(); } } else { latch.await(TIMEOUT, TimeUnit.SECONDS); //5 return readCache(); } } return result; }
My reasoning is:
- If cache appears to be valid and had not been fiddled with during the read //2 is evaluated to false, return. Concurrent, no locks, only 2 extra reads from RAtomicLong
- Once the cache is stale the first thread to notice it passes //3
- Subsequent threads land at //5 because
latch.trySetCount(1)
can't be true untillatch.countDown()
- Once //4 is executed threads with the newer controlValue guaranteed to read fresh cache, threads with the older controlValue can't pass //3 anymore because
(controlValueBefore == controlValue.get()
is false - There is some time between //4 and
latch.countDown()
so in an unlucky event that some thread carrying the old value just passed the first part of //3 right before //4 happened - probably enough forlatch.trySetCount(1)
to still return false
Other than the last point I don't see any other major problems with this, however there's probably something I'm missing besides I'm not sure how the synchronisation tools provided by Redisson work and what kind of guarantees they can give. Take aside that there are probably more practical approaches architecturally (that I'll probably go with), thoughts on the algorithm? I feel like it might be one tweak away from being solid, or one observation away from being discarded completely.