In NativeAPI.actor.cpp, ::watch ACTOR will call cx->clearWatchMetadata()
when the connectionFileChanged event is triggered. After that, it will
create a new watch metadata for itself.
If there are multiple watches, each of them that receives the
cnnectionFileChanged will clear *all* watch data and create *one* watch
for itself. This does not makes sense. A watch should only clear the
metadata, and then create one, for itself.
cx->clearWatchMetadata() is only used there, thus removed.
In the following scenario a watch might be deleted by mistake:
Two watches over key A
Watch 1 Watch 2 Reference Count
| []
| Version 100, Value 10
T | Add reference count [100]
I | Metadata added
M | ...
E | Version 200, Value 20
| Add reference count [100, 200]
| Delete the old metadata
| *together with the reference count* <----- [1]
| []
| Trigger watch 1
| Update metadata
| Triggered
| Delete the reference count []
| ...
L | Version 200, Value 20
I | Add reference count [200]
N | Same metadata used
E | ...
| Watch 2 cancelled
| Reduce reference count []
| Delete the metadata
| (watchPromise removed, send broken_promise to listeners)
| broken_promise!
V
By *not* clear the reference count in [1], just remove 100, this problem
will be fixed, since the watchPromise will still have one reference and
not being removed until watch 1 get triggered/cancelled again.
Tested by running 100k correctness on branch 7.1. One irrelevant
failure occured which will be tracked differently.
In the case
1. A watch to key A is set, the watchValueMap ACTOR, noted as X, starts waiting.
2. All watches are cleared due to connection string change.
3. The watch to key A is restarted with watchValueMap ACTOR Y.
4. X receives the cancel exception, and tries to dereference the counter. This causes Y gets cancelled.
the reference count will cause watch prematurely terminate. Recording
the versions of each watch would help preventing this issue