161 Commits

Author SHA1 Message Date
A.J. Beamon
438bc636d5 Rename max_machine_failures_without_losing_X to max_zone_failures_without_losing_X in status. 2019-07-30 14:02:31 -07:00
Evan Tschannen
b303ab4e6c fix: DR agents need to be clients because their failure monitoring information needs to come from two different cluster controllers 2019-07-23 19:24:07 -07:00
Vishesh Yadav
2f29b2c3d1 simulator: Just do a wait() in setupAndRun to avoid destruction
It get us out of the ACTOR, never clearing the systemActors, and let
simulator call exit().
2019-07-09 14:55:20 -07:00
Vishesh Yadav
78a1b2defc simulator: Destroy each process individually in its context
When simulation ends, all the actors are cancelled, and the
destructions which rely on `globals` may not have access to right
globals (instead of the default simulator process globals). This
patch, calls destroy on each process individually after we context
switch to that process so that the globals acceses in destructor are
its own.

This issue arised when trying to get `Peer::peerReferences` in
NetNotifiedQueue, resulting in decrementing the reference count of
peers in FlowTransport object of '0.0.0.0'.
2019-07-09 14:24:16 -07:00
Vishesh Yadav
eabc610daa
Merge pull request #1813 from alexmiller-apple/log-version-4
Add a TLogVersion::V4
2019-07-09 08:42:20 -07:00
Alex Miller
d2ef84a8f9 Add a TLogVersion::V4
And refactor some code to make adding more TLogVersions easier.
2019-07-08 22:22:45 -07:00
Alex Miller
7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
mpilman
6ea75713cb Overall framework and first buggify entries 2019-06-16 09:09:09 -07:00
Vishesh Yadav
6b4d30c3ae failmon: Identify client vs server when starting failure monitoring client 2019-06-09 00:43:12 -07:00
sramamoorthy
61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy
17ecba8313 trace cleanup and other indentation changes 2019-05-28 22:07:46 -07:00
sramamoorthy
898bed66c1 Allow only whitelisted binary path for exec op 2019-05-28 22:07:46 -07:00
sramamoorthy
a60145b9a1 Restore the cluster in single region configuration 2019-05-28 22:07:46 -07:00
A.J. Beamon
603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
mpilman
20c3f7f264 remove mixed-mode support 2019-05-13 14:15:23 -07:00
mpilman
42385c2f81 Fixed issues introduced during rebase 2019-05-13 14:15:23 -07:00
mpilman
9eeb48c43d Allow to turn on object serializer
This commit includes functionality to turn on
the object serializer for network communication.
This is done the following way:

- On incoming connections, a process will detect
  whether the client supports the object serializer
  and will only serialize responses with it, if it does
- On outgoing connections, the command line flag is used
  to determine whether the object serializer should be used
  to send data.

This way, a cluster can run in mixed mode. To upgrade one
can upgrade one process at a time and set the flag one process
at a time.

This is how this is tested on the simulator:
- The command line flag can take three options: on, off,
  and random.
- For off, the object serializer will never we used.
- For on, the object serializer will be always used.
- For random, the simulator will flip a coin for each
  process it starts up.
2019-05-13 14:15:22 -07:00
A.J. Beamon
5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Jingyu Zhou
6870e132b2
Merge branch 'master' into pprof 2019-04-19 14:06:44 -07:00
Andrew Noyes
6207d724f8 Fix all -Wunused-variable warnings 2019-04-15 18:13:00 -07:00
Jingyu Zhou
4b08042a88 Change memory profiling threshold to a flag 2019-04-05 16:33:51 -07:00
A.J. Beamon
614a599a04 Update fdbserver/SimulatedCluster.actor.cpp
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-05 13:12:19 -07:00
mpilman
1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
mpilman
c008e16c81 Defer formatting in traces to make them cheaper
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:

- TraceEvent::detail now takes a c-string instead of std::string for
  literals. This prevents unnecessary allocations if the trace is not
  going to be printed in the first place (for example for SevDebug).
  Before that `detail` expected a `std::string` as key, which mean that
  any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
  specialized for any type that needs to be printed. The actual
  formatting will be deferred to after the `enabled` check. This
  provides two benefits: (1) if a TraceEvent is disabled, we don't pay
  for the formatting and (2) TraceEvent can trace types that it doesn't
  know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
  passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
  calls, a call to detail will only introduce a if-branch which is much
  cheaper than a function call.
2019-04-05 13:12:19 -07:00
Balachandar Namasivayam
f9560e1abd Addressed Review Comments 2019-03-19 15:23:14 -07:00
Balachandar Namasivayam
5471725db5 Support config where the primary and remote DC's can be used as satellites. 2019-03-18 12:17:59 -07:00
Evan Tschannen
82d957e0bb
Merge pull request #1178 from vishesh/task/issue-963-IPv6
IPv6 Support
2019-03-05 17:14:16 -08:00
Vishesh Yadav
e93cd0ff21 Add some checks and comments to IPv6 changes #963 2019-03-04 14:12:45 -08:00
Vishesh Yadav
cc9ad0e202 net: Use IPv6 in simulation testing #963
25% times we will use IPv6 addresses
2019-03-04 14:12:45 -08:00
Alex Miller
71a794ccc3 Re-enable spill-by-reference testing. 2019-03-04 01:42:38 -08:00
Alex Miller
59df4ab39a Disable testing spill-by-reference.
There was a bad interaction between Spill-by-reference work (96f5c811)
and some concurrent piece of work that touched txsTag, which is causing
failures.

So let's disable this to get master back to a clean state while we debug why.
2019-02-28 00:16:37 -08:00
Evan Tschannen
8afb7fbb9d
Merge pull request #1160 from alexmiller-apple/tstlog-fork
Spill-By-Reference TLog Part 2: New and Old TLogServers co-exist harmoniously
2019-02-26 18:00:04 -08:00
Alex Miller
2af5025185 Don't bias log_spill in SimulatedCluster. 2019-02-26 17:58:59 -08:00
Alex Miller
d4fe9f905c Let log_spill/log_version default in config, and clean up serialization.
We don't need to abide by object serializer rules yet, and the minor
change to Simulation config lets us test config being the default at the
start of the test.
2019-02-26 17:14:41 -08:00
Alex Miller
2dc57568cb Change many things about log_version.
* log_version in the database (`/conf/log_version`) is now a hint that gets
  rounded to the nearest supported version.
* fdbcli and FDB enforce that only a valid log_version can be configured to
* TLogVersion is persisted in CoreTLogSet (and LogSet and TLogSet)
* Some comments here and there
* Add an assert on filename length to make sure KV-pairs in filename
  don't exceed a maximum length.
2019-02-26 16:47:04 -08:00
Evan Tschannen
b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Alex Miller
6d23eb2d1a Implement log_version.
This mega-commit introduces a new configuration setting, `log_version`,
that controls the TLog implementations and features that are available
within FDB, so that users can opt in to new features if they're willing
to sacrifice backwards compatibility.
2019-02-22 12:15:23 -08:00
Alex Miller
bf8bfb8137 Set log_spill in SimulationConfig.
Which also revealed that it needed to be added to the schema.
2019-02-19 22:30:15 -08:00
mpilman
999ea09bfd Use correct fwd decls in TesterInterface
Also TesterInterface.h -> TesterInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3f0fd2a20c Use fwd decls in WorkerInterface
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3a0f9839b9 Fix minor IDE build errors 2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b Use proper fwd decl in NativeAPI
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
mpilman
78dd80ea8a Proper fwd decl in BackupAgent
Also BackupAgent.h -> BackupAgent.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3cb2391b58 use proper fwd declarations in ManagementAPI
Also ManagementAPI.h -> ManagementAPI.actor.h
2019-02-19 15:16:59 -08:00
Vishesh Yadav
124a277a65 Remove coordinator printf in SimulatedCluster 2019-02-19 13:53:17 -08:00
Evan Tschannen
065a45e05f Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
562b315744 fix: The anti quorum cannot be more than half of the replication factor, or the log system will continue to accept commits when a recovery is impossible 2019-02-18 15:22:32 -08:00
Meng Xu
6d09ac483c Merge with master 2019-02-15 17:03:40 -08:00
Meng Xu
4790a609d5 TeamRemover: Fix bug in generating cluster config
The machine number must be no smaller than the replication factor
of tLogs and storage servers
2019-02-15 15:11:03 -08:00