Taking the pain

out of repairs


About The Last Pickle



Work with clients
to deliver and improve

Apache Cassandra based solutions


Based in
USA, New Zealand, Australia, France

Repair

Pain

Strategies

Incremental Repair











Repairs is the way to ensure
consistent data on disk










  • Hinted Hand-off
  • Read Repairs
  • Digest Mismatches
  • Tombstones
Repair Internals


anti-entropy                         
                 --> validation compaction
        --> merkle tree
                --> streaming

Repair Internals


Repair Internals

Repair Internals

RepairJob

(Merkle Tree)

RepairSession

Differencer

StreamingRepairTask

RepairSession

StorageService

the pain

  • merkle tree
    • CPU & IO intensive
    • Memory intensive
    • Busts fs cache

  • parallel replica load
  • over streaming
take the pain away
  • Don't Repair
  • Break It Down
  • Incremental Repairs
take the pain away

Don't Repair

take the pain away

Break It Down


$ bin/nodetool repair -pr
$ bin/nodetool repair -in-local-dc
$ bin/nodetool repair -st 3074457345618258602 -et -9223372036854775808


For automatic range repairs
see https://github.com/BrianGallew/cassandra_range_repair

take the pain away

Incremental Repairs

Incremental Repairs

  • Concept
  • Complexities
  • Migration
  • Problems
Incremental Repairs

SSTables are immutable,
so repair was processing data
that had not changed
Incremental Repairs

Repairing less data, more frequently,
reduces the impact on the cluster
Incremental Repairs

Track repaired timestamp per SSTable
Ignore SSTables that have been repaired
Incremental Repairs

$ tools/bin/sstablemetadata keyspace1-standard1-ka-5-Data.db
SSTable:
...
Repaired at: 0

$ bin/nodetool repair --incremental
$ tools/bin/sstablemetadata keyspace1-standard1-ka-5-Data.db
SSTable:
...
Repaired at: 1454500502306
Incremental Repairs

Splits partially repaired SSTables
into
Repaired and Un-repaired SSTables
Incremental Repairs

$ ls -lah keyspace1/standard1
keyspace1-standard1-ka-9-Data.db
keyspace1-standard1-ka-10-Data.db

$ bin/nodetool repair --incremental
           -st 3074457345618258602 -et -9223372036854775808

$ ls -lah keyspace1/standard1
keyspace1-standard1-ka-11-Data.db
keyspace1-standard1-ka-12-Data.db
keyspace1-standard1-ka-13-Data.db
keyspace1-standard1-ka-14-Data.db
Incremental Repairs

$ tools/bin/sstablemetadata keyspace1-standard1-ka-11-Data.db
...
Repaired at: 1454504607623

$ tools/bin/sstablemetadata keyspace1-standard1-ka-12-Data.db
...
Repaired at: 0

$ tools/bin/sstablemetadata keyspace1-standard1-ka-13-Data.db
...
Repaired at: 1454504607623

$ tools/bin/sstablemetadata keyspace1-standard1-ka-14-Data.db
...
Repaired at: 0
Incremental Repairs

Two sets of SSTables are maintained:
Repaired and Un-repaired
Proper Migration

  1. Disable compaction
  2. Run a classic full repair
  3. Stop the node
  4. stablerepairedset all SSTables
  5. Start the node
Simple Migration

the other way
Problems

  • 2x compactions
  • Slow? Use: >2.1.12, >2.2.4, >3.0.1, or 3.1
  • Try first with STCS tables
Nerdy Stuff

  • Consistency Level: REPAIRED_QUORUM
  • Aggregate Functions
  • Repair Aware gc_grace_seconds (3.0)
  • Automatic Repair Scheduling? (CASSANDRA-10070)