Taking the pain out of repairs

Taking the pain

out of repairs

About The Last Pickle

Work with clients
to deliver and improve

Apache Cassandra based solutions

Based in
USA, New Zealand, Australia, France

Repair

Pain

Strategies

Incremental Repair

Repairs is the way to ensure
consistent data on disk

Hinted Hand-off
Read Repairs
Digest Mismatches
Tombstones

Repair Internals

anti-entropy
                 --> validation compaction
        --> merkle tree
                --> streaming

Repair Internals

RepairJob

(Merkle Tree)

RepairSession

Differencer

StreamingRepairTask

RepairSession

StorageService

the pain

merkle tree
- CPU & IO intensive
- Memory intensive
- Busts fs cache

parallel replica load
over streaming

take the pain away

Don't Repair
Break It Down
Incremental Repairs

take the pain away

Don't Repair

take the pain away

Break It Down

$ bin/nodetool repair -pr

$ bin/nodetool repair -in-local-dc

$ bin/nodetool repair -st 3074457345618258602 -et -9223372036854775808

For automatic range repairs
see https://github.com/BrianGallew/cassandra_range_repair

take the pain away

Incremental Repairs

Concept
Complexities
Migration
Problems

Incremental Repairs

SSTables are immutable,
so repair was processing data
that had not changed

Incremental Repairs

Repairing less data, more frequently,
reduces the impact on the cluster

Incremental Repairs

Track repaired timestamp per SSTable
Ignore SSTables that have been repaired

Incremental Repairs

$ tools/bin/sstablemetadata keyspace1-standard1-ka-5-Data.db
SSTable:
...
Repaired at: 0

$ bin/nodetool repair --incremental
$ tools/bin/sstablemetadata keyspace1-standard1-ka-5-Data.db
SSTable:
...
Repaired at: 1454500502306

Incremental Repairs

Splits partially repaired SSTables
into
Repaired and Un-repaired SSTables

Incremental Repairs

$ ls -lah keyspace1/standard1
keyspace1-standard1-ka-9-Data.db
keyspace1-standard1-ka-10-Data.db

$ bin/nodetool repair --incremental
           -st 3074457345618258602 -et -9223372036854775808

$ ls -lah keyspace1/standard1
keyspace1-standard1-ka-11-Data.db
keyspace1-standard1-ka-12-Data.db
keyspace1-standard1-ka-13-Data.db
keyspace1-standard1-ka-14-Data.db

Incremental Repairs

$ tools/bin/sstablemetadata keyspace1-standard1-ka-11-Data.db
...
Repaired at: 1454504607623

$ tools/bin/sstablemetadata keyspace1-standard1-ka-12-Data.db
...
Repaired at: 0

$ tools/bin/sstablemetadata keyspace1-standard1-ka-13-Data.db
...
Repaired at: 1454504607623

$ tools/bin/sstablemetadata keyspace1-standard1-ka-14-Data.db
...
Repaired at: 0

Incremental Repairs

Two sets of SSTables are maintained:
Repaired and Un-repaired

Proper Migration

Disable compaction
Run a classic full repair
Stop the node
stablerepairedset all SSTables
Start the node

Simple Migration

the other way

Problems

2x compactions
Slow? Use: >2.1.12, >2.2.4, >3.0.1, or 3.1
Try first with STCS tables

Nerdy Stuff

Consistency Level: REPAIRED_QUORUM
Aggregate Functions
Repair Aware gc_grace_seconds (3.0)
Automatic Repair Scheduling? (CASSANDRA-10070)

Taking the pain

out of repairs

About The Last Pickle

Work with clients to deliver and improve

Apache Cassandra based solutions

Repair Internals

Repair Internals

Repair Internals

Repair Internals

the pain

take the pain away

take the pain away

take the pain away

take the pain away

Incremental Repairs

Incremental Repairs

Incremental Repairs

Incremental Repairs

Incremental Repairs

Incremental Repairs

Incremental Repairs

Incremental Repairs

Incremental Repairs

Proper Migration

Simple Migration

Problems

Nerdy Stuff

Work with clients
to deliver and improve