TWCS in Cassandra is not working as expected - Stack Overflow

admin2025-04-19  0

We have a table using TWCS with 5 days TTL and 1 day

gc_grace_seconds.
compaction = {'class':'.apache.cassandra.dbpaction.TimeWindowCompactionStrategy','compaction_window_size':'12','compaction_window_unit':'HOURS','max_threshold':'32','min_threshold':'4'}

Which would mean table should have 10 sstables as compaction window size is 12 hours (2 sstables per day and has 5 days TTL).

But when we run sstable metadata we see 19 sstables on the disk and sstable is not deleted even when maxDeletionTime is showing as yesterday's date. What could be the reason for this?

Was expecting 10 sstables as compaction window size is 12 hours (2 sstables per day and has 5 days TTL).

We have a table using TWCS with 5 days TTL and 1 day

gc_grace_seconds.
compaction = {'class':'.apache.cassandra.dbpaction.TimeWindowCompactionStrategy','compaction_window_size':'12','compaction_window_unit':'HOURS','max_threshold':'32','min_threshold':'4'}

Which would mean table should have 10 sstables as compaction window size is 12 hours (2 sstables per day and has 5 days TTL).

But when we run sstable metadata we see 19 sstables on the disk and sstable is not deleted even when maxDeletionTime is showing as yesterday's date. What could be the reason for this?

Was expecting 10 sstables as compaction window size is 12 hours (2 sstables per day and has 5 days TTL).

Share Improve this question edited Mar 5 at 23:28 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 5 at 23:03 user29906820user29906820 111 bronze badge 2
  • Sample sstable metadata output showing maxDeletionTime of march 4th (which was yesterday) { "10.0.0.1": { "metadata": [ { "sstable": "nb-109126-big-Data.db", "minTimestamp": "2025-02-27T00:00:00.000+00:00", "maxTimestamp": "2025-02-27T00:00:00.000+00:00", "fileTimestamp": "2025-03-05T00:00:00.000+00:00", "duration": "9m 60s", "minDeletionTime": "2025-02-27T00:00:00.000+00:00", "maxDeletionTime": "2025-03-04T00:00:00.000+00:00", "droppable": "2740.0" } – user29906820 Commented Mar 5 at 23:06
  • Please do not post such additional info in the comments; edit & update your post instead. – desertnaut Commented Mar 5 at 23:28
Add a comment  | 

2 Answers 2

Reset to default 1

There are a couple of potential obvious reasons this can happen:

  • The gc_grace_seconds of the table is the default 10 days, which would mean while data has TTL'ed, it will not be removed until 10 days after the TTL has occured.
  • If the partitioning of the table is creating shadows - e.g. the same partition is spread across multiple time windows. This is a lot harder to deal with although there is a flag to bypass the shadow check, it is appropriately named as unsafe_aggressive_sstable_expiration - and should not be used without properly understanding if it applies and if it is safe in your specific instance.

It's possible that you've got data mixed between SSTables. Do you have read repairs enabled or potentially mix your writes by issuing updates? If so, you may have timestamp overlaps in your SSTables which will cause Cassandra to block dropping expired tables. You can check if this is the case with sstableexpiredblockers.

https://cassandra.apache./doc/stable/cassandra/tools/sstable/sstableexpiredblockers.html

You can also directly check for overlap between your SSTables by comparing the min and max timestamps. Check this post from The Last Pickle that uses sstablemetadata.

https://thelastpickle/blog/2016/12/08/TWCS-part1.html

As mentioned by Andrew, you can have Cassandra ignore the overlap by setting unsafe_aggressive_sstable_expiration which will delete old tables once they expire. I've found this very helpful for managing TWCS tables, but it can cause deleted data to reappear, so make sure you fully understand it before enabling.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1745003555a279375.html

最新回复(0)