Cassandra cfstats: differences between Live and Total used space values -
for 1 month i'm seeing following values of used space 3 nodes ( have replication factor = 3) in cassandra cluster in nodetool cfstats output:
pending tasks: 0 column family: binarydata sstable count: 8145 space used (live): 787858513883 space used (total): 1060488819870
for other nodes see values, like:
space used (live): 780599901299 space used (total): 780599901299
you can note 25% difference (~254gb) between live , total space. seems have lot garbage on these 3 nodes cannot compacted reason. column family i'm talking has leveledcompaction strategy configured sstable size of 100mb:
create column family binarydata key_validation_class=utf8type , compaction_strategy=leveledcompactionstrategy , compaction_strategy_options={sstable_size_in_mb: 100};
note, total value staying for month on of 3 nodes. relied cassandra normalize data automatically.
what tried decrease space (without result):
- nodetool cleanup
- nodetool repair -pr
- nodetool compact [keyspace] binarydata (nothing happens: major compaction ignored leveledcompaction strategy)
are there other things should try cleanup garbage , free space?
leveled compaction creates sstables of fixed, relatively small size, in case 100mb grouped “levels”. within each level, sstables guaranteed non-overlapping. each level ten times large previous.
so statement provided in cassandra doc, can conclude may in case ten time large level background not formed yet, resulting no compaction.
coming second question, since have kept replication factor 3, data has 3 duplicate copies, have anomaly.
and 25% difference between live , total space, know due on deletion operation.
Comments
Post a Comment