There are two problems that we would like to address to make the datastore faster and more efficient in terms of storage space.

Excessive Invalid Index Entries

The first problem is the buildup of invalid index entries in the database. These make queries slower and certain queries (eg. lastmod_ts > DATETIME) unacceptably slow.

We can mitigate this buildup by making put and delete queries atomic. This would allow us to delete index entries immediately when the datastore operation is not part of a transaction.

Since the pycassa interface does not support the type of batch statements that we would need to implement this feature, we would first need to migrate to cassandra-driver.

Excessive Work Trying to Validate Index Entries

The second problem is the necessity of verifying every index entry before using it during a query.

We can reduce the amount of validation work by appending a marker to any index that is part of a transaction. We would also need to add this marker to index entries that the transactional operation affects, such as index entries for previous values of that property.

This would allow us to skip the validity check for index entries that don't have this marker. It would give the additional benefit of speeding up projection queries again.

Of course, this solution would require putting the datastore in read-only mode and then running the groomer before upgrading AppScale.