NoSQL: Cassandra write operation?

Hello, I have a short text about write operations in Cassandra in my documentation. Unfortunately, I don't understand the following things:

  • What is a MemTable?
  • What is an in-memory system?
  • What is meant by "holds the data until it is full"?
  • ….

So it would be nice if you could explain the text in a little more detail.

(1 votes)
Loading...

Similar Posts

Subscribe
Notify of
2 Answers
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
regex9
11 months ago

I would first describe it in a visual example.

Imagine you’re a waiter in a beer tent. You could transport every measure individually here, which might cost you some time. It would be more effective to transport several knocks on a tray, which is filled to such an extent that its load limit is exhausted.

In this case, the Commit logo would be the customer order list. It ensures that every registered customer receives his order. The tablet is the MemTable. It is a local buffer that takes up as much as it can. As soon as the tray is full, the scoops (data) are transported to the actual target system (SSTable) and thus the order is completed.

What is an in-memory system?

Basically, this means a memory area within the memory. So storage space available to the program (Cassandra) for program runtime.

The advantage of such a system is that the data can be processed more quickly. The computer has easier, faster access to its memory.

However, the storage space is naturally limited and not persistent (in case of program termination, the area is reserved and the data are gone). Therefore, the data must eventually be transferred to a persistent system (storage on the hard disk/SSTable).

If Cassandra had to dispense with this buffer and instead save the data immediately on the hard drive, the write operations would take much longer. On the other hand, the buffer provides the advantage that you can retrieve data that have recently been written (but not yet stored persistently) faster.

What is a MemTable?

MemTable is simply the name for such a storage area. Since the data in the memory are formatted accordingly (so that they can later be transferred more easily/faster into the persistent system), the word Table in the term included.

By the way, Cassandra can create several MemTables. For each database table, there is a maximum of one active MemTable (the current writing process) and there can be several inactive MemTables that are still waiting to be filled or emptied.

What is meant by “the data keeps until they are full”?

Simply put, a MemTable is waiting to reach a certain threshold. If, for example, the database is configured in such a way that a memtable can hold a maximum of 5 MB, it would output its data to the SSTable if this value is exceeded.

However, there are other criteria/configuration options that determine when a MemTable is emptied. You could specify a time period (e.g., save the data of a MemTable persistent every five seconds) and there is a memory limit for the commit log.

KarlRanseierIII
11 months ago

You could, of course, enter the (technical) Description look.

The Commitlog is ultimately a WAL and sets the WAL principle Um. MemTable is simply a block in the (main) memory with data structures and the data – ultimately a form of cache. Once full, data is transferred to the persistent SS tablets (hard disk).