Equip your server with cheap 128+GB RAM that preserves application state upon restart
The author of Gmail has a great post called “The problem with conventional databases”. He discusses how long term trends in hardware influence the architecture options for heavily loaded servers. In Blue Edge we have used with good success non-database web server architectures, where the application has most of its data on the filesystem, loads them at startup and rarely touches the disc while it functions. This allows great performance to be achieved with little hardware.
Paul touches an interesting topic about the Flash memory prices and its use:
Finally, one more interesting stat: 8 GB of flash memory cost about $80
Flash has some weird performance characteristics, but those can be overcome with smarter controllers. I expect that flash will replace disk for all applications other than large object storage (such as video streams) and backup.
I think there is a great use for Flash memory right now: you can get your computer 128GB of RAM for about $1000, have your application work directly and only in RAM and still have all your data persistently stored when you reboot or in case of power failure! How can you do that?
First, lets review the “weird performance characteristics” - here is a summary of what Wikipedia says:
- Flash transfer speed is relatively low: the “normal” speed seems to be about 10MB/sec, top speed I’ve seen in the specs is 32 MB/sec
- Reading is a little faster than writing. When you need to overwrite a block of memory, you erase it on the first step, and then write the new data
- The erase-write cycles are finite, most Flash is guaranteed to work for at least 1 million cycles
- Flash memory, of course, doesn’t need power to preserve its state, and it is a random-access device
With the hardware available today you can use Flash for disc storage. It has a slower transfer rate than normal discs and no seek time. To overcome the speed limit, you can batch a few flash memories and access them in parallel, in RAID-0 fashion. Then you could install a database and have it operate faster than hard disc under specific access pattern (a lot of small transactions).
However, databases are not optimized for this kind of operation. They are optimized to minimize seek, and they solve a lot of hard problems under assumptions that are not necessarily true anymore.
Alternatively, you can use memory-mapped files persisted on your Flash RAID array. This files make the content of the file available at some memory location - you get a pointer and can read/write data at a random offset of the file. The virtual memory mechanism takes care to read the file portion if it’s not there yet, or to write it back if you changed it. This almost means you can treat it like any other memory and store arbitrary data structures. You can’t really, because the base address of memory mapping is usually different, so the pointers within that memory area would be wrong after remapping the file, but this problem is solved using specialized pointers, so let’s forget about it for now.
In this way you can basically work with flash storage space as if it is RAM. Your real DRAM becomes a cache for the persistent flash storage. You still need to tell it when to flush the pages to disc and take care to have some locality of data so that the “cache” is effective. You can have a 64bit architecture with huge virtual memory space and big 64 kb memory pages to make the transfer to/from flash more efficient. You need some software infrastructure to support this scheme, but it is nowhere near the complexity of a database.
You can play on the strengths of flash memory and get really decent results. Working with packs of USB sticks or Compact Flash cards seems messy and unreliable, but you can bet hardware manufacturers will produce “hard drives” made of inexpensive flash memory chips soon. The trends in hardware development do open new possibilities for software architectures.
Update: It seems flash memory performance has another quirk: random writes are very slow, much slower than random reads or sequential writes or harddisk random writes. This seems to be caused not by some inherent characteristics of the medium, but rather by controllers having to implement their algorithms using very little memory.
So, does the above till holds true? To some extent, yes. You can implement a datastore that uses multiversion concurrency control (the modern way to implement concurrency control anyway) that uses only continuous writes and random reads. Or you can wait for the next generation of smarter controllers.






