MySQL Conf 2008 – Memcached (Day 1)

The session I’m attending this afternoon is Memcached and MySQL: Everything You Need To Know. I’m really looking forward to this talk. For whatever geeky reason I think Memcached is the coolest. ๐Ÿ˜‰

Brian Aker has put the PDF of the slides at The slides will be updated as time goes on so this link should always have the most up-to-date stuff. In case at some point he removes them, you can find them attached.

2:15pm: Talking about Grazr and how they have a daemon to do write-through cache. Now talking about processing data so that the data normally/frequently used is always in cache.

2:25pm: Memcached is supposed to be “simple” so that it can be faster. Has its own memory slab allocator, it originally tried using malloc but that was way too slow. The way it assigns blocks for memory assignment means that each data store goes into one of those “blocks.” If the # of available room in the block is full, then it drops the oldest record in that block. Written around libevent for scalable network connections. That was only connections sending data are “active” on the server.

2:27pm: The clients handle the majority of the load. It takes the cache key and hashes it so that it knows which server to send to the data to. Server doesn’t do the serialization either.

2:30pm: Server is not redundant and does not handle failover. But some clients like with PHP’s PECL, the client can implement this type of functionality.

2:33pm: Available commands are pretty much only set/get/replace/add, append/prepend, increment/decrement, cas and stats. While they say append/prepend are easy to be abused it is still an easy & quick way to store the keys used for something. Unless the size of the keys that you are storing reaches close to 1MB, it’s okay. It’s a great way to know what keys you’ll need to clear when product information has been updated.

2:38pm: MySQL UDF usage for Memcached is growing. Google Summer of Code is working on making MySQL Query cache use Memcached. lighthttpd has mod_memcache for caching files from disk. Apache also has mod_memcached but is still alpha.

2:42pm: Memcached has a few limits that you should pay attention to:
The max cache key is 250 bytes
Max data size is 1MB
Maxbytes limits the item cache, not everything
Be careful because with 32bit machines, you can set too high of a maxbytes that in combination with other memcache memory elements (e.g., key storage) and it will segfault.

2:46pm: LRU – Least recently accessed items are up for eviction and can be seen. One LRU exists per “slab class.” LRU evictions don’t need to be common. That’s pretty nice because you won’t lose larger data sets because a small data element doesn’t have room in its slab (and vice versa).

2:50pm: Threads – Great for large instances (16G+) and/or large multiget requests. It scales okay now, but they are working on improving it. Also means that you may not need to run multiple instances on the same box. Only 1 thread can talk to the allocator and the hash table at one time. This is so that you don’t have race conditions.

2:55pm: Don’t run Memcached with swap enabled or at least set it really small. Can seriously slow down performance and technically is contrary to purpose of memcache. The smaller swap means that OS can still use if it really has to, but won’t let the writing to swap happen for Memcached. The memory for Memcached is permanently allocated from the OS. Shouldn’t be an issue with most modern servers. The slab class are created by chunk size. Tends to create 36-39 slab classes. It does not reassign slab classes once the daemon loads. They are working on a way to allow you to change the assignments on the fly. e.g., if you find that you are evicting a lot of data from one slab, you can give that slab more pages while taking it from another slab.

3:10pm: There is normal hashing (usually crc or some modulus operation) and consistent hashing. Each client could implement its own version, but usually use common methods so that multiple clients can use the same pool of memcache servers.

3:20pm: PECL client (and most others) has option to not “remove” the server if it is not available. It can have a “back-off” method where it won’t try to hit the server for 1 second, 5 seconds, then 15 seconds, etc. It can also failover to a different server until the original is back online.

3:25pm: You should always try to use multi-get. Memcached is optimized for handling multiple requests at once. You’ll find a big improvement if you do this. The trick is to write your code in a way that can utilize this to the fullest.

4:04pm: Back from break. We’re now going over various coding examples of how to use Memcached. I’m hoping this part will be helpful. Most coding examples of how to use Memcached are generally fairly simple. But with 90 minutes left, there aught to be some gems.

4:15pm: Going over locks. Much like my previous post regarding memcached cache locks.

4:25pm: Just re-hashing the same examples in different languages. ๐Ÿ™

4:35pm: libmemcached is a C/C++ client. Has replicated ability. You can store flags associated with the data in the byte category. Most clients don’t allow you direct access to this.

4:48pm: When doing a multi-get it hashes each key to determine the servers where the data would be stored. Once it knows that, the requests are made in parallel and as single-combined request to each server needed.

4:50pm: MySQL & Memcached. The MySQL uses the UDF API and libmemcached. You have to install it by CREATE FUNCTION. Most common method is to use memc_delete to remove the memcached data when data is written to the db. An example is “select id, url, memc_set(concat(‘feeds’, md5(url), url) from feeds;” or “select memc_get(concat(‘feeds’, md5(url));” This could be helpful for when you store the entire row, and only the row, in memcache. But for the most part, I don’t see the benefit of using this. Of course, I’ve been known to be wrong before.

5:00pm: There is memcached-tool which may assist with some basic stats/display commands. It will eventually be what does the slab re-allocation. libmmcached has memslap which may be useful for performance testing. MRTG gives you decent graphs about what is happening on the server.

5:05pm: 1.2.5 release supports multi-interface support, UDP all the time, noreply, & IPV6. Noreply is kind of neat as it allows you to set keys and not have have a response sent back to the client. It makes it a fire & forget save to memcache. My experience is that most people don’t verify that the data is really set to memcache anyway.

Posted under Events, memcached, mysql

This post was written by Michael Tougeron on April 14, 2008

Tags: , , ,

memcached PHP semaphore & cache expiration handling

There are a lot of different ways that people use memcached and PHP. The most common of which is probably your basic set and get to cache data from your database.

  1. function get_my_data1() {
  2.     $cache_id = "mykey";
  3.     $data = $memcache_obj->get($cache_id);
  4.     if ( !$data ) {
  5.         $data = get_data_from_db_function();
  6.         $memcache_obj->set($cache_id, $data, $sec_to_cache_for);
  7.     }
  8.     return $data;
  9. }

But what if the query that’s going to hit the database is pretty intensive and you don’t want more than one user to hit the db at a time? That’s easily handling via a semaphore lock.

  1. function get_my_data2() {
  2.     $cache_id = "mykey";
  3.     $data = $memcache_obj->get($cache_id);
  4.     if ( !$data ) {
  5.         // check to see if someone has already set the lock
  6.         $data_lock = $memcache_obj->get($cache_id . ‘_qry_lock’);
  7.         if ( $data_lock ) {
  8.             $lock_counter = 0;
  9.             // loop until you find that the lock has been released.  that implies that the query has finished
  10.             do while ( $data_lock ) {
  11.                 // you may only want to wait for a specified period of time.
  12.                 // one second is usually sufficient since your goal is to always have sub-second response time
  13.                 // if you query takes more than 1 second, you should consider "warming" your cached data via a cron job
  14.                 if ( $lock_counter > $max_time_to_wait ) {
  15.                     $lock_failed = true;
  16.                     break;
  17.                 }
  18.                 // you really want this to be a fraction of a second so the user waits as little as possible
  19.                 // for the simplicity of example, I’m using the sleep function.
  20.                 sleep(1);
  21.                 $data_lock = $memcache_obj->get($cache_id . ‘_qry_lock’);
  22.             }
  23.             // if the loop is completed, that either means the user waited for too long
  24.             // or that the lock has been removed.  try to get the cached data again; it should exist now
  25.             $data = $memcache_obj->get($cache_id);
  26.             if ( $data ) {
  27.                 return $data;
  28.             }
  29.         }
  30.         // set a lock for 2 seconds
  31.         $memcache_obj->set($cache_id . ‘_qry_lock’, true, 2);
  32.         $data = get_data_from_db_function();
  33.         $memcache_obj->set($cache_id, $data, $sec_to_cache_for);
  34.         // don’t forget to remove the lock
  35.         $memcache_obj->delete($cache_id . ‘_qry_lock’);
  36.     }
  37.     return $data;
  38. }

More below the break –> Read More…

Posted under mysql, PHP, Tips & Tricks, Web Development

This post was written by Michael Tougeron on January 11, 2008

Tags: , ,