memcached PHP semaphore & cache expiration handling

Author Michael Tougeron on January 11, 2008

Posted under mysql, PHP, Tips & Tricks, Web Development and tagged with , ,

There are a lot of different ways that people use memcached and PHP. The most common of which is probably your basic set and get to cache data from your database.

  1. function get_my_data1() {
  2.     $cache_id = "mykey";
  3.     $data = $memcache_obj->get($cache_id);
  4.     if ( !$data ) {
  5.         $data = get_data_from_db_function();
  6.         $memcache_obj->set($cache_id, $data, $sec_to_cache_for);
  7.     }
  8.     return $data;
  9. }

But what if the query that’s going to hit the database is pretty intensive and you don’t want more than one user to hit the db at a time? That’s easily handling via a semaphore lock.

  1. function get_my_data2() {
  2.     $cache_id = "mykey";
  3.     $data = $memcache_obj->get($cache_id);
  4.     if ( !$data ) {
  5.         // check to see if someone has already set the lock
  6.         $data_lock = $memcache_obj->get($cache_id . ‘_qry_lock’);
  7.         if ( $data_lock ) {
  8.             $lock_counter = 0;
  9.             // loop until you find that the lock has been released.  that implies that the query has finished
  10.             do while ( $data_lock ) {
  11.                 // you may only want to wait for a specified period of time.
  12.                 // one second is usually sufficient since your goal is to always have sub-second response time
  13.                 // if you query takes more than 1 second, you should consider "warming" your cached data via a cron job
  14.                 if ( $lock_counter > $max_time_to_wait ) {
  15.                     $lock_failed = true;
  16.                     break;
  17.                 }
  18.                 // you really want this to be a fraction of a second so the user waits as little as possible
  19.                 // for the simplicity of example, I’m using the sleep function.
  20.                 sleep(1);
  21.                 $data_lock = $memcache_obj->get($cache_id . ‘_qry_lock’);
  22.             }
  23.             // if the loop is completed, that either means the user waited for too long
  24.             // or that the lock has been removed.  try to get the cached data again; it should exist now
  25.             $data = $memcache_obj->get($cache_id);
  26.             if ( $data ) {
  27.                 return $data;
  28.             }
  29.         }
  30.         // set a lock for 2 seconds
  31.         $memcache_obj->set($cache_id . ‘_qry_lock’, true, 2);
  32.         $data = get_data_from_db_function();
  33.         $memcache_obj->set($cache_id, $data, $sec_to_cache_for);
  34.         // don’t forget to remove the lock
  35.         $memcache_obj->delete($cache_id . ‘_qry_lock’);
  36.     }
  37.     return $data;
  38. }

More below the break –>

Another good use of the semaphore locking is to set the expire time as part of the cached data and have an extended expire time set in memcache. This allows you to have more control over what happens when the cached data becomes stale. You can make it so that one user repopulates the cache while other users continue to get the existing cache until the first user has finished.

  1. function get_my_data3() {
  2.     $cache_id = "mykey";
  3.     $data = $memcache_obj->get($cache_id);
  4.     // if there is cached data and the expire timestamp has already expired or is within the next 2 minutes
  5.     // then we want the user to freshen up the cached data
  6.     if ( $data &amp;&amp; ($data[‘cache_expires_timestamp’]time()) < 120 ) {
  7.         // if the semaphore lock has already been set, just return the data like you normally would.
  8.         if ( $memcache_obj->get($cache_id . ‘_expire_lock’) ) {
  9.             return $data;
  10.         }
  11.         // now we want to set the lock and have the user freshen the data.
  12.         $memcache_obj->set($cache_id . ‘_expire_lock’, true, 2);
  13.         // by unsetting the data it will cause the data gather logic below to execute.
  14.         unset($data);
  15.     }
  16.     if ( !$data ) {
  17.         // be sure to include all of the semaphore logic from example 2
  18.         // set the _qry_lock for 2 seconds
  19.         $memcache_obj->set($cache_id . ‘_qry_lock’, true, 2);
  20.         $raw_data = get_data_from_db_function();
  21.         $data[‘cache_expires_timestamp’] = time() + $sec_to_cache_for;
  22.         $data[‘cached_data’] = $raw_data;
  23.         $memcache_obj->set($cache_id, $data, $sec_to_cache_for);
  24.         // remove the _qry_lock
  25.         $memcache_obj->delete($cache_id . ‘_qry_lock’);
  26.         // remove the _expires_lock
  27.         $memcache_obj->delete($cache_id . ‘_expires_lock’);
  28.     }
  29.     return $data;
  30. }

When you mash these functions together, you end up with a system where only one user every freshens the cache and/or hits the database with a specific query at a time. There are a lot of other things that suddenly become available once you start thinking of memcached beyond just saving your db from hits. You have things like session handling, smarty template caching, flags for per-server processing (e.g., clearing local file cache), and even as a temporary database.

Posted under mysql, PHP, Tips & Tricks, Web Development

This post was written by Michael Tougeron on January 11, 2008

Tags: , ,

5 Comments so far

  1. tyleradam March 18, 2008 7:09 pm

    Good stuff!!

  2. mightye December 5, 2008 11:19 am

    Looks like these solutions are prone to a race condition Michael.

    For example, in the second one, users A and B hit this function very nearly simultaneously. A executes ->get($cache_id), then B does. A tests timestamp and determines it is stale, then B does the same thing. A tests the lock and it doesn’t exist. B tests the lock and it doesn’t exist for him either. A sets the lock. B sets the lock.

    Now both users will do this work at the same time, and if it’s really critical that the work not be done more than once (IE, perhaps it can lead to a deadlock or some other system utilization or data integrity issue), you can have some problems here.

    I might suggest using something like (for your first example):
    while ($cache->increment($lock_name) > 1) {
    $cache->decrement($lock_name);
    usleep(100);
    }
    Or adapting the same idea to your second example:
    if ($cacheIsStale && $cache->increment($lockName) == 1) {
    // refresh cache
    }
    $cache->decrement($lockName);

    Increment and decrement should be atomic operations, so there should be no concern about both users calling ->increment() nearly simultaneously only to discover that they both get back the same value. Though this may not be guaranteed in a multi-memcached-server setup; at least in such a scenario the window size for the race condition is dramatically smaller – and it’s as good as it gets given that memcached doesn’t natively offer true semaphores.

  3. Michael Tougeron December 5, 2008 12:31 pm

    That could work as well. With the environment I work in (GameSpot.com) it is still possible for race conditions with your example. However, for me anyway, it doesn’t matter. I’d rather have 2 people (or even 1,000) people hit the database than all 100,000 hitting the db. :)

  4. Michael Greiling July 1, 2009 9:45 pm

    I know this is an old post, but I just ran across it while looking for my own solution within memcache and I thought I’d share my findings. I found the following to work slightly better than mightye’s solution in a previous comment:

    [CODE]

    function get_lock( $key_name )
    {
    while ( !$cache->add( $key_name, 1, false, 30 ) ) { usleep(100); }
    }

    function release_lock( $key_name )
    {
    $cache->delete( $key_name );
    }

    get_lock(‘lock_A’);
    // do stuff
    release_lock(‘lock_A’);

    [/CODE]

    The difference here is that unlike increment(), the key does not need to exist in the memcache prior to obtaining a lock, and add() is atomic as well. (it will never return ‘true’ to more than one process — as opposed to set() for example)

    One pitfall though which I didn’t realize right away is that unlike a traditional blocking semaphore, the processes waiting to run will not necessarily complete in the order in which they started. They don’t line up in a queue to complete on a first-come-first-server basis, instead the next process to obtain the lock happens to be the first one to attempt an add() command after the lock is released which is completely random.

  5. Mike Tougeron August 31, 2009 9:12 am

    @Michael Greiling – Yes, that’s the route I ended up going in production as well. This entry has *long* been do for an update. :)

Trackbacks

Leave a Comment

You must be logged in to post a comment.

More Blog Posts