MySQL Conf 2008 – PHP/DBUnit (Day 2)

I started off the day today listening to the keynotes at the MySQL Conference. Unfortunately I had to work on something so I wasn’t able to really pay attention. :( I hope I didn’t miss anything important.

After the keynotes, I decided to start off with the Testing PHP/MySQL Applications with PHPUnit/DbUnit session by Sebastian Bermann. I know I should already be using PHPUnit, but honestly I’ve never gotten around to it. I’m hoping that this will give me the kick in the ass that I need to start. :P

Read More…

Posted under Events, mysql, PHP

This post was written by Michael Tougeron on April 15, 2008

Tags: , , , ,

MySQL Conf 2008 – Memcached (Day 1)

The session I’m attending this afternoon is Memcached and MySQL: Everything You Need To Know. I’m really looking forward to this talk. For whatever geeky reason I think Memcached is the coolest. ;)

Brian Aker has put the PDF of the slides at http://download.tangent.org/talks/Memcached%20Study.pdf. The slides will be updated as time goes on so this link should always have the most up-to-date stuff. In case at some point he removes them, you can find them attached.

2:15pm: Talking about Grazr and how they have a daemon to do write-through cache. Now talking about processing data so that the data normally/frequently used is always in cache.

2:25pm: Memcached is supposed to be “simple” so that it can be faster. Has its own memory slab allocator, it originally tried using malloc but that was way too slow. The way it assigns blocks for memory assignment means that each data store goes into one of those “blocks.” If the # of available room in the block is full, then it drops the oldest record in that block. Written around libevent for scalable network connections. That was only connections sending data are “active” on the server.

2:27pm: The clients handle the majority of the load. It takes the cache key and hashes it so that it knows which server to send to the data to. Server doesn’t do the serialization either.

2:30pm: Server is not redundant and does not handle failover. But some clients like with PHP’s PECL, the client can implement this type of functionality.

2:33pm: Available commands are pretty much only set/get/replace/add, append/prepend, increment/decrement, cas and stats. While they say append/prepend are easy to be abused it is still an easy & quick way to store the keys used for something. Unless the size of the keys that you are storing reaches close to 1MB, it’s okay. It’s a great way to know what keys you’ll need to clear when product information has been updated.

2:38pm: MySQL UDF usage for Memcached is growing. Google Summer of Code is working on making MySQL Query cache use Memcached. lighthttpd has mod_memcache for caching files from disk. Apache also has mod_memcached but is still alpha.

2:42pm: Memcached has a few limits that you should pay attention to:
The max cache key is 250 bytes
Max data size is 1MB
Maxbytes limits the item cache, not everything
Be careful because with 32bit machines, you can set too high of a maxbytes that in combination with other memcache memory elements (e.g., key storage) and it will segfault.

2:46pm: LRU – Least recently accessed items are up for eviction and can be seen. One LRU exists per “slab class.” LRU evictions don’t need to be common. That’s pretty nice because you won’t lose larger data sets because a small data element doesn’t have room in its slab (and vice versa).

2:50pm: Threads – Great for large instances (16G+) and/or large multiget requests. It scales okay now, but they are working on improving it. Also means that you may not need to run multiple instances on the same box. Only 1 thread can talk to the allocator and the hash table at one time. This is so that you don’t have race conditions.

2:55pm: Don’t run Memcached with swap enabled or at least set it really small. Can seriously slow down performance and technically is contrary to purpose of memcache. The smaller swap means that OS can still use if it really has to, but won’t let the writing to swap happen for Memcached. The memory for Memcached is permanently allocated from the OS. Shouldn’t be an issue with most modern servers. The slab class are created by chunk size. Tends to create 36-39 slab classes. It does not reassign slab classes once the daemon loads. They are working on a way to allow you to change the assignments on the fly. e.g., if you find that you are evicting a lot of data from one slab, you can give that slab more pages while taking it from another slab.

3:10pm: There is normal hashing (usually crc or some modulus operation) and consistent hashing. Each client could implement its own version, but usually use common methods so that multiple clients can use the same pool of memcache servers.

3:20pm: PECL client (and most others) has option to not “remove” the server if it is not available. It can have a “back-off” method where it won’t try to hit the server for 1 second, 5 seconds, then 15 seconds, etc. It can also failover to a different server until the original is back online.

3:25pm: You should always try to use multi-get. Memcached is optimized for handling multiple requests at once. You’ll find a big improvement if you do this. The trick is to write your code in a way that can utilize this to the fullest.

4:04pm: Back from break. We’re now going over various coding examples of how to use Memcached. I’m hoping this part will be helpful. Most coding examples of how to use Memcached are generally fairly simple. But with 90 minutes left, there aught to be some gems.

4:15pm: Going over locks. Much like my previous post regarding memcached cache locks.

4:25pm: Just re-hashing the same examples in different languages. :(

4:35pm: libmemcached is a C/C++ client. Has replicated ability. You can store flags associated with the data in the byte category. Most clients don’t allow you direct access to this.

4:48pm: When doing a multi-get it hashes each key to determine the servers where the data would be stored. Once it knows that, the requests are made in parallel and as single-combined request to each server needed.

4:50pm: MySQL & Memcached. The MySQL uses the UDF API and libmemcached. You have to install it by CREATE FUNCTION. Most common method is to use memc_delete to remove the memcached data when data is written to the db. An example is “select id, url, memc_set(concat(‘feeds’, md5(url), url) from feeds;” or “select memc_get(concat(‘feeds’, md5(url));” This could be helpful for when you store the entire row, and only the row, in memcache. But for the most part, I don’t see the benefit of using this. Of course, I’ve been known to be wrong before.

5:00pm: There is memcached-tool which may assist with some basic stats/display commands. It will eventually be what does the slab re-allocation. libmmcached has memslap which may be useful for performance testing. MRTG gives you decent graphs about what is happening on the server.

5:05pm: 1.2.5 release supports multi-interface support, UDP all the time, noreply, & IPV6. Noreply is kind of neat as it allows you to set keys and not have have a response sent back to the client. It makes it a fire & forget save to memcache. My experience is that most people don’t verify that the data is really set to memcache anyway.

Posted under Events, memcached, mysql

This post was written by Michael Tougeron on April 14, 2008

Tags: , , ,

MySQL Conf 2008 – MySQL Replication (Day 1)

I started off the MySQL Conference 2008 with the replication tutorial. I’m hoping that there will be some good tidbits in the second half of the session. I’m almost positive the first part will be refresher and standard replication stuff.

I really wanted to do the MySQL Proxy session, but that is an all-day thing and I’d miss the Memcached session. In hindsight, I’m probably going to regret that decision, but we’ll see. I can always do Proxy next year. :)

10:00am: So far pretty much all refresher. Some people have asked some good questions about the relay logs and replication lag. Unfortunately, none of it is particularly helpful. I don’t understand why MySQL doesn’t enable a way to do master-master replication reliably outside of NDB. In my opinion, NDB, is not an appropriate solution for most websites. Having to take down the entire cluster to alter a table would not be acceptable. :(

10:45am: Not sure if it is my laptop or the wireless network, but my connection/authorization never seems to carry over from when I log in/out of the laptop. I have to disconnect from the wireless auto-reconnect and connect again manually in order to get on the Internet. I wonder if anyone else is having this problem?

10:45am: One of the recommend ways for doing HA master/master is to use a shared disk array. The section would be to put a virtual IP in front of a heartbeat monitor in front of two MySQL servers configured the same. If they shared the same disk array, then the bin-log will be the same for both servers. The heartbeat monitor then sends the traffic to mas1 until it finds that it is no longer online. Then the traffic goes to mas2 automatically. When mas1 comes back online, the heartbeat monitor sends the traffic back automatically. Since mas2 is no longer reading/writing to the db tables, there shouldn’t be any corruption. Behind the mas1/mas2 is another virtual IP tied to the same heartbeat monitor. That way the slaves will also stay online. This sounds like it will handle many of the scenarios where you need HA master/master. The only thing I still see missing is the ability to alter tables without bringing down the tier like you can with m/cluster. [ m/cluster is a product by Continuent for HA synchronous replication. Of course, that product is EOL and their new product doesn't allow this either. :( ]

11:00am: Federated databases are good for joining two sets of data to another db server. I guess I always misunderstood how that works. This seems like it’d be a good way to hook up user information with community (forums) data. I wonder what the performance costs are?

11:15am: Row-based replication in 5.1 paves the way for future replication enhancements such as conflict detection & conflict resolution. As well as multi-channel replication and horizontal partioning. Multi-channel replication would be really cool. It is such a pain when replication lags and this could go a long way to help fight that. You cannot do master filtering for individual tables with row-based. I wonder why that is?

12:00pm: I don’t know why MySQL keeps pushing NDB as the end-all solution to replication problems. Yes it can deal with a lot of the replication issues, but it has several other issues that make it a bad (or at least not optimal) solution.

12:05pm: eek, running out of battery power. I have 18 minutes left… Should be able to make it until lunch. Maybe…

12:20: That’s about it. Q&A now, but I’m headed out to find a lunch table near power. :P

Posted under Events, mysql

This post was written by Michael Tougeron on April 14, 2008

Tags: , , ,