Validating HTML with tidy

If you ever have to do HTML validation or parsing in PHP the tidy extension is the way to do it! This extension lets you use the abilities of tidy in some pretty powerful ways. The extension, written by John Coggeshall, has been around for several years now. I can see how if someone just took a quick glance at it they could think it was nice, but not really something they need. How wrong they would be! If you take a few minutes and look under the hood, tidy is an extremely powerful tool. Not only can it format html to standards (what most people use it for), it can also be a powerful parser and validation tool.

When I’m dealing with user inputted data where I want to allow HTML I have two concerns. First, I don’t want to allow XSS (some xml parsers think <p kkk=”></p>” closes the <p> tag). Second, the user frequently enters invalid html (e.g., doesn’t close the <a> tag). Fortunately tidy can easily deal with both. The second issue is the easiest to solve by running tidy->cleanRepair() on the html. The first is taken care of by looping through the tidy nodes and rebuilding the html using a whitelist. More about how to do this after the break. Read More…

Posted under PHP, Security, Tips & Tricks, Web Development

This post was written by Michael Tougeron on January 15, 2009

Tags: , , , , ,

Scaling MySQL powered Web Sites by Sharding and Replication – SF MySQL Meetup Nov 2008

Peter Zaitsev will be giving his excellent presentation “Scaling MySQL powered Web Sites by Sharding and Replication” to the upcoming SF MySQL 6:00pm on November 3rd @ the CBS Interactive (formerly CNET Networks) building in SOMA San Francisco (235 2nd St).

RSVP here: http://mysql.meetup.com/30/calendar/8912109/

Description from the 2008 Velocity conference:
When your application grows beyond capacity of a single MySQL server there are few ways to scale, with most typical being Replication and Sharding.

In this presentation you will learn how, depending on your application performance scalability and high availability goals, you can choose what proper scaling strategy will be good for you.

We also will speak about scalability limitations of replication of Replication and Sharding, implementation complexities, and aspects of operation.

Peter Zaitsev’s bio:
MySQL Performance Blog

Peter helped many companies ranging from one man startups to Fortune 500 companies with their Performance and Scaling problems on various stages starting from architecture design and down to queries and schema optimization.

Over cause of the years Peter has spoken on many International conferences focused on MySQL, Open Source, Databases or High Performance Web applications.

Before co-founding Percona Peter worked for MySQL AB as head of High Performance Group being involved in Support, Consulting, Development and working with vendors helping to optimize their software or hardware to perform well with MySQL.

Peter also was involved in number of Web Startups in the roles ranging from CTO to Consultant and has a lot of experience in MySQL and Web Applications Operations, Deployment, Quality Assurance and Development.
Please join us!

Map to CNET Networks, Inc.
When you arrive, please look for the PHP/MySQL Meetup sign pointing you to the conference room.

Look forward to seeing you there!

Posted under Events, Internet, mysql, Technology, Tips & Tricks, Web Development

This post was written by Michael Tougeron on October 23, 2008

Tags: , , , ,

memcached PHP semaphore & cache expiration handling

There are a lot of different ways that people use memcached and PHP. The most common of which is probably your basic set and get to cache data from your database.

  1. function get_my_data1() {
  2.     $cache_id = "mykey";
  3.     $data = $memcache_obj->get($cache_id);
  4.     if ( !$data ) {
  5.         $data = get_data_from_db_function();
  6.         $memcache_obj->set($cache_id, $data, $sec_to_cache_for);
  7.     }
  8.     return $data;
  9. }

But what if the query that’s going to hit the database is pretty intensive and you don’t want more than one user to hit the db at a time? That’s easily handling via a semaphore lock.

  1. function get_my_data2() {
  2.     $cache_id = "mykey";
  3.     $data = $memcache_obj->get($cache_id);
  4.     if ( !$data ) {
  5.         // check to see if someone has already set the lock
  6.         $data_lock = $memcache_obj->get($cache_id . ‘_qry_lock’);
  7.         if ( $data_lock ) {
  8.             $lock_counter = 0;
  9.             // loop until you find that the lock has been released.  that implies that the query has finished
  10.             do while ( $data_lock ) {
  11.                 // you may only want to wait for a specified period of time.
  12.                 // one second is usually sufficient since your goal is to always have sub-second response time
  13.                 // if you query takes more than 1 second, you should consider "warming" your cached data via a cron job
  14.                 if ( $lock_counter > $max_time_to_wait ) {
  15.                     $lock_failed = true;
  16.                     break;
  17.                 }
  18.                 // you really want this to be a fraction of a second so the user waits as little as possible
  19.                 // for the simplicity of example, I’m using the sleep function.
  20.                 sleep(1);
  21.                 $data_lock = $memcache_obj->get($cache_id . ‘_qry_lock’);
  22.             }
  23.             // if the loop is completed, that either means the user waited for too long
  24.             // or that the lock has been removed.  try to get the cached data again; it should exist now
  25.             $data = $memcache_obj->get($cache_id);
  26.             if ( $data ) {
  27.                 return $data;
  28.             }
  29.         }
  30.         // set a lock for 2 seconds
  31.         $memcache_obj->set($cache_id . ‘_qry_lock’, true, 2);
  32.         $data = get_data_from_db_function();
  33.         $memcache_obj->set($cache_id, $data, $sec_to_cache_for);
  34.         // don’t forget to remove the lock
  35.         $memcache_obj->delete($cache_id . ‘_qry_lock’);
  36.     }
  37.     return $data;
  38. }

More below the break –> Read More…

Posted under mysql, PHP, Tips & Tricks, Web Development

This post was written by Michael Tougeron on January 11, 2008

Tags: , ,