Tag: dpc09

June 23, 2009

Scalable Web Applications

Purpose of the entry

On Saturday June 13th 2009 I attended a talk by Eli White on Scalable web applications. Eli White previously worked at digg.com and now holds the position PHP Community Manager & DevZone Editor-in-Chief at Zend Technologies. When you hear him talk you immediately notice he has a high knowledge on programming, good practices and he is eager to distribute his knowledge and does it with passion. That explains his position at Zend. Eli gave for what I am concerned the best talk at the conference. Of course I didn’t attend all talks but I can hardly imagine a talk more interesting than this. He gave valuable tips on how to implement new projects with small tips. He showed you how you could already take some small steps that would lead to great benefit if your application would turn out to be mass consumed. The following entry is my interpretation of the talk itself.

Overview

What is scalable application design
Tip 1: load balancing the webserver
Tip 2: scaling from a single DB server to a Master-Slave setup
Tip 3: Partitioning, Vertical DB Scaling
Tip 4: Partitioning, horizontal DB Scaling
Tip 5: Application Level Partitioning
Tip 6: Caching to get around your database
Resources
Closing notes

What is scalable application design

Scalable application design first means, there is a design for an application. So there is a start on the application itself. It is no use designing and thinking on how to create your killer application by trying to fit in as many good practices as possible, so that in the end you feel discouraged in even beginning to program. First step in creating an application is actually beginning. Do not overkill the project with great ideas which will take forever to implement or require you to study till you drop. Does this mean you do not have to study or apply good design principles? Of course not, you should on a permanent basis. It should become a habit, a second nature. It just means do not overdo it in the way you will never begin the project at all. To make it easy for you Eli gave some practical tips which you can apply immediately without any problems and which doesn’t require any real study or understanding of design principles.

Tip 1: load balancing the webserver

Load balancing is the act of dispatching a user his request towards one server that is part of a server farm. This way the load (generated by user requests) is distributed amongst several servers and your application will be able to respond quicker. There are some caveats to keep in mind when coding for a system that will eventually have load balancing.
Do not program depending on a cache that is written to the filesystem, do not rely on the filesystem at all. The reason why is straight forward. Because the requests are distributed amongst the servers the user his second request might end up on a different server where the cache is not available. So if you are using local caching (APC / Zend Server) avoid assuming exclusive/single cache. Most of us start with only having one server, so it might be a good idea to encapsulate the sessions, cache reliabilities. So if you switch to load balancing it is only a matter of changing the code in the encapsulation.

Tip 2: scaling from a single DB server to a Master-Slave setup

class DB {
    private $master;
    private $slave;
    private static $cfg = array(
        'write' => array('mysql:dbname=MyDB;host=213.136.52.29'),
        'read' =>  array('mysql:dbname=MyDB;host=213.136.52.30',
                         'mysql:dbname=MyDB;host=213.136.52.31',
                         'mysql:dbname=MyDB;host=213.136.52.32');
        );

    public function __construct()
    {
        $this->master = $this->_getConnection('write');
        $this->slave = $this->_getConnection('read');
    }

    public function query($query, $pool)
    {
        if('master' == $pool) {
            return $this->master->query($query);
        } else if('slave' == $pool) {
            return $this->slave->query($query);
        }
    }

    private function _getConnection($pool) {
        $max = count(self::$cfg[$pool]) - 1;
        $dsn = self::$cfg[$pool][mt_rand(0, $max)];
        return new PDO($dsn, USER, PASS);
    }
}

Adopting a Master-Slave setup is not available for the start-up developer. It is a costly thing, and most of us have a shared hosting account or one dedicated server. Does that mean you should not already implement it? Of Course not. You should already accommodate your code for this setup. And it can be done with little extra coding. Create an extra layer which can instantiate connections to different DB servers. As many as you want for Master and as many as you want for Slaves. Implement your layer that you have a query method which accepts an extra parameter. Everytime you query the database for a write action (insert, delete or update) you pass the extra param as ‘master’ and for all reads (select) ‘slave’. Then in your layer delegate the query to the specific server: master or slave. I only have one dedicated server so In my code I make sure both ‘master’ and ‘slave’ point to the same database connection. But as soon as my website gets heavy traffic I can add another DB server, and all I have to do is change one method and that is it. All my write queries go to the master which is replicated to the slave and all my select queries are done on the slave. You just made your application more scalable.
Is it really that simple? In fact it is.

$db->query('update article set comments += 1', 'master');
$db->query('select comments from articles', 'slave');

Of course you have to take several other issues into consideration. One of those is slave-lag.
Slave-lag is the time needed between an Insert, Delete, Update query on the master and the replication towards the slave. We are speaking miliseconds here, but it could be enough for you to mismatch the latest ID from your select statement. Write your code that it is not dependent on lag. Lag from any matter because if you really start to scale bigtime, you will scale over several data centers around the planet and then replicating one farm to another can take seconds, minutes…

Tip 3: Partitioning, Vertical DB Scaling

Sometimes tables can become to big when you talk about the number of columns they have. That will mean that with every action on the database the query will experience overhead. When a table of 15 columns has to be searched on one field and only 2 columns should be returned almost every time then it can make sense to split up the data in multiple tables. This will decrease the size of the tables and thus also the time and memory needed to query them. You can vertically partition your tables when one of the following situations presents itself:

Columns that are rarely used can move to their own table
Columns that are often empty
And columns that are not used in the where clause

Tip 4: Partitioning, horizontal DB Scaling

A table can have too much columns, but a table can also reach a point that it can have to much rows. In this case it might be wise to split the table in rowsets.
The splitting can be done on several ways. Here are the more common ones.

Range Based
Date Based
Interlaced
User Based

Depending on your application you can use any of the methologies described above. For a news site you might want to give users access to the archive with articles of last year. Mostly we only access the ones from this year. Date based would be a good way to go for such an application. Range based is when you have for instance 8 million users in your database and want to split it up per one million rowsets based on the userId or any other primary key. Interlaced means every row is switched between tables. First row goes to the first table, second to the second table, third to the third, fourth back to the first table and so on. And last but not least is user based. You can split upon username alphabet or any other characteristic, but remember all this horizontal scaling must be retrieved from your application code. So think hard on how you want to split. It is to no use if you split it up and you still have to query each and every table to find the data you are looking for. So put some thought in it, you need to be able to map it later on from your code.

Tip 5: Application Level Partitioning

Application level partitioning involves moving various tables of your DB onto different servers. These can be single tables or related tables so you still are able to join them in your query.

Tip 6: Caching to get around your database

Your performance can get a real boost with the right DB result caching. Always try to Write through cache in as close to the final processed form possible but choose small, discrete, reusable data units. Of course you should not be storing data you can’t recreate. Take special note on the discrete, reusable data units. These are units that can be re-used by other parts of your application or by the same part but on different requests. This way caching can be shared and no (extra) calls need to be made to the database.
For example Facebook has millions of users and those millions of users can keep track of hundreds of friends. If every status update would have to be queried from a database the application would be painfully slow. Facebook solved this issue with reusable data units. When you have 100 friends in your list facebook creates 100 small cached units for every friend individually. If one of your friends also has 100 frends and shares 50, Facebook can reuse 50 units and get the rest from the database. Another user might have 75 friends, 40 the same as you and 25 the same as your friend. 65 data units can be reused and only 10 have to be extracted from the database. AS you can see this way you can avoid a lot of calls. What happens when a user makes a status update? Well the status update is written towards the master Db and to the cached data unit of that specific user. No query is needed to retrieve the data. All application parts that use that unit are updated automatically. Only when the cache would turn out invalid a select query is done. I hope you see the real power in this kind of caching.

Resources

Closing notes

Most bottlenecks in the next hype is at database level. Distributing your data in databases accross multiple database servers is often the best and only solution. The problem is if you have to rewrite all your code to accommodate these changes across multiple servers it might take a lot of budget. Which can be easily prevented.
Of course you do not yet have these scaling problems. Hopefully you eventually will and that means you should already encapsulate your code that will handle the queries. By encapsulating you have a single spot where you will have to make the necessary changes (if any, because you could start developing with all these tips and tricks already in mind and implement them). For small applications the overhead that you create with for instance vertical partitioning is small. But the benefit is huge when your application is the next big thing. The caching is something you could already implement to the full extend.

Now it is up to you!
Have fun.
Nick Belhomme

June 17, 2009

Opening keynote by Andrei Zmievski at DPC09 – The evolution of PHP

Purpose of the entry

On Friday June 12th 2009 the Dutch PHP Conference started with a small introduction by Cal Evans, director of PCE at Ibuildings and previous editor-in-Chief of DevZone at Zend, Inc.
Cal welcomed us all at the third edition of what will become one of the major players concerning php conferences in Europe. The conference itself is packed with great talks on various subjects regarding programming. From the novice topics like the excellent talk from Ben Ramsey on “Grokking the REST Architecture” to the more advanced like “Trees in the Database: Advanced Data Structures” by Lorenzo Alberton. The latter I could unfortunately not attend, a man has got to make choices. I was attending Ben Ramsey’s second talk on REST because the first one was really really interesting. I did not have to choose which opening keynote to attend. There is only one per day and the first day was opened by Andrei Zmievski. He is a real entertainer. He brought the evolution of PHP, the new features in PHP 5.3 and PHP 6.0. He also said what PHP 7.0 will not be.
In this entry I will give an overview on the talk given by the main guy responsible for the official PHP releases.

Overview

What is PHP
PHP is mature and version 5.3 will arrive shortly
PHP 6.0 is coming and with it Unicode and traits
PHP7.0
Resources
Closing notes

What is PHP

PHP or Hypertext PreProcessor is a dirty language with a very low learning curve. This has been the success of the dominating web language. It is dirty because it was build upon release upon release. Never able to clean up some errors made in the beginning when the language was starting to evolve. This resulted in some function parameter inconsistencies and design issues that could have been done a different way. Because PHP took features from other languages it is a mix of various things. PHP is a ball of nails, it is not a full breed, but more like a mutt. But you got to love the mutt. It works brilliantly and does what it needs to do. And with every release it becomes more and more mature, more and more beautiful. Mutts often are stronger than full breeds also…

PHP is mature and version 5.3 will arrive shortly

Within a couple of weeks the long awaited PHP 5.3 version will be released as stable.
Like PHP 5.1 made a huge leap with the PDO integration so will 5.3 with namespaces. Andrei made some jokes about the decision to use backslashes as the namespace symbol. He did this with mails send to the mailing list and I must say it was hilarious. The message he wants to give to the community is: Yes we have used the \, stop whining about it and start adopting it. You have no choice. Decisions have to be made and we, the PHP Core developers have made ours. Personally I am sure namespaces will change the way programmers and framework architects develop their applications. Maybe not immediately but over the years they will.
Things like lambda functions and closures are also supported and I am positive people will start to adopt to this new functionality right away. I know I will.
As I said before PHP is a mutt and it has adopted a lot of features from other languages. Java is one of them. The tradition is continuous and with the introduction of phar files I am positive a lot of deployment will change. Phar allows you to pack your entire project into a single file, pretty much like a tar or zip file. You can browse and include these files from the Phar file from within your code as if it was a normal directory on your filesystem. You can also execute your application from within the file entirely. As you can imagine this gives big benefits on deployment where some files always have to be uploaded as a single pack to ensure the sequential uploading of files to avoid the resulting breaking of the application for a period of time. Or the way libraries and common applications like phpMyAdmin are distributed.
If you want to know more about lambda, closures, phars and namespaces there are some great resources available and I will give you a list at the end of this article.

PHP 6.0 is coming and with it Unicode and traits

Traits is PHP solution on the multiple inheritance problem. There is no diamond issue here and still you can use the functionallity of multiple classes from within a class. You will still be limited to inheritance from a single parent, and this is a good thing, but you can import (dynamically copy/paste) methods from a secondary or n+ other classes. Unicode will solve the internationalization problems, problems with some string functions and even allow chinese programmers to program in chinese. Do not expect Andrei or me for that same matter to debug such applications. 🙂

PHP7.0

PHP7.0 is coming and it will offer lots more of functionality and great stuff. What it will be nobody knows (maybe some do?), but what is known is that it will not be PHP designed from scratch. A version polished to perfection. The ultimate clean language. Such a language does not exist and if someone is eager to build it, it will not be called PHP. PHP is a powerful tool. If used in experienced hands you can develop applications with it that will astound everybody. Go and use PHP. And if you are using PHP love it, cherisch it and welcome the changes.

Resources

Closing notes

Andrei Zmievski is a wonderful speaker with lots of charisma. With his opening keynote he awoke my interest of using all the new functionalities future releases of PHP will offer. As a core PHP developer and leader of the PHP-GTK project he was not afraid to make various jokes on himself, the community and PHP. Which made the talk very lighthearted and easy to digest. One of the cool stuff was when he pointed to an ini setting called y2k_compliance in PHP3 which did nothing but made you feel safe. Like a placebo for PHP. That was so much fun!
His talk made me keen on using all the new PHP 5.3 features and made me look forward to using PHP 6.0. And with me a lot of other developers were feeling good at heart and ready to attend all the talks the dutch PHP conference had to offer. The stage was set to be one of the best content wise conferences I ever attended.

Happy coding from sunny Belgium,
Nick Belhomme

June 16, 2009

Zend Framework 1.8 Workshop at the Dutch PHP Conference 09

Purpose of the entry

On Thursday June 11th 2009 I attended a workshop on the very (dare I say the most) popular PHP framework: Zend Framework (ZF).
I will share my experience in this entry and try to give you a quick overview on the tips I found most interesting.

Overview

General experience
Components
Resources
Closing notes

General experience

If you follow Matthew Weier O’phinney’s blog and career you already know he is a real authority in his field. Today it is the Zend Framework. As the project leader at Zend Technologies, he is doing development and coaching his team towards an ever expanding and more stable framework. The workshop itself was not really a workshop at all but an extended presentation lapping several hours. This means that during this presentation no real hands-on coding was done by the attendees.

Matthew gave a high level presentation on the most common (and some new) components. Whenever an attendee had a question he was more than willing to explain it to a deeper detail. This makes him very
approachable and someone who is eager to help.

Components

Matthew discussed the new components Zend_Application and Zend_Tool. Both new in Zend Framework 1.8 and both seem to offer when understood completely an entire new user experience on how you setup your ZF projects. Zend_Application offers you a way to do bootstrapping on a more uniform way and Zend_Tool offers you tools to do the scaffolding for you. If you want more info on both of them please visit the official documentation or take a look at the presentation which I included at the end of this entry. Of course Zend_Auth and Zend_Acl were also discussed and in such a matter that it deviated from the way it was best practice in ZF 1.6 as described in the marvelous Zend Framework In Action book by Rob Allen. If you do not yet own this book hop over to Rob Allens webpage, Amazon or wherever you can get your hands on it. At the moment it is still one of the best books about ZF and offers some very valuable tips on the subject. The deviation illustrates that a framework like every other piece of code is organic and has the tendency to lead its own life. Which is actually a good thing, “grow or die” remember?

One of the main tips I got from the presentation was the use of the DAO (Data Access Object) principle.
Instead of adding your data access directly into your model you should add a DAO as a layer between your model and data resource (database, xml, csv, …). This way if you want to switch from a database to a webservice (think of scalability), you do not have to rewrite every single model but only the DAOs. If you want to know more about scalability I will be doing an entry on this based on the incredible presentation done by Eli White, PHP Community Manager & DevZone Editor-in-Chief at Zend Technologies.

Zend_Form was also covered and Matthew explained that it would be in the project it’s best interest to set the decorators in the view instead of the controllers/models. Decorators are a part of the display and thus should be handled in the view. He showed a practical example of how to style and display form elements separately in the view instead of doing the <?php echo $this->form; ?>. This is done by simply accessing the form element through the properties of the form. <?php echo $this->form->username; ?>

Of course other useful tips were shared in this presentation and I stronlgly encourage you to attend one of these whenever and wherever you have the chance to attend one. Because of the purpose of this entry I am not going any deeper into the subject. You can find Matthews presentation online .

Resources

Closing notes

I hoped you enjoyed reading this entry as much as I wrote it. I surely would recommend adopting a framework to your skills. ZF is a smart choice and the community is very very strong. In todays world if you do not know one or 2 of the major players out there like ZF or Symfony you are sure to run into trouble sooner or later regarding employment. For a more direct access to the community attend workshops like the one I just described or visit the #zftalk irc channel on irc.freenode.net

Sunny greetings from Belgium,
Nick Belhomme