Service Behavior Immutability

One of the things I always mention when I rant about misusing events is behavior mutation. Since listeners can be attached and sometimes even detached dynamically during application runtime it frequently results in event hell and severely impedes debugging. After some thought I believe same reasoning can be applied to services too.

In a nutshell technically a service is pretty much any class that is being instanced by some DI mechanism. From an architectural point of view it provides a facade to some part of the systems functionality. Since an instance of a service is the same for all classes using it a misuse of it may make parts of the system transiently dependent on each other.

Let’s use a router as an example:

class Router
{
    public function route($request)
    {
        //returns some sort of Callable
    }

    public function addRoute(Route $route)
    {
        //adds a new route
    }
}

Parts of the system that rely on using the route() method are transiently dependent on those that add routes to it. Meaning you can never be 100% sure that calling route() with the same request twice will return the same result. So when debugging you also have to take int account not just the service itself but also all the places it could have been modified in.

To avoid this the best approach is to write only immutable services, meaning that a service should not have methods that would modify its behavior. Our routing example could be rewritten as follows:

class Router
{
    public function __construct(array $routes)
    {
        
    }
    
    public function route($request)
    {
        //returns some sort of Callable
    }
}

Such an approach still allows for a pluggable architecture. The only difference is that it forces you to do plugin initialization before the service is actually built. A good approach for this is to use the Facade pattern by building the Routing subsystem separately and then building a limited Router service to it.

A good example of this is Doctrines EntityManager which is the Doctrine service that is used the most. It provides a limited functionality and limits the user from doing some crazy stuff like defining new Entities on the fly.

Replacing controllers with middleware

Middleware is now a very popular topic in the PHP community, here are some of my thougts on the subject. First, let’s take a quick look at what middleware is ( if you already know about middleware you can skip this part):

Short intro

The idea behind it is “wrapping” your application logic with additional request processing logic, and then chaining as much of those wrappers as you like. So when your server receives a request, it would be first processed by your middlewares, and then after you generate a response it will also be processed by the same set:

Middleware

Middleware

It may sound complicated, but in fact it’s very simple if you look at some examples of what could be a middleware:

  • Firewall – check if requests are a allowed from a particular IP
  • JSON Formatter – Parse JSON post data into parameters for your controller. Then turn your response into JSON before sending ti back
  • Authentication – Redirect users who are not logged in to a login page

The coolest part of this is chaining. Since middlewares don’t know about each other it’s simple to find the ones you need and chain them together. And the best part is that after we get PSR-7 we can get sets of middleware that are decoupled from the frameworks and easily interpolable.

This is it for the quick intro, now here are my thoughts:

Replacing controllers
In the picture above notice the application kernel in the middle? My initial thought was: why not consider our application as middleware too ?. Indeed, controllers in our frameworks already read requests and return responses, so pretty much they are also middleware, just without the chaining. The other thing that differs controllers from middleware is tight coupling to the framework, apart from that they are the same. And here it dawned on me:

The application kernel in the above chart shouldn’t be our Controller, since when you follow some proper design rules it’s your models that contain application logic, not the controller. Which means next gen frameworks should dump the Controller concept entirely, and split everything to middleware layers

Problems
There are some problems with middleware though, the biggest on coming from framework independence. The amount of things you can do without utilizing the framework is actualy very small. Interpolable middleware would have no way to access your database, templating, etc. The only way to expose those things in an interpolable way would be for middleware to provide you with a set of required interfaces that you would hvae to satisfy. That’s cool, but it might be far too hard for Junior devs, and eventually not catch on.

Were we using middleware all this time?
All frameworks allow you to specify in your controllers some code that would be executed before and after the action execution, likw this:

class Controller
{
    function before()
    {
        //preprocess, check authorization, do redirects
    }

    function actionIndex()
    {
        //actual action
    }

    function after()
    {
        //postprocess, handle formatting, etc
    }
}

Well in that case your before()/after() has always been your middleware code. And if you wrote your controllers following the thin controller, fat model rule your actions are pretty much middlewares too, since all they do is format data receive from your model layer.

Let’s try inverting
Another issue middleware has that old-style controllers don’t is heavy reliance on configuration. There must be some config file present that will tell which middlewares to chain for a particular route. And what I learned over the years that it’s much better to write code instead of config. You can debug code, it’s harder to debug a misconfigured system. So here I thought, that if controllers and middleware are so simple, perhaps it’s posiible to reverse the idea and write controllers in a middleware fashion, consider this:

class Controller
{
    function actionIndex()
    {
        //assume that each middleware modifies
        //the request/response given
        if(!$this->auth->isLoggedIn($this-request));
            return $this->redirect($this->request);

        $this->json->processRequest($this-request);
        $response = /* call model layer and build a response */;
        $this->json->processResponse($this-response);
        return $response;
    }
}

I think the above is more readable, debuggable and understandable then chaining middleware in a configuration file. So maybe we don’t really need middleware, just better controller code? Maybe the whole point of middleware is to prevent programmers writing spaghetti code in their controllers ?

Is a HTTP Request enough?
The PSR-7 has one of it’s goals to enable interpolable middleware, but it bases its standard on an HTTP Request. The question is whether data in such a representation is enough to writ middlewares, what if you want to pass some additional request parameters around? In the JSON encode/decode example I mentioned earlier it doesn’t really sound like a very good idea to create a new request converting JSON data into POST form encoded data for the next middleware. This decoding/encoding part is an overhead, that I wish we could avoid. Wouldn’t it be better if it could just decode data and pass it like that?

What I’m thinking is perhaps a better idea would be to have a Request class that is more like a parameter bag, and has nothing to do with HTTP. This way it could be used for even CLI apps. The problem with it is how would it represent things like URLs and headers? I don’t know, but there must be a way.

A checklist for framework developers

So, there have been some new frameworks being posted on reddit, with the expected result of receiving a lot of hate and criticism. I can kind of understand both sides here, since just about a year ago I’ve seen similar responses towards my PHPixie. Since then the flames calmed down and it has now a considerable userbase with 386 installs in the last 30 days. The important part here is to be able to take in constructive criticism and carry on making your project better. I’ve learned a few things the hard way as I went, and I’d like to share them with other people willing to release their brainchild to the world.

  • Tests – you must have those to be taken seriously. Even if it’s the first version, alpha or anything. The tests are the first indicator that you are putting real effort in
  • Find a niche – There has to be something about your framework that makes it stand out, and you have to focus on it. I really pushed a lot of effort behind the “great performance” argument. I also submitted it to Techempower Framework Benchmark Suite to prove it. Don’t ever state something while not having anything to back it up with
  • Read a book – there should be no singletons, god objects, shared state or anything like that in a modern app
  • Write docs – That includes writing docblocks, generating an API documentation from those docblocks, writing some tutorials and a nice about page.
  • Be prepared to compare it to existing frameworks – there has to be something better with your approach. If there isn’t don’t bother releasing it.
  • Use PSR-2 – I made the mistake of writing PHPixie in the style I liked and then had to rewrite it. The style must suite the majority, not just you. You’re not making it for yourself
  • Don’t expect any help or pull requests from the very start
  • Support your users as much as possible, at least install some forums script to be able to talk to them
  • Buy a domain and spend some time on site design. Make it look legit
  • Commit frequently – Your Github history has to indicate your project is not dead
  • Participate in the community – You can’t expect people to help you with your project if you ignore theirs
  • Don’t give up

If this sounds like too much work, just pick a smaller project. It’s better to make a smaller masterpiece than a big ball of mud.

PHP memory leaks and garbage collection

Historically memory leaks have not been a huge concern for PHP development. Since PHP processes exit after processing a request all used memory is deallocated. But that doesn’t mean we are free to not care about memory management altogether. If your application is processing several requests at the sime time, the total amount of wasted memory may add up to a hefty number.

With projects like ReactPHP become more popular and widely used PHP is entering the domain of permanently running processes and we all look forward to using those to speed up our applications. But for that to become a reality our applications and frameworks have to be carefully rewritten to prevent memory leaks, since even the smallest one can grow to eat all available memory in a span of a month.

PHP utilizes reference counting to decide when a object can be removed. Eachtime you assign an object to a new variable its refcount is increased by one


$a = new stdClass;
$c = $b = $a;
xdebug_debug_zval('a'); //refcount = 3
unset($b, $c);
xdebug_debug_zval('a'); //refcount = 1

When a refcount reaches 0 the memory is deallocated. This is the first thing you have to think about when instantiating multiple objects, e.g. your domain entities. You have to make sure that the references to them don’t hang around after they are no longer needed. One good rule of thumb for doing so is avoiding global state as much as possible. If all your object references are inside some local scope, it’s much easier to keep track on whats happening to them than when they are scattered all over the place.

Now let’s see wht happens here:


$a = new stdClass;
$b = new stdClass;

$a->b = $b;
$b->a = $a;

xdebug_debug_zval('a'); //refcount = 2
xdebug_debug_zval('b'); //refcount = 2

unset($a, $b);

After we unset both $a and $b variables we have no way of accessing those objects from our application anymore, but their refcount is not 0. Since what was $a still references $b and vice versa both their refcounts are 1, and they won’t be removed from memory. And that is what a memory leak is. In a perfect world we would try to avoid this as much as possible, but this might be hard to achieve. PHP tries to detect such unreachable objects using it’s garbage collector. When it’s called it looks through the memory for lopps in object references and deallocates them, thus saving us memory. But this process is not free, running it too often may slow down the execution time. The PHP manual states the impact is about 7% which is quite considerable.

I have found a nice trick that works for me to both have gc collection and get an almost insignificant performance impact: to run the garbage collection only after the end of each request. You can do it like so:


gc_enable();
gc_collect_cycles();
gc_disable();

Basically we are enabling garbage collection, running it once and disabling it again. This way we ensure that it runs only a single time per request.