Replacing controllers with middleware

Middleware is now a very popular topic in the PHP community, here are some of my thougts on the subject. First, let’s take a quick look at what middleware is ( if you already know about middleware you can skip this part):

Short intro

The idea behind it is “wrapping” your application logic with additional request processing logic, and then chaining as much of those wrappers as you like. So when your server receives a request, it would be first processed by your middlewares, and then after you generate a response it will also be processed by the same set:

Middleware

Middleware

It may sound complicated, but in fact it’s very simple if you look at some examples of what could be a middleware:

  • Firewall – check if requests are a allowed from a particular IP
  • JSON Formatter – Parse JSON post data into parameters for your controller. Then turn your response into JSON before sending ti back
  • Authentication – Redirect users who are not logged in to a login page

The coolest part of this is chaining. Since middlewares don’t know about each other it’s simple to find the ones you need and chain them together. And the best part is that after we get PSR-7 we can get sets of middleware that are decoupled from the frameworks and easily interpolable.

This is it for the quick intro, now here are my thoughts:

Replacing controllers
In the picture above notice the application kernel in the middle? My initial thought was: why not consider our application as middleware too ?. Indeed, controllers in our frameworks already read requests and return responses, so pretty much they are also middleware, just without the chaining. The other thing that differs controllers from middleware is tight coupling to the framework, apart from that they are the same. And here it dawned on me:

The application kernel in the above chart shouldn’t be our Controller, since when you follow some proper design rules it’s your models that contain application logic, not the controller. Which means next gen frameworks should dump the Controller concept entirely, and split everything to middleware layers

Problems
There are some problems with middleware though, the biggest on coming from framework independence. The amount of things you can do without utilizing the framework is actualy very small. Interpolable middleware would have no way to access your database, templating, etc. The only way to expose those things in an interpolable way would be for middleware to provide you with a set of required interfaces that you would hvae to satisfy. That’s cool, but it might be far too hard for Junior devs, and eventually not catch on.

Were we using middleware all this time?
All frameworks allow you to specify in your controllers some code that would be executed before and after the action execution, likw this:

class Controller
{
    function before()
    {
        //preprocess, check authorization, do redirects
    }

    function actionIndex()
    {
        //actual action
    }

    function after()
    {
        //postprocess, handle formatting, etc
    }
}

Well in that case your before()/after() has always been your middleware code. And if you wrote your controllers following the thin controller, fat model rule your actions are pretty much middlewares too, since all they do is format data receive from your model layer.

Let’s try inverting
Another issue middleware has that old-style controllers don’t is heavy reliance on configuration. There must be some config file present that will tell which middlewares to chain for a particular route. And what I learned over the years that it’s much better to write code instead of config. You can debug code, it’s harder to debug a misconfigured system. So here I thought, that if controllers and middleware are so simple, perhaps it’s posiible to reverse the idea and write controllers in a middleware fashion, consider this:

class Controller
{
    function actionIndex()
    {
        //assume that each middleware modifies
        //the request/response given
        if(!$this->auth->isLoggedIn($this-request));
            return $this->redirect($this->request);

        $this->json->processRequest($this-request);
        $response = /* call model layer and build a response */;
        $this->json->processResponse($this-response);
        return $response;
    }
}

I think the above is more readable, debuggable and understandable then chaining middleware in a configuration file. So maybe we don’t really need middleware, just better controller code? Maybe the whole point of middleware is to prevent programmers writing spaghetti code in their controllers ?

Is a HTTP Request enough?
The PSR-7 has one of it’s goals to enable interpolable middleware, but it bases its standard on an HTTP Request. The question is whether data in such a representation is enough to writ middlewares, what if you want to pass some additional request parameters around? In the JSON encode/decode example I mentioned earlier it doesn’t really sound like a very good idea to create a new request converting JSON data into POST form encoded data for the next middleware. This decoding/encoding part is an overhead, that I wish we could avoid. Wouldn’t it be better if it could just decode data and pass it like that?

What I’m thinking is perhaps a better idea would be to have a Request class that is more like a parameter bag, and has nothing to do with HTTP. This way it could be used for even CLI apps. The problem with it is how would it represent things like URLs and headers? I don’t know, but there must be a way.

You owe yourself that README file

Just a few days ago I have finally finished the PHPixie ORM library and wanted to release it immediately. I planned on writing only a small blog post outlining its basic usage, then switching to finishing off other PHPixie 3 components and only after that returning to writing detailed ORM docs. Then I remembered all the interesting projects I found on Github but have never used because they didn’t even have a README file.

There are a lot of developers that don’t care about writing tests and even more that don’t care about documentation, rather they expect users to report bugs and open issues for questions. But what really happens is their code and all their hard work is being ignored as a result.

Now, when your code is ready to ship, imagine yourself as you were writing it, think of all the sacrifices that guy has made to arrive at where you are now, remember the sleepless nights he has had and the times when he skipped on hanging out with his friends. You owe him to make his work not in vain, to make sure that when people see his library they try it out instead of moving on to the one that actually has a README file.

It is not enough to write the best code, it is as important to show how good it is. And no, a small description with 10 lines of example code is not enough, in fact it might be even worse, as it will give you some moral comfort and may prevent from writing some actual in depth documentation.

And this shouldn’t be a chore, treat is as putting icing on a cake. Even if you bake a perfect cake, if the icing looks bad nobody is going to buy it on the off chance that it might taste better. You docs should reflect all the time and effort that went into your project.

Impostor Software Architects

From all the different kinds of developers I met over the years there is one that I really hate. The impostor architect kind. They are an absolute plague to any developer environment and the community at large.

You can easily spot one by this quote:

I don’t like working with algorithms, optimizing the database and writing regular expressions. I love designing application architectures though,
making thing work together.

Sometimes the person will also mention that their code is SOLID and that they use patterns extensively, but will fail to explain even 10 of those and confess to have never had read the GoF book. Usually the also consider UML diagrams useless and unit tests a waste of time. Sure UML is useless if the apps you have been designing all your life have less than a hundred classes. Every time you get a new legacy project on your hands, don’t you wish you had a nice annotated class UML diagram of it? Imagine how many hours of debugging that would actually save you, and such architects are the reason you don’t have it.

In such cases usually the reason behind not liking things like writing performant SQL queries comes from lack of
required knowledge. Such problems require at least some theoretical background and actual experience while talking about
general application design is possible without them. In this case not liking is in fact a well-known coping mechanism, where a person tries to devalue something he or she doesn’t posses or cannot attain. For example I have some friends who really hated the IPhone until they actually got one themselves.

As an experiment try going to a programming IRC channel and asking a question about writing an A*-search algorithm. Usually
you will get useful responses that help you to get the job done. Later ask about how to better structure your code and you will
most likely start a small flame war and get your own opinion criticized to death.

I think the reason for this is the developer title inflation I blogged about earlier, that makes easy for people with little theoretical background to end up in charge of architecture design. Logically the person in charge of architecture should be the one who has a solid grasp on all components used and therefore can efficiently design their interaction.

You can draw a pretty accurate parallel with actual architects. You wouldn’t trust a guy who has been building shacks all his life, says he doesn’t like math and geometry, build a cathedral, would you? How about one that considers blueprints useless ?

Unit Tests are not enough

For the last half a year I have been refactoring the next version of PHPixie ORM and writing unit tests for it. My goal is to bring it to 100% coverage ( right now it’s at 97% ). But as others have already stated, 100% coverage doesn’t really mean there are no bugs in the code, all it means is that the components are behaving in the way you intended them to.

One huge problem with unit testing is that it may not detect wrong parameters in method calls. For example take a look at this method:

//Checks whether string $a contains string $b
public function contains($a, $b) {
    ....
}

Let’s say we have it successfully unit tested and continue to a different method that relies on contains():

//Checks whether string $string contains 'cat'
public function containsCat($string) {
    return $this->stringTools->contains($string, 'cat');
}

Now we unit test the containsCat() by mocking the contains() call. Our unit tests pass and all is great.

A week after that someone decides to modify contains($a, $b) by reordering the arguments. So instead of checking whether $a contains $b it will now check whether $b contains $a. He then fixes the tests for that method and it seems everything is ok. Except that now our containsCat() method is broken, since it passes arguments in the wrong order. Out unit test will not tell us that because the call to contains() has been mocked.

This issue is somewhat mitigated by using type hinting, at least then if you reorder parameters of different types you may get an error stating that. This is why I really want PHP 7 to get static type hints, but even then, as with the contains() example, you still are not safe.

That is why you also need integration and functional tests where you can check the whole system or a set of components working together. These tests are usually much easier to write then unit tests, since they require using actual dependencies and only minimal mocking. They also help you save more time, as unlike unit tests they rarely need to change after code refactoring.

Actually I came to a conclusion that you should start with having functional tests first and only then drilling down to writing unit tests. And perhaps if you manage to cover over 80% of your codebase with functional tests you may find it fitting to skip unit tests altogether in some cases. This is especially true for websites where having behavioral tests ( like Behat ) not only provides you with means of testing the actual pages rendered, but also acts as a spec for the entire system.

Stop using PHP-FPM to argue using Nginx vs Apache

I often see “Apache vs Nginx” discussions appearing on reddit and some of the arguments people make are plain ridiculous. So now I want to address one that makes my eyes roll the post: PHP-FPM.

When Nginx first came into PHP world its popularity was mostly fueled by numerous benchmarks showcasing its speed vs a LAMP setup. You see Nginx didn’t have anything like Apaches’ mod_php and required the use of PHP-FPM, which indeed was a much faster way of processing PHP on multicore systems. The mistake people often did was to compare those setups and conclude that Nginx was just a better HTTP server.

Nginx is a great webserver, and its default setup is designed for performance, while the default Apache setup provides much more in terms of flexibility. But please don’t say that Nginx is better just because PHP-FPM is faster than mod_php, when you can easily setup Apache to use FPM too

One of the contributing reasons is that there is so many different configuration options in Apache that a person can easily misconfigure it. Apache has 3 MPMs: prefork, worker and event and even proxying request to the PHP-FPM server can be done in at least 3 entirely different ways. Nginx is easier to set up if you’re looking for a “good enough” solution, it’ll run pretty good out of the box.

At that time I thought people would eventually understand the actual difference and judge webservers on their own merits, but years passed and it’s still happening! So pretty much now we have 4 categories of developers:

  • People that use Apache because it comes with windows packages like WAMP and is very easy to setup on any Linux distro. These people don’t care how their PHP is executed at all
  • People that read a blog post that Apache isn’t cool anymore, installed Nginx with PHP-FPM and consider the first group to be inferior and less tech savvy.
  • People that are supporting a legacy app, are stuck with an old version of Apache ( maybe 1.3 even ), and think moving to Nginx would boost their performance sky-high
  • And there obviously is a portion of developers that actually have experience in both and can select the one that fits the task the best

The problem is that most of the “Apache vs Nginx comparison” posts are written by the first 3 groups of people. The fourth group has long ago realized that there is enough info on the internet to stop talking about it over and over.

The lessons you should take from this post are:

  • Try running Apache 2.4 with mpm_event and PHP-FPM using ProxyPass and see the results you get
  • If you have only a single core PHP-FPM won’t be faster than mod_php ( it’ll take a separate blog post to explain why )
  • In tech, never pick a single side, stay flexible
  • If you need performance use HHVM

Developer title inflation

The laws of Economics apply to all kinds of human relations and it’s principles can be easily extrapolated to explain a great number of things, including developer titles. This is a joke I heard some time ago:

– How can a Junior developer become a Senior one?
– Simple. Just change the job twice

Sad, but true, and there are a great number of things that contribute to this.

Demand-pull inflation
The friendly HR girl that handles the hiring process often has only a vague idea on who she is actually looking for. She’s been given a checklist by the CTO and wants to do her job well by finding someone fitting as fast as possible. Why not let a guy with 2 years of experience a chance for the Senior dev vacancy? Especially if she represents a small agency that doesn’t get that many applicants in the first place.

The CTO may decide that even though the person he just interviewed isn’t quite the Senior they’re looking for he still could hire him for 80% of the salary, but the title stays. Titles, unlike money, cost the company nothing.

People conducting the interview use themselves as a standard
If the developers conducting the interview aren’t very good themselves, they are likely to also overestimate the interviewee. In fact they will do everything not to hire a person who is more knowledgeable than themselves, so that not to shake their position of power.

It’s getting very easy to develop
We have a multitude of libraries and frameworks available today. Becoming “good-enough” to string those together and make a CRUD app is no challenge at all. This combined with that a lot of people consider a developer that can throw together a website on his own a mid-level already means that the Junior title is pretty much skipped entirely, and people rarely consider themselves Junior PHP devs for longer than half a year.

Titles don’t get revoked
This is the worse kind of inflation to me. This happens when people who were genuinely awesome 4 years ago stopped learning new stuff, stick to old practices and put ‘hipster’ label on everything new. In PHP those are the kind of people who think namespaces suck, Composer is complicated and testing is just wasting time. They proudly state their 10+ years of experience, while actually being harmful to the team. Truth be told, their experience does come in handy when it comes to architecture sometimes. But development is so rapidly evolving that if you miss just 2 years you’re probably far behind the bandwagon.

Titles are rarely specific
Being a “Senior WordPress developer” doesn’t make you a “Senior PHP developer”, and somewhat vice versa actually.

The terrible consequences
There was a discussion on reddit recently that discussed questions that should be asked when interviewing a Senior developer. I was surprised at how trivial those were. “Knowing weak and strong points” of current ORMs is something a mid-level dev should be easily able to do. What happened to knowing algorithms, data structures, patterns, extensive database knowledge, cryptography etc. What do you call a person who knows all that then ?. You can’t put those on the same spot as the guy who can choose between ORMs. Perhaps we need more titles, maybe we need to start calling ourselves “exalted PHP developers of the 5th rank” from now on. But the worst part is that people who already consider themselves to be Senior stop learning, and a person who doesn’t learn constantly will never notice how ignorant he in fact is. And one day on your first day in a new company you may find out that you will now be lead by people much less experienced than you, and every architectural decision is going to be a battle between your knowledge vs their ignorance. And that frankly sucks.

TL;DR Be modest and learn every day.

A checklist for framework developers

So, there have been some new frameworks being posted on reddit, with the expected result of receiving a lot of hate and criticism. I can kind of understand both sides here, since just about a year ago I’ve seen similar responses towards my PHPixie. Since then the flames calmed down and it has now a considerable userbase with 386 installs in the last 30 days. The important part here is to be able to take in constructive criticism and carry on making your project better. I’ve learned a few things the hard way as I went, and I’d like to share them with other people willing to release their brainchild to the world.

  • Tests – you must have those to be taken seriously. Even if it’s the first version, alpha or anything. The tests are the first indicator that you are putting real effort in
  • Find a niche – There has to be something about your framework that makes it stand out, and you have to focus on it. I really pushed a lot of effort behind the “great performance” argument. I also submitted it to Techempower Framework Benchmark Suite to prove it. Don’t ever state something while not having anything to back it up with
  • Read a book – there should be no singletons, god objects, shared state or anything like that in a modern app
  • Write docs – That includes writing docblocks, generating an API documentation from those docblocks, writing some tutorials and a nice about page.
  • Be prepared to compare it to existing frameworks – there has to be something better with your approach. If there isn’t don’t bother releasing it.
  • Use PSR-2 – I made the mistake of writing PHPixie in the style I liked and then had to rewrite it. The style must suite the majority, not just you. You’re not making it for yourself
  • Don’t expect any help or pull requests from the very start
  • Support your users as much as possible, at least install some forums script to be able to talk to them
  • Buy a domain and spend some time on site design. Make it look legit
  • Commit frequently – Your Github history has to indicate your project is not dead
  • Participate in the community – You can’t expect people to help you with your project if you ignore theirs
  • Don’t give up

If this sounds like too much work, just pick a smaller project. It’s better to make a smaller masterpiece than a big ball of mud.

PHP memory leaks and garbage collection

Historically memory leaks have not been a huge concern for PHP development. Since PHP processes exit after processing a request all used memory is deallocated. But that doesn’t mean we are free to not care about memory management altogether. If your application is processing several requests at the sime time, the total amount of wasted memory may add up to a hefty number.

With projects like ReactPHP become more popular and widely used PHP is entering the domain of permanently running processes and we all look forward to using those to speed up our applications. But for that to become a reality our applications and frameworks have to be carefully rewritten to prevent memory leaks, since even the smallest one can grow to eat all available memory in a span of a month.

PHP utilizes reference counting to decide when a object can be removed. Eachtime you assign an object to a new variable its refcount is increased by one


$a = new stdClass;
$c = $b = $a;
xdebug_debug_zval('a'); //refcount = 3
unset($b, $c);
xdebug_debug_zval('a'); //refcount = 1

When a refcount reaches 0 the memory is deallocated. This is the first thing you have to think about when instantiating multiple objects, e.g. your domain entities. You have to make sure that the references to them don’t hang around after they are no longer needed. One good rule of thumb for doing so is avoiding global state as much as possible. If all your object references are inside some local scope, it’s much easier to keep track on whats happening to them than when they are scattered all over the place.

Now let’s see wht happens here:


$a = new stdClass;
$b = new stdClass;

$a->b = $b;
$b->a = $a;

xdebug_debug_zval('a'); //refcount = 2
xdebug_debug_zval('b'); //refcount = 2

unset($a, $b);

After we unset both $a and $b variables we have no way of accessing those objects from our application anymore, but their refcount is not 0. Since what was $a still references $b and vice versa both their refcounts are 1, and they won’t be removed from memory. And that is what a memory leak is. In a perfect world we would try to avoid this as much as possible, but this might be hard to achieve. PHP tries to detect such unreachable objects using it’s garbage collector. When it’s called it looks through the memory for lopps in object references and deallocates them, thus saving us memory. But this process is not free, running it too often may slow down the execution time. The PHP manual states the impact is about 7% which is quite considerable.

I have found a nice trick that works for me to both have gc collection and get an almost insignificant performance impact: to run the garbage collection only after the end of each request. You can do it like so:


gc_enable();
gc_collect_cycles();
gc_disable();

Basically we are enabling garbage collection, running it once and disabling it again. This way we ensure that it runs only a single time per request.