Should you reinvent the wheel? Programming and cakes

The “Not invented here” mindset and it’s counterpart “Don’t reinvent the wheel” mindsets are probably causing thousands of discussions every day. Almost every week you see a question pop up in programming subreddits where people ask whether they should use a framework or a library for something or roll out their own solution.

To me the answer has always been rather obvious and easily explainable with a simple analogy. What if instead of programming you would be cooking a birthday cake. How would you approach it? Your options are pretty much the same:

  • Buy a ready made one – you definitely will get an okay-ish cake that way, or even a very good one if you are willing to spend more on it.
  • Google for a nice recipe – This approach really depends on how good a cook you are, since even the best recipe can be totally botched by bad execution. And you have no way of telling whether the recipe you found is good before you spend time and effort cooking it.
  • Buy a cook book – These recipes usually have much better quality than ones you find online, since the authors have their reputation to maintain and a such a book is in itself a professional product
  • Improvise and cook your own – Should obviously be attempted only when you have some substantial experience, know where to get a backup cake if you fail, or as a learning practice.

If I told you that I’m going to improvise a cake without any prior experience and use expensive ingredients for it, you’d say I was bound to fail and waste a lot of money in the process. Can you see the analogy now? This is what I hear when a person with no substantial experience says they rolled out their own framework, router, ORM etc.

On the other hand ready made recipes only get you so far. Their very nature of catering to the majority of people makes them lean towards being simple and easy. You won’t just google a recipe for this cake (yes it is cake and is completely edible):

1432276037_torty-agzamova-7

If you are a local website “bakery” with average experience your best bet is to use established recipes which yield good results honing your skills until you can do something more complex. But if you position yourself as a premium wedding cake baker you can’t just bring a regular strawberry cheesecake to a party.

Setting up a PHP project with PHPUnit, Travis-CI, CodeClimate and Packagist


A short tutorial I did explaining the process of setting up a PHP project with continuous testing with PHPUnit and Travis-CI and quality metrics with CodeClimate.

Service Behavior Immutability

One of the things I always mention when I rant about misusing events is behavior mutation. Since listeners can be attached and sometimes even detached dynamically during application runtime it frequently results in event hell and severely impedes debugging. After some thought I believe same reasoning can be applied to services too.

In a nutshell technically a service is pretty much any class that is being instanced by some DI mechanism. From an architectural point of view it provides a facade to some part of the systems functionality. Since an instance of a service is the same for all classes using it a misuse of it may make parts of the system transiently dependent on each other.

Let’s use a router as an example:

class Router
{
    public function route($request)
    {
        //returns some sort of Callable
    }

    public function addRoute(Route $route)
    {
        //adds a new route
    }
}

Parts of the system that rely on using the route() method are transiently dependent on those that add routes to it. Meaning you can never be 100% sure that calling route() with the same request twice will return the same result. So when debugging you also have to take int account not just the service itself but also all the places it could have been modified in.

To avoid this the best approach is to write only immutable services, meaning that a service should not have methods that would modify its behavior. Our routing example could be rewritten as follows:

class Router
{
    public function __construct(array $routes)
    {
        
    }
    
    public function route($request)
    {
        //returns some sort of Callable
    }
}

Such an approach still allows for a pluggable architecture. The only difference is that it forces you to do plugin initialization before the service is actually built. A good approach for this is to use the Facade pattern by building the Routing subsystem separately and then building a limited Router service to it.

A good example of this is Doctrines EntityManager which is the Doctrine service that is used the most. It provides a limited functionality and limits the user from doing some crazy stuff like defining new Entities on the fly.

Stop using PHPMyAdmin

It makes me cringe every time I see developers use PHPMyAdmin to administer their databases, even more if its a local database. This is a relic from the old times when people used it with their remote servers since desktop applications did not provide remote access functionality yet. Here are just a few things you have to consider if you still rely on it in your development stack:

  • Running PHPmyAdmin is a potential security vulnerability. Although security bugs are patched quickly server installations usually end up running the same version for years without an upgrade. At the same time desktop applications allow you to use SSH tunneling for a secure connection.
  • If you are not running it using SSL encryption you are vulnerable to a man in the middle attack. The attacker can easily read the entire database dump as you are downloading it from the server.
  • Leaving it idle for a couple of minutes results in a session timeout.
  • If you are managing multiple servers you have to maintain PHPMyAdmin on each of them.
  • You face timeouts when dealing with uploading large dumps or executing slow queries.
  • No database visualization tools, which are vital when inspecting databases with a large amount of tables
  • For managing local databases it has to many requirements, like a preconfigured virtual host etc.

At the same time there are so many better suited desktop applications, my favorites are:

  • SQLYog – The one with the most features. I especially like its database visualization and the ability to import/export CSV and Excel files. Their license is pretty expensive but there is a community edition available which has a comparable feature set. Although it’s Windows-only it runs perfectly fine behind Wine on Linux.
  • HeidiSQL – Another Windows-only tool, opensource and a more lightweight version. Recently I found myself using it more than SQLYog actually.
  • Sequel Pro – Is pretty much the best choice you have for a Mac. I used it only a couple of times but it comes with all the necessary tools.
  • MySQL Workbench – Like SQLYog but works on all platforms and is free.
  • Emma – Works natively on Linux, although the feature set is rather limited.

The bottom line is that you really have no excuses to continue using it anymore, it is a security vulnerability, lacks advanced features and has all the issues associated with running a web application.

Replacing controllers with middleware

Middleware is now a very popular topic in the PHP community, here are some of my thougts on the subject. First, let’s take a quick look at what middleware is ( if you already know about middleware you can skip this part):

Short intro

The idea behind it is “wrapping” your application logic with additional request processing logic, and then chaining as much of those wrappers as you like. So when your server receives a request, it would be first processed by your middlewares, and then after you generate a response it will also be processed by the same set:

Middleware

Middleware

It may sound complicated, but in fact it’s very simple if you look at some examples of what could be a middleware:

  • Firewall – check if requests are a allowed from a particular IP
  • JSON Formatter – Parse JSON post data into parameters for your controller. Then turn your response into JSON before sending ti back
  • Authentication – Redirect users who are not logged in to a login page

The coolest part of this is chaining. Since middlewares don’t know about each other it’s simple to find the ones you need and chain them together. And the best part is that after we get PSR-7 we can get sets of middleware that are decoupled from the frameworks and easily interpolable.

This is it for the quick intro, now here are my thoughts:

Replacing controllers
In the picture above notice the application kernel in the middle? My initial thought was: why not consider our application as middleware too ?. Indeed, controllers in our frameworks already read requests and return responses, so pretty much they are also middleware, just without the chaining. The other thing that differs controllers from middleware is tight coupling to the framework, apart from that they are the same. And here it dawned on me:

The application kernel in the above chart shouldn’t be our Controller, since when you follow some proper design rules it’s your models that contain application logic, not the controller. Which means next gen frameworks should dump the Controller concept entirely, and split everything to middleware layers

Problems
There are some problems with middleware though, the biggest on coming from framework independence. The amount of things you can do without utilizing the framework is actualy very small. Interpolable middleware would have no way to access your database, templating, etc. The only way to expose those things in an interpolable way would be for middleware to provide you with a set of required interfaces that you would hvae to satisfy. That’s cool, but it might be far too hard for Junior devs, and eventually not catch on.

Were we using middleware all this time?
All frameworks allow you to specify in your controllers some code that would be executed before and after the action execution, likw this:

class Controller
{
    function before()
    {
        //preprocess, check authorization, do redirects
    }

    function actionIndex()
    {
        //actual action
    }

    function after()
    {
        //postprocess, handle formatting, etc
    }
}

Well in that case your before()/after() has always been your middleware code. And if you wrote your controllers following the thin controller, fat model rule your actions are pretty much middlewares too, since all they do is format data receive from your model layer.

Let’s try inverting
Another issue middleware has that old-style controllers don’t is heavy reliance on configuration. There must be some config file present that will tell which middlewares to chain for a particular route. And what I learned over the years that it’s much better to write code instead of config. You can debug code, it’s harder to debug a misconfigured system. So here I thought, that if controllers and middleware are so simple, perhaps it’s posiible to reverse the idea and write controllers in a middleware fashion, consider this:

class Controller
{
    function actionIndex()
    {
        //assume that each middleware modifies
        //the request/response given
        if(!$this->auth->isLoggedIn($this-request));
            return $this->redirect($this->request);

        $this->json->processRequest($this-request);
        $response = /* call model layer and build a response */;
        $this->json->processResponse($this-response);
        return $response;
    }
}

I think the above is more readable, debuggable and understandable then chaining middleware in a configuration file. So maybe we don’t really need middleware, just better controller code? Maybe the whole point of middleware is to prevent programmers writing spaghetti code in their controllers ?

Is a HTTP Request enough?
The PSR-7 has one of it’s goals to enable interpolable middleware, but it bases its standard on an HTTP Request. The question is whether data in such a representation is enough to writ middlewares, what if you want to pass some additional request parameters around? In the JSON encode/decode example I mentioned earlier it doesn’t really sound like a very good idea to create a new request converting JSON data into POST form encoded data for the next middleware. This decoding/encoding part is an overhead, that I wish we could avoid. Wouldn’t it be better if it could just decode data and pass it like that?

What I’m thinking is perhaps a better idea would be to have a Request class that is more like a parameter bag, and has nothing to do with HTTP. This way it could be used for even CLI apps. The problem with it is how would it represent things like URLs and headers? I don’t know, but there must be a way.

You owe yourself that README file

Just a few days ago I have finally finished the PHPixie ORM library and wanted to release it immediately. I planned on writing only a small blog post outlining its basic usage, then switching to finishing off other PHPixie 3 components and only after that returning to writing detailed ORM docs. Then I remembered all the interesting projects I found on Github but have never used because they didn’t even have a README file.

There are a lot of developers that don’t care about writing tests and even more that don’t care about documentation, rather they expect users to report bugs and open issues for questions. But what really happens is their code and all their hard work is being ignored as a result.

Now, when your code is ready to ship, imagine yourself as you were writing it, think of all the sacrifices that guy has made to arrive at where you are now, remember the sleepless nights he has had and the times when he skipped on hanging out with his friends. You owe him to make his work not in vain, to make sure that when people see his library they try it out instead of moving on to the one that actually has a README file.

It is not enough to write the best code, it is as important to show how good it is. And no, a small description with 10 lines of example code is not enough, in fact it might be even worse, as it will give you some moral comfort and may prevent from writing some actual in depth documentation.

And this shouldn’t be a chore, treat is as putting icing on a cake. Even if you bake a perfect cake, if the icing looks bad nobody is going to buy it on the off chance that it might taste better. You docs should reflect all the time and effort that went into your project.

Impostor Software Architects

From all the different kinds of developers I met over the years there is one that I really hate. The impostor architect kind. They are an absolute plague to any developer environment and the community at large.

You can easily spot one by this quote:

I don’t like working with algorithms, optimizing the database and writing regular expressions. I love designing application architectures though,
making thing work together.

Sometimes the person will also mention that their code is SOLID and that they use patterns extensively, but will fail to explain even 10 of those and confess to have never had read the GoF book. Usually the also consider UML diagrams useless and unit tests a waste of time. Sure UML is useless if the apps you have been designing all your life have less than a hundred classes. Every time you get a new legacy project on your hands, don’t you wish you had a nice annotated class UML diagram of it? Imagine how many hours of debugging that would actually save you, and such architects are the reason you don’t have it.

In such cases usually the reason behind not liking things like writing performant SQL queries comes from lack of
required knowledge. Such problems require at least some theoretical background and actual experience while talking about
general application design is possible without them. In this case not liking is in fact a well-known coping mechanism, where a person tries to devalue something he or she doesn’t posses or cannot attain. For example I have some friends who really hated the IPhone until they actually got one themselves.

As an experiment try going to a programming IRC channel and asking a question about writing an A*-search algorithm. Usually
you will get useful responses that help you to get the job done. Later ask about how to better structure your code and you will
most likely start a small flame war and get your own opinion criticized to death.

I think the reason for this is the developer title inflation I blogged about earlier, that makes easy for people with little theoretical background to end up in charge of architecture design. Logically the person in charge of architecture should be the one who has a solid grasp on all components used and therefore can efficiently design their interaction.

You can draw a pretty accurate parallel with actual architects. You wouldn’t trust a guy who has been building shacks all his life, says he doesn’t like math and geometry, build a cathedral, would you? How about one that considers blueprints useless ?

Unit Tests are not enough

For the last half a year I have been refactoring the next version of PHPixie ORM and writing unit tests for it. My goal is to bring it to 100% coverage ( right now it’s at 97% ). But as others have already stated, 100% coverage doesn’t really mean there are no bugs in the code, all it means is that the components are behaving in the way you intended them to.

One huge problem with unit testing is that it may not detect wrong parameters in method calls. For example take a look at this method:

//Checks whether string $a contains string $b
public function contains($a, $b) {
    ....
}

Let’s say we have it successfully unit tested and continue to a different method that relies on contains():

//Checks whether string $string contains 'cat'
public function containsCat($string) {
    return $this->stringTools->contains($string, 'cat');
}

Now we unit test the containsCat() by mocking the contains() call. Our unit tests pass and all is great.

A week after that someone decides to modify contains($a, $b) by reordering the arguments. So instead of checking whether $a contains $b it will now check whether $b contains $a. He then fixes the tests for that method and it seems everything is ok. Except that now our containsCat() method is broken, since it passes arguments in the wrong order. Out unit test will not tell us that because the call to contains() has been mocked.

This issue is somewhat mitigated by using type hinting, at least then if you reorder parameters of different types you may get an error stating that. This is why I really want PHP 7 to get static type hints, but even then, as with the contains() example, you still are not safe.

That is why you also need integration and functional tests where you can check the whole system or a set of components working together. These tests are usually much easier to write then unit tests, since they require using actual dependencies and only minimal mocking. They also help you save more time, as unlike unit tests they rarely need to change after code refactoring.

Actually I came to a conclusion that you should start with having functional tests first and only then drilling down to writing unit tests. And perhaps if you manage to cover over 80% of your codebase with functional tests you may find it fitting to skip unit tests altogether in some cases. This is especially true for websites where having behavioral tests ( like Behat ) not only provides you with means of testing the actual pages rendered, but also acts as a spec for the entire system.

Dev Stories #1: Crazy Job Interviews

This is going to be the first episode of me telling some interesting stories from my development career.

Stop using PHP-FPM to argue using Nginx vs Apache

I often see “Apache vs Nginx” discussions appearing on reddit and some of the arguments people make are plain ridiculous. So now I want to address one that makes my eyes roll the post: PHP-FPM.

When Nginx first came into PHP world its popularity was mostly fueled by numerous benchmarks showcasing its speed vs a LAMP setup. You see Nginx didn’t have anything like Apaches’ mod_php and required the use of PHP-FPM, which indeed was a much faster way of processing PHP on multicore systems. The mistake people often did was to compare those setups and conclude that Nginx was just a better HTTP server.

Nginx is a great webserver, and its default setup is designed for performance, while the default Apache setup provides much more in terms of flexibility. But please don’t say that Nginx is better just because PHP-FPM is faster than mod_php, when you can easily setup Apache to use FPM too

One of the contributing reasons is that there is so many different configuration options in Apache that a person can easily misconfigure it. Apache has 3 MPMs: prefork, worker and event and even proxying request to the PHP-FPM server can be done in at least 3 entirely different ways. Nginx is easier to set up if you’re looking for a “good enough” solution, it’ll run pretty good out of the box.

At that time I thought people would eventually understand the actual difference and judge webservers on their own merits, but years passed and it’s still happening! So pretty much now we have 4 categories of developers:

  • People that use Apache because it comes with windows packages like WAMP and is very easy to setup on any Linux distro. These people don’t care how their PHP is executed at all
  • People that read a blog post that Apache isn’t cool anymore, installed Nginx with PHP-FPM and consider the first group to be inferior and less tech savvy.
  • People that are supporting a legacy app, are stuck with an old version of Apache ( maybe 1.3 even ), and think moving to Nginx would boost their performance sky-high
  • And there obviously is a portion of developers that actually have experience in both and can select the one that fits the task the best

The problem is that most of the “Apache vs Nginx comparison” posts are written by the first 3 groups of people. The fourth group has long ago realized that there is enough info on the internet to stop talking about it over and over.

The lessons you should take from this post are:

  • Try running Apache 2.4 with mpm_event and PHP-FPM using ProxyPass and see the results you get
  • If you have only a single core PHP-FPM won’t be faster than mod_php ( it’ll take a separate blog post to explain why )
  • In tech, never pick a single side, stay flexible
  • If you need performance use HHVM

© 2024 Dracony

Theme by Anders NorénUp ↑