The lie of independence - or - how I learned to stop worrying and love the dependency chain.

user-pic

Wiki Extras for this post

Every so often I see a blog post or a comment that is along the lines of "Don't use X, it has too many dependencies, use Y instead, it doesn't depend on anything." I've touched on this subject before, but in different ways. I've seen this subject crop up again recently and think it warrants another look.

This time it was oriented towards Rails and Catalyst, but I've seen it on other projects as well. The question behind the above statement is somewhat deceiving. It seems to be: 'Do I choose a system that has lots of dependencies, or one that has only a few (or none)?' Well, when you put it like that, it seems like a no brainer... but it is, in reality, a trick question.

The truth is that no modern software stands on its own. There are literally hundreds of small tasks that need to be accomplished for every web request in a modern application. Decoding headers, parsing values, communicating with the web server, etc. All of these things need to be done. The question, really, is not whether you have lots of dependencies. The question is, are those dependencies internal or external.

So what's the difference between something with 'internal dependencies' and something with 'external dependencies'? The difference is that the 'internal dependency' projects try to do all those small tasks themselves, where the 'external dependency' projects delegate those small tasks to external modules specifically built for those particular tasks.

The practical differences between these two are subtle, but important. It's a difference in viewpoint for one. Those in the 'external dependency' camp tend to prefer to focus on the problem they are trying to solve, and use tools to solve the little tasks that are built for solving those little tasks. The 'internal dependency' camp tends to have a more 'do it myself so I can get exactly what I want' type view. Both of these views have merit in certain situations, but ultimately I think the 'external dependency' camp wins for all practical purposes and I'll explain why.

One of the biggest wins for the 'external dependency' camp is that they are focused on their own particular problem and spend less time Yak Shaving, writing Yet Another Templating Language, etc.

Another important fact is that when a module is published on its own, it has to stand up on its own. This generally means more documentation and clearly defined interface. Interfaces are important because they are the contract between the code being used and the application which is using it. A clearly defined API makes the code that uses a module MORE solid, as it is clear what the expectations are on both sides.

Documentation solidifies APIs. When an API is documented, it becomes public and it adds weight to the code, making it harder to modify. When it's documented and published, it gets even more solid. It's like a sign that says 'don't pull this rug, someone might be standing on it'.

When the same small task is accomplished by some chunk of code in some sub-module within the main application, it is often without a clearly defined API and documentation. This doesn't seem like a big deal until you realize that it usually means that the application author feels like he can adjust the calling semantics or return values to work better in the newest release of ApplicationFramework vWhizBang.42. When that happens, any code that you wrote that used that functionality suddenly breaks. If you are lucky the change is documented somewhere so you can find out why your application doesn't work anymore. Most of the time you are not lucky.

Which brings us to the next point; the author of the 'small task' module has incentive to fix the problem without disrupting the API. As I mentioned, once documented and published, a module becomes solid. As soon as it's out there, people start depending on it, and not just the people who wrote it. This forces the author to think hard about making a change to the API. This is not motivated by some fuzzy 'goodness' feeling toward other people, it's simply the background knowledge that 'boy, if I change this, I am going to get swamped with emails and complaints'. All of this means that the author has a reason to fix whatever problem it is without disrupting the API... even if it's a little harder to do.

Another aspect of this is that when a module is published, it gets used by more than one application. It almost always starts getting used for things that the original author didn't intend or didn't think of. Any seasoned developer knows, the vast majority of bugs relate to the code being used in ways the author didn't plan for.

What this means is that those bugs are exposed more quickly. When your code is deployed by many different people in many different ways, you have a much larger group of people stress testing and reporting bugs. This means that published modules tend to have their issues identified and corrected more quickly than an unpublished piece of code within another distribution.

A module that is used in many different applications also has the benefit of expanding its feature-set more quickly. Different applications and frameworks grow and develop at different rates and in different areas. When new requirements come up, the self-contained module will likely be updated to better support it. This will happen, often, without a disruption in the API.

The benefit of this, in its simplest terms, is that when you realize that your application requires some additional functionality, often you will find that the module has already gained that functionality. In the 'internal dependency' camp, in the same situation you instead get to have a nice Yak Shaving Holiday, away from working on your core feature set.

In short, having dependencies allows you to focus on your core functionality... the features that are most important to you. All the while those who wrote your dependency modules are focused on their own core functionality, that which is most important to them. It means that each small component has someone focused on making it as good and as solid as it can be.

There is another aspect to all of this that must be discussed. When your application doesn't include all its small tasks within itself, you are actually MORE able to respond to security and other problems. Most of the time, a security or bug-fix type issue can be isolated to a particular module. Because the API is defined and documented, usually that module can be updated independent of the rest of the application. The bug can usually be fixed without disrupting the rest of the code.

There is another subtle effect here. Because each small-task module is versioned independently, they only need to be updated when they have a problem that needs fixing or when your application requires functionality that is new to that module. This means that you can update your Cookie handling module without needing to update your form handling module.

In the 'internal dependency' camp, all the small-task code is part of the larger package. Now, if you need to update, you have a much bigger problem. Instead of just isolating and adjusting the points where the Cookie module is used, you have to look at every change made to the larger package across the board between the version you have deployed and the version that contains your Cookie bugfix. You then have to go through your code and adjust every other interaction with code that has changed whether it's related to your Cookie bugfix or not. This means every update requires much more work.

It also means that with internal dependencies, when your application is left unmodified for a while, you have a much bigger task when it comes time to bring it up to date... and you have to do so all at once, rather than working with one package update at a time.

All of these things demonstrate that ultimately, external dependencies are the way to go. It lets you focus on your core features and gives you a much more solid base to stand on as you build your own functionality.

All of this said, there are still places where internal dependency options are better. Examples of these might be embedded devices or areas where you have little or no control over your deployment environment such as shared virtual hosts, etc.

Fortunately the world of application deployment has progressed beyond shared virtual-hosts. We now have options such as cloud servers and other virtual environments that give us flexible machines where we get to decide what is deployed and how. For most of us, using external modules is a no-brainer.

Waiting for many modules to install can be annoying. Ultimately, when you realize the real benefits of using external dependencies, you realize that the more you stand on the shoulders of others, the better off you will be.

No TrackBacks

TrackBack URL: http://www.catalyzed.org/mt/mt-tb.fcgi/57

3 Comments

| Leave a comment

I think these were exactly the same conclusions the guys at Bell Labs came to when they made UNIX all those years ago.

Very well said. I have nothing really to add because I think you covered almost everything in there. This post is going into my list of links to refer people too when they complain about Moose having dependencies. Thanks for the great post.

Makes sense to me! :-) Very helpful article.

Leave a comment

All comments are moderated. Spammers don't waste your time

Sponsored By


Ionzero: Rescue your dev project.
OpenID accepted here Learn more about OpenID

Following

Not following anyone

Note to spammers: all comments are moderated. Don't waste your time