2005-12-21

What I Want from a Web Framework

As of today, it's been exactly six years since I started programming in PHP. When I started, the major players for writing web applications (to the best of my knowledge at the time) were CGI scripts written in whatever 1, mod_perl, and PHP, which was on version 3 and just getting started as a web-development language. PHP was free, open-source, and easy to get running under Apache. PHP also had good library support, which was important for interfacing in a clean way with our database at the time, Oracle 8i. PHP wasn't, and isn't, a framework (as I understand the meaning of the term). Frameworks, (such as the then-contemporary ArsDigita Community System) weren't really an option for us, since we were attacking the massive virtual hosting problem with a single set of PHP scripts that assumed the form of the website in question by, in essence, looking up the hostname in the database and spitting out HTML accordingly. For the problem we were trying to solve (in the way we were trying to solve it), it's doubtful an out-of-the-box framework would've done the trick. Still, by writing what in some sense was a framework (albeit one with an installed base of one system targeted toward a handful of markets), I feel like I have a reasonably well-thought-out wishlist of what I'd like to see if I were creating a new web app from the ground up. So here goes: First off, some prerequisites. Some of these are givens, but others not even longtime players like PHP achieve. But this is a basic list of things that a web language/framework needs to include to even step up to the plate: ease of getting started, cross-platform capability, free and open-source licensing, decent libraries (with some caveats) and built-in support of international character sets.
  1. If a web framework isn't easy to set up, very few people will take the time to try it. It's precisely the "what should I write this in if I just want to get it running quickly" crowd that is needed to make a language succeed. These quick prototypes often quickly accrete into larger projects, which then turn into companies (cf. PHP, for a meta-example). Also, unless there are obvious and compelling features to make me want to try a framework or language, I'm probably only going to try those that are easy to experiment in.
  2. Second, good cross-platform support for a language or framework is essential. One of the reasons the reddit guys switched from Lisp to Python was that the Lisp on their dev system didn't match the Lisp on their production system. I like to do development on my iBook, but for various performance reasons, Mac OS X isn't a particularly good server OS yet. So it would be nice to be able to deploy to Linux or FreeBSD for production use.
  3. A related reason the reddit guys switched was that although there are Lisps that work on both MacOS X and FreeBSD, they are all commercial, and I assume, closed-source products. This ties into ease of experimentation as well. If I need to buy a license or deal with a crippleware free version, I'm less likely to try a product. Also, open source is a pretty-well proven way to develop a platform, if not consumer-oriented software like word processors.
  4. Library support has historically been quite important as well. I'm on the fence whether an RDBMS is going to be the long-term winner for persistent storage for web apps, but at the moment it's pretty important. And lacking first-class support for things like bind variables is a pretty big unfeature, so the libraries have to be complete and up-to-date. And library support adds the sorts of candy that gets people to try your environment in the first place.
  5. Lastly, (and I'm talking to you, PHP), transparent and built-in support for international character sets is an important feature. Heck, if internationalization in general could be cleanly integrated into the framework, so much the better.
Now onto my personal preferences. First in the list would be a clean, powerful language. This is another area where PHP falls short. I'd like to be able to have something like this work:
<?

function that_returns_an_array()
{
        return array ('foo' => 1, 'bar' => 2);
}

echo that_returns_an_array()['bar'];

?>
It should print 2 but it doesn't even parse; a little more closure would be nice. Having to type "array" to define a hash is also a little annoying; something that looks a little cleaner couldn't hurt. And I understand functional programming is supposed to have some kind of salubrious effects, so maybe a language that lends itself to functional programming would be a Good Thing. One thing I do like about PHP is it's inside-out nature. There are all sorts of ways to shoot yourself in the foot with this sort of architecture, but it sure helps with point 1 if you can mock up a site in HTML and then add the active portions in place. Related to this, however, is a pet peeve of mine: the profusion of template languages. I can think of three arguments for the existence of templating languages:
  1. Web designers are scared of programming languages
  2. We need to separate presentation from content
  3. Bad people can do bad things in the native language
The first argument, I think, is bogus. Most sufficiently powerful templating languages are at least as complicated as the equivalent constructs in, say PHP, and if you're in something that's effectively a template language, why spend a bunch of processor cycles str_replace'ing tags?2 . The second argument has largely fallen by the wayside with the standardization and widespread adoption of CSS, and it doesn't even really solve the problem of internationalization. The third argument I'll deal with below, but basically an ideal web framework would use a capability model that would render it moot. Which brings me to the only possibly-original idea in this post: using something along the lines of a capability architecture to provide transparent security. The current state of the art seems to be to use an RDBMS for persistent storage and to use SQL to query and manipulate data. Typically a connection with full privileges is made to the database, and the so-called business logic in the Web application or framework decides what level of access is afforded to the end-user. The problem with this arrangement is that the integrity of the database is potentially no greater than the attention paid to security in the most poorly-written piece of software on the server. Furthermore if someone manages to sneak a script onto the server, the damage done is limited only by the permissions of the Web server process. If the process has some kind of built-in persistent connection to the database, the security of the database is equally compromised. In the ideal case, the web framework would have user accounts with fine-grained privileges (possibly including a generic "guest" account with read permission on public areas), and those privileges would trickle down to the lowest levels of the execution of the script. For instance if Joe has write permissions on all appointments belonging to Joe, Joe wouldn't be able to delete Mary's appointments even if poorly-written code would otherwise allow it. The script that Joe manages to sneak onto the server (itself a much less likely scenario under this architecture) still wouldn't allow him to query or manipulate any data that the Joe account isn't authorized to access. I'm not sure if the user accounts on a UNIX machine, much less an off-the-shelf RDBMS are robust and scaleable enough to be able to handle one "user" per user, and it would be interesting to see if any existing frameworks use this approach or something similar. Even then I'm not sure if most RDBMSes have good row-level access control. The second newish feature I'd like to see I haven't thought through well enough to describe in detail. But basically it would come down to (more) transparent persistence for user data. This is somewhat problematic in a stateless architecture like the Web, but it would be neat if I could simply flag a variable as persistent and have it stick around between requests. And not only that, but have it be the first-class way of storing data between requests. This could, of course, be shoehorned into a RDBMS and/or filesystem, but an integrated approach might be superior. If we combine these two, then the use of UNIX user accounts and native RDBMS users sounds increasingly infeasible. In that case it might be the right approach to ditch the RDBMS and raw filesystem, or bury it under enough layers of carefully-reviewed code to make the features transparent. I apologize if this has been a disorganized rant. I'm sure most of the concepts have been gone over by people far more knowledgeable than myself, and it's even possible that all of these are encapsulated in an already-existing framework that I just have to go out and learn. If so I'd love to hear about it in the comments ;) 1 I have it on good authority that Amazon's obidos is written in C, and also that the number of requests a particular Apache process is allowed to handle before being killed—to control memory leaks and such—is in the single digits. 2 As a stunt I'd like to see someone write a templating language in another templating language, preferably on top of PHP or some other inside-out language.

0 Comments:

Post a Comment

<< Home