Web Service Architecture

This document provides an overview of administration and architecture of web-based services. Most “real” systems administration work involves a fair amount of wrangling HTTP servers, so the core information in this document will likely be useful to most. As a secondary benefit, HTTP provides a good basis to address a number of key service-administration topics and issues, which are relevant to all systems administrators.

Technology Overview

HTTP Protocol

There’s more to the Internet and to network services than HTTP and “the web,” and the moment for protesting the conflation of HTTP and “the Internet,” has passed. For whatever its worth, HTTP has become the UNIX pipe of the Internet. Web browsers, of course implement HTTP, but for testing purposes you should familiarize yourself with curl, which is a versatile command line HTTP client (among other protocols.)

There are a few terms with specific meanings in the context of HTTP. Consider these basic concepts:

The software that sends requests to a server and then receives data (a resource) in response. HTTP clients (typically) place requests to servers listening on port 80. HTTPS clients pace requests to servers listening on port 443. There are many robust, native, HTTP client libraries for application developers.
A system that responds to client requests and returns resources. HTTP servers listen for requests on port 80; however, servers can run on any port.. HTTPS servers listen for requests on port 443 by default. Recently, a number of non-world-wide-web services have emerged that implement HTTP servers. (i.e. RESTful APIs, Application Servers, databases like CouchDB.)
The content returned by the server in response to a request, except for that data which is in the header.
The message sent from a client to a server HTTP defines several different types of requests, discussed in HTTP requests.
status code
All responses from an HTTP server, include a status code, described in HTTP status codes.
The metadata transmitted with a given request or response. See headers for more information.
A daemon that implements the HTTP protocol. Historically, this has refered to the NCSA httpd and its successor the Apache HTTPD Server, though the name is generic, and can refer to any HTTP server, typically multi-purpose ones running on Unix-like systems.


HTTP transmits metadata regarding content in key/value pairs called headers. Headers are largely arbitrary, though different kinds of clients have different requirements around headers. Use “curl -I” to return a list of headers. See the following headers from tychoish.com:

Server: nginx/0.7.67
Date: Sun, 06 Nov 2011 14:26:07 GMT
Content-Type: text/html
Content-Length: 8720
Last-Modified: Fri, 04 Nov 2011 13:39:02 GMT
Connection: keep-alive
Accept-Ranges: bytes

HTTP also provides for a set of headers in the request. They take the same key/value form as response headers, although the set of keys is different.

Status (Error) Codes

When you run “curl -I” the first response line provides the HTTP status, the above example omitted the following status:

HTTP/1.1 200 OK

This conveys the version of the HTTP protocol in use [1] a status code (e.g. 200,) and then some human intelligible translation of this code. You already know code 404 that servers returns when it can’t find the resource requested. 200 is, as above, is he status code for “resource returned successfully.”

There are a few dozen HTTP codes, and while some administrators have memorized the HTTP status codes, there is no need: In general, 300 series codes reflect a redirection (e.g. “the resource you’re looking for is somewhere else,”) 400 series code reflect some sort of error or problem that the server has with the request, and 500 series requests reflect some sort of “internal error,” usually related to the server’s configuration or state.

The following codes are useful to know on sight. Use a reference for everything else:

Code Meaning
200 Everything’s ok.
301 Resource Moved Permanently. Update your bookmarks.
302 Moved Temporarily. Don’t update your bookmarks.
400 Error in request syntax. Client’s fault.
401 Authorization required. Use HTTP Auth to return the resource
403 Access Denied. Bad HTTP Auth credentials or permissions.
404 Resource not found. Typo in the request, or data not found.
410 Resource removed from server. This is infrequently used.
418 I’m a tea pot. From an April Fools RFC, and socially useful.
500 Internal server error. Check configuration and error logs.
501 Requires unimplemented feature. Check config and error logs.
502 Bad gateway. Check proxy configuration. Upstream server error.
504 Gateway timeout. Proxy server not responding.

Often server logs will return more useful information regarding the state of the system.

[1]All HTTP services are or attempts to comply with version 1.1.


HTTP provides a very full featured interface for interacting with remote resources, although in common use, the GET, POST, and PUT request account for the overwhelming majority of requests. Requests types are generally refered to as “methods.” htrict semantic adherence to HTTP methods is one of the defining aspects of “REST

Fetch a resource from an HTTP server.
Upload a resource to an HTTP server. Often fails as a result of file permissions and server configurations, but don’t assume that it will fail. Less common than POST
Send a response to a web pages. Submitting web-forms are conventionally implemented as POSTS.
Remove a resource from an HTTP server. Often fails as a result of file permissions and server configurations, but don’t assume that it will fail. Used infrequently prior to RESTful web APIs.
Retrieve only the headers without fecthing the body of the request.

HTTP Abstractions

Conceptually, it’s best to think about HTTP in terms of static content conveyed directly from the server’s file system, thought the web server, to the end-user’s client. In truth, operations are more complex, and most HTTP deployments make use of a clustered server and server side processing.

There are a couple of very simple abstractions that most general purpose web servers provide that make it easier to deploy web services using HTTP: Proxy handling where a server will “pass” a request to another server and URL rewriting where the server will map incoming requests for resources to different internal paths and resources.

Load Balancing and Proxies

Most general purpose web servers have the ability to proxy or forward incoming requests to different HTTP (or CGI, FastCGI or similar) server. Proxying requires minimal overhead on the part of the front end server and makes it possible to host an entire domain or sub-domain using a cluster of machines or have a single public IP address that can access the resources of a group of machines. As a result proxies are essential for scaling web services horizontally. Most deployments of any size use these abstractions to distribute resources. Think of proxying as an instance of partitioning or horizontal scaling.

Load-balancing, then, are proxy configurations where a cluster of servers identical servers provide a single resource. The proxy server in these situations must distribute the requests among the nodes and (optionally) track connections to ensure that nodes remain responsive and in some configurations can ensure that connections from a single client are consistently routed to the same back-end server when possible. Load balancers include the ability to distribute requests unevenly among the node if systems have different capacities as well as different possible responses to node failures. Load balanced architectures are simple examples of replication or vertical scaling.

URL Rewriting

URL rewriting allows httpd programs to map request strings to actual resources, which may have different names or locations. URL rewriting engines often support regular expression matching, and can provide both transparent rewriting where URLs are rewritten so that the client is unaware, or as “redirections” where the server directs the client to requires multiple requests.

This ability to preset logical URLs to users without affecting the organization of the “back-end” can be incredibly liberating for administrators and developers. Most web development frameworks provide some level of URL abstraction but having this ability in the web server is also helpful. While there are some quirks for each server’s rewriting engine, all systems are roughly similar.

Web Server Fundamentals

HTTP and Static Content

HTTP is really designed to serve static content, and most general-purpose web servers (and browsers) and optimized for this task. Web browsers make multiple requests in parallel to download embeded content (i.e. images, style sheets, JavaScript) “all at once” rather than sequentially.” General purpose HTTPDs are also pretty good at efficiently serving content with this pattern. To serve static content:

  • Make sure that you’re not serving static content (i.e. anything that the web/application server needs to modify) from a low-volume/single-threaded application server.
  • Use some sort of caching service, if needed. It’s an additional layer of complexity, but using a front-end caching proxy like Varnish or Squid can cache data in RAM and return results more quickly, which is useful in certain kinds of high-volume situations with certain kinds of applications. Caches are great, but they don’t solve underlying problems, and they add an additional layer of complexities.
  • Make sure all resources/assets originate from the same domain, if possible. Use a virtual domain like “static.example.net” if necessary, but being consistent with your domain usage can help your browser cache things more effectively. It also makes it easier for you to understand your own setups later. Keep things simple and organized.

Serving static content with HTTP is straightforward, when you need to dynamically assemble content per-request, you must use a more complex system. The kind of dynamic content you require and the kinds of existing applications and tools that you want to use dictate your architecture–to some extent–from here.

Common Gateway Interfaces

CGI, FastCGI, SCGI, PSGI, WSGI, and Rack are all protocols used by web servers to communicate with application servers that generate content dynamically. In summary, users make HTTP requests against a web server (httpd) that passes the request to a separate process, which generates a response that it hands back to the HTTP server that returns the result to the user. While this seems a bit complex, in practice CGI and related protocols have simple designs, robust tool-sets, are commonly understood by many developers, and (generally) provide excellent process/privilege segregation.

There are fundamental differences between these protocols, even though their overall method of operation is similar. Consider the following overview:

Common Gateway Interface. CGI is the “original” script gateway protocol. CGI is simple and easy to implement, but every request requires the webserver to create a new process, or copy of the application in memory, for the length of the request. The per-request process creation and tear-down doesn’t scale well with database connections and large request loads.
FastCGI attempts to solve the process creation/tear-down overhead, by daemonizing he application. With FastCGI your application layer can distribute these costs (i.e. process initialization, database connections, etc.) across the deployment. The result is greatly increased performance over conventional CGI. However, FastCGI is more complex to implement, and typically FastCGI application instances have lower request-per-second-per-instance capacities than HTTP servers, which creates a minor architectural challenge. Also, to deploy new application code, you must restart FastCGI processes, which may interrupt client requests.
Web Server Gateway Interface (sometimes pronounced wisgy or wisky.) WSGI provides a method for web applications to communicate with conventional HTTP servers. WSGI emerged from the Python community, and is typically used by applications written in this language though the interface is not necessarily Python specific. WSGI is easy to use, though the exact method of deployment and operation varies slightly by implementation.
Perl Web Server Gateway. PSGI provides an interface, a la WSGI between Perl web applications and other CGI-like servers. Indeed, PSGI primarily describes a tool-set for writing web applications rather than a particular interface or protocol to web servers: you can run PSGI applications with CGI, FastCGI or HTTP interfaces.
Simple Common Gateway Interface. SCGI is operationally similar to FastCGI, but the protocol appears more like CGI from the application perspective.
Rack is a Ruby-centric (and inspired PSGI) web-server interface that provides an abstraction layer/interface between web servers and Ruby applications that “appears native” to Ruby developers.

While CGI and FastCGI defined dynamic applications from the earliest days of HTTP and the web, the other above mentioned interface methods seem largely emerged in the context of recent web application development frameworks such as Ruby on Rails, Django, and Catalyst.

HTTP App Servers

There are a number of recent application servers that implement HTTP itself instead of some intermediate protocol. These are efficient for serving dynamic content, they’re less efficient for serving static resources and cannot support heavy loads. In production environments these administrators will cluster these servers behind a general purpose httpd that proxies requests back to the application server. In this respect, such servers are operationally similar to FastCGI application servers, but may be easier to for administrators and developers. Examples of these kinds of application servers include:

  • Thin,
  • Mongrel,
  • Tornado,
  • Twisted, and
  • Node.js.

Embeded Interpreters

In contrast to applications servers that operate as web server or use a gateway interface, some general purpose httpd implementations make it possible to embed a programming language within the web server process. Implementations vary by language and by webserver; but are often very powerful at the expense of some idiosyncrasies. For a quite a while, these methods were the prevailing practice for deploying dynamic content.

This practice is most common in context of the Apache HTTPD Server with Perl (and mod_perl) and PHP (mod_php). While there are also Ruby (mod_ruby) and Python (mod_python) implementations of these methods, the community has abandoned these modules in preference for other application deployment methods.

With the exception of mod_php, the embeded interpreters all require you to restart Apache when deploying new code. Additionally, all code run by way of an interpreter embeded in the web server process runs with the permissions of the web server. These operational limitations make this approach less ideal for shared environments.

Even though less popular today than other options, there are still reasons to use these tools for your application in some circumstances. Consider the following:

  • mod_perl is very efficient, and not only provides a way to run CGI-style scripts, but also exposes most of the operation of Apache to Perl-scripting. In some advanced cases this level of flexibility may provide enough benefit to indicate using mod_perl and Apache over other options.
  • mod_php has comparable performance to other methods of running PHP scripts, and is significantly easier to deploy applications using mod_php than most other methods of deploying PHP. [2] Because mod_php is so easy to use, its likely that most PHP developers target this environment. The common conception that “PHP just works,” is probably due to the ease of use of mod_php
[2]In the last couple of years, PHP-FPM has made PHP much easier to run as FastCGI.

Scaling HTTP Servers and Building Distributed Systems

Often web servers are pretty straightforward and have low resource requirements. However, in a number of situations web servers face some scaling challenges, for example: when faced with extraordinary load, when they must generate dynamic content for each request can use significantly greater resources often require additional resources, and when services are critical and downtime is not an option.

In nearly every case, the database layer presents a larger scale-related challenge than the web server tier. See “Database Capacity and Scaling” for more information related to database scaling and architecture.

If HTTP becomes a bottleneck, like databases, there’s a progression of measures that you may take to help ensure that your deployment can deal with the traffic that you expect to face. In general consider the following process:

Begin by: moving the database engine to a separate system, and ensuring that your method of serving dynamic content is finely tuned. In many cases, the default configuration for dynamic content (CGI/Apache/etc.) is poorly tuned. For example, the application server or the httpd has a low maximum connection threshold and the process will review before reaching capacity. Connection timeouts, application timeouts, and approaches to concurrency (threading, forking, event-driven, etc) can all impact performance and you must understand address these problems before taking other approaches.

Then, begin by decoupling HTTP services along logical boundaries, so that it’s easy to increase application capacity and capacity for serving static content separately. If your application or “site,” depends on multiple applications and runtimes, make sure that the services run distinctly, and that all components can run independently and with minimal dependencies.

The key to making sure the site works as a whole is to use a load balancing proxy sever in front of the servers that provide your core application and content. This layer makes it possible for users of your service to have the experience of only using one system when in fact a cluster of systems supplies the service.


Because most application servers are single threaded, it makes sense to run some 2-4 application servers, per system (each on a distinct TCP port.)

Typically, run one application server, or webserver worker process per core.

The number of “worker” processes for the nginx server defaults to 1 for most distributions. As a result, unless you modify the configuration, nginx will only use one processor core.

You may architect systems with this scaling eventuality in mind from the very beginning by using virtual hosting and private networks to separate the application layer from the “front-end” HTTP servers. Once, you’ve segregated application and static HTTP content, it’s a relatively simple matter to cluster specific components and add capacity “horizontally.” As the application layer becomes saturated, deploy more instances of the application service and use the load balancer to distribute load among those nodes, and you can safely repeat this for each component service.


If your system supports or requires a higher standard of availability, it’s also good to keep at least two front-end proxy servers in rotation at any given time, by adding multiple DNS records for a single hostname that point to multiple hosts running identical front-end services.

See also

While availability and scaling are not necessarily linked tasks, consider the material covered in “High(er) Availability Is a Hoax” when thinking about architecture.

HTTPD Technologies

Until now, this document has approached HTTP and web servers in generic terms, as if all implementations are equivelent. In truth, there are some significant differences.

To be fair, HTTP is pretty simple, and strictly speaking there’s no need for a big multi-purpose web server. It’s totally possible to use inetd, and a little bit of code (you might as well do this in C) to create your own httpd. Every time a request comes in, inetd spawns a copy of your daemon, which handles the request and process terminates. The HTTP protocol is pretty straightforward, so the server is easy to implement and the binary is pretty small and you can have total control over the behavior of the server by changing some values in a header file (and recompiling, of course.) I’m aware of at least one pretty high traffic site that does things in this manner and it works. Surprisingly well. So that’s an option, but perhaps a bit beyond most mortals.

The remainder of this document, then provides an overview of a process that you might use to chose a HTTP server for your deployment, as well as an overview of contemporary open source HTTP servers.

Choosing a Web Server

You should evaluate web servers on a couple of dimensions, including RAM usage, configuration method, compatibility and interoperability with your application servers, and resource utilization under load. In turn:

  • RAM usage, addresses the amount of memory that the server uses, both without any traffic and then with traffic. You should try to know (roughly,) how much RAM a user (or a thousand) concurrent users requires.
  • Configuration, describes how easy and grokable the server’s configuration system is for you. This is a personal determination, but there are some configuration systems that seem built around particular tasks. For instance, some kinds of pragmatically configured virtual hosting schemes are considerably easier to setup and maintain with Lighttpd than other servers. Apache’s configuration is the the most well documented interface. Consider your task and be familiar with how different servers approach configuration.
  • Compatibility and Interoperability. Consider the ease with which the server can connect to or run your application or service. The advent of CGI, FastCGI and successor inter-service protocols make this nearly a non-issue. At the same time, there are some operational reasons to use certain servers over others. For instance: the uwsgi application server is considerably easier to run with nginx than with Apache. PHP code has historically been easier to run under Apache than with any other server, and the large number of Apache modules make Apache a clear winner for many cases.
  • Resource utilization under load is, in many circumstances, the most important factor in HTTP daemon. Because CPU and RAM use relates directly to cost and capacity, a web server that uses fewer resources without sacrificing required features is preferable.

Approaches to Concurrency in Web Servers

There are three basic approaches to dealing with concurrent requests used by web servers: forking, threading, and queuing. This means:

  • Forking. The example using inetd above, is a primal example of this kind of methodology. In essence the server creates a new copy of the process for every request. This is simple because there are no requirements for a shared state between any of the processes that fulfill each request. This approach is robust, but resource intensive as the server duplicates the entire process in memory per request. Apache HTTPD in the 1.x series was a forking web server, and continues to provide an optional forking implementation in the 2.x series.
  • Threading. Functionally similar to the forking approach described above; however, rather than creating a new process for every request the web server creates a new thread to track each request. This is somewhat less memory intensive than the forking approach, but is more complicated from an engineering prospective which lead to some stability issues in early stage implementations and continues to impact compatibility with some embeded interpreters. While less resource intensive than forking, threaded approaches are reasonably memory intensive. The performance advantages obtained with the threaded approach provide only modest benefits over forking and have real limitations at scale. The default “worker” “mpm module” for the 2.x series of the Apache HTTPD uses a threaded approach.
  • Queuing. This method places all requests in a queue, and a uses an asynchronous event loop to service all requests. This method uses very little memory, particularly under load. The the “mpm_event” module for Apache uses this approach (stable in 2.4,) and the Lighttpd, nginx, and Cherokee web servers all use a queuing approach. These servers are sometimes described as “event driven” or “asynchronous.”

HTTPD Implementations

Apache HTTPD

It’s difficult to talk about HTTP or even open source and Linux without considering the impact of the Apache httpd. Apache descends from the original httpd developed at the NCSA. The server is highly modular and incredibly flexible, having grown out of a series of “patches” (hence the name from, “a patchy web server”) to the original HTTPD. The Apache project consolidated in the 1990s, and is generally regarded as one of the early technological successes of open source, and likely fueled most early adoption of GNU/Linux systems.

Today, the server itself remains popular and very useful, with most administrators having some level of familiarity with Apache and its configuration. It is well documented supported on many platforms and with many tools as a result of its wide adoption. Apache is incredibly stable and robust as a result of it’s extensive use. At the same time, Apache is not very efficient, particularly in light of recent competitors. While this comparison is frequently offered in analysis, it’s probably the case that it’s overblown. The demands on the vast majority of web servers will never surpass the ability of a well tuned Apache instance on even modest hardware.


nginx (pronounced “engine x”) is my personal favorite web server. It’s simple, functionally complete, uses very little memory, and preforms reliably under all circumstances that I’ve been able to throw at it. The configuration syntax is simple and makes many of the more complicated Apache configurations simple. Many appreciate its and aptitude for serving as an HTTP proxy and software load-balancer, and it can serve as a high-volume Mail proxy for high volume mail servers. The best part of nginx is that it just works.

While there are edge cases where it makes sense to use another server (typically Apache,) and there are some edges that are more rough: plain CGI isn’t particularly smooth, [3] better authentication systems, [4] or a pluggable caching layer, there are few reasons to use something other than nginx.

[3]Admittedly, the problem is largely with CGI itself. Given an option, I tend to prefer nginx’s externalization of this and it’s configuration of FastCGI processes.
[4]nginx only supports basic HTTP authentication. This is a fundamental flaw with HTTP, but support for digest authentication would at the very least be nice.


Lighttpd (pronounced “lighty”) was one of the first to use the queuing methodology. Because Lighttpd was stable very early and deployed in a number of high profile situations, it became a favorite. In addition to generally efficient operation, it offers a minimalist Lua-based configuration which permits dynamic virtual host configuration and some additional flexibility. Lighttpd, like nginx, supports FastCGI naively, and developed the widely used “spawn-fcgi” tool for starting FastCGI servers.

Unfortunately, development on Lighttpd has stalled and there is a persistent memory-leak issue which forces administrators to restart the server every couple of days. Since late 2009, there has been little reason to use Lighttpd, except if you need the easy virtual host configuration, and can’t find similar functionality in other tools and don’t mind restarting the server arbitrarily for a known problem.


The inclusion of AntiWeb is something of an outlier, but it’s a cool project and it may be a good introduction to HTTP servers for someone interested in the technology at a lower level. AntiWeb is a Common Lisp web server that uses the event-driven/asynchronous approach like Lighttpd or nginx, but it inherits some pretty innovative ideas regarding web development and HTTP from the Lisp world. While you might not use AntiWeb as your next httpd, it’s worth investigation by anyone whose interested in web servers and web applications.


Cherokee views itself as the successor to Apache (hence the name,) and combines Apache’s ease of use with the performance of event-driven servers like nginx. It’s main selling point, is easy configuration, which it accomplishes by way of a web-based interface.