The Django Book

Chapter 15: Middleware

On occasion, youll need to run a piece of code on each and every request that Django handles. This code might need to modify the request before the view handles it, it might need to log information about the request for debugging purposes, and so forth.

You can do this with Djangos middleware framework, which is a set of hooks into Djangos request/response processing. Its a light, low-level plug-in system capable of globally altering both Djangos input and output.

Each middleware component is responsible for doing some specific function. If youre reading this book linearly (sorry, postmodernists), youve seen middleware a number of times already:

  • All of the session and user tools that we looked at in Chapter 12 are made possible by a few small pieces of middleware (more specifically, the middleware makes request.session and request.user available to you in views).

  • The sitewide cache discussed in Chapter 13 is actually just a piece of middleware that bypasses the call to your view function if the response for that view has already been cached.

  • The flatpages , redirects , and csrf contributed applications from Chapter 14 all do their magic through middleware components.

This chapter dives deeper into exactly what middleware is and how it works, and explains how you can write your own middleware.

Whats Middleware?

A middleware component is simply a Python class that conforms to a certain API. Before diving into the formal aspects of what that API is, lets look at a very simple example.

High-traffic sites often need to deploy Django behind a load-balancing proxy (see Chapter 20). This can cause a few small complications, one of which is that every requests remote IP (request.META["REMOTE_IP"] ) will be that of the load balancer, not the actual IP making the request. Load balancers deal with this by setting a special header, X-Forwarded-For , to the actual requesting IP address.

So heres a small bit of middleware that lets sites running behind a proxy still see the correct IP address in request.META["REMOTE_ADDR"] :

class SetRemoteAddrFromForwardedFor(object):
    def process_request(self, request):
        try:
            real_ip = request.META['HTTP_X_FORWARDED_FOR']
        except KeyError:
            pass
        else:
            # HTTP_X_FORWARDED_FOR can be a comma-separated list of IPs.
            # Take just the first one.
            real_ip = real_ip.split(",")[0]
            request.META['REMOTE_ADDR'] = real_ip

If this is installed (see the next section), every requests X-Forwarded-For value will be automatically inserted into request.META['REMOTE_ADDR'] . This means your Django applications dont need to be concerned with whether theyre behind a load-balancing proxy or not; they can simply access request.META['REMOTE_ADDR'] , and that will work whether or not a proxy is being used.

In fact, this is a common enough need that this piece of middleware is a built-in part of Django. It lives in django.middleware.http , and you can read a bit more about it in the next section.

Middleware Installation

If youve read this book straight through, youve already seen a number of examples of middleware installation; many of the examples in previous chapters have required certain middleware. For completeness, heres how to install middleware.

To activate a middleware component, add it to the MIDDLEWARE_CLASSES tuple in your settings module. In MIDDLEWARE_CLASSES , each middleware component is represented by a string: the full Python path to the middlewares class name. For example, heres the default MIDDLEWARE_CLASSES created by django-admin.py startproject :

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware'
)

A Django installation doesnt require any middleware MIDDLEWARE_CLASSES can be empty, if youd like but we recommend that you activate CommonMiddleware , which we explain shortly.

The order is significant. On the request and view phases, Django applies middleware in the order given in MIDDLEWARE_CLASSES , and on the response and exception phases, Django applies middleware in reverse order. That is, Django treats MIDDLEWARE_CLASSES as a sort of wrapper around the view function: on the request it walks down the list to the view, and on the response it walks back up. See the section How Django Processes a Request: Complete Details in Chapter 3 for a review of the phases.

Middleware Methods

Now that you know what middleware is and how to install it, lets take a look at all the available methods that middleware classes can define.

Initializer: __init__(self)

Use __init__() to perform systemwide setup for a given middleware class.

For performance reasons, each activated middleware class is instantiated only once per server process. This means that __init__() is called only once at server startup not for individual requests.

A common reason to implement an __init__() method is to check whether the middleware is indeed needed. If __init__() raises django.core.exceptions.MiddlewareNotUsed , then Django will remove the middleware from the middleware stack. You might use this feature to check for some piece of software that the middleware class requires, or check whether the server is running debug mode, or any other such environment situation.

If a middleware class defines an __init__() method, the method should take no arguments beyond the standard self .

Request Preprocessor: process_request(self, request)

This method gets called as soon as the request has been received before Django has parsed the URL to determine which view to run. It gets passed the HttpRequest object, which you may modify at will.

process_request() should return either None or an HttpResponse object.

  • If it returns None , Django will continue processing this request, executing any other middleware and then the appropriate view.

  • If it returns an HttpResponse object, Django wont bother calling any other middleware (of any type) or the appropriate view. Django will immediately return that HttpResponse .

View Preprocessor: process_view(self, request, view, args, kwargs)

This method gets called after the request preprocessor is called and Django has determined which view to execute, but before that view has actually been executed.

The arguments passed to this view are shown in Table 15-1.

Table 15-1. Arguments Passed to process_view()
Argument Explanation
request The HttpRequest object.
view The Python function that Django will call to handle this request. This is the actual function object itself, not the name of the function as a string.
args The list of positional arguments that will be passed to the view, not including the request argument (which is always the first argument to a view).
kwargs The dictionary of keyword arguments that will be passed to the view.

Just like process_request() , process_view() should return either None or an HttpResponse object.

  • If it returns None , Django will continue processing this request, executing any other middleware and then the appropriate view.

  • If it returns an HttpResponse object, Django wont bother calling any other middleware (of any type) or the appropriate view. Django will immediately return that HttpResponse .

Response Postprocessor: process_response(self, request, response)

This method gets called after the view function is called and the response is generated. Here, the processor can modify the content of a response; one obvious use case is content compression, such as gzipping of the requests HTML.

The parameters should be pretty self-explanatory: request is the request object, and response is the response object returned from the view.

Unlike the request and view preprocessors, which may return None , process_response() must return an HttpResponse object. That response could be the original one passed into the function (possibly modified) or a brand-new one.

Exception Postprocessor: process_exception(self, request, exception)

This method gets called only if something goes wrong and a view raises an uncaught exception. You can use this hook to send error notifications, dump postmortem information to a log, or even try to recover from the error automatically.

The parameters to this function are the same request object weve been dealing with all along, and exception , which is the actual Exception object raised by the view function.

process_exception() should return a either None or an HttpResponse object.

  • If it returns None , Django will continue processing this request with the frameworks built-in exception handling.

  • If it returns an HttpResponse object, Django will use that response instead of the frameworks built-in exception handling.

Note

Django ships with a number of middleware classes (discussed in the following section) that make good examples. Reading the code for them should give you a good feel for the power of middleware.

You can also find a number of community-contributed examples on Djangos wiki: http://code.djangoproject.com/wiki/ContributedMiddleware

Built-in Middleware

Django comes with some built-in middleware to deal with common problems, which we discuss in the sections that follow.

Authentication Support Middleware

Middleware class: django.contrib.auth.middleware.AuthenticationMiddleware .

This middleware enables authentication support. It adds the request.user attribute, representing the currently logged-in user, to every incoming HttpRequest object.

See Chapter 12 for complete details.

Common Middleware

Middleware class: django.middleware.common.CommonMiddleware .

This middleware adds a few conveniences for perfectionists:

Forbids access to user agents in the ``DISALLOWED_USER_AGENTS`` setting : If provided, this setting should be a list of compiled regular expression objects that are matched against the user-agent header for each incoming request. Heres an example snippet from a settings file:

import re

DISALLOWED_USER_AGENTS = (
    re.compile(r'^OmniExplorer_Bot'),
    re.compile(r'^Googlebot')
)

Note the import re , because DISALLOWED_USER_AGENTS requires its values to be compiled regexes (i.e., the output of re.compile() ). The settings file is regular python, so its perfectly OK to include Python import statements in it.

Performs URL rewriting based on the ``APPEND_SLASH`` and ``PREPEND_WWW`` settings : If APPEND_SLASH is True , URLs that lack a trailing slash will be redirected to the same URL with a trailing slash, unless the last component in the path contains a period. So foo.com/bar is redirected to foo.com/bar/ , but foo.com/bar/file.txt is passed through unchanged.

If PREPEND_WWW is True , URLs that lack a leading www. will be redirected to the same URL with a leading www..

Both of these options are meant to normalize URLs. The philosophy is that each URL should exist in one and only one place. Technically the URL example.com/bar is distinct from example.com/bar/ , which in turn is distinct from www.example.com/bar/ . A search-engine indexer would treat these as separate URLs, which is detrimental to your sites search-engine rankings, so its a best practice to normalize URLs.

Handles ETags based on the ``USE_ETAGS`` setting : ETags are an HTTP-level optimization for caching pages conditionally. If USE_ETAGS is set to True , Django will calculate an ETag for each request by MD5-hashing the page content, and it will take care of sending Not Modified responses, if appropriate.

Note there is also a conditional GET middleware, covered shortly, which handles ETags and does a bit more.

Compression Middleware

Middleware class: django.middleware.gzip.GZipMiddleware .

This middleware automatically compresses content for browsers that understand gzip compression (all modern browsers). This can greatly reduce the amount of bandwidth a Web server consumes. The tradeoff is that it takes a bit of processing time to compress pages.

We usually prefer speed over bandwidth, but if you prefer the reverse, just enable this middleware.

Conditional GET Middleware

Middleware class: django.middleware.http.ConditionalGetMiddleware .

This middleware provides support for conditional GET operations. If the response has an Last-Modified or ETag or header, and the request has If-None-Match or If-Modified-Since , the response is replaced by an 304 (Not modified) response. ETag support depends on on the USE_ETAGS setting and expects the ETag response header to already be set. As discussed above, the ETag header is set by the Common middleware.

It also removes the content from any response to a HEAD request and sets the Date and Content-Length response headers for all requests.

Reverse Proxy Support (X-Forwarded-For Middleware)

Middleware class: django.middleware.http.SetRemoteAddrFromForwardedFor .

This is the example we examined in the Whats Middleware? section earlier. It sets request.META['REMOTE_ADDR'] based on request.META['HTTP_X_FORWARDED_FOR'] , if the latter is set. This is useful if youre sitting behind a reverse proxy that causes each requests REMOTE_ADDR to be set to 127.0.0.1 .

Danger!

This middleware does not validate HTTP_X_FORWARDED_FOR .

If youre not behind a reverse proxy that sets HTTP_X_FORWARDED_FOR automatically, do not use this middleware. Anybody can spoof the value of HTTP_X_FORWARDED_FOR , and because this sets REMOTE_ADDR based on HTTP_X_FORWARDED_FOR , that means anybody can fake his IP address.

Only use this middleware when you can absolutely trust the value of HTTP_X_FORWARDED_FOR .

Session Support Middleware

Middleware class: django.contrib.sessions.middleware.SessionMiddleware .

This middleware enables session support. See Chapter 12 for details.

Sitewide Cache Middleware

Middleware class: django.middleware.cache.CacheMiddleware .

This middleware caches each Django-powered page. This was discussed in detail in Chapter 13.

Transaction Middleware

Middleware class: django.middleware.transaction.TransactionMiddleware .

This middleware binds a database COMMIT or ROLLBACK to the request/response phase. If a view function runs successfully, a COMMIT is issued. If the view raises an exception, a ROLLBACK is issued.

The order of this middleware in the stack is important. Middleware modules running outside of it run with commit-on-save the default Django behavior. Middleware modules running inside it (coming later in the stack) will be under the same transaction control as the view functions.

See Appendix C for more about information about database transactions.

X-View Middleware

Middleware class: django.middleware.doc.XViewMiddleware .

This middleware sends custom X-View HTTP headers to HEAD requests that come from IP addresses defined in the INTERNAL_IPS setting. This is used by Djangos automatic documentation system.

Whats Next?

Web developers and database-schema designers dont always have the luxury of starting from scratch. In the next chapter, well cover how to integrate with legacy systems, such as database schemas youve inherited from the 1980s.

Copyright 2006 Adrian Holovaty and Jacob Kaplan-Moss.
This work is licensed under the GNU Free Document License.
Hosting graciously provided by media temple
Chinese translate hosting by py3k.cn.