Dispatcher

The Dispatcher is Adobe's caching and/or load balancing tool. This section aims to help you:

  • Configure the Dispatcher for efficient operation.
  • Design your website to optimize operation of the Dispatcher when creating a high-performance website.
  • Fine-tune your Dispatcher settings should performance issues arise.

Using the Dispatcher also helps protect your application server from attack. Therefore, you can increase protection of your CQ instance by using the Dispatcher in conjunction with an industry-strength web server.

The process for deploying the Dispatcher is independent of the web server and OS platform chosen:

Why use the Dispatcher to implement Caching?

There are two basic approaches to web publishing:

  • Static Web Servers: such as Apache or IIS, are very simple, but fast.
  • Content Management Servers: which provide dynamic, real-time, intelligent content, but require much more computation time and other resources.

The Dispatcher helps realize an environment that is both fast and dynamic. It works as part of a static HTML server, such as Apache, with the aim of:

  • storing (or "caching") as much of the site content as is possible, in the form of a static website
  • accessing the layout engine as little as possible.

Which means that:

  • static content is handled with exactly the same speed and ease as on a static web server;additionally you can use the administration and security tools available for your static web server(s).
  • dynamic content is generated as needed, without slowing the system down any more than absolutely necessary.

The Dispatcher contains mechanisms to generate, and update, static HTML based on the content of the dynamic site. You can specify in detail which documents are stored as static files and which are always generated dynamically.

This section illustrates the principles behind this.

Static Web Server

file

A static web server, such as Apache or IIS, serves static HTML files to visitors of your website. Static pages are created once, so the same content will be delivered for each request.

This process is very simple, and thus extremely efficient. If a visitor requests a file (e.g. a HTML page), the file is usually taken directly from memory, at worst it is read from the local drive. Static web servers have been available for quite some time, so there is a wide range of tools for administration and security management, and they are very well integrated with network infrastructures.

Content Management Servers

file

If you use a Content Management Server, such as CQ, an advanced layout engine processes the request from a visitor. The engine reads content from a repository which, combined with styles, formats and access rights, transforms the content into a document that is tailored to the visitor's needs and rights.

This allows you to create richer, dynamic content, which increases the flexibility and functionality of your website. However, the layout engine requires more processing power than a static server, so this setup may be prone to slowdown if many visitors use the system.

How the Dispatcher performs Caching

file
The Cache Directory For caching, the Dispatcher module uses the web server's ability to serve static content. The Dispatcher places the cached documents in the document root of the web server.

Note

Due to this, the Dispatcher stores only the HTML code of the page - it does not store the HTTP headers. This can be an issue if you use different encodings within your website, as these may get lost.

Methods for Caching

The Dispatcher has two primary methods for updating the cache content when changes are made to the website.

  • Content Updates remove the pages that have changed, as well as files that are directly associated with them.
  • Auto-Invalidation automatically invalidates those parts of the cache that may be out of date after an update. i.e. it effectively flags relevant pages as being out of date, without deleting anything.

Content Updates 

In a content update, one or more CQ documents change. CQ sends a syndication request to the Dispatcher, which updates the cache accordingly:

  1. It deletes the modified file(s) from the cache.
  2. It deletes all files that start with the same handle from the cache. For example, if the file /en/index.html is updated, all the files that start with /en/index. are deleted. This mechanism allows you to design cache-efficient sites, especially in regard to picture navigations.
  3. It touches the so-called statfile; this updates the timestamp of the statfile to indicate the date of the last change.

The following points should be noted:

  • Content Updates are typically used in conjunction with an authoring system which "knows" what must be replaced.
  • Files that are affected by a content update are removed, but not replaced immediately. The next time such a file is requested, the Dispatcher fetches the new file from the CQ instance and places it in the cache, thereby overwriting the old content.
  • Typically, automatically generated pictures that incorporate text from a page are stored in picture files starting with the same handle - thus ensuring that the association exists for deletion. For example, you may store the title text of the page mypage.html as the picture mypage.titlePicture.gif in the same folder. This way the picture is automatically deleted from the cache each time the page is updated, so you can be sure that the picture always reflects the current version of the page.
  • You may have several statfiles, for example one per language folder. If a page is updated, CQ looks for the next parent folder containing a statfile, and touches that file.

Auto-invalidation

Auto-invalidation automatically invalidates parts of the cache - without physically deleting any files. At every content update, the so-called statfile is touched, so its timestamp reflects the last content update.

The Dispatcher has a list of files that are subject to auto-invalidation. When a document from that list is requested, the Dispatcher compares the date of the cached document with the timestamp of the statfile:

  • if the cached document is newer, the Dispatcher returns it.
  • if it is older, the Dispatcher retrieves the current version from the CQ instance.

Again, certain points should be noted:

  • Auto invalidation is typically used when the inter-relations are complex e.g. for HTML pages. These pages contain links and navigation entries, so they usually have to be updated after a content update. If you have automatically generated PDF or picture files, you may choose to auto-invalidate those too.
  • Auto-invalidation does not involve any action by the dispatcher at update time, except for touching the statfile. However, touching the statfile automatically renders the cache content obsolete, without physically removing it from the cache.

How the Dispatcher returns Documents

file

Finding out whether a document is subject to caching

You can define which documents the Dispatcher caches in the configuration file. The Dispatcher checks the request against the list of cacheable documents. If the document is not in this list, the Dispatcher requests the document from the CQ instance.

The Dispatcher always requests the document directly from the CQ instance in the following cases:

  • If the HTTP method is not GET. Other common methods are POST for form data and HEAD for the HTTP header.
  • If the request URI contains a question mark "?". This usually indicates a dynamic page, such as a search result, which does not need to be cached.
  • The file extension is missing. The web server needs the extension to determine the document type (the MIME-type).
  • The authentication header is set (this can be configured)

Finding out if a document is cached

The Dispatcher stores the cached files on the web server as if they were part of a static website. If a user requests a cacheable document the Dispatcher checks whether that document exists in the web server's file system:

  • if so, it returns this
  • if not, the Dispatcher requests the document from the CQ instance.

Finding out if a document is up-to-date

To find out if a document is up to date, the Dispatcher performs two steps:

  1. It checks whether the document is subject to auto-invalidation. If not, the document is considered up-to-date.
  2. If the document is configured for auto-invalidation, the Dispatcher checks whether it is older or newer than the last change available. If it is older, the Dispatcher requests the current version from the CQ instance and replaces the version in the cache.

Why implement Load Balancing?

Load Balancing is the practice of distributing the computational load of the website across several instances of CQ.

file

You gain:

  • increased processing power
    In practice this means that the Dispatcher shares document requests between several instances of CQ. Because each instance now has fewer documents to process, you have faster response times. The Dispatcher keeps internal statistics for each document category, so it can estimate the load and distribute the queries efficiently.
  • increased fail-safe coverage
    If the Dispatcher does not receive responses from an instance, it will automatically relay requests to one of the other instance(s). Thus, if an instance becomes unavailable, the only effect is a slowdown of the site, proportionate to the computational power lost. However, all services will continue.
  • you can also manage different websites on the same static web server.

Note

While load balancing spreads the load efficiently, caching helps to reduce the load. Therefore, try to optimize caching and reduce the overall load before you set up load balancing. Good caching may increase the load balancer's performance, or render load balancing unnecessary.

Caution

While a single Dispatcher will usually be able to saturate the capacity of the available Publish instances, for some rare applications it can make sense to additionally balance the load between two Dispatcher instances. Configurations with multiple Dispatchers need to be considered carefully, since an additional Dispatcher will increase the load on the available Publish instances and can easily cause an actual overall performance decrease in most applications.

How the Dispatcher performs Load Balancing

Performance Statistics

The Dispatcher keeps internal statistics about how fast each instance of CQ processes documents. Based on this data, the Dispatcher estimates which instance will provide the quickest response time when answering a request, and so it reserves the necessary computation time on that instance.

Different types of requests may have differing average completion times, so the Dispatcher allows you to specify document categories. These are then considered when computing the time estimates. For example, you can make a distinction between HTML pages and images, as the typical response times may well differ.

If you use an elaborate search function, you may create a new category for search queries. This helps the Dispatcher send search queries to the instance that responds fastest. This prevents a slower instance from stalling when it receives several "expensive" search queries, while the others get the "cheaper" requests.

Personalized content (Sticky Connections)

Sticky connections ensure that documents for one user are all composed on the same instance of CQ. This is important if you use personalized pages and session data. The data is stored on the instance, so subsequent requests from the same user must return to that instance or the data is lost.

Because sticky connections restrict the Dispatcher's ability to optimize the requests, you should use them only when needed. You can specify the folder that contains the "sticky" documents, thus ensuring all documents in that folder are composed on the same instance for each user.

Note

For most pages that use sticky connections you have to switch off caching - otherwise the page looks the same to all users, regardless of the session content.

For a few applications, it can be possible to use both sticky connections and caching; for example, if you display a form that writes data to the session.

Installation

Each Dispatcher installation kit comes as an archive file that can be downloaded from Daycare.

The following naming convention is used:

dispatcher-<web-server>-<operating-system>-<dispatcher-release-number>.<file-format>

For example, the dispatcher-apache2.2-linux-i686-4.0.6.tgz installation kit contains the dispatcher release 4.0.6 for an Apache 2.2 web server that runs under Linux i686 and is packaged using the tar format.

The naming according to web server is as follows:

 Web Server  Installation Kit
 Apache 2.2  dispatcher-apache2.2-<other parameters>
 Apache 2  dispatcher-apache2-<other parameters>
 Apache 1.3  dispatcher-apache-<other parameters>
 Apache 1.3 EAPI  dispatcher-apache-eapi-<other parameters>
 Microsoft Internet
 Information Server 5, 6, 7
 dispatcher-iis-<other parameters>
 Sun Java Web Server / iPlanet
 dispatcher-ns-<other parameters>

Note

Within the source tree of Apache 1.3 the compiler flag EAPI exists. A dispatcher module built for Apache with EAPI should not be used inside a plain apache server nor vice versa.

In order to determine which version of the dispatcher you should install:

Open a shell, chdir to your apache installation directory and enter:

    # ./httpd -V

This will output information such as:

    Server version: Apache/1.3.27 (Unix)
    Server built: Feb 18 2003 17:46:31
    Server's Module Magic Number: 19990320:13
    Server compiled with.... -D EAPI -D HAVE_MMAP ...

When the flag EAPI is shown, such an installation would need an apache-eapi installation kit.

Historically, Apache with EAPI is often found in binary distributions bundled with an operating system, whereas an apache built from source is a plain one.

Each archive contains the following files:

  • the Dispatcher modules
  • an example configuration file
  • the readme file with installation instructions and last-minute information; the readme file is named as README.dispatcher.<web-server-name> e.g. README.dispatcher.apache
  • release notes

Note

Please check the readme file for any last-minute changes / platform specific notes before starting the installation.

The following sections detail the web server specific installation procedures.

Microsoft Internet Information Server

Microsoft IIS - Installing IIS

For information on how to install this web server, see:

IIS 7 / 7.5

After installation:

  1. Open the Server Manager and add the role Web Server (IIS).
  2. Ensure the following Role Services are installed:
    • ISAPI Extensions
    • ISAPI Filters
    • IIS 6 Metabase Compatibility

Microsoft IIS - Installing the Dispatcher module

See Installation for information on accessing the installation files. The required archive for Microsoft Internet Information System is:

  • dispatcher-iis-<operating-system>-<dispatcher-release-number>.zip

which contains the following files :

File
 Description
 disp_iis.dll The Dispatcher dynamic link library file.
 disp_iis.ini Configuration file for the IIS. This example can be updated with your requirements.
NOTE: The ini file must have the same name-root as the dll.
 disp_iis.pdb Program database holding program symbols. Can be used for debugging purposes.
 dispatcher.any An example configuration file for the Dispatcher.
 README.dispatcher.iis The readme file holding installation instructions and last-minute information.
Note: Please check this file before starting the installation.
 release-notes.txt
The release notes; listing issues fixed in the current and past releases.

IIS 5 / 6

  1. Extract
    • disp_iis.dll
    • disp_iis.ini
    • dispatcher.any
    • disp_iis.pdb
    from the Dispatch package into the executable directory of the selected website under IIS; 
    i.e. <IIS_INSTALLDIR>\scripts. For example, in a standard IIS installation on a non-server product, the installation directory is C:\Inetpub, with the scripts virtual directory located in C:\Inetpub\Scripts.

IIS 7 / 7.5

To add the Dispatcher to the list of available ISAPI filters use the following steps:

  1. Using the Windows Explorer, create a directory <IIS_INSTALLDIR>/Scripts;
    for example, C:\Inetpub\Scripts.
  2. Extract all files from the scripts directory of the Dispatcher package into this directory:
    • disp_iis.dll
    • disp_iis.ini
    • dispatcher.any
    • disp_iis.pdb

Microsoft IIS - Configure the Dispatcher INI File

  1. Configure disp_iis.ini as required. The basic format of the .ini file is as follows:
[main]
scriptpath=/<virtual root>/disp_iis.dll
configpath=<path to dispatcher.any>
loglevel=1
servervariables=1
where :
Parameter Description
scriptpath This is the URL path of disp_iis.dll within the web server's virtual namespace i.e. /scripts/disp_iis.dll.
configpath The location of dispatcher.any within the local file system (absolute path).
logfile The location of the dispatcher.log file.  If this is not set then log messages will go to the windows event log.
loglevel Defines the Log Level used to output messages to the event log. The following values may be specified:
0: error messages only.
1: errors and warnings.
2: errors, warnings and informational messages.
3: errors, warnings, informational and debug messages.
Note: It is recommended to set the log level to 3 during installation and testing, then revert to 0 when running in a production environment.
servervariables Defines how server variables are processed.
0: IIS server variables are sent to neither the Dispatcher nor CQ.
1: all IIS server variables (such as LOGON_USER, QUERY_STRING, ...) are sent to the Dispatcher, together with the request headers (and also to the CQ instance if not cached).

Server variables include AUTH_USER, LOGON_USER, HTTPS_KEYSIZE and many others. See the IIS documentation for the full list of variables, with details.

An example configuration:

[main]
scriptpath=/Scripts/disp_iis.dll
configpath=C:\Inetpub\Scripts\dispatcher.any
loglevel=1

Microsoft IIS - Configure IIS

IIS 5 / 6

To add the Dispatcher to the list of available ISAPI filters use the following steps:

  1. Inside the Internet Service Manager, right click the root node of the website under which you want to add the dispatcher, then open its Properties dialog.
  2. Select the tab named ISAPI Filters.
  3. Click Add.. and specify:
    • Filter Name, a name for the filter (i.e. for the Dispatcher).
    • Executable, the location of the filter (i.e. where the dll resides).
  4. Click OK to save.

Access to the cached files must be unrestricted (Anonymous Access). Otherwise the syndication request for cache flushing will be unsuccessful due to missing authentication information.

To ensure access you have to:

  1. Inside the Internet Service Manager, right click the root node of the appropriate website, then open its Properties dialog.
  2. Select the Directory Security tab.
  3. Activate Anonymous access.
  4. To activate the changes you have to restart IIS. Either from the IIS control window or from a command window:
    • net stop w3svc  - will stop the IIS web publishing service
    • net start w3svc - will start it again

IIS 7 / 7.5

To add the Dispatcher to the list of available ISAPI filters perform the following tasks.

Configure the Virtual Directory:

  1. Open Administrative Tools, then select Internet Information Services (IIS) Manager.
  2. Select your site in the tree, then using the context menu (usually right mouse click) select Add Virtual Directory....
  3. Enter the alias Scripts and the physical path of the directory created above (C:\Inetpub\Scripts).

Register the ISAPI filter:

  1. Open Administrative Tools, then select Internet Information Services (IIS) Manager.
  2. Select your site in the tree, then the tab Features View (at the bottom).
  3. Open the feature ISAPI Filters in the view.
  4. Click Add... and enter the following settings:
    • Filter Name: CQ
    • Executable: the path to disp_iis.dll (C:\Inetpub\Scripts)
  5. Click OK to save.

Register the ISAPI handler:

  1. Open Administrative Tools, then select Internet Information Services (IIS) Manager.
  2. Select your site in the tree, then the tab Features View (at the bottom).
  3. Open the feature Handler Mappings in the view.
  4. Click Add Script Map... and enter the following settings:
    • Request Path: /Scripts/disp_iis.dll
    • Executable: the path to disp_iis.dll (C:\Inetpub\Scripts)
    • Name: CQ
  5. Click OK to save.
  6. Open the feature Configuration Editor.
  7. Select the section system.webServer\handlers from Application.config.
  8. Select the first Collection (at the top), then click on ... (at the far right).
  9. The Collection Editor will appear, select the handler CQ (at the top).
  10. Change the value of the allowPathInfo flag (at the bottom) to true.
  11. Close the Collection Editor and click Apply.

Register the .json extension:

  1. Open Administrative Tools, then select Internet Information Services (IIS) Manager.
  2. Select your site in the tree, then the tab Features View (at the bottom).
  3. Open the feature MIME Types in the view.
  4. Click Add... and enter the following settings:
    • File name extension: .json
    • MIME type: application/json
  5. Click OK to save.

IIS 7.5

If your site is not completely new it might contain the hidden segment bin. To remove this:

  1. Open Administrative Tools, then select Internet Information Services (IIS) Manager.
  2. Select your site in the tree, then the tab Features View (at the bottom).
  3. Open the feature Request filtering in the view.
  4. Select the tab Hidden Segments.
  5. Remove the entry bin (if existing).

If you would like the dispatcher to write a log file instead of writing to the event log then do the following:

  1. First, configure a logging location in disp_iis.ini. Open the disp_iis.ini file for editing, this file is most likely stored under C:\inetpub\scripts\disp_iis.ini.
  2. Add the following value:
    logfile=C:\inetpub\logs\dispatcher\dispatcher.log
  3. Using Windows Explorer, go to C:\inetpub\logs directory.
  4. Create a directory named dispatcher
  5. Right click the dispatcher directory and select Properties in the contextual dialog
  6. Select the Security tab
  7. Click Edit
  8. Click Add
  9. Open Administrative Tools, then select Internet Information Services (IIS) Manager.
  10. In the IIS Manager, browse down the tree and select the IIS site where you have configured the dispatcher. On the right side of the window, click Advanced Settings.
  11. Highlight and copy the value under Application Pool (copy by pressing [Ctrl]+c).
  12. Go back to the security Add dialog.
  13. In the editable box, enter IIS AppPool\ then paste the value from step 10. You should have entered something like this IIS AppPool\DefaultAppPool.
  14. Click Check Names (this should cause the user id to be underlined).
  15. Click OK.
  16. For the newly added user, enable all access rights except Full Control.
  17. Click OK.

Next Steps

Before you can start using the Dispatcher you must now:

Apache Web Server

Caution

Instructions for installation under both Windows and Unix are covered here.

Please be careful when selecting which to execute.

Note

Running the following configuration can lead to segmentation faults and httpd problems:

Dispatcher version: 4.0.0-3 on AIX
httpd: IBM IHS 6.0.2.15 on AIX

Apache Web Server - Installing your Apache Web Server

For Information about how to install an Apache Web Server read the installation manual - either online or in the distribution.

Caution

If you are creating an Apache binary by compiling the source files yourself, make sure that you turn on dynamic modules support. This can be done by using any of the --enable-shared options. As minimum include the mod_so module.

More information can be found in the Apache Web Server installation manual.

Apache Web Server - Add the Dispatcher Module

The Dispatcher comes as either:

  • Windows: a Dynamic Link Library (DLL)
  • Unix: a Dynamic Shared Object (DSO)

See Installation for information on accessing the installation archive files, in particular the specific file to select for your environment.

The installation archive files contains the following files - dependent on whether you have selected Windows or Unix:

File Description
disp_apache<x.y>.dll Windows:
The Dispatcher dynamic link library file.
dispatcher-apache<x.y>-<rel-nr>.so Unix:
The Dispatcher shared object library file.
mod_dispatcher.so Unix:
An example link.
http.conf.disp<x> An example configuration file for the Apache server.
dispatcher.any An example configuration file for the Dispatcher.
README.dispatcher.apache Readme file holding installation instructions and last-minute information.
Note: Please check this file before starting the installation.
release-notes.txt
The release notes; listing issues fixed in the current and past releases.

Use the following steps to add the Dispatcher to your Apache Web Server:

  1. Place the Dispatcher file in the appropriate Apache module directory:
    • Windows:
      Place disp_apache<x.y>.dll in <APACHE_ROOT>/modules
    • Unix:
      Locate either the <APACHE_ROOT>/libexec or <APACHE_ROOT>/modules directory according to your installation.
      Copy dispatcher-apache<options>.so into this directory.
      To simplify long-term maintenance you can also create a symbolic link named mod_dispatcher.so to the Dispatcher:
          ln -s dispatcher-apache<x>-<os>-<rel-nr>.so mod_dispatcher.so

Apache Web Server - Configure your Apache Web Server for the Dispatcher

Note

The ModMimeUsePathInfo parameter has been added as of version 4.0.9. So it should only be used and configured if you are using this version, or higher.

The Apache Web Server needs to be configured, using httpd.conf. In the Dispatcher installation kit you will find an example configuration file named httpd.conf.disp<x>.

These steps are compulsory:

  1. Navigate to <APACHE_ROOT>/conf.
  2. Open httpd.conf for editing.
  3. The following configuration entries must be added, in the order listed:
    • LoadModule to load the module on start up.
    • AddModule to enable the module. (Apache 1.3 only).
    • Dispatcher-specific configuration entries, including DispatcherConfig,DispatcherLog and DispatcherLogLevel.
    • SetHandler to activate the Dispatcher. LoadModule.
    • ModMimeUsePathInfo to configure behavior of mod_mime.

The following configuration steps are optional, but recommended:

  1. Change the owner of the htdocs directory:
    • The apache server starts as root, though the child processes start as daemon (for security purposes). The DocumentRoot (<APACHE_ROOT>/htdocs) must belong to the user daemon:
          cd <APACHE_ROOT>
          chown -R daemon:daemon htdocs
  2. For Apache 1.3 only:
    Remove the Multiviews option for directories handled by the dispatcher.
    • When a file is requested and its parent directory does not yet exist, the negotation module returns a 403 (FORBIDDEN) response before the dispatcher has the opportunity to create the file and its parent directories. Therefore, you should disable the MultiViews option inside directories that are handled by the dispatcher.
      To do this, remove the following lines in httpd.conf:
          Options Indexes FollowSymLinks
          Options Indexes MultiViews

LoadModule

The following table lists examples that can be used; the exact entries are according to your specific Apache Web Server:

 Windows ...
LoadModule dispatcher_module modules\disp_apache.dll
...
 Unix
 (assuming  symbolic link)
...
LoadModule dispatcher_module libexec/mod_dispatcher.so
...

Note

The first parameter of each statement must be written exactly as in the above examples.

See the example configuration files provided and the Apache Web Server documentation for full details about this command.

AddModule (Apache 1.3 only)

A list of AddModule statements enable modules within the Apache Web Server.

The order of AddModule statements in the configuration file is important if there are interdependencies between the modules. It is recommended to position the AddModule statement for the Dispatcher as the last entry; for example:

AddModule disp_apache.c

Note

The first parameter of each statement must be written exactly as in the above examples.

See the example configuration files provided and the Apache Web Server documentation for full details about this command.

Dispatcher specific configuration entries

The Dispatcher-specific configuration entries are placed after the LoadModule entry. The following table lists an example configuration that is applicable for both Unix and Windows:

Windows
and
Unix
 ...
<IfModule disp_apache2.c>
  DispatcherConfig conf/dispatcher.any
  DispatcherLog    logs/dispatcher.log
  DispatcherLogLevel 3
  DispatcherNoServerHeader 0
  DispatcherDeclineRoot 0
  DispatcherUseProcessedURL 0
  DispatcherPassError 0
</IfModule>
...

The individual configuration parameters:

DispatcherConfig Location and name of the configuration file.
DispatcherLog Location and name of the log file.
DispatcherLogLevel Log level for the log file:
0 - Errors
1 - Warnings
2 - Infos
3 - Debug
Note: It is recommended to set the log level to 3 during installation and testing, then to 0 when running in a production environment.
DispatcherNoServerHeader Defines the Server Header to be used:
undefined or 0 - the HTTP server header contains the CQ version.
1 - the Apache server header is used.
DispatcherDeclineRoot Defines whether to decline requests to the root "/":
0 - accept requests to /
1 - requests to / are not handled by the dispatcher; use mod_alias for the correct mapping.
DispatcherUseProcessedURL Defines whether to use pre-processed URLs:
0 - use the original URL passed to the web server.
1 - the dispatcher uses the URL already processed by the handlers that precede the dispatcher (i.e. mod_rewrite) instead of the original URL passed to the web server.
See the Apache web site documentation for information about mod_rewrite; for example, Apache 2.2. When using mod_rewrite, it is advisable to use the flag 'passthrough|PT' (pass through to next handler) to force the rewrite engine to set the uri field of the internal request_rec structure to the value of the filename field.
DispatcherPassError
Defines how to support 40x error codes for ErrorDocument handling:
0 - the dispatcher spools all error responses to the client.
1 - the dispatcher does not spool an error response to the client (where the status code is greater or equal than 400), but passes the status code to Apache, which e.g. allows an ErrorDocument directive to process such a status code.

Note

Path entries are relative to the root directory of the Apache Web Server.

Note

The default settings for the Server Header are:
    ServerTokens               Full

    DispatcherNoServerHeader   0

Which shows the CQ version (for statistical purposes). If you want to disable such information being available in the header you can set:
    ServerTokens               Prod

See the Apache Documentation about ServerTokens Directive (for example, for Apache 2.2) for more information.

SetHandler

After these entries you must add a SetHandler statement to the context of your configuration (<Directory>, <Location>) for the Dispatcher to handle the incoming requests. The following example configures the Dispatcher to handle requests for the complete website:

Windows
and
Unix
...
<Directory />
  <IfModule disp_apache2.c>
    SetHandler dispatcher-handler
  </IfModule>

  Options FollowSymLinks
  AllowOverride None
</Directory>
...

The following example configures the Dispatcher to handle requests for a virtual domain:

Windows ...
<VirtualHost 123.45.67.89>
  ServerName www.mycompany.com
  DocumentRoot <cache-path>\docs
  <Directory <cache-path>\docs>
    <IfModule disp_apache2.c>
      SetHandler dispatcher-handler
    </IfModule>
    AllowOverride None
  </Directory>
</VirtualHost>
...
Unix  ...
<VirtualHost 123.45.67.89>
  ServerName www.mycompany.com
  DocumentRoot /usr/apachecache/docs
  <Directory /usr/apachecache/docs>
    <IfModule disp_apache2.c>
      SetHandler dispatcher-handler
    </IfModule>
    AllowOverride None
  </Directory>
</VirtualHost>
...

Note

The parameter of the SetHandler statement must be written exactly as in the above examples, as this is the name of the handler defined in the module.

See the example configuration files provided and the Apache Web Server documentation for full details about this command.

ModMimeUsePathInfo

After the SetHandler statement you should also add the ModMimeUsePathInfo definition.

Note

The ModMimeUsePathInfo parameter has been added in the Dispatcher v 4.0.9. It should only be used for this version, or higher.

The ModMimeUsePathInfo parameter should be set On for all Apache configurations:

    ModMimeUsePathIfo On

The mod_mime module (see for example, Apache Module mod_mime) is used to assign content metadata to the content selected for an HTTP response. The default setup means that when mod_mime determines the content type, only the part of the URL that maps to a file or directory will be considered.

When On, the ModMimeUsePathInfo parameter specifies that mod_mime is to determine the content type based on the complete URL; this means that virtual resources will have metainformation applied based on their extension.

The following example activates ModMimeUsePathInfo:

Windows
and
Unix

...
<Directory />
  <IfModule disp_apache2.c>
    SetHandler dispatcher-handler
    ModMimeUsePathInfo On
  </IfModule>

  Options FollowSymLinks
  AllowOverride None
</Directory>
...

 

Before you can start using the Dispatcher you must:

Sun Java System Web Server / iPlanet

Note

Instructions for both Windows and Unix environments are covered here.

Please be careful when selecting which to execute.

Sun Java System Web Server / iPlanet - Installing your Web Server

For full information on how to install these web servers, please refer to their respective documentation:

  • Sun Java System Web Server
  • iPlanet Web Server

Sun Java System Web Server / iPlanet - Add the Dispatcher Module

The Dispatcher comes as either:

  • Windows: a Dynamic Link Library (DLL)
  • Unix: a Dynamic Shared Object (DSO)

See Installation for information on accessing the installation files; in particular the specific file to select for your environment.

The installation archive files contains the following files - dependent on whether you have selected Windows or Unix:

File Description
disp_ns.dll Windows:
The Dispatcher dynamic link library file.
dispatcher.so Unix:
The Dispatcher shared object library file.
dispatcher.so Unix:
An example link.
obj.conf.disp  An example configuration file for the iPlanet / Sun Java System web server.
dispatcher.any  An example configuration file for the Dispatcher.
README.dispatcher.ns  Readme file holding installation instructions and last-minute information.
Note: Please check this file before starting the installation.
 release-notes.txt The release notes; listing issues fixed in the current and past releases.

Use the following steps to add the Dispatcher to your web server:

  1. Place the Dispatcher file in the web server's plugin directory:
    • Windows:
      Place disp-ns.dll in the directory <WEBSERVER_ROOT>/plugins.
    • Unix:
      Place dispatcher.so in the directory <WEBSERVER_ROOT>/plugins.

Sun Java System Web Server / iPlanet - Configure for the Dispatcher

The web server needs to be configured, using obj.conf. In the Dispatcher installation kit you will find an example configuration file named obj.conf.disp.

  1. Navigate to <WEBSERVER_ROOT>/config.
  2. Open obj.conf for editing.
  3. Copy the line that starts:
        Service fn="dispService"
    from obj.conf.disp to the initialization section of obj.conf.
  4. Save the changes.
  5. Open magnus.conf for editing.
  6. Copy the two lines that start:
      Init funcs="dispService, dispInit"
      and
      Init fn="dispInit"
    from obj.conf.disp to the initialization section of magnus.conf.
  7. Save the changes.

Note

The following configurations should all be on one line and the $(SERVER_ROOT) and $(PRODUCT_SUBDIR) must be replaced by the respective values.

Init

The following table lists examples that can be used; the exact entries are according to your specific web server:

Windows
and
Unix
...
Init funcs="dispService,dispInit" fn="load-modules" shlib="$(SERVER_ROOT)/plugins/dispatcher.so"
Init fn="dispInit" config="$(PRODUCT_SUBDIR)/dispatcher.any" loglevel="1" logfile="$(PRODUCT_SUBDIR)/logs/dispatcher.log"
...
where:
Parameter Description
config Location and name of the configuration file dispatcher.any.
logfile  Location and name of the log file.
loglevel  Log level for when writing messages to the log file:
0 Errors
1 Warnings
2 Infos
3 Debug
Note: It is recommended to set the log level to 3 during installation and testing and to 0 when running in a production environment.

Depending on your requirements you can define the Dispatcher as a service for your objects. To configure the Dispatcher for your entire website modify the default object:

Windows ...
NameTrans fn="document-root" root="$(PRODUCT_SUBDIR)\dispcache"
...
Service fn="dispService" method="(GET|HEAD|POST)" type="*\*"
...
Unix  ...
NameTrans fn="document-root" root="$(PRODUCT_SUBDIR)/dispcache"
...
Service fn="dispService" method="(GET|HEAD|POST)" type="*/*"
...

Before you can start using the Dispatcher you must:

Configuring the Dispatcher

The following sections describe how to configure various aspects of the Dispatcher.

IPv4 and IPv6

All elements of CQ (e.g. CQ, CQSE, CRX, the Dispatcher, etc) can be installed in both IPv4 and IPv6 networks.

Operation is seamless as no special configuration is required, when needed you can simply specify an IP address using the format that is appropriate to your network type.

This means that when an IP address needs to be specified you can select (as required) from:

  • an IPv6 address
    for example http://[ab12::34c5:6d7:8e90:1234]:4502
  • an IPv4 address
    for example http://123.1.1.4:4502
  • a server name
    for example, http://www.yourserver.com:4502
  • the default case of localhost will be interpreted for both IPv4 and IPv6 network installations
    for example, http://localhost:4502

Including Configuration File(s)

By default the Dispatcher configuration is stored in dispatcher.any, though you can change the name and location of this file during installation.

You can also include files:

  • If your configuration file is large you can split it into several smaller files (that are easier to manage) then include these. 
  • To include files that are generated automatically.

For example, to include the file myFarm.any in the /farms configuration use the following code:

/farms
  {
  $include "myFarm.any"
  }

You can also use the asterisk ("*") as a wildcard to specify a range of files to include.

For example, if the files farm_1.any through to farm_5.any contain the configuration of farms one to five, you can include them as follows:

/farms
  {
  $include "farm_*.any"
  }

Configuration Parameters

One dispatcher can be used with multiple websites; for example:

  • intranet.myCompany.com
  • internet.myCompany.com
  • www.myFlagshipProduct.com

To configure your Dispatcher you:

  • assign a unique /name
  • configure the caching and rending parameters for each /farm
  • configure logging

An example configuration is structured as follows:

# name of the dispatcher
/name "internet-server"

# each farm configures a set off (loadbalanced) renders
/farms
  {
  # first farm entry (label is not important, just for your convenience)
  /website 
    {  
    /clientheaders
      {
      # List of headers that are passed on
      }
    /virtualhosts
      {
      # List of URLs for this Web site
      }
    /sessionmanagement 
      {
      # settings for user authentification
      }
    /renders
      {
      # List of CQ instances that render the documents
      }
    /filter
      {
      # List of filters
      }
    /cache
      {
      # Cache configuration
      /rules
        {
        # List of cachable documents
        }
      /invalidate
        {
        # List of auto-invalidated documents
        }
      }
    /statistics
      {
      /categories
        {
        # The document categories that are used for load balancing estimates
        }
      }
    /stickyConnectionsFor "/myFolder"
    /health_check
      {
      # Page gets contacted when an instance returns a 500
      }
    /retryDelay "1"
    /numberOfRetries "5"
    /unavailablePenalty "1"
    }
  }

/name

With the /name parameter you assign a unique name (of your choice) to your Dispatcher instance.

/ignoreEINTR

Caution

This option is not usually needed. You only need to use this when you see the following log messages:

    Error while reading response: Interrupted system call

Any file system oriented system call can be interrupted EINTR if the object of the system call is located on a remote system accessed via NFS. Whether these system calls can time out or be interrupted is based on how the underlying file system was mounted on the local machine.

The parameter can be used if your instance has such a configuration and you have seen the following log message:

Error while reading response: Interrupted system call

Internally the the dispatcher reads the response from the remote server (i.e. CQ) using a loop that can be represented as:

while (response not finished) {
     read more data
}

Such messages can be generated when the EINTR occurs in the "read more data" section and are caused by the reception of a signal before any data was received.

To ignore such interrupts you can add the following parameter to dispatcher.any (before /farms):

/ignoreEINTR "1"

Setting /ignoreEINTR to "1" will instruct the dispatcher to stay in the loop for reading data until the complete response has been read. The default value is 0 and deactivates the option.

/farms (Farms or Website Global Settings)

The /farms section defines a list of farms or websites; they can have any alphanumeric (a-z, 0-9) name.

Each /farms section defines:

  • A set of load-balanced renderers.
  • The IP addresses and ports of the publish instances to serve and cache content from.
  • Further characteristics including where to cache files, what to cache.

For each farm you can specify separate caching and rendering parameters, some of which have sub-parameters:

Purpose Parameter Sub-parameters
Default homepage
(optional)
/homepage
 
Client Headers /clientheaders
 
Virtual Host /virtualhosts  
Session Management and Authentication /sessionmanagement
/directory (mandatory)
/encode (optional)
/header (optional)
/timeout (optional)
Rendering Allocation /renders  
Filters /filter  
Forward Syndication Requests /propagateSyndPost
 
Cache /cache
/docroot
/statfile
/statfileslevel
/allowAuthorized
Cacheable Documents
/cache
/rules
 
Autoinvalidated Files
/cache
/invalidate  
Flush Requests
/cache
(only allowed from authorized clients)
/allowedClients
 
Internal Response Statistics /statistics
 
Sticky Connection Folder /stickyConnectionsFor  
Health Check /health_check  
Retry Delay /retryDelay  
Unavailable Penalty /unavailablePenalty  

The following example shows the skeleton definition for two farms named /daycom and /docsdaycom:

#name of dispatcher
/name "day sites"

#farms section defines a list of farms or sites
/farms
{
/daycom
{
...
}
/docdaycom
{
...
}
}

If you use more than one render farm, the list is evaluated bottom-up. This is particularly relevant when defining Virtual Hosts for your websites.

/homepage (IIS only; optional)

Caution

This parameter is IIS only and will not have any effect in the other web servers.

For example, when using Apache use mod_rewrite. See the Apache web site documentation for information about mod_rewrite; for example, Apache 2.2. When using mod_rewrite, it is advisable to use the flag 'passthrough|PT' (pass through to next handler) to force the rewrite engine to set the uri field of the internal request_rec structure to the value of the filename field.

This specifies the page that the Dispatcher returns when no specific target page or file is requested.

Typically this is the page returned when a user specifies an URL such as www.myCompany.com. The /homepage parameter is required if there is no automatic redirection from the server (for example IIS) nor from CQ (for example, if you shut CQ down after the content is cached). To display the index.html page in such circumstances, use the following setting:

/homepage "/index.html"

This is defined within the /farms section, for example:

#name of dispatcher
/name "day sites"

#farms section defines a list of farms or sites
/farms
{
/daycom
{
/homepage "/index.html"
...
}
/docdaycom
{
...
}
}

/clientheaders (Client Headers)

This defines a list of all HTTP headers passed from the client to the CQ instance.

By default the Dispatcher forwards the standard HTTP headers to the CQ instance. In some instances, you might want to:

  • add headers: for example, custom headers
  • remove headers: for example, authentication headers which are only relevant to the web server

If you need to customize the set of headers you have to specify the entire set of headers to be forwarded; including any of those forwarded by default. Such a list might look as follows:

/clientheaders
  {
  "referer"
  "user-agent"
  "authorization"
  "from"
  "content-type"
  "content-length"
  "accept-charset"
  "accept-encoding"
  "accept-language"
  "accept"
  "host"
  "if-match"
  "if-none-match"
  "if-range"
  "if-unmodified-since"
  "max-forwards"
  "proxy-authorization"
  "proxy-connection"
  "range"
  "cookie"
  "cq-action"
  "cq-handle"
  "handle"
  "action"
  "cqstats"
  }

/virtualhosts (Virtual Hosts)

The virtual hosts section is a list of all hostname/URI combinations that the Dispatcher accepts for this website. You can also use the asterisk ("*") character as a wildcard. For example, the section:

   /virtualhosts
    {
    "www.myCompany.com"
    "www.myCompany.ch"
    "www.mySubDivison.*"
    }

handles the requests for:

  • myCompany - both the .com and the .ch domains
  • mySubDivision - all domains

To configure the Dispatcher to handle all requests:

   /virtualhosts
    {
    "*"
    }

If you use more than one render farm, the list is evaluated bottom-up.  In the following example (which shows only the relevant sections):

  • all the pages in the /products folder are sent to server 2
  • all other pages to server 1
/farms
  {
  /myProducts 
    { 
    /virtualhosts
      {
      "www.mycompany.com"
      }
    /renders
      {
      /hostname "server1.myCompany.com"
      /port "80"
      }
    }
  /myCompany 
    { 
    /virtualhosts
      {
      "www.mycompany.com/products/*"
      }
    /renders
      {
      /hostname "server2.myCompany.com"
      /port "80"
      }
    }
  }

/sessionmanagement (Session Management and Authentification)

Caution

/allowAuthorized must be set to "0" in the /cache section in order to enable this feature.

This feature allows you to create a secure session for access to the render farm so that users:

  • need to log in before they can access any page in the farm
  • have access to all pages in the farm after logging in

/sessionmanagement is defined within /farms.

Caution

If your website has section with different access requirements you will need to specify multiple farms.

/sessionmanagement has several sub-parameters:

/directory (mandatory)

The directory that stores the session information. If the directory does not exist, it is created.

/encode (optional)

How the session information is encoded. Use "md5" for encryption using the md5 algorithm, or "hex" for hexadecimal encoding. If you encrypt the session data, a user with access to the file system cannot read the session contents. The default is "md5".

/header (optional)

The name of the HTTP header or cookie that stores the authorization information. If you store the information in the http header, use HTTP:<header-name>. To store the information in a cookie, use Cookie:<header-name>. If you do not specify a value HTTP:authorization is used.

/timeout (optional)

The number of seconds until the session times out after it has been used last. If not specified "800" is used, so the session times out a little over 13 minutes after the last request of the user.

An example configuration looks as follows:

/sessionmanagement 
  { 
  /directory "/usr/local/apache/.sessions" 
  /encode "md5" 
  /header "HTTP:authorization" 
  /timeout "800" 
  }

/renders (Rendering Allocations)

This section defines where the Dispatcher will allocate requests to render a document.

If you use a single CQ instance for rendering, you can specify it as in the following example:

/renders
  {
    /myRenderer
      {
      # hostname or IP of the renderer
      /hostname "cq.myCompany.com"
      # port of the renderer
      /port "80"
      # connection timeout in milliseconds, "0" (default) waits indefinitely
      /timeout "0"
      }
  }

If the CQ instance runs on the same computer as the Dispatcher, you can specify it as in the following example:

/renders
  {
    /myRenderer
     {
     /hostname "127.0.0.1"
     /port "3402"
     }
  }

To distribute the workload equally among multiple CQ instances use:

/renders
  {
    /myFirstRenderer
      {
      /hostname "cq.myCompany.com"
      /port "80"
      }
    /mySecondRenderer
      {
      /hostname "127.0.0.1"
      /port "3402"
      }
  }

/filter (Filters)

Using filters, you can specify which requests are accepted by the Dispatcher module. All other requests are sent back to the server, where they are offered to the other modules that run on the web server. 

Caution

Please see the Security Checklist for further considerations when restricting access by using the Dispatcher.

If you want the Dispatcher to handle all files, use:

/filter
  {
  /0001
    {
    /glob "*"
    /type "allow"
    }
  }

Filters also allow you to exclude (deny) access to various elements; for example:

  • ASP pages
  • sensitive areas within a publish instance

Requests to an explicitly denied area result in a 404 error code (page not found) being returned.

The following filter denies access to ASP pages:

/filter
  {
  /0001
    {
    /glob "*"
    /type "allow"
    }
  /0002
    {
    /glob "*.asp *"
    /type "deny"
    }
  }

A match is needed for the entire request line, not just the URI. Therefore the above defines the match as "*.asp *", because the full request line is GET /home.asp HTTP/1.0.

You can also match other parts of the request. The following filter denies access to form data submitted by the POST method:

/filter
  {
  /0001
    {
    /glob "*"
    /type "allow"
    }
  /0002
    {
    /glob "POST *"
    /type "deny"
    }
  }

The following example shows a filter used to deny external access to the Workflow console:

...
/filter
  {
  /0001
    {
    /glob "*"
    /type "allow"
    }
  /0002
    {
    # deny access to CQ Workflow console
    /glob "* /libs/cq/workflow/content/console*"
    /type "deny"
    }
  }
...

If your publish instance uses a web application context (for example publish) this can also be added to your filter definition.

...
/filter
  {
  /0001
    {
    /glob "*"
    /type "allow"
    }
 /0003
    {
    # deny access to CQ Workflow console
    /glob "* /publish/libs/cq/workflow/content/console/archive*"
    /type "deny"
    }
  }
...

If you still need to access single pages within the restricted area, you can allow access to them. For example, to allow access to the Archive tab within the Workflow console add the following section after the previous example:

...
/0004
{
/glob "* /libs/cq/workflow/content/console/archive*"
/type "allow"
}
...

Note

When multiple /glob patterns apply to a request, the last /glob pattern that applies is effective.

/propagateSyndPost (Forward Syndication Requests)

Syndication requests are usually intended for the Dispatcher only, so by default they are not sent to the CQ instance.

If necessary, you can forward syndication requests to the Dispatcher. This is done by setting this parameter to "1". If set, you must make sure that POST requests are not denied in the filter section.

/cache (Cache)

The /cache section specifies some general aspects of how and where the Dispatcher caches documents.

Various sub-parameters can be defined:

/docroot

This link points to the document root of the web server. This is where the Dispatcher stores the cached documents, and this is where the web server looks for them. If you use multiple render farms, you have to define a different document root on the web server for each farm, and specify the corresponding link here.

/statfile

This link points to the statfile, which the Dispatcher uses to keep track of the last content update. This can be any file on the web server. The file itself is empty, only the timestamp is used.

/statfileslevel

Creates statfiles in all folders down to the level you specify. Use this if you only want to invalidate sections of the cache, not the entire cache. For a default structure with language folders (for example, the English site under /content/myWebsite/en), use /statfileslevel = "2" to invalidate per language. This means that when the Dispatcher invalidates the English part of the website, other sections are not affected (for example, the German and French sections). If you use this parameter, do not specify the /statfile parameter.

/allowAuthorized

By default, requests that carry an authentication header are not cached. This is because authentication is not checked when a document is returned from the cache - so the document would be displayed for a user who does not have the necessary rights. However, in some setups it can be permissible to cache authenticated documents.

Note

/allowAuthorized must be set to "0" in order to enable the /sessionmanagement feature.

An example cache section might look as follows:

/cache
  {
  /docroot "/opt/dispatcher/cache"
  /statfile  "/tmp/dispatcher-website.stat"          
  /allowAuthorized "0"
      
  /rules
    {
    # List of files that are cached
    }

  /invalidate
    {
    # List of files that are auto-invalidated
    }
  }

/rules (within /cache) (Cacheable Documents)

The list of cachable documents determines which documents are cached, though the Dispatcher never caches a document in the following circumstances:

  • If the HTTP method is not GET.
    Other common methods are POST for form data and HEAD for the HTTP header.
  • If the request URI contains a question mark ("?").
    This usually indicates a dynamic page, such as a search result that does not need to be cached.
  • The file extension is missing.
    The web server needs the extension to determine the document type (the MIME-type).
  • The authentication header is set (this can be configured)

If you do not have dynamic pages (beyond those already excluded by the above rules), you can let the Dispatcher cache everything. The rules section for this looks as follows:

/rules
  {
  /0000
    {
    /glob "*"
    /type "allow"
    }
  }

If there are some sections of your page that are dynamic (for example a news application) or within a closed user group, you can define exceptions:

Note

Closed user groups must not be cached as user rights are not checked for cached pages.

/rules
  {
  /0000
    {
    /glob "*"
    /type "allow"
    }
  /0001
    {
    /glob "/en/news/*"
    /type "deny"
    }
  /0002
    {   
    /glob "*/private/*"
    /type "deny"
    }   

Compression (Apache 1.3 only)

On Apache 1.3 web servers you can also compress the cached documents. This allows Apache to return the document in a compressed form if so requested by the client.

Note

Currently only the gzip format is supported.

Only applicable for Apache 1.3.

The following rule caches all documents in compressed form; Apache can return either the uncompressed or the compressed form to the client:

/rules
  {
  /rulelabel
    {
    /glob "*"
    /type "allow"
    /compress "gzip"
    }
  }

/invalidate (within /cache) (Autoinvalidated Files)

This defines a list of all documents that are automatically rendered invalid after a content update.

With auto-invalidation the Dispatcher does not physically delete the pages after a content update, but checks them for validity when they are next requested. Documents in the cache that are not auto-invalidated will remain in the cache until they are deleted by a content update.

Auto-invalidate is typically used for HTML pages. As these often contain navigation elements and links to other pages, it is very hard to determine whether a page is affected by a content update. To stay on the safe side, you usually auto-invalidate all HTML pages. An example configuration for this looks as follows:

  /invalidate
  {
  /0000
    {
    /glob "*"
    /type "deny"
    }
  /0001
    {
    /glob "*.html"
    /type "allow"
    }
  }

With this configuration, when /content/geometrixx/en.html is replicated:

  • all the files with pattern en.* are removed from the /content/geometrixx/ folder
  • the /content/geometrixx/en/jcr_content folder is removed
  • all the other files that match the /invalidate configuration are not deleted: they will be deleted with the next request. In our example /content/geometrixx.html is not deleted, it will be deleted when /content/geometrixx.html is requested.

 

If you offer automatically generated PDF and ZIP files for download, you might have to auto-invalidate these as well. A configuration example this looks as follows:

/invalidate
  {
  /0000
    {
    /glob "*"
    /type "deny"
    }
  /0001
    {
    /glob "*.html"
    /type "allow"
    }
  /0002
    {
    /glob "*.zip"
    /type "allow"
    }
  /0003
    {
    /glob "*.pdf"
    /type "allow"
    }
  }

/allowedClients (in /cache) (Flush Requests only allowed from authorized Clients)

An ACL defines specific clients that are allowed to flush the cache. The globbing patterns are matched against the IP.

The following example:

  1. denies access to any client
  2. explicitly allows access to the localhost
/allowedClients
  {
  /0001
    {
    /glob "*.*.*.*"
    /type "deny"
    }
  /0002
    {
    /glob "127.0.0.1"
    /type "allow"
    }
  }

Caution

It is recommended that you define the /allowedClients.

If this is not done, any client can issue a call to clear the cache; if this is done repeatedly it can severely impact the site performance.

/statistics (Internal Response Statistics)

The /statistics section defines the categories that the Dispatcher uses for load balancing estimates.

Note

If you do not use load balancing, you can omit this section.

The Dispatcher keeps a list of typical response times per CQ instance and per request category. When the Dispatcher needs to render a document, it:

  • checks these lists to determine which CQ instance currently has the most resources available to handle it
  • reserves these resources when considering future requests.

The statistics are internal and cannot be accessed from outside. By default, the Dispatcher makes a distinction between HTML documents and everything else. The default /statistics section looks as follows:

/statistics
  {
  /categories
    {
    /html
      {
      /glob "*.html"
      }
    /others
      {
      /glob "*"
      }
    }
  }
If you want to add a special category for search pages, the section would look as follows:

Note

The more specific requests must be stated first.

/statistics
  {
  /categories
    {
    /search
      {
      /glob "*search.html"
      }
    /html
      {
      /glob "*.html"
      }
    /others
      {
      /glob "*"
      }
    }
  }

/stickyConnectionsFor (Sticky Connection Folder)

You can define one folder that contains sticky documents; this will be accessed using the URL.

The Dispatcher sends all requests, from a single user, that are in this folder to the same web server. This ensures that session data is present and consistent for all documents. This mechanism uses the renderid cookie. To define sticky connections (which will also set the cookie as necessary) for the folder /myFolder, use the following line:

/stickyConnectionsFor "/myFolder"

/health_check (Health Check with URL Probing)

When a 500 status code occurs the specified "health check" page is checked. If this page also returns a 500 status code the instance is considered to be unavailable and a configurable time penalty is applied before retrying.

/health_check
{
# Page gets contacted when an instance returns a 500
/url "/health_check.html"
}

/retryDelay (Retry Delay)

/retryDelay "1"

retryDelay is the intermediate sleep time (defined in seconds) that is applied after one render was unresponsive and before contacting the next one in the list of "renders".

"1" is the default value used if not explicitly defined. The default should be applicable in most case, so this setting should hardly ever need changing.

/numberOfRetries (Number of Retries)

/numberOfRetries "5"

numberOfRetries is the number of times that the dispatcher should try to reach any render from the farm before returning a failure. This number could be greater than the number of renders.

"5" is the default value used if not explicitly defined.

/unavailablePenalty (Unavailable Penalty)

/unavailablePenalty "1"

unavailablePenalty is the number of seconds that is applied to the Dispatcher statistics after a render is unresponsive. In the Dispatcher statistics the total response time for this render is then increased by the appropriate number of seconds, which in turn moves it to the end of the list of renders as they are sorted by increasing response time.

This might happen, for example, when the TCP/IP connection to the designated hostname/port cannot be established, either because CQ5 is not running (and not listening) or because of a network-related problem.

Logging

In the web server configuration, you can set:

  • The location of the Dispatcher log file.
  • The log level.

Refer to the web server documentation and the readme file of your Dispatcher instance for more information.

Apache Rotated / Piped Logs

If using an Apache web server you can use the standard functionality for rotated and/or piped logs. For example, using piped logs:

    DispatcherLog "| /usr/apache/bin/rotatelogs logs/dispatcher.log%Y%m%d 604800"

This will automatically rotate:

  • the dispatcher log file; with a timestamp in the extension (logs/dispatcher.log%Y%m%d).
  • on a weekly basis (60 x 60 x 24 x 7 = 604800 seconds).

Please see the Apache web server documentation on Log Rotation and Piped Logs; for example Apache 2.2.

Note

Upon installation the default log level is high (i.e. level 3 = Debug), so that the Dispatcher logs all errors and warnings. This is very useful in the initial stages.

However, this requires additional resources, so when the Dispatcher is working smoothly according to your requirements, you can(should) lower the log level.

Confirm Basic Operation

To confirm basic operation and interaction of the web server, dispatcher and CQ instance you can use the following steps:

  1. Set the loglevel to 3.
  2. Start the web server; this also starts the Dispatcher.
  3. Start the CQ instance.
  4. Check the log and error files for your web server and the Dispatcher.
    Depending on your web server you should see messages such as:
           [Thu May 30 05:16:36 2002] [notice] Apache/2.0.50 (Unix) configured
    and:
           [Fri Jan 19 17:22:16 2001] [I] [19096] Dispatcher initialized (build XXXX)
  5. Surf the website via the web server. Confirm that content is being shown as required.
    For example, on a local installation where CQ runs on port 4502 and the web server on 80 access the Websites console using both:
        http://localhost:4502/libs/wcm/core/content/siteadmin.html
        http://localhost:80/libs/wcm/core/content/siteadmin.html
    The results should be identical. Confirm access to other pages with the same mechanism.
  6. Check the contents of the cache directory - is it being filled?
  7. Activate a page - check that the cache is being flushed correctly.
  8. If everything is operating correctly you can reduce the loglevel to 0 again.

Integration with CQ

If the Dispatcher is being used with CQ the interaction must be configured to ensure clean cache management.

Dependent on your environment, the configuration selected can also increase performance.

Disabling External Access to Specific Folders

Note

Since Dispatcher 4.0.11 it is no longer necessary to set DispatcherUseProcessedURL 1 (in httpd.conf when using with Apache).

The filter section will always be applied to the unprocessed, sanitized URL, as received by the web server. 

When configuring the Dispatcher you should restrict external access as far as possible; for example, so that only the following are available to external visitors:

  • /content
  • miscellaneous content such as designs and client libraries; for example:
    • /etc/designs/default*
    • /etc/designs/mydesign*

The following /filter section (dispatcher.any) can be used as a basis in your Dispatcher configuration file; the recommended principle is to deny access to everything, then allow access to specific (limited) elements:

    # only handle the requests in the following acl. default is 'none'
# the glob pattern is matched against the first request line
/filter
{
# deny everything and allow specific entries
/0001 { /type "deny" /glob "*" }

# open consoles
# /0011 { /type "allow" /glob "* /admin/*" } # allow servlet engine admin
# /0012 { /type "allow" /glob "* /crx/*" } # allow content repository
# /0013 { /type "allow" /glob "* /system/*" } # allow OSGi console

# allow non-public content directories
# /0021 { /type "allow" /glob "* /apps/*" } # allow apps access
# /0022 { /type "allow" /glob "* /bin/*" }
/0023 { /type "allow" /glob "* /content*" } # disable this rule to allow mapped content only

# /0024 { /type "allow" /glob "* /libs/*" }
# /0025 { /type "deny" /glob "* /libs/shindig/proxy*" } # if you enable /libs close access to proxy

# /0026 { /type "allow" /glob "* /home/*" }
# /0027 { /type "allow" /glob "* /tmp/*" }
# /0028 { /type "allow" /glob "* /var/*" }

# enable specific mime types in non-public content directories
/0041 { /type "allow" /glob "* *.css *" } # enable css
/0042 { /type "allow" /glob "* *.gif *" } # enable gifs
/0043 { /type "allow" /glob "* *.ico *" } # enable icos
/0044 { /type "allow" /glob "* *.js *" } # enable javascript
/0045 { /type "allow" /glob "* *.png *" } # enable png
/0046 { /type "allow" /glob "* *.swf *" } # enable flash

# enable features
/0061 { /type "allow" /glob "POST /content/[.]*.form.html" } # allow POSTs to form selectors under content
/0062 { /type "allow" /glob "* /libs/cq/personalization/*" } # enable personalization

# deny content grabbing
/0081 { /type "deny" /glob "GET *.infinity.json*" }
/0082 { /type "deny" /glob "GET *.tidy.json*" }
/0083 { /type "deny" /glob "GET *.sysview.xml*" }
/0084 { /type "deny" /glob "GET *.docview.json*" }
/0085 { /type "deny" /glob "GET *.docview.xml*" }
/0086 { /type "deny" /glob "GET *.*[0-9].json*" }
# /0087 { /type "allow" /glob "GET *.1.json*" } # allow one-level json requests

# deny query
/0090 { /type "deny" /glob "* *.query.json*" }
}

Note

The configuration above is based on the default configuration file delivered with the Dispatcher.

It is intended as an example for use on a production environment.

Items prefixed with # are deactivated (commented out), care should be taken if you decide to activate any of these (by removing the # on that line) as this can have a security impact.

Consider the following recommendations if you do choose to extend access:

  • External access to /admin should always be completely disabled.
  • Care must be taken when allowing access to files in /libs. Access should be allowed on an individual basis.
  • Deny access to the replication configuration so it cannot be seen:
    • /etc/replication.xml*
    • /etc/replication.infinity.json*
  • Deny access to the Google Gadgets reverse proxy:
    • /libs/shindig/proxy*

For example, depending on how restrictive your filter configuration is, you may need to explicitly register each individual vanity URL. In the following example /my/vanity/url:

# enable vanity URLs
/0071 { /type "allow" /glob "/my/vanity/url" } # enable individual vanity url

Depending on your installation, there might be additional resources under /libs, /apps or elsewhere, that must be made available. You can use the access.log file as one method of determining resources that are being accessed externally.

Caution

Access to consoles and directories can present a security risk for production environments. Unless you have explicit justifications they should remain deactivated (commented out).

Caution

If you are using reports in a publish environment you should configure the Dispatcher so that access to /etc/reports is not possible for external visitors.

Invalidating Dispatcher Cache from the Authoring Environment

A replication agent is used to send a cache invalidation request from the CQ author environment to the Dispatcher when a page is published; this removes the old page content from the cache when new content is published.

To set up a CQ authoring environment, so that it invalidates the cache upon publication of a page:

  1. Open the CQ Tools console.
  2. Open the required replication agent; for example the Dispatcher Flush agent that is included in a standard installation.
  3. In the Settings tab ensure that Enabled is active.
  4. Open the Transport tab and enter the URI needed to access the dispatcher.
    If you are using the standard Dispatcher Flush agent you will probably need to update the hostname and port; for example, http://<dispatcherHost>:<portApache>/dispatcher/invalidate.cache
  5. Configure other parameters as required.Configure other parameters as required.
  6. Click OK to activate the agent.

Note

The agent for flushing dispatcher cache does not have to have a user name and password, but if configured they will be sent with basic authentication.

There are two potential issues with this approach:
  • The Dispatcher must be reachable from the authoring instance. If your network (e.g. the firewall) is configured such that access between the two is restricted, this may not be the case.
  • Publication and cache invalidation take place at the same time. Depending on the timing, a user may request a page just after it was removed from the cache, and just before the new page is published. CQ now returns the old page, and the Dispatcher caches it again. This is more of an issue for large sites.

Setting up CQ User Accounts

Various default accounts are included within the CQ installation. Those related to the Dispatcher, must be either changed, configured or deleted as appropriate.

For more information see Default Users and Groups.

Invalidating Dispatcher Cache from a Publishing Instance

Under certain circumstances performance gains can be made by transferring cache management from the authoring environment to a publishing instance. It will then be the publishing environment (not the CQ authoring environment) that sends a cache invalidation request to the Dispatcher when a published page is received.

Such circumstances include:

Note

The decision to use this method should be made by an experienced CQ administrator.

The dispatcher flush is controlled by a replication agent operating on the publish instance. However, the configuration is made on the authoring environment and then transferred by activating the agent:

  1. Open the CQ Tools console.
  2. Open the required replication agent; for example the Dispatcher Flush agent under Agents on Publish that is included in a standard installation.
  3. In the Settings tab ensure that Enabled is active.
  4. Open the Transport tab and enter the URI needed to access the dispatcher.
    If you are using the standard Dispatcher Flush agent you will probably need to update the hostname and port; for example, http://<dispatcherHost>:<portApache>/dispatcher/invalidate.cache
  5. Configure other parameters as required.Configure other parameters as required.
  6. Repeat for every publish instance affected.
  7. If you now activate a page from author to publish, you can see that this agent will initiate a standard replication. The log will include entries indicating requests coming from your publish server; for example:
  8. <publishserver> 13:29:47 127.0.0.1 POST /dispatcher/invalidate.cache 200

Using Multiple Dispatchers

In complex setups, you may use multiple Dispatchers. For example, you may use:

  • one Dispatcher to publish a website on the Intranet
  • a second Dispatcher, under a different address and with different security settings, to publish the same content on the Internet.

In such a case, make sure that each request goes through only one Dispatcher. A Dispatcher does not handle requests that come from another Dispatcher. Therefore, make sure that both Dispatchers access the CQ website directly.

Deny Access with Dispatcher Configuration

You can use the Dispatcher to seal off sensitive areas. If a dispatcher is in front of a publish instance, you can define a filter that refuses all requests to specified sensitive areas. Requests to the sensitive area then result in a 404 error code (page not found).

See how to define filters in the dispatcher.any file.

Optimizing a Website for Cache Performance

The Dispatcher offers a number of built-in mechanisms that you can use to optimize performance if your website takes advantage of them. This section tells you how to design your web site to maximize the benefits of caching.

Note

It may help you to remember that the Dispatcher stores the cache on a standard web server. This means that you:

  • can cache everything that you can store as a page and request using an URL
  • cannot store other things, such as HTTP headers, cookies, session data and form data.

In general, a lot of caching strategies involve selecting good URLs and not relying on this additional data.

HTTP Headers

HTTP headers are not cached, which may be an issue if you store the encoding in them. Taken from the cache, the page will have the default encoding for the web server. There are two ways of avoiding this problem:

  • If you use only one encoding, make sure that the encoding you use on the web server is the same as the default encoding of the CQ website.
  • Use a <META> tag in the HTML header to set the encoding, such as:
        <META http-equiv="Content-Type" content="text/html; charset=EUC-JP">

Avoid URL Parameters

If possible, avoid URL parameters for pages that you want to cache. For example, if you have a picture gallery, the following URL is never cached:
www.myCompany.com/pictures/gallery.html?event=christmas&page=1
However, you can put these parameters into the page URL, as follows:
www.myCompany.com/pictures/gallery.christmas.1.html

Note

This URL calls the same page and the same template as gallery.html. In the template definition, you can specify which script renders the page, or you can use the same script for all pages.

Customize by URL

If you allow users to change the font size (or do any other customization on the layout), make sure that the different layouts are reflected in the URL as well.

If you store them in a cookie (or any comparable mechanism) then the version which will be cached cannot be predicted.

As a result, the Dispatcher will return documents, of any font size, at random.

Making the font size part of the URL specification avoids this problem:

www.myCompany.com/news/main.large.html

Note

For most layout aspects, it is also possible to use style sheets and/or client side scripts. These will usually work very well with caching.

This is also useful for a print version, where you can use an URL such as:

    www.myCompany.com/news/main.print.html

Using the script globbing of the template definition, you can specify a separate script that renders the print pages.

Picture Titles

If you render page titles, or other text, as pictures, then it is recommended to store the files so that they are deleted upon a content update on the page:

  1. Place the picture file in the same folder as the page itself.
  2. Use the naming format: <same name as the filename of the page>.<picture name>.

For example, you can store the title of the page myPage.html in the file myPage.title.gif. This file is automatically deleted if the page is updated, so any change to the page title is automatically reflected in the cache.

Note

The file does not have to physically exist on the CQ instance. You can use a script that dynamically creates and outputs the picture. The Dispatcher then stores the picture on the web server.

Picture Navigation

If you use pictures for the navigation entries, the method is basically the same as with titles, just slightly more complex. The trick is to store all the navigation images with the target pages. If you use two pictures for normal and active, you can use the following scripts:

  • A script that displays the page, as normal.
  • A script that processes ".normal" requests and returns the normal picture.
  • A script that processes ".active" requests and returns the activated picture.

It is important that you create these pictures with the same naming handle as the page, to ensure that a content update deletes these pictures as well as the page.

For pages that are not modified, the pictures still remain in the cache, although the pages themselves are usually auto-invalidated.

Personalization

The Dispatcher cannot cache personalized data, so it is recommended that you limit personalization to where it is necessary. To illustrate why:

  • If you use a freely customizable start page, that page has to be composed every time a user requests it.
  • If, in contrast, you offer a choice of 10 different start pages, you can cache each one of them, thus improving performance.

Note

If you personalize each page (for example by putting the user's name into the title bar) you cannot cache it, which can cause a major performance impact.

However, if you have to do this, you can:

 

  • use iFrames to split the page into one part that is the same for all users and one part that is the same for all pages of the user. You can then cache both of these parts.
  • use client-side JavaScript to display personalized information. However, you have to make sure that the page still displays correctly if a user turns JavaScript off.

Sticky Connections

Sticky connections ensure that the documents for one user are all composed on the same server. If a user leaves this folder and later returns to it, the connection still sticks. Define one folder to hold all documents that require sticky connections for the website. Try not to have other documents in it. This impacts load-balancing if you use personalized pages and session data.

MIME Types

There are two ways in which a browser can determine the type of a file:

  1. By its extension (e.g. .html, .gif, .jpg, etc)
  2. By the MIME-type that the server sends with the file.

For most files, the MIME-type is implied in the file extension. i.e.:

  • files ending in .html have the MIME-type "text/html"
  • files ending in .jpg have the MIME-type "image/jpeg".

If the file has no ending, it is displayed as plain text.

The MIME-type is part of the HTTP header, and as such, the Dispatcher does not cache it. If your CQ application returns files that do not have a recognized file ending, but rely on the MIME-type instead, these files may be incorrectly displayed.

To make sure that files are cached properly, follow these guidelines:

  • Make sure that files always have the proper extension.
  • Avoid generic file serve scripts, which have URLs such as download.jsp?file=2214. Re-write the script to use URLs containing the file specification; for the previous example this would be download.2214.pdf.

Troubleshooting

Note

Please check the Dispatcher Knowledge Base for further information.

Check the Basic Configuration

As always the first steps are to check the basics:

  • Confirm Basic Operation
  • Check all log files for your web server and dispatcher. If necessary increase the loglevel used for the dispatcher logging.
  • Check your configuration:
    • Do you have multiple Dispatchers? 
      • Have you determined which Dispatcher is handling the website / page you are investigating?
    • Have you implemented filters?
      • Are these impacting the matter you are investigating?

IIS Diagnostic Tools

IIS provides various trace tools, dependent on the actual version:

  • IIS 6 - IIS diagnostic tools can be downloaded and configured
  • IIS 7 - tracing is fully integrated

These can help you monitor activity.

IIS and 404 Not Found

When using IIS you might experience 404 Not Found being returned in various scenarios. If so, see the following Knowledge Base articles.

You should also check that the dispatcher cache root and the IIS document root are set to the same directory.

Problems Deleting Workflow Models

Symptoms

Problems trying to delete workflow models when accessing a CQ author instance through the Dispatcher.

Steps to reproduce:

  1. Log in to your CQ author instance (confirm that requests are being routed through the dispatcher).
  2. Create a new workflow; for example, with the Title set to workflowToDelete.
  3. Confirm that the workflow was successfully created.
  4. Select and right click on the workflow, then click Delete.
  5. Click Yes to confirm.
  6. An error message box will appear showing:
          "ERROR 'Could not delete workflow model!!".

Resolution

Add the following headers to the /clientheaders section of your dispatcher.any file:

  • x-http-method-override
  • x-requested-with

{
  {
  /clientheaders
    {
    ...
    "x-http-method-override"
    "x-requested-with"
    }

Interference with mod_dir (Apache)

This describes how the dispatcher interacts with mod_dir inside the Apache webserver, as this can lead to various, potentially unexpected effects:

Apache 1.3

In Apache 1.3 mod_dir handles every request where the URL maps to a directory in the file system.

It will either:

  • redirect the request to an existing index.html file 
  • generate a directory listing

When the dispatcher is enabled, it processes such requests by registering itself as a handler for the content type httpd/unix-directory.

Apache 2.x

In Apache 2.x things are different. A module can handle different stages of the request, such as URL fixup. mod_dir handles this stage by redirecting a request (when the URL maps to a directory) to the URL with a / appended.

The dispatcher does not intercept the mod_dir fixup, but it will completely handle the request to the redirected URL (i.e. with / appended). This might pose a problem if the remote server (e.g. CQ5) handles requests to /a_path differently to requests to /a_path/ (when /a_path maps to an existing directory).

If this happens you must either:

  • disable mod_dir for the Directory or Location subtree handled by the dispatcher
  • use DirectorySlash Off to configure mod_dir not to append /

Your comments are welcome.
Did you notice a way we could improve the documentation on this page? Is something unclear or insufficiently explained? Please leave your comments below and we will make the appropriate changes. Comments that have been addressed, by improving the documentation accordingly, will then be removed.

COMMENTS

  • By Anonymous - 12:17 PM on Mar 16, 2012   Reply
    Hi ,


    Here is my question.

    I am storing all “ *.png” files under the “/content/dam/geometrixx/PN/f/” path.
    When I published the “1234.png” file from author stage to publish and dispatcher stages, the file is published to publish stage and to dispatcher at the location
    < /opt/apache2.2/htdocs//content/dam/geometrixx/PN/f/>
    I am trying to restrict the .png files not accessible by the end users. I am trying with below syntax in “dispatcher.any” file. But this filter condition does not seems to be work out.

    /001
    {
    /glob "*.png*"
    /type "deny"

    }

    I am able to access file using the below URL. What could be the issue how to restrict certain format of files
    http://localhost:8080/content/dam/geometrixx/PN/f/1234.png
    • By Alexandre COLLIGNON - 11:09 AM on Mar 29, 2012   Reply
      Hi,

      Your rule is correct and applied by the dispatcher. However, the dispatcher checks the rules of the dispatcher.any file only if the resource is not in the cache folder (the DocumentRoot of your webserver).

      So you have to either remove the png files of your cache or simply clean your whole cache folder.

      Hope it's helpful

      Alex

    ADD A COMMENT

     

    In order to post a comment, you need to sign-in.

    Note: Customers with DayCare user accounts need to create a new account for use on day.com.

    ***