Using Dispatcher with Multiple Domains

You are reading the Adobe Experience Manager 5.6.1 version of Using Dispatcher with Multiple Domains .
This documentation is also available for the following versions: AEM 5.6  CQ 5.5 

Note

Dispatcher versions are independent of AEM, however the Dispatcher documentation is embedded in the AEM documentation. Always use the Dispatcher documentation that is embedded in the documentation for the latest version of AEM.

You may have been redirected to this page if you followed a link to the Dispatcher documentation that is embedded in the documentation for a previous version of AEM.

Use Dispatcher to process page requests in multiple web domains while supporting the following conditions:

  • Web content for both domains is stored in a single AEM repository. 
  • The files in the Dispatcher cache can be invalidated separately for each domain. 

For example, a company publishes websites for two of their brands: Brand A and Brand B. The content for the website pages are authored in AEM, and stored in the same repository workspace:

/
| - content
     | - sitea
     |    | - content nodes
     | - siteb
          | - content nodes

Pages for BrandA.com are stored below /content/sitea. Client requests for the URL http://BrandA.com/en.html are returned the rendered page for the /content/sitea/en node. Similarly, pages for BrandB.com are stored below /content/siteb.

When using Dispatcher to cache content, associations must be made between the page URL in the client HTTP request, the path of the corresponding file in the cache, and the path of the corresponding file in the repository.

Client requests

When clients send HTTP requests to the web server, the URL of the requested page must be resolve to the content in the Dispatcher cache, and eventually to the content in the repository.

file
  1. The domain name system discovers the IP address of the web server which is registered for the domain name in the HTTP request.
  2. The HTTP request is sent to the web server. 
  3. The HTTP request is passed to Dispatcher. 
  4. Dispatcher determines whether the cached files are valid. If valid, the cached files are served to the client.
  5. If cached files are not valid, Dispatcher requests newly-rendered pages from the AEM publish instance.

Cache Invalidation

When Dispatcher Flush replication agents request that Dispatcher invalidates cached files, the path of the content in the repository must resolve to the content in the cache.

file
  1. A page is activated on the AEM author instance and the content is replicated to the publish instance.
  2. The Dispatcher Flush Agent calls Dispatcher to invalidate the cache for the replicated content.
  3. Dispatcher touches one or more .stat files to invalidate the cached files.

To use Dispatcher with multiple domains, you need to configure AEM, Dispatcher, and your web server. The solutions described on this page are general and apply to most environments. Due to the complexity of some AEM topologies, your solution can require further custom configurations to resolve particular issues. You will likely need to adapt the examples to satisfy your existing IT infrastructure and management policies.

URL Mapping

To enable domain URLs and content paths to resolve to cached files, at some point in the process a file path or page URL must be translated. Descriptions of the following common strategies are provided, where path or URL translations occur at different points in the process:

  • (Recommended) The AEM publish instance uses Sling mapping for resource resolution to implement internal URL rewriting rules. Domain URLs are translated to content repository paths. (See AEM Rewrites Incoming URLs.)
  • The web server uses internal URL rewriting rules that translate Domain URLs to cache paths. (See The Web Server Rewrites Incoming URLs.)

It is generally desirable to use short URLs for web pages. Typically, page URLs mirror the structure of the repository folders that contain the web content. However, the URLs do not reveal the topmost repository nodes, such as /content. The client is not necessarily aware of the structure of the AEM repository. 

General Requirements

Your environment must implement the following configurations to support Dispatcher working with multiple domains: 

  • Content for each domain resides in separate branches of the repository (see the example environment below).
  • The Dispatcher Flush replication agent is configured on the AEM publish instance. (See Invalidating Dispatcher Cache from a Publishing Instance.)
  • The domain name system resolves the domain names to the IP address of the web server.
  • The Dispatcher cache mirrors the directory structure of the AEM content repository. The file paths below the document root of the web server are the same as the paths of the files in the repository.

Environment for the Provided Examples

The example solutions that are provided apply to an environment with the following characteristics:

  • The AEM author and publish instances are deployed on Linux systems.
  • Apache HTTPD is the web server, deployed on a Linux system. 
  • The AEM content repository and the document root of the web server use the following file structures (the document root of the Apache web server is /usr/lib/apache/httpd-2.4.3/htdocs):

    Repository

    | - /content
         | - sitea
         |    | - content nodes
         | - siteb
              | - conent nodes

    Document root of the web server

    | - /usr
         | - lib
             | - apache
                 | - httpd-2.4.3
                     | - htdocs
                         | - content
                             | - sitea
                             |   | - content nodes
                             | - siteb
                                 | - content nodes

AEM Rewrites Incoming URLs

Sling mapping for resource resolution enables you to associate incoming URLs with AEM content paths. Create mappings on the AEM publish instance so that render requests from Dispatcher resolve to the correct content in the repository.

Dispatcher requests for page rendering identify the page using the URL that it is passed from the web server. When the URL includes a domain name, Sling mappings resolve the URL to the content. The following graphic illustrates a mapping of the branda.com/en.html URL to the /content/sitea/en node.

file

The Dispatcher cache mirrors the repository node structure. Therefore, when page activations occur  the resulting requests for invalditing the cached page require no URL or path translations. 

file

Define virtual hosts on the web server

Define virtual hosts on the web server so that a different document root can be assigned to each web domain:

  • The web server must define a virtual domain for each of your web domains.
  • For each domain, confgure the document root to coincide with the folder in the repository that contains the domain's web content.
  • Each virtual domain must also include Dispatcher-related configurations, as described on the Installing Dispatcher page.

The following example httpd.conf file configures two virtual domains for an Apache web server:

  • The server names (which coincide with the domain names) are branda.com (line 16) and brandb.com (line 30).
  • The document root of each virtual domain is the directory in the Dispatcher cache that contains the site's pages. (lines 17 and 31)

With this configuration, the web server performes the following actions when it recieves a request for http://branda.com/en/products.html

  • Associates the URL with the virtual host that has a ServerName of branda.com.
  • Forwards the URL to Dispatcher.
httpd.conf
# load the Dispatcher module
LoadModule dispatcher_module modules/mod_dispatcher.so
# configure the Dispatcher module
<IfModule disp_apache2.c>
	DispatcherConfig conf/dispatcher.any
	DispatcherLog    logs/dispatcher.log 	
	DispatcherLogLevel 3
	DispatcherNoServerHeader 0	
	DispatcherDeclineRoot 0
	DispatcherUseProcessedURL 0
	DispatcherPassError 0
</IfModule>

# Define virtual host for brandA.com
<VirtualHost *:80>
  ServerName branda.com
  DocumentRoot /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea
   <Directory /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
     Options FollowSymLinks
     AllowOverride None
   </Directory>
</VirtualHost>

# define virtual host for brandB.com
<VirtualHost *:80>
  ServerName brandB.com
  DocumentRoot /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb
   <Directory /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
     Options FollowSymLinks
     AllowOverride None
   </Directory>
</VirtualHost>

# document root for web server
DocumentRoot "/usr/lib/apache/httpd-2.4.3/htdocs"
        

Configure Dispatcher to Handle Multiple Domains

To support URLs that include domain names and their corresponding virtual hosts, define the following Dispatcher farms:

  • Configure a Dispatcher farm for each virtual host. These farms process requests from the web server for each domain, check for cached files, and request pages from the renders.
  • Configure a Dispatcher farm that is used for invalidating content the cache, regardless of which domain the content belongs to. This farm handles file invalidation requests from Flush Dispatcher replication agents.

Create Dispatcher farms for virtual hosts

Farms for virtual hosts must have the following configurations so that the URLs in client HTTP requests are resolved to the correct files in the Dispatcher cache:

  • The /virtualhosts property is set to the domain name. This property enables Dispatcher to associate the farm with the domain.
  • The /filter property allows access to the path of the request URL truncated after the domain name part. For example, for the http://branda.com/en.html URL, the path is interpreted as /en.html, so the filter must allow access to this path.
  • The /docroot property is set to the path of the root directory of the domain's site content in the Dispatcher cache. This path is used as the prefix for the concatenated URL from the original request. For example, the docroot of /usr/lib/apache/httpd-2.4.3/htdocs/sitea causes the request for http://branda.com/en.html to resolve to the /usr/lib/apache/httpd-2.4.3/htdocs/sitea/en.html file.

Additionally, the AEM publish instance must be designated as the render for the virtual host. Configure other farm properties as required. The following code is an abbreviated farm configuration for the branda.com domain:

/farm_sitea  {     
    ...
    /virtualhosts { "branda.com" }
    /renders {
      /rend01  { /hostname "127.0.0.1"  /port "4503" }
    }
    /filter {
      /0001 { /type "deny"  /glob "*" }
      /0023 { /type "allow" /glob "*/en*" }  
      ...
     }
    /cache {
      /docroot "/usr/lib/apache/httpd-2.4.3/htdocs/content/sitea"
      ...
   }
   ...
}
        

Create a Dispatcher farm for cache invalidation

A Dispatcher farm is required for handling requests for invalidating cached files. This farm must be able to access .stat files in the docroot directories of each virtual host.

The following property configurations enable Dispatcher to resolve files in the AEM content repository from files in the cache:

  • The /docroot property is set to the default docroot of the web server. Typically, this is the directory where the /content folder is created. An example value for Apache on Linux is /usr/lib/apache/httpd-2.4.3/htdocs.
  • The /filter property allows access to files below the /content directory.

The /statfileslevel property must be high enough so that .stat files are created in the root directory of each virtual host. This property enables the cache of each domain to be invalidated separately. For the example setup, a /statfileslevel value of 2 creates .stat files in the docroot/content/sitea directory and the docroot/content/siteb directory.

Additionally, the publish instance must be designated as the render for the virtual host. Configure other farm properties as required. The following code is an abbreviated configuration for the farm that is used for invalidating the cache:

/farm_flush {  
    ...
    /virtualhosts   { "invalidation_only" }
    /renders  {
      /rend01  { /hostname "127.0.0.1" /port "4503" }
    }
    /filter   {
      /0001 { /type "deny"  /glob "*" }
      /0023 { /type "allow" /glob "*/content*" } 
      ...
      }
    /cache  {
       /docroot "/usr/lib/apache/httpd-2.4.3/htdocs"
       /statfileslevel "2"
       ...
   }
   ...
}
        

When you start the web server, the Dispatcher log (in debug mode) indicates the initialization of all farms:

Dispatcher initializing (build 4.1.2)
[Fri Nov 02 16:27:18 2012] [D] [24974(140006182991616)] farms[farm_sitea].cache.docroot = /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea
[Fri Nov 02 16:27:18 2012] [D] [24974(140006182991616)] farms[farm_siteb].cache.docroot = /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb
[Fri Nov 02 16:27:18 2012] [D] [24974(140006182991616)] farms[farm_flush].cache.docroot = /usr/lib/apache/httpd-2.4.3/htdocs
[Fri Nov 02 16:27:18 2012] [I] [24974(140006182991616)] Dispatcher initialized (build 4.1.2)
        

Configure Sling Mapping for Resource Resolution

Use Sling mapping for resource resolution so that domain-based URLs resolve to content on the AEM publish instance. The resource mapping translates the incoming URLs from Dispatcher (originally from client HTTP requests) to content nodes.

To learn about Sling resource mapping, see Mappings for Resource Resolution in the Sling documentation. 

Typically, mappings are required for the following resources, although additional mappings can be necessary:

  • The root node of the content page (below /content)
  • The design node that the pages use (below /etc/designs)
  • The /libs folder

After you create the mapping for the content page, to discover additional required mappings use a web browser to open a page on the web server. In the error.log file of the publish instance, locate messages about resources that are not found. The following example message indicates that a mapping for /etc/clientlibs is required:

01.11.2012 15:59:24.601 *INFO* [10.36.34.243 [1351799964599] GET /etc/clientlibs/foundation/jquery.js HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/sitea/etc/clientlibs/foundation/jquery.js not found
        

Note

The linkchecker transformer of the default Apache Sling rewriter automatically modifies hyperlinks in the page to prevent broken links. However, link rewriting is performed only when the link target is an HTML or HTM file. To update links to other file types, create a transformer component and add it to an HTML rewriter pipeline. (See Rewriting Links to Non-HTML Files.)

Example resource mapping nodes

The following table lists the nodes that implement resource mapping for the branda.com domain. Similar nodes are created for the brandb.com domain, such as /etc/map/http/brandb.com. In all cases, mappings are required when references in the page HTML to not resolve correctly in the context of Sling.

Node path Type Property
/etc/map/http/branda.com sling:Mapping Name: sling:internalRedirect
Type: String
Value: /content/sitea
/etc/map/http/branda.com/libs sling:Mapping Name: sling:internalRedirect
Type: String
Value: /libs
/etc/map/http/branda.com/etc sling:Mapping  
/etc/map/http/branda.com/etc/designs sling:Mapping Name: sling:internalRedirect
Type: String
Value: /etc/designs
/etc/map/http/branda.com/etc/clientlibs sling:Mapping Name: sling:internalRedirect
Type: String
Value: /etc/clientlibs

Configuring the Dispatcher Flush replication agent

The Dispatcher Flush replication agent on the AEM publish instance must send invalidation requests to the correct Dispatcher farm. To target a farm, use the URI property of the Dispatcher Flush replication agent (on the Transport tab). Include the value of the /virtualhost property for the Dispatcher farm that is configured for invalidating the cache:

http://webserver_name:port/virtual_host/dispatcher/invalidate.cache

For example, to use the farm_flush farm of the previous example, the URI is http://localhost:80/invalidation_only/dispatcher/invalidate.cache.

file

The Web Server Rewrites Incoming URLs

Use the internal URL rewriting feature of your web server to translate domain-based URLs to file paths in the Dispatcher cache. For example, client requests for the http://brandA.com/en.html page are translated to the content/sitea/en.html file in the document root of the web server.

file

The Dispatcher cache mirrors the repository node structure. Therefore, when page activations occur  the resulting requests for invalditing the cached page require no URL or path translations. 

file

Define virtual hosts and rewrite rules on the Web server

Configure the following aspects on the web server:

  • Define a virtual host for each of your web domains.
  • For each domain, confgure the document root to coincide with the folder in the repository that contains the domain's web content.
  • For each virtual domain, create a URL renaming rule that translates the incoming URL to the path of the cached file. 
  • Each virtual domain must also include Dispatcher-related configurations, as described on the Installing Dispatcher page.
  • The Dispatcher module must be configured to use the URL that the web server has rewritten. (See the DispatcherUseProcessedURL proeprty in  Installing Dispatcher.)

The following example httpd.conf file configures two virtual hosts for an Apache web server:

  • The server names (which coincide with the domain names) are brandA.com (line 16) and brandB.com (line 32).
  • The document root of each virtual domain is the directory in the Dispatcher cache that contains the site's pages. (lines 20 and 33)
  • The URL rewrite rule for each virtual domain is a regular expression that prefixes the path of the requested page with the path to the pages in the cache. (lines 19 and 35)
  • The DispatherUseProcessedURL property is set to 1. (line 10)

For example, the web server performes the following actions when it recieves a request with the http://brandA.com/en/products.html URL:

  • Associates the URL with the virtual host that has a ServerName of brandA.com.
  • Rewrites the URL to be /content/sitea/en/products.html.
  • Forwards the URL to Dispatcher.
httpd.conf
# load the Dispatcher module
LoadModule dispatcher_module modules/mod_dispatcher.so
# configure the Dispatcher module
<IfModule disp_apache2.c>
	DispatcherConfig conf/dispatcher.any
	DispatcherLog    logs/dispatcher.log 	
	DispatcherLogLevel 3
	DispatcherNoServerHeader 0	
	DispatcherDeclineRoot 0
	DispatcherUseProcessedURL 1
	DispatcherPassError 0
</IfModule>

# Define virtual host for brandA.com
<VirtualHost *:80>
  ServerName branda.com
  DocumentRoot /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea
  RewriteEngine  on
  RewriteRule    ^/(.*)\.html$  /content/sitea/$1.html [PT]
   <Directory /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
     Options FollowSymLinks
     AllowOverride None
   </Directory>
</VirtualHost>

# define virtual host for brandB.com
<VirtualHost *:80>
  ServerName brandB.com
  DocumentRoot /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb
  RewriteEngine  on
  RewriteRule    ^/(.*)\.html$  /content/siteb/$1.html [PT]
   <Directory /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
     Options FollowSymLinks
     AllowOverride None
   </Directory>
</VirtualHost>

# document root for web server
DocumentRoot "/usr/lib/apache/httpd-2.4.3/htdocs"
        

Configure a Dispatcher Farm

When the web server rewrites URLs, Dispatcher requires a single farm defined according to Configuring Dispatcher. The following configurations are requiremed to support the web server virtual hosts and URL renaming rules:

  • The /virtualhosts property must include the ServerName values for the all VirtualHost definitions.
  • The /statfileslevel property must be high enough to create .stat files in the directories that contain the content files for each domain.

The following example configuration file is based on the example dispatcher.any file that is installed with Dispatcher. The following changes are required to support the web server configurations of the previous httpd.conf file:

  • The /virtualhosts property causes Dispatcher to handle requests for the brandA.com and brandB.com domains. (line 12)
  • The /statfileslevel property is set to 2, so that stat files are created in each directory that contains the domain's web content (line 41): /statfileslevel "2"

As usual, the document root of the cache is the same as the document root of the web server (line 40): /usr/lib/apache/httpd-2.4.3/htdocs

dispatcher.any
/name "testDispatcher"
/farms
  {
  /dispfarm0
    {  
    /clientheaders
      {
      "*"
      }      
    /virtualhosts
      {
      "brandA.com" "brandB.com"
      }
    /renders
      {
      /rend01    {  /hostname "127.0.0.1"   /port "4503"  }
      }
    /filter
      {
      /0001 { /type "deny"  /glob "*" }
      /0023 { /type "allow" /glob "*/content*" }  # disable this rule to allow mapped content only
      /0041 { /type "allow" /glob "* *.css *"   }  # enable css
      /0042 { /type "allow" /glob "* *.gif *"   }  # enable gifs
      /0043 { /type "allow" /glob "* *.ico *"   }  # enable icos
      /0044 { /type "allow" /glob "* *.js *"    }  # enable javascript
      /0045 { /type "allow" /glob "* *.png *"   }  # enable png
      /0046 { /type "allow" /glob "* *.swf *"   }  # enable flash
      /0061 { /type "allow" /glob "POST /content/[.]*.form.html" }  # allow POSTs to form selectors under content
      /0062 { /type "allow" /glob "* /libs/cq/personalization/*"  }  # enable personalization
      /0081 { /type "deny"  /glob "GET *.infinity.json*" }
      /0082 { /type "deny"  /glob "GET *.tidy.json*"     }
      /0083 { /type "deny"  /glob "GET *.sysview.xml*"   }
      /0084 { /type "deny"  /glob "GET *.docview.json*"  }
      /0085 { /type "deny"  /glob "GET *.docview.xml*"  }      
      /0086 { /type "deny"  /glob "GET *.*[0-9].json*" }
      /0090 { /type "deny"  /glob "* *.query.json*" }
      }
    /cache
      {
      /docroot "/usr/lib/apache/httpd-2.4.3/htdocs"
      /statfileslevel "2"
      /allowAuthorized "0"
      /rules
        {
        /0000  { /glob "*"     /type "allow"  }
        }
      /invalidate
        {
        /0000  {   /glob "*" /type "deny"  }
        /0001 {  /glob "*.html" /type "allow"  }
        }
      /allowedClients
        {
        }     
      }
    /statistics
      {
      /categories
        {
        /html  { /glob "*.html" }
        /others  {  /glob "*"  }
        }
      }
    }
  }
        

Note

Because a single Dispatcher farm is defined, the Dispatcher Flush replication agent on the AEM publish instance requires no special configurations.

Rewriting Links to Non-HTML Files

To rewrite references to files that have extensions other than .html or .htm, create a Sling rewriter transformer component and add it to the default rewriter pipeline. 

Rewrite references when resource paths do not resolve correctly in the web server context. For example, a transformer is required when image-generating components create links such as /content/sitea/en/products.navimage.png. The topnav component of the How to Create a Fully Featured Internet Website creates such links.

The Sling rewriter is a module that post-processes Sling output. SAX pipeline implementations of the rewriter consist of a generator, one or more transformers, and a serializer:

  • Generator: Parses the Sling output stream (HTML document) and generates SAX events when it encounters specific element types. 
  • Transformer: Listens for SAX events and consequently modifies the event target (an HTML element). A rewriter pipeline contains zero or more transformers. Transformers are executed in sequence, passing the SAX events to the next transformer in the sequence.
  • Serializer: Serializes the output, including the modifications from each transformer.
file

The AEM Default Rewriter Pipeline

AEM uses a default pipeline rewriter that processes documents of type text/html:

  • The generator parses HTML documents and generates SAX events when it encounters a, img, area, form, base, link, script, and body elements. The generator alias is htmlparser.
  • The pipeline includes the following transformers: linkchecker, mobile, mobiledebug, contentsync. The linkchecker transformer externalizes paths to referenced HTML or HTM files to prevent broken links.
  • The serializer writes the HTML output. The serializer alias is htmlwriter.

The /libs/cq/config/rewriter/default node defines the pipeline.

Creating a Transformer

Perform the following tasks to create a transformer component and use it in a pipeline:

  1. Implement the org.apache.sling.rewriter.TransformerFactory interface.  This class creates instances of your transformer class. Specify values for the transformer.type property (the transformer alias) and configure the class as an OSGi service component. 
  2. Implement the org.apache.sling.rewriter.Transformer interface. To minimize the work, you can extend the org.apache.cocoon.xml.sax.AbstractSAXPipe class. Override the startElement method to customize the rewriting behavior. This method is called for every SAX event that is passed to the transformer.
  3. Bundle and deploy the classes.
  4. Add a configuration node to your AEM application to add the transformer to the pipeline.

Tip: You can instead configure the TransformerFactory to that the transformer is inserted into every rewriter that is defined. Consequently you do not need to configure a pipeline:

  • Set the pipeline.mode property to global.
  • Set the service.ranking property to a positive integer.
  • Do not include a pipeline.type property.

Note

Use the multimodule archetype of the Content Package Maven Plugin to create your Maven project. The POMs automatically create and install a content package.

The following examples implement a transformer that rewrites references to image files. 

  • The MyRewriterTransformerFactory class instantiates MyRewriterTransformer objects. The pipeline.type property sets the transformer alias  to mytransformer. To include the alias in a pipeline, the pipeline configuration node includes this alias in the list of transformers. 
  • The MyRewriterTransformer class overrides the startElement method of the AbstractSAXTransformer class. The startElement method rewrites the value of src attributes for img elements.

The examples are not robust and should not be used in a production environment.

Example TransformerFactory implementation
package com.adobe.example;

import org.apache.felix.scr.annotations.Component;
import org.apache.felix.scr.annotations.Service;
import org.apache.felix.scr.annotations.Property;

import org.apache.sling.rewriter.Transformer;
import org.apache.sling.rewriter.TransformerFactory;

@Component
@Service
public class MyRewriterTransformerFactory implements TransformerFactory {
    /* Define the alias */
    @Property(value="mytransformer")
    static final String PIPELINE_TYPE ="pipeline.type";
 
    public Transformer createTransformer() {
        
        return new MyRewriterTransformer ();
    }
}
        
Example Transformer implementation
package com.adobe.example;

import java.io.IOException;

import org.apache.cocoon.xml.sax.AbstractSAXPipe;

import org.apache.sling.api.SlingHttpServletRequest;
import org.apache.sling.rewriter.ProcessingComponentConfiguration;
import org.apache.sling.rewriter.ProcessingContext;
import org.apache.sling.rewriter.Transformer;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.AttributesImpl;

import javax.servlet.http.HttpServletRequest;

public class MyRewriterTransformer extends AbstractSAXPipe implements Transformer {

	private static final Logger log = LoggerFactory.getLogger(MyRewriterTransformer.class);
	private SlingHttpServletRequest httpRequest;	
	/* The element and attribute to act on  */
	private static final String ATT_NAME = new String("src");
	private static final String EL_NAME = new String("img");

	public MyRewriterTransformer () {
	}
	public void dispose() {
	}
	public void init(ProcessingContext context, ProcessingComponentConfiguration config) throws IOException {
		this.httpRequest = context.getRequest();
		log.debug("Transforming request {}.", httpRequest.getRequestURI());
	}
	@Override
	public void startElement (String nsUri, String localname, String qname, Attributes atts) throws SAXException {
		/* copy the element attributes */
		AttributesImpl linkAtts = new AttributesImpl(atts); 
		/* Only interested in EL_NAME elements */
		if(EL_NAME.equalsIgnoreCase(localname)){


			/* iterate through the attributes of the element and act only on ATT_NAME attributes */
			for (int i=0; i < linkAtts.getLength(); i++) {
				if (ATT_NAME.equalsIgnoreCase(linkAtts.getLocalName(i))) {
					String path_in_link = linkAtts.getValue(i);

					/* use the resource resolver of the http request to reverse-resolve the path  */
					String mappedPath = httpRequest.getResourceResolver().map(httpRequest, path_in_link);

					log.info("Tranformed {} to {}.", path_in_link,mappedPath);

					/* update the attribute value */
					linkAtts.setValue(i,mappedPath);
				}
			}

		}
        /* return updated attributes to super and continue with the transformer chain */
	super.startElement(nsUri, localname, qname, linkAtts);
	}
}
        

Adding the Transformer to a Rewriter Pipeline

Create a JCR node that defines a pipeline that uses your transformer. The following node definition creates a pipeline that processes text/html files. The default AEM generator and parser for HTML are used.

Note

If you set the Transformer property pipeline.mode to global, you do not need to configure a pipeline. The global mode inserts the transformer into all pipelines.

Rewriter configuration node - XML representation
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
    jcr:primaryType="nt:unstructured"
    contentTypes="[text/html]"
    enabled="{Boolean}true"
    generatorType="htmlparser"
    order="5"
    serializerType="htmlwriter"
    transformerTypes="[mytransformer]">
</jcr:root>
        

The following graphic shows the CRXDE Lite representation of the node:

file

Your comments are welcome!
Did you notice a way we could improve the documentation on this page?
Please leave your comments below and we will make the appropriate changes.

COMMENTS

  • By Yogesh - 11:56 PM on Feb 25, 2013   Reply
    For Multiple virtual host to work you need NameVirtual host entry in your configuration. Something like NameVirtualHost *:80

    I would also suggest to use different rewrite files for each virtual host settings,

    <VirtualHost *:80>
    ServerName Site1
    Include Server_1_rewrite
    ....
    </VirtualHost>

    <VirtualHost *:80>
    ServerName Site2
    Include Server_2_rewrite
    ....
    </VirtualHost>

    Yogesh

    • By ppiegaze - 3:22 PM on Feb 27, 2013   Reply
      Thanks for the suggestion. An issue has been logged to include this in the documentation
    • By Vineet Kumar - 3:23 AM on Sep 19, 2013   Reply
      I tried to follow these steps but invalidation doesn't work as expected. Even it I am using expression like http://webserver_name:port/virtual_host/dispatcher/invalidate.cache, it always triggers the last farm in the .any file. Can you please help with this?
      • By alvawb - 11:43 AM on Sep 19, 2013   Reply
        Please post your issue including any information about what you're trying to achieve and any valuable messages from error logs to our user forum at http://help-forums.adobe.com/content/adobeforums/en/experience-manager-forum/adobe-experience-manager.html You'll get a quicker response and reach a wider audience.

      ADD A COMMENT

       

      In order to post a comment, you need to sign-in.

      Note: Customers with DayCare user accounts need to create a new account for use on day.com.

      ***