Backup and Restore

You are reading the CRX 2.3 version of Backup and Restore.
This documentation is also available for the following versions: CRX 2.2  CRX 2.1  CRX 2.0 (for CQ 5.3) 

There are two ways to back up and restore CRX repository content: 

  • You can create an external backup of the repository and store it in a safe location. If the repository breaks down, you can restore it to the previous state.
  • You can create internal versions of the repository content. These versions are stored in the repository along with the content, so you can quickly restore nodes and trees you have changed or deleted.

General

The approach described here applies for system backup and recovery.

If you need to backup and/or recover a small amount of content, which is lost, a recovery of the system is not necessarily required:

  • Either you can fetch the data from another system via a package
  • or you restore the backup on a temporary system, create a content package and deploy it on the system, where this content is missing.

For details, see Package Backup below.

Timing

Do not run backup in parallel with the datastore garbage collection, as it might harm the results of both processes.

Generally it is recommended to run the Tar Optimizer after the backup has been performed.

Offline Backup

You can always do an offline backup. This requires a downtime of CRX, but can be quite efficient in terms of required time compared to an online backup.

In most cases you will use a filesystem snapshot to create a read-only copy of the storage at that time. To create a offline backup perform these steps:

  • stop the application
  • make a snapshot backup
  • start the application

As the snapshot backup usually takes only a few seconds, the entire downtime is less than a few minutes.

Note

Instead of stopping CRX, you can also use calls to JMX MBeans to prevent CRX from writing to the disk, which reduces the time CRX is unavailable to seconds. This is described under Filesystem Snapshot Backup below.

Online Backup

This backup method creates a backup of the entire repository, including any applications deployed under it, such as CQ5. The backup includes content, version history, configuration, software, hotfixes, custom applications, log files, search indexes, and so on. If you are using clustering and if the shared folder is a subdirectory of crx-quickstart (either physically, or using a softlink), the shared directory is also backed up.

You can restore the entire repository (and any applications) at a later point.

This method operates as a hot or online backup so it can be performed while the repository is running. Therefore the repository is usable while the backup is running. This method works for the default, TarPM-based, CRX instances.

When creating a backup, you have the following options:

  • Backing up to a directory using CRX's integrated backup tool.
  • Backing up to a directory using a filesystem snapshot

In any case, the backup creates an image (or snapshot) of the repository. Then the systems backup agent should take care to actually transfer this image to a dedicated backup system (tape drive).

Caution

The online backup only backs up the file system. If you store the repository content and/or the repository files in a database, that database needs to backed up separately.

CRX Online Backup

An online backup of your repository lets you create, download, and delete backup files. It is a "hot" or "online" backup feature, so can be executed while the repository is being used normally in the read-write mode.

When starting a backup you can specify a Target Path and/or a Delay.

Target Path

The backup files are usually saved in the parent folder of the folder holidng the quickstart jar file (.jar). For example, if you have the CRX jar file located under /InstallationKits/CRX, then the backup will be generated under /InstallationKits. You can also specify a target to a location of your choice.

If the TargetPath is a directory, the image of the repository is created in this directory. If the same directory is used multiple times (or always) to storing backup,

  • modified files in the repository are modified accordingly in the TargetPath
  • deleted files in the repository are deleted in the TargetPath
  • created files in the repository are created in the TargetPath

Note

If TargetPath is set to filename with the extension .zip, the repository is backupped to a temporary directory and then the content of this temporary directory is compressed and stored in the ZIP file.

This approach is discouraged, because

  • it requires additional disk storage during the backup process (temporary directory plus the zip file)
  • the compression process is done by the CRX and might influence its performance.
  • It delays the backup process.
  • Up to Java 1.6 Java is only able to create ZIP files up to a size of 4 gigabytes.

If you need to create a ZIP as backup format, you should backup to a directory and then use a  compression program to create the zip file.

Delay

Indicates a time delay (in milliseconds), so that repository performance is not affected.

By default, the CRX backup runs at full speed. You can slow down creating an online backup, so that it does not slow down other tasks.

A delay of 1 millisecond typically results in 10% CPU usage, and a delay of 10 milliseconds usually results in less than 3% CPU usage. The total delay in seconds can be estimated as follows:

    Repository size in MB * delay in milliseconds / 2 (if the zip option is used)
    or / 4 (when backing up to a directory).

That means a backup to a directory of a 200 MB repository with 1 ms delay increases the backup time by about 50 seconds.

Note

See The Mechanics of Online Backup for internal details of the process.

To create a backup:

  1. Log in to CRX as the administrator (admin).

  2. Select Backup from the Welcome screen:

    file
  3. The backup console will open. Specify the Target Path and Delay as required.

    file

    Note

    The backup console is also available using:

        http://<hostname>:<port-number>/libs/granite/backup/content/admin.html

  4. Click Start, a progress bar will indicate the progress of the backup.

    Note

    You can Cancel a running backup at any time.

  5. When the backup is complete the zip files are listed in the left pane of the console.

    file

    Note

    Backup files that are no longer needed can be removed using the console. Select the backup file in the left pane, then click Delete.

    Note

    If you have backed up to a directory: after the backup process is finished CRX will not write to the target directory.

Automating CRX Online Backup

If possible, the online backup should be run when there is little load on the system, for example in the morning. By default the Tar PM optimization runs between 2 am and 5 am, which also slows down the system, that means a good time to run the online backup is 5 am.

Backups can be automated using the wget or curl HTTP clients. The following show examples of how to automate backup by using curl.

Backing up to the default Target Directory

Caution

In the following example various parameters in the curl command might need to be configured for your instance; for example, the hostname (localhost), port (4502), admin password (xyz) and file name (backup.zip).

curl -u admin:admin -X POST http://localhost:4502/system/console/jmx/com.adobe.granite:type=Repository/op/startBackup/java.lang.String?target=backup.zip
        

The backup file/directory is created on the server in the parent folder of the folder containing the crx-quickstart folder (the same as if you were creating the backup using the browser). For example, if you have installed CRX in the directory /InstallationKits/crx-quickstart/, then the backup is created in the /InstallationKits directory.

The curl command returns immediately, so you must monitor this directory to see when the zip file is ready. While the backup is being created a temp directory (with the name based on that of the final zip file) can be seen, at the end this will be zipped. For example:

  • name of resulting zip file: backup.zip
  • name of temporary directory: backup.f4d5.temp

Backing up to a non-default Target Directory

Usually the backup file/directory is created on the server in the parent folder of the folder containing the crx-quickstart folder.

If you want to save your backup (of either sort) to a different location you can set an absolute path to the target parameter in the curl command.

For example, to generate backupJune.zip in the directory /Backups/2012:

curl -u admin:admin -X POST http://localhost:4502/system/console/jmx/com.adobe.granite:type=Repository/op/startBackup/java.lang.String?target=/Backups/2012/backupJune.zip"

        

Caution

When using a different application server (such as JBoss), the online backup may not work as expected, because the target directory is not writable. In this case, please contact Support.

Note

A backup can also be triggered using the MBeans provided by CRX.

Filesystem Snapshot Backup

With CRX 2.3 it is possible to prevent writing to disk via a JMX call. While this is used internally by the CRX backup tool, it can also be used by an external process.

The process described here is specially suited for large repositories.

Note

If you want to use this backup approach, your system must support filesystem snapshots. E.g. for Linux this means your filesystems should be placed on a logical volume.

  1. Via JMX call the operation blockRepositoryWrites on the MBean com.adobe.granite:type=Repository.

    This will prevent CRX from writing to disk. Instead all write requests are blocked.

  2. Do a snapshot of the filesystem CRX is deployed on.

  3. Via JMX call the operation unblockRepositoryWrites on the MBean com.adobe.granite:type=Repository.

    The blocked write requests will continue to work.

  4. Mount the filesystem snapshot, do a backup from there, unmount the snapshot.

Caution

To call unblockRepositoryWrites, you need to use a locally attached JMX client.

The JMX console is inappropriate in this case, as its availability is not guaranteed after blockRepositoryWrites has been called.

Backing Up the Data Store Separately

If the file data store has been configured outside the main repository, it is not included in the backup. This will reduce the size of the online backup and the backup directory. However, the data store needs to be backed up as well. Because files in the file data store directory are immutable, they can be backed up incrementally (potentially using rsync) or after running the online backup.

Note

Do not run the data store backup and garbage collection concurrently.

How CRX Online Backup Works

CRX Online Backup is comprised of a series of internal actions to ensure the integrity of the data being backed up and the backup file(s) being created. These are listed below for those interested:

  1. The online backup uses the following algorithm:
  2. When creating a zip file, the first step is to create or locate the target directory.
    1. If backing up to a zip file, a temporary directory is created. The directory name starts with backup. and ends with .temp; for example backup.f4d3.temp.
    2. If backing up to a directory, the name specified in the target path is used. An existing directory can be used, otherwise a new directory will be created.
      An empty file named backupInProgress.txt is created in the target directory when the backup starts. This file is deleted when the backup is finished.
  3. All files are copied from the source directory to the target directory (or temporary directory when creating a zip file).  The progress bar indicator of this sub-process is between 0% - 70% when creating a zip file, or 0% - 100% if no zip file is created.
  4. If the backup is being made to a pre-existing directory, then "old" files in the target directory are deleted. Old files are files that do not exist in the source directory.
  5. The files are copied to the target directory in four stages.
    1. In the first copy stage (progress indicator 0% - 63% when creating a zip file or 0% - 90% if no zip file is created), all files are copied concurrently while the repository is running normally.
    2. In the second copy stage (progress indicator 63% - 66.5% when creating a zip file or 90% - 95% if no zip file is created) only files that were created or modified in the source directory since the first copy stage was started are copied. Depending on the activity of the repository, this might range from no files at all, up to a significant number of files (because the first file copy stage usually takes a lot of time).
    3. In the third copy stage (progress indicator 66.5% - 68.6% when creating a zip file or 95% - 98% if no zip file is created) only files that were created or modified in the source directory since the second copy stage was started are copied. Depending on the activity of the repository, there might be no files to copyl, or a very small number of files (because the second file copy stage is usually fast).
    4. File copy stages one to three are all done concurrently while the repository is running. The fourth and last file copy stage will first lock repository write operations (write operations are paused; they do not throw an exception, but will wait). Only files that were created or modified in the source directory since the third copy stage was started are copied. Depending on the activity of the repository, there might be no files to copy, or a very, very small number of files (because the second file copy stage usually is very fast). After that, repository access continues. Progress indicator 68.6% - 70% when creating a zip file or 98% - 100% if no zip file is created.
  6. Depending on the target:
    1. If a zip file was specified, this is now created from the temporary directory. Progress indicator 70% - 100%. The temporary directory is then deleted.
    2. If the target was a directory, the empty file named backupInProgress.txt is deleted to indicate that the backup is finished.

Restoring the Backup

Note

This kind of restore restores the complete repository including the application, all content, logfiles, etc.

To restore the backup from a backup:

  1. Restore a backup image on the system.  if you backup the datastore separately, make sure, that the datastore is restored as well to the correct location.

    In case you have created a backup as a zip file, unpack this zip file using:

    jar -xvf backupJune.zip
            
  2. On Unix systems, the "x"-bit of the following scripts are not preserved by the zip file:

    • server/start
    • server/stop
    • server/serverctl

    You have to adjust these manually after restoring the backup.

  3. Now the repository is ready to use. You can start it now using the regular start scripts.

This approach has some unique features:

  • additional disk consumption is low. A snapshot only consumes space if data on the original data is changed.
  • Snapshots are very fast and efficient, therefor the required time to create a snapshot is low (much lower than copying the data), as only metadata are duplicated.

Package Backup

To back up and restore content, you can use one of the following:

  • Package Manager, which uses the Content Package format to back up and restore content. The Package Manager provides more flexibility in defining and managing packages.
  • Content Zipper, which uses the CRX Package, XML Sys View Package, XML Doc View, or ZIP format to back up content.You restore content in these package formats with the Content Loader. See Creating a Backup using the Content Zipper and Restoring a Backup using the Content Loader.

For details on the features and tradeoffs of each of these individual content package formats, see Importing and Exporting Content in the User Guide.

Scope of Backup

When you back up nodes using either the Package Manager or the Content Zipper, CRX saves the following information: 

  • The CRX repository content below the tree you have selected.
  • The Node type definitions that are used for the content you back up.
  • The Namespace definitions that are used for the content you back up.

When backing up, CRX loses the following information: 

  • The version history.

Creating a Backup using the Content Zipper

To create the backup using the Content Zipper:

  1. Lock the top node of the tree you want to back up or a parent node of that node.
  2. In the Content Zipper, type the path of the tree you want to back up. For a format, click CRX package.
  3. Click Submit Query. Your Web browser now offers the package file as a download. Save the download on your computer.
  4. Unlock the node again.

The file you have downloaded contains the current version of the tree you have exported, including the node type and namespace definitions, but without the version history.

Note

The CRX package file is Adobe’s proprietary file format for CRX node information. It is optimized for a small file size and optimal performance. If you prefer a standard XML file for further processing, click XML sys view in step 2. If you use the file only for archiving, use the CRX package format. Importing and Exporting Content describes the various file formats and their uses.

Restoring a Backup using the Content Loader

To restore the backup using the Content Loader:

  1. Lock the node you want to restore. You can still modify the node and the nodes below it, but others cannot.
  2. In the Content Loader, load the CRX package that you want to restore.
  3. Unlock the node again.

Note

You cannot restore the versioning history using the previous steps. CRX allows you to save the version history, but it does not currently support restoring it.


Your comments are welcome!
Did you notice a way we could improve the documentation on this page? Is something unclear or insufficiently explained? Please leave your comments below and we will make the appropriate changes. Comments that have been addressed, by improving the documentation accordingly, will then be removed.

COMMENTS

  • By dirk.heider - 2:15 PM on Jun 23, 2010   Reply
    Hello,

    we have a problem creating online Backups.
    The command "curl -b login.txt -f -o progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&zipFileName=backup.zip"" do not work. There is no visable zip file created!
    Do you have any suggestions for us?
    • By tmueller - 11:03 AM on Jun 28, 2010   Reply
      > we have a problem creating online Backups.

      Could you search for the file in the parent folder of folder that contains the crx-quickstart folder (two directories up)? So if you have a directory named /day/crx-quickstart/, then the backup is created in the root directory.
    • By Backup & restore - 2:48 PM on Aug 17, 2010   Reply
      This is a really useful article. Thanks for sharing! Great read!
      • By alvawb - 7:16 PM on Sep 17, 2010   Reply
        We're happy that it helped.
      • By wcarpent - 10:13 PM on Oct 19, 2010   Reply
        I agree, a very useful article, but I was wondering if there's any way to do a "Directory Only" backup via curl with a custom directory path? This strategy would be extremely useful for implementations needing scheduled incremental backups.
        • By aheimoz - 10:31 AM on Nov 09, 2010   Reply
          We've updated the documentation to cover:
          <ul>
          <li>Backing up from a non-default Source Directory</li>
          <li>Backing up to a non-default Target Directory</li>
          <li>Backing up a Shared Directory</li>
          </ul> We hope this helps.
          • By Greg - 1:38 AM on Feb 02, 2011   Reply
            This does not quite address the question - there is no example given to show what the curl command to perform the incremental backup would be. Should we omit the filename, and supply the directory path, as we would do in the console? For example, this does not work:
            curl -b login.txt -f -o progress.txt "http://localhost:4502/crx/config/backup.jsp?action=add&targetDir=E:\backup\incremental"
            • By Sheldon - 2:18 PM on Sep 07, 2011   Reply
              Greg,

              You still need to add the "zipFileName=" with no value.


        • By Anonymous - 8:19 PM on Dec 16, 2010   Reply
          It would be nice to put a link to how to separate the data store in the "Backing Up the Data Store Separately" section. Also, maybe you can post the commands to automate the backups with wget:
          wget --keep-session-cookies --save-cookies=login.txt "http://localhost:7402/crx/login.jsp?UserId=admin&Password=xyz&Workspace=crx.default"
          wget --load-cookies=login.txt --output-document=progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&zipFileName=backup.zip"
          • By John Francis - 2:14 AM on Mar 25, 2011   Reply
            Thought this might help someone out... If you would like to DELETE a server backup using curl, you can use this command:
            curl -b login.txt -f -o progress.txt "http://[RepositoryURL]/crx/config/backup.jsp?action=remove&fileName=/[DayPath]/[DayBackupName.zip]"
            • By Dushyant - 8:41 PM on Jan 23, 2012   Reply
              For backing up a large Repository, where are all the optimized *.tar files? We have a repository over 30GB, and I can only see on cache.lock file ... Is this the only lock file that needs to be deleted? Also, at how many locations lock.properties could be?
              • By aheimoz - 3:14 PM on May 02, 2012   Reply
                Can we ask for more information about what you are trying to achieve?
                It's not usually recommended practice to delete lock files, so we would advise caution.
                • By aheimoz - 3:15 PM on May 02, 2012   Reply
                  Can we ask for more information about what you are trying to achieve?
                  It's not usually recommended practice to delete lock files, so we would advise caution.
                • By efish - 8:24 PM on Mar 06, 2012   Reply
                  how to you specify delay via command line
                  curl -b login.txt -f -o progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&zipFileName=&installDir=C:\cq5author\crx-quickstart\repository&targetDir=C:\backup&delay=10"
                  is this correct

                  • By alvawb - 4:35 AM on Mar 07, 2012   Reply
                    curl -b login.txt -f -o progress.txt -d "action=add&installDir=<SOURCE_DIR>&targetDir=<DESTINATION_DIR>&zipFileName=&delay=10" http://localhost:4502/crx/config/backup.jsp should work (put in your directories and ports). Hope that helps.
                  • By Anon - 5:12 PM on Mar 09, 2012   Reply
                    Hi, recovery from online backup (to a directory, not zip) is quite unreliable for author environments. We follow the following process after we have replicated backup data to a remote host:

                    1) Delete all files matching “listener.properties” under /crx-quickstart/repository
                    2) Delete all instances of "cluster_node.id" and ".lock" under /crx-quickstart/repository/”
                    3) Rename and files matching "index*.tar" under "/crx-quickstart/repository/workspaces/crx.default/copy”.

                    However, this usually results in the following error during initialisation:

                    Unable to create repository.
                    javax.jcr.RepositoryException: org.apache.jackrabbit.core.state.ItemStateException: Failed to read bundle: deadbeef-face-babe-cafe-babecafebabe: java.lang.IllegalArgumentException: Invalid namespace index: 1
                    ...
                    caused by: java.lang.IllegalArgumentException: Invalid namespace index: 1

                    What's everybody elses experience of recovering from backup for author environments? Do you do anything else prior to start up? We don't seem to encounter the same issues for Publish instances (only steps 1-2 required).

                    I'd like to see this article extended to cover the 'house keeping' tasks that should be performed prior to start up if they're required.

                    Thanks.
                    • By Bill - 6:31 PM on Mar 27, 2012   Reply
                      Why are you deleting / renaming files? Is it necessary to do this if moving the instance to a new machine?

                      Why not just recover on the same instance you failed on?

                      Are you online backing up to a local filesystem?

                      Does the repo start on the instance before changing files, moving it, etc.?

                      An online backup directory should contain a full copy of your instance, and except for changing permissions to +x on some crx-quickstart/server/ files (start, stop, serverctl, if *NIX), other than moving the bad instance out of the way, and re-naming your backup instance folder to be the new instance, you should be able to start up without issue.

                      we have had issues with repositories not starting when SCPing millions of files across the wire - would recommend crating a tar file before transferring backup (if necessary / required).

                      Bill





                      • By aheimoz - 6:43 AM on Mar 28, 2012   Reply
                        Thanks for your feedback. You might also find the forum:
                        http://forums.adobe.com/community/digital_marketing_suite/cq5
                        a good place for finding information and exchanging ideas and experiences.
                    • By jitendra - 10:39 AM on Jun 12, 2012   Reply
                      Hi,
                      whenever I hit below url, I m getting server internal error 500 in response.
                      curl -b login.txt -f -o progress.txt "http://localhost:4502/crx/config/backup.jsp?action=add&zipFileName=backup.zip"

                      am I missing something here. kindly let me know.
                      thanks in advance.
                      • By ppiegaze - 2:25 PM on Jun 19, 2012   Reply
                        The command should work. Are you sure you aren't having some other problem with your server?
                      • By Mohamed ismail - 2:20 PM on Aug 24, 2012   Reply
                        The backup zip file is getting stored in the default location in the default file name eventhough i tried to explicitly mention the non default target directory and the customized file name. I have used the following command to take the backup.

                        curl -u admin:passwd -X POST http://<hostname>:<portno>/crx/config/backup.jsp?action=add&zipFileName=backupjune.zip&targetDir=ATT_DL/backup_repository

                        backup_repository is the directory which i have created to store the backup file. But it is is never getting stored there instead it is there in the directory "domains" (which is the parent folder of ATT_DL and crx_quickstart is inside ATT_DL) . do i have to change the url to make it work properly. Please help to resolve it
                        • By aheimoz - 3:39 PM on Aug 24, 2012   Reply
                          (See "Backing up to a non-default Target Directory".)
                          Usually the backup file/directory is created on the server in the parent folder of the folder containing the crx-quickstart folder.

                          The command to backup to a non-default directory is:
                          curl -u admin:admin -X POST http://localhost:4502/system/console/jmx/com.adobe.granite:type=Repository/op/startBackup/java.lang.String?target=/Backups/2012/backupJune.zip"

                          Which looks different to your command. Could you try this format instead.

                          If you still have problems, we'd suggest that you post the details (exactly what you're trying to achieve, what you've done so far) to our dedicated CQ5 forum:
                          http://forums.adobe.com/community/digital_marketing_suite/cq5
                          Hope that helps.
                        • By Anoop - 12:07 PM on Sep 07, 2012   Reply
                          Hi,

                          i am trying to automate the backup process for cq5.5 deployed in weblogic server. The command
                          curl -u admin:admin -X POST http://localhost:4502/system/console/jmx/com.adobe.granite:type=Repository/op/startBackup/java.lang.String?target=/Backups/2012/backupJune.zip" works with no issues.

                          The above script takes the backup of the parent folder where crx-qucikstart is present. But my requirement is to take the backup of crx-quickstart only. Can you tell me what is the extra parameter to be added

                          The script curl -b login.txt -f -o progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&zipFileName=&installDir=C:\cq5author\crx-quickstart\repository&targetDir=C:\backup is not working in cq5.5 since the page /crx/config/backup.jsp is not present

                          • By Alexandre COLLIGNON - 3:58 PM on Oct 02, 2012   Reply
                            Hi Anoop,
                            The jmx backup URL supports both absolute and relative (to crx-quickstart) path. You can use a URL similar to the following.

                            curl -u admin:admin -X POST http://localhost:4502/system/console/jmx/com.adobe.granite:type=Repository/op/startBackup/java.lang.String?target=C:/Backups/2012/backupJune.zip

                            Hope that helps, Alex.
                          • By Samuel - 10:50 AM on Dec 11, 2012   Reply
                            Hi,
                            How can you stop a backup started with cURL command?
                            The backup process has been running for over 25hours+ now and is not useful anymore.
                            Is there a command to stop this?
                            When browsing the /libs/granite/backup/content/admin.html backupsite I can't see any cancel button.
                            (I guess this is since the backup was started with the cURL command)
                            • By Pavel Kuchin - 1:46 PM on Mar 20, 2013   Reply
                              Where is internal hyperlinks?

                              >>From GENERAL
                              >>For details, see Package Backup below.
                              What is it? Why I should spend my time for searching Package Backup section?
                              I am not just saying that. A lot of documents have not hyperlinks.
                              • By Alexandre COLLIGNON - 5:35 PM on Mar 20, 2013   Reply
                                Hi Pavel,
                                Many thanks for your feedback. I added a link. Feel free to report any other documentation issue.
                                Hope that helps, Alex.

                              ADD A COMMENT

                               

                              In order to post a comment, you need to sign-in.

                              Note: Customers with DayCare user accounts need to create a new account for use on day.com.

                              ***