This content describes how watched folders are affected by different
backup and recovery scenarios, the limitations and outcomes of these
scenarios, and how to minimize data loss.
Watched Folder is a file system-based application that
invokes configured service operations that manipulate the file within
one of the following folders in the watched folder hierarchy:
Input
Stage
Output
Failure
Preserve
A user or client application first drops the file or folder in
the input folder. The service operation then moves the file into
the stage folder for processing. After the service performs the
specified operation, it saves the modified file in the output folder.
Successfully processed source files are moved to the preserve folder,
and failed processing files are moved to the failure folder. When
the Preserve On Failure attribute for the watched
folder is enabled, failed processed source files are moved to the
preserve folder. (See Configuring watched folder endpoints.)
You can back up watched folders by backing up the file system.
Note: This backup is independent of the database or
document storage backup and recovery process.
How watched folders workThis content describes the watched folder file manipulation
process. It is important to understand this process before developing
a recovery plan. In this example, the Preserve On Failure attribute
for the watched folder is enabled. The files are processed in the
order in which they arrive.
The following table describes the file manipulation of five sample
files (file1, file2, file3, file4, file5) throughout the process.
In the table, the x axis represents time, such as Time 1 or T1,
and the y axis represents folders within the watched folder hierarchy,
such as Input.
Folder
|
T1
|
T2
|
T3
|
T4
|
T5
|
T6
|
T7
|
Input
|
file1, file2, file3, file4
|
file2, file3, file4
|
file3, file4
|
file4
|
empty
|
file5
|
empty
|
Stage
|
empty
|
file1
|
file2
|
file3
|
file4
|
empty
|
file5
|
Output
|
empty
|
empty
|
file1_out
|
file1_out, file2_out
|
file1_out, file2_out
|
file1_out, file2_out, file4_out
|
file1_out, file2_out, file4_out
|
Failure
|
empty
|
empty
|
empty
|
empty
|
file3_fail, file3
|
file3_fail, file3
|
file3_fail, file3
|
Preserve
|
empty
|
empty
|
file1
|
file1, file2
|
file1, file2
|
file1, file2, file4
|
file1, file2, file4
|
The following text describes file manipulation for each time:
T1: The four sample files are placed in the input folder.
T2:The service operation moves file1 into the stage folder
for manipulation.
T3: The service operation moves file2 into the stage folder
for manipulation. It places the results of file1 in the output folder,
and it moves file1 to the preserve folder.
T4:The service operation places file3 in the stage folder
for manipulation. It places the results of file2 in the output folder,
and it places file2 in the preserve folder.
T5:The service operation places file4 in the stage folder
for manipulation. The manipulation of file3 fails, and the service
operation places it in the failure folder.
T6: The service operation places file5 in the input folder.
It places the results of file4 in the output folder, places file4
in the preserve folder.
T7: The service operation places file5 in the stage folder
for manipulation.
Backing up watched foldersIt is recommended that you back up the entire watched folder
file system to another file system.
Restoring watched foldersThis section describes how to restore watched folders.
Watched folders often invoke short-lived processes that complete
within a minute. In such cases, restoring the watched folder with
a backup that is done hourly will not prevent data loss.
For example, if a backup is taken at time T1 and the server fails
at T7, then file1, file2, file3, and file4 are already manipulated.
Restoring the watched folder with a backup taken at T1 does not
prevent data loss.
If a more recent backup was taken, you can restore the files.
When restoring the files, consider which watched folder hierarchy
folder the current file resides in:
Stage: Files in this folder are processed again after
the watched folder is restored.
Input: Files in this folder are processed again after
the watched folder is restored.
Result: Files in this folder are not processed.
Output: Files in this folder are not processed.
Preserve: Files in this folder are not processed.
Strategies to minimize data lossThe following strategies can minimize output and input
folder data loss when restoring a watched folder:
Back up output and failure folders frequently, such as
hourly, to avoid loss of result and failure files.
Back up the input files in a folder other than the watched
folder. This ensures file availability after recovery in case you
cannot find the files in either the output or failure folder. Ensure
that your file naming scheme is consistent.
For example, if
you are saving the output with %F.extension,
the output file will have the same name as the input file. This
helps you to determine which input files are manipulated and which
ones must be resubmitted. If you see only file1_out file in the
result folder and not file2_out, file3_out and file4_out, you must
resubmit file2, file3, and file4.
If the watched folder backup that is available is older than
the time it takes to process the job, you should allow the system
to create a new watched folder and automatically place the files
in the input folder.
If the latest available backup is not recent enough, the
backup time is less than the time it takes to process the files,
and the watched folder is restored, the file was manipulated in
one of the following different stages:
Stage 1: In
the input folder
Stage 2: Copied to the stage folder but the process
is not yet invoked
Stage 3: Copied to the stage folder and the process
is invoked
Stage 4: Manipulation in progress
Stage 5: Results returned
If files
are in Stage 1, they will be manipulated. If files are in Stage
2 or 3, place them in the input folder for manipulation to take
place again.
Note: If manipulation of a file occurs
more than once, data loss will be prevented but results may be duplicated.
ConclusionDue to the dynamic and constantly changing nature of a
watched folder, restoration of watched folders should be done with
files that are backed up within a day. A best practice would be
backing up the results, storing the input folder on a server, and
tracking input files so that you can resubmit the job in case of
failure.
|
|
|