Evaluating a Disk for Hosting the Deduplication Database

You can use simulateddb option with SIDB2 tool to evaluate the disk in which you plan to host the deduplication database (DDB). This helps to determine the size of the data and the DDB that can be hosted on the disk.

You can also use the user-interface version of this tool. See Deduplication Database Simulator for more details and usage.

Procedure

  1. Log on to the MediaAgent computer that you plan to host the DDB.

  2. From the command prompt, go to the software_installation_directory/Base folder, and run the following command by using one or more of the following parameters.

    Options

    Descriptions

    -simulateddb

    The keyword to simulate the DDB to evaluate the disk compatibility for hosting the DDB.

    -p

    The path where the DDB files will be located during simulation.

    For example: D:\DDB01

    -e

    To use the existing DDB files for simulation.

    • If -e option is used, the files from the existing DDB must be copied under folder names as DDBSimulation and the path for -p option must be the location where the Primary.dat or Primary.idx files are located.

      For example: D:\DDBSimulation

    • If -e option is not used, the command will create new SIDB_Folder_n folder under provided path.

    -in

    The instance of the software using the tool.

    -datasize

    The application data size in GB.

    -threads

    Number of threads that are accessing the DDB.

    Default: 8

    Range: 1-8

    -dratio

    The expected deduplication ratio.

    Default: 5.

    -blocksize

    The deduplication data block size in KB.

    Default: 128 KB.

    -tlimit

    The query and insertion (Q&I) time limit in microseconds.

    Default: 1000.

    -tlimit and -datasize options cannot be used together.

    -cleanddb

    The files that are created during simulation process are deleted after completion of simulation.

    -noprunesim

    By default, pruning simulation is enabled to run with the DDB simulation only when the -tlimit param is specified.

    This parameter disables pruning simulation on the DDB.

    -outfile

    The location of the output file that stores the DDB simulation results.

    Syntax

    • Windows

      sidb2 -simulateddb {-p <DDBLocation> [-e]} -in <instance#> [-datasize <number>] [-threads <number>] [-dratio <number>] [-blocksize <number>] [-tlimit <number>] [-cleanddb] [-noprunesim] [-outfile <output file path>]
    • Linux

      ./sidb2 -simulateddb {-p <DDBLocation> [-e]} -in <instance#> [-datasize <number>] [-threads <number>] [-dratio <number>] [-blocksize <number>] [-tlimit <number>] [-cleanddb] [-noprunesim] [-outfile <output file path>]

Examples

  • For details on the projected average transaction time for an insert or query in the DDB based on the size of the application data that is backed up, use -simulateddb and -datasize options.

    sidb2 -simulateddb -in instance001 -p d:\DDB -datasize 500 -outfile D:\simulationresults.txt
  • For recommendations on the maximum application data size that can be backed up using the DDB based on the average access time for each record, use -simulateddb.

    This will run till it reaches the default threshold time limit of 1000 microseconds.

    sidb2 -simulateddb -in instance001 -p d:\DDB -outfile D:\simulationresults.txt
  • To run DDB simulation using existing DDB files. This simulation will run till it reaches the Q&I time threshold of 150 microseconds.

    SIDB2.exe -simulateddb -p f:\DDBSimulation\CV_SIDB\2\n\Split00 -e -in instance002 -tlimit 150 -outfile D:\simulationresults.txt

Output

The details of the DDB simulation are stored in the output file specified in the -outfile parameter. The following information is a sample of the contents of an output file.

SIDB2.exe -simulateddb -p f:\simulateDDB -in instance002 -tlimit 150
 Warning!!
 SIDB tool will create a new DDB now. It may take long for the tool to get finish
 ed. You can cancel the operation and use -e option to use existing DDB instead.
 Creating new DDB files under: [f:\simulateDDB\SIDB_FOLDER_1]
 Performing QueryInsert ... [Wed Jun 25 22:59:28 2014]
 [Parameters Used]
 Threshold Time Limit -> [150.0] microseconds
 Dedupe Ratio -> [5]
 Block Size -> [128] KB
 No. of threads -> [10]
 Simulate pruning -> [YES]
 No. of records already present:
 [0] Primary records, [0] Secondary records.
 Iteration [36430000] [Thu Jun 26 00:15:37 2014]
 Total Primary records - [70742821]
 Total Secondary records - [353713929]
 Total QueryInsert time - [3866.471682] secs
 Total Commit time - [180.877113] secs
 Average time for last [10000] operations:
 QueryInsert - [1753.26] microseconds
 Commit - [4.94] microseconds
 QueryInsert + Commit - [1758.20] microseconds
 Moving average for last [500000] operations:
 Moving average - [161.55] microseconds
 ----
 Pruning iteration [320126]
 Total Primary records pruned - [5172325]
 Total Secondary records pruned - [25861695]
 Total (Pri + Sec) pruning time - [4355.65] seconds
 Total ZeroRef records pruned - [5111740]
 Total ZeroRef pruning time - [133.59] seconds
 Total Archive Files pruned - [78]
 Time for last iteration
 (Pri + Sec) - [0.013] seconds
 ZeroRef - [0.000] seconds
 (Pri + Sec) + ZeroRef - [0.014] seconds
 Moving Average for last [50] iterations
 Moving Average - [0.017] seconds
 Pruning thread exiting as QI threads have exited.
 Simulation threshold [QI time] reached. Number of QI threads [10].
 Threshold = [150.00] microseconds.
 Current value = [161.55] microseconds.
 No. of records at threshold limit:
 [70756183] Primary records, [353780725] Secondary records.
 QueryInsert time taken per connection = [4187.761398] secs
 Max. QueryInsert time taken = [4226.064840] secs
 Commit time taken per connection = [189.973720] secs
 Max. Commit time taken = [193.188206] secs
 QueryInsert + Commit Time per connection = [4377.735118] secs
 Deduplication DB Simulation Completed [Thu Jun 26 00:15:38 2014]
 The disk is capable of hosting a Deduplication DB for:
 42.174 TB of Application Data size
 8.435 TB of data on disk
 5.623 TB of front end application size
 115.3 microseconds average Query & Insert time per block
 Throughput for DDB server 35514 GB per Hour

Loading...