DR for a Replication Server

Restoring a Primary Database

________________________________________________________________________

Phase I: Clean up the old system


1) Log on to your primary Replication Server and suspend the DSI connection to the

primary database.


suspend connection to srv1.pdb

2) Re-create the primary database.

  • · If your primary database is irretrievably damaged, drop it. If the database is

marked suspect, you must use the dbcc dbrepair command. Drop database will

not execute on a suspect database.

drop database pdb

dbcc dbrepair (pdb,dropdb)


  • · Drop any damaged database devices using sp_dropdevice.
  • · Create new database devices using disk init (if necessary).
  • · Create the new database using create database with the for load option.


3) Restore the primary database.

  • · Load the database from the database dump.

load database pdb from <dump file>

  • · Restore any transaction log dumps.

load transaction pdb from <dump file>

  • · Activate the database using the online database command.

online database pdb

Phase II: Re-Establish Replication

If you have done up-to-the-minute recovery, your primary database should be current, and

you can go directly to Step 8. Steps 4-7 are only necessary if you have not recovered all

of the data in the database.


4) Zero out the LTM truncation point. Use the RSSD for your primary Replication

Server, and run the following command:

rs_zeroltm srv1, pdb

5) Log on to the primary Replication Server and get the generation number for the

primary database. Record this number.

admin get_generation, srv1, pdb

6) Use your primary database, and update the generation number. The new number

should be higher than the old number.

dbcc settrunc(’ltm’, ’gen_id’, <new number>)

The reason for doing this is that the generation number will differ between the RSSD on the Repserver controlling this db and the actual db The reason for doing this is that if you are restoring a primary database to an earlier state,  you have to increment the database generation number so that the Replication  Server does not ignore log records submitted after the database  is reloaded.


7) Recover any lost or inconsistent data.

  • · Recover any data modification lost from the primary database.
  • · Re-synchronize the replicate database:
  • · Re-materialize the replicate database.
  • · Run rs_subcmp.
  • · Replay the transaction log from a log backup.


8) Once the primary and replicate databases are correct, you can resume replication. Log

on to your primary Replication Server and resume the DSI connection to your primary

database.

resume connection to srv1.pdb

9) Restart the RepAgent on your primary database.


10) Review your error logs to ensure that no errors have occurred.












Re-playing the Transaction Log

________________________________________________________________________


1) Log on to your Adaptive Server. Create a new database that has the same size and

segments as your primary database. This will be your temporary recovery database.


2) Shut down the Replication Server and restart it in single-user mode. To do this, you

should have a run server file that contains the –M option.

Once the Replication Server comes up, check the error log to ensure that the server

came up in stand-alone mode.


3) Put the Replication Server into log recovery mode by executing the following

commands:

set log recovery for srv1.pdb

allow connections


4) Configure a RepAgent for the temporary recovery database that you created in Step 1.

Log into your primary data server, use the temporary recovery database, and execute

the following commands:

exec sp_config_rep_agent temp_rep, ‘enable’, ‘rep1’,

’sa’,’’


5) Log on to your primary Adaptive Server and load the temporary database you created

with the database dump of your primary database.

load database temp_rep from ’<dump file>’

go

online database temp_rep


6) Start the RepAgent for the temporary recovery database in recovery mode. Although

you are starting the RepAgent on the temporary recovery database, the ‘connect

database’ is your original primary database:

exec sp_start_rep_agent temp_rep, recovery,

‘srv1’,‘pdb’,’rep1’


7) Verify that the RepAgent has replayed everything that you have loaded. There are

two ways to do this:

  • · Examine the error log of your primary Adaptive Server, looking for a message

that states that recovery of the transaction log is complete.

  • · Run sp_help_rep_agent on the RepAgent for the temporary recovery database:

exec sp_help_rep_agent temp_rep, ‘recovery’

The recovery status should be “not running” or “end of log”.


8) Log on to your primary Adaptive Server and load the temporary database you created

with the first transaction dump of your primary database:

load transaction temp_rep

from ’<dump file>’

9) Restoring the transaction dump will invalidate the LTM truncation point. Reset it by

using the temporary recovery database and running the following command:

dbcc settrunc(‘ltm’,’valid’)


10) Start the RepAgent for the temporary recovery database in recovery mode:

exec sp_start_rep_agent temp_rep, recovery,

‘srv1’,‘pdb’,’rep1’


11) Verify that the RepAgent has replayed everything that you have loaded. There are

two ways to do this:

  • · Examine the error log of your primary Adaptive Server, looking for a message

that states that recovery of the transaction log is complete.

  • · Run sp_help_rep_agent on the RepAgent for the temporary recovery database:

exec sp_help_rep_agent temp_rep, ‘recovery’

The recovery status should be “not running” or “end of log”.


12) Repeat Steps 8, 9, 10, and 11 for each transaction log dump until you have played all

transaction log dumps through the replication system.


13) Check the Replication Server error log for any loss detection messages. If you find

any, then this process has failed, and you must find another way of re-synchronizing

your data in the primary and replicate databases. Your other options are:

  • · Re-materializing the replicate database.
  • · Running rs_subcmp.


14) If you did not find loss detection messages in the previous step, shut down your

Replication Server, and restart it in normal mode. To do this, you should have a run

server file that does not contain the –M option.

If you accidentally bring the server up in single-user mode, there will be a message to

this effect in the error log. Shut Replication Server down, and start it up using a

correct run server file.


15) Re-establish the LTM truncation point in the original primary database. Use the

primary database, and run the following command:

dbcc settrunc(‘ltm’,’valid’)


16) Restart the RepAgent for the original primary database in normal mode.

exec sp_start_rep_agent pdb


17) Log on to your Replication Server and check for any open transactions

admin who,sqt


18) Drop the temporary recovery database, and remove its database devices.




Rebuilding Stable Queues

________________________________________________________________________


1) Log on to your replication server, and drop the damaged partition.

drop partition <partition name>


2) If necessary, add a new partition to replace the disk space. Remember, the file or raw

partition must already exist before you execute this command.

  • · In UNIX, you can use the touch command to create a file.
  • · In Windows NT, you can create a file using a utility program such as Notepad.

add partition <partition name>

on ’<physical_name>’

with size <size>


3) Check the disk space on your Replication Server. Your failed partition should contain

the status DROPPED, while your new partition should be listed as ONLINE.

admin disk_space


4) Rebuild the stable device:

rebuild queues


5) If the connection to your replicate database is suspended, resume it.


6) Examine your Replication Server error log. Wait for the log to display the message

“Rebuild Queues: Complete”.

You may see the error message, “DSI: detecting loss for database ’srv1.pdb’…” If

your RepAgent is not running, this message can be ignored.


7) If the RepAgent for the primary database is not running, start it up. Use your primary

database and run the following:

exec sp_start_rep_agent pdb


8) Examine your Replication Server error log. Check for loss detection messages. If a

data loss was detected, then this process has not fully succeeded, and you have lost

transactions.

  • · Compare the primary and replicate databases, and replace any lost data.
  • · Use the ignore loss command to restart replication.

ignore loss from srv1.pdb to srv1.rdb

If no data loss was detected, Replication Server will end loss detection mode, and

normal replication may continue.


9) Check the disk space used by your Replication Server’s stable device.

admin disk_space

If the damaged partition no longer appears in this display, the file or raw partition that

was used by the damaged partition may be dropped.


Restoring an RSSD Database

________________________________________________________________________

Phase I: Clean up the old system


1) Shut down all RepAgents that connect to the current Replication Server.


2) If the Replication Server is still running, log on to it and shut it down.


3) Re-create the RSSD database.

  • · If your RSSD database is irretrievably damaged, drop it. If the database is marked

suspect, you must use the dbcc dbrepair command. Drop database will not

execute on a suspect database.

drop database rep1_RSSD

dbcc dbrepair (rep1_RSSD,dropdb)

  • · Drop any damaged database devices using sp_dropdevice.
  • · Create new database devices using disk init (if necessary).
  • · Create the new RSSD database using create database with the for load option.


4) Restore the RSSD database.

  • · Load the database from the database dump.

load database rep1_RSSD

from ‘<dump device>’

  • · Load the database from any transaction log dumps. Be sure to apply the

transaction log dumps in order.

load tran rep1_RSSD

from ‘<dump device>’

  • · Activate the database using the online database command.

online database rep1_RSSD


5) If your Replication Server’s RSSD had a RepAgent and you were able to recover the

database up-to-the-minute, re-establish its truncation point and restart your

RepAgents. Use the RSSD database and run the following command:

dbcc settrunc(’ltm’, ’valid’)


6) If your Replication Server’s RSSD had a RepAgent and there was an active route from

this replication server to another one, but you were unable to recover up-to-the-minute,

get the generation number.

admin get_generation, srv1, rep1_RSSD

Phase II: Rebuild the RSSD

If you have done up-to-the-minute recovery, your RSSD database should be current.

Restart your Replication Server and RepAgents. Steps 7-12 are only necessary if you

have not been able to recover the RSSD database to its current state.


7) Shut down the Replication Server and restart it in single-user mode. To do this, you

should have a run server file that contains the –M option. Once the Replication

Server comes up, check the error log to ensure that the server came up in stand-alone

mode.


8) Rebuild the stable device:

rebuild queues


9) Start up the RepAgents for your primary databases in recovery mode. To do this, do

the following:

sp_start_rep_agent pdb1, recovery


10) Examine your Replication Server error log, and check for loss detection messages. If

a data loss was detected:

  • · Compare the primary and replicate databases, and replace any lost data.
  • · Use the ignore loss command to restart replication.

ignore loss from srv1.pdb to srv1.rdb


If no data loss was detected, Replication Server will end loss detection mode, and

normal replication may continue.


11) If this Replication Server is a primary Replication Server, with a route to a

downstream Replication Server, then you must clean up the truncation point in the

RSSD. Remember that a Primary Replication Server’s RSSD is a primary database,

replicating schema information to other RSSDs.

If your Replication Server is the only one in the domain, or if it is a replicate

Replication Server, you do not need to run this last step.

To clean up the truncation point, do the following:

  • · Clear the RSSD’s LTM truncation point:

dbcc settrunc(’ltm’, ’ignore’)

  • · Move the transaction log forward. Execute the following batch multiple times (40

times, for example):

begin tran

checkpoint

commit

If the number of log records does not increase, create a table in the database and

drop it, and then try the above batch again.

  • · Reset the RSSD’s LTM truncation point:

dbcc settrunc(’ltm’, ’valid’)

  • · Increment the generation id number in the RSSD:

dbcc settrunc(’ltm’, ’gen_id’, <new number>)

12) Shut down your Replication Server and your RepAgents, and restart them in normal

mode.

  • · To start your Replication Server in normal mode, you should have a run server file

that does not contain the –M option. If you accidentally bring the server up in single-user

mode, there will be a message to this effect in the error log. Shut Replication

Server down, and start it up using a correct run server file.

  • · To start a RepAgent in normal mode, run the following command:

sp_start_rep_agent pdb







































Avoiding Disaster

________________________________________________________________________

To avoid disasters on your RSSD database:

1) Put your RSSD on database devices that are not being used by any other databases. If

you can, put them on separate disk drives.


2) Separate your RSSD data and log portions on to separate database devices and, if

possible, separate disk drives.


3) Mirror your RSSD database devices.


4) Keep current backups of your RSSD database.


5) Perform transaction backups of your RSSD database to permit up-to-the-minute

recovery.

To avoid disasters on your stable device:

6) Mirror your stable device.

Note: Sybase does not provide a mechanism to mirror your stable device. Your disk

drives or operating system must do this.


7) Always have more disk space in your stable device than is strictly necessary. If

replication fails, having more disk space gives you time to fix the problem before the

stable device fills up.

To avoid disasters on your application databases:

8) Keep current backups of your primary database.


9) If your replicate database is significantly different than your primary database, and

would be difficult to rebuild, keep a current backup of it, also.

Miscellaneous suggestions

10) Document your system, so that you know the names and locations of all important

objects.


11) Create rs_subcmp parameter files for every subscription you are replicating, so that

you don’t need to create them during a crisis.


12) On a regular basis, run rs_subcmp against your primary and replicate tables to ensure

that they truly are synchronized.