8000 Add tests for DR · Issue #865 · ClusterLabs/anvil · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add tests for DR #865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy stateme 8000 nt. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
digimer opened this issue Mar 12, 2025 · 2 comments
Open

Add tests for DR #865

digimer opened this issue Mar 12, 2025 · 2 comments
Assignees
Labels
pending test automation the issue requires a test written in CI to avoid regressions
Milestone

Comments

@digimer
Copy link
Member
digimer commented Mar 12, 2025

Instructions for DR (ignoring proxy [aka long-throw] for now);

  1. Show the list of nodes and DR hosts;
# anvil-manage-dr --show
Anvil! Nodes
- Node Name: [an-anvil-01], Description: [Demo VM Anvil! Node 1]
 - No linked DR hosts yet.
- Node Name: [an-anvil-02], Description: [Demo VM Anvil! Node 2]
 - No linked DR hosts yet.
- Node Name: [an-anvil-03], Description: [Demo VM Anvil! Node 3]
 - No linked DR hosts yet.

-=] DR Hosts
- Name: [an-a01dr01.alteeve.com]
- Name: [an-a03dr01.alteeve.com]

-=] Servers
- Server name: [srv01-sql] on Anvil! Node: [an-anvil-01]
- Server name: [srv02-reports] on Anvil! Node: [an-anvil-02]
- Server name: [srv05-load-balancer] on Anvil! Node: [an-anvil-02]
- Server name: [srv03-app1] on Anvil! Node: [an-anvil-03]
- Server name: [srv04-app2] on Anvil! Node: [an-anvil-03]
  1. Link an-a01dr01 to an-anvil-01;
# anvil-manage-dr --dr-host an-a01dr01 --link --anvil an-anvil-01

The DR host: [an-a01dr01] has been linked to the Anvil! node: [an-anvil-01].
  1. Make sure the DR's VG has been grown; (example output shows prompt that won't be there with --confirm)
# anvil-manage-host --auto-grow-pv --confirm
Searching for free space to grow PVs into.

[ Warning ] - Auto-growing the LVM physical volumes could, in some case, leave the system unbootable.
The steps that will taken are;
- LVM Physical volumes will be found.
- For each found, 'parted' is used to see if there is > 1GiB of free space available.
- If so, and if no other partitions are after it, it will be grown to use the free space.
- The PV itself will then be resized to use the new space

This is generally used just after initializing a new subnode or DR host. If this host has real data
on it, please proceed with caution. 

The partition table will be backed up, and if the partition resize fails, the partition table will be
reloaded automatically. If this host has real data, ensure a complete backup is available before 
proceeding.

Proceed? [y/N]
y
Thank you, proceeding.
Enabling maintenance mode.
Found: [170.53 GiB] free space after the PV partition: [/dev/vda:3]! Will grow the partition to use the free space.
- [ Note ] - The original partition table for: [/dev/vda] has been saved to: [/tmp/vda.partition_table_backup]
             If anything goes wrong, we will attempt to recover automatically. If needed, you can try
             recovering with: [/usr/sbin/sfdisk /dev/vda < /tmp/vda.partition_table_backup --force]
The partition: [/dev/vda3] appears to have been grown successfully. The new partition scheme is:
====
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 268435456000B
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start        End            Size           File system  Name                  Flags
        17408B       1048575B       1031168B       Free Space
 1      1048576B     630194175B     629145600B     fat32        EFI System Partition  boot, esp
 2      630194176B   1703935999B    1073741824B    xfs
 3      1703936000B  268435439103B  266731503104B                                     lvm
====

The resize appears to have been successful. The physical volume: [/dev/vda3] details are now:
====
  --- Physical volume ---
  PV Name               /dev/vda3
  VG Name               an-a01dr01_vg0
  PV Size               248.41 GiB / not usable 1.98 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              63593
  Free PE               43657
  Allocated PE          19936
  PV UUID               tZ2e9M-2AD0-gmI0-a1EL-0hjD-WgR9-AX9D0V
   
====

The physical volume: [/dev/vda3] has been resized!
Disabling maintenance mode.
  1. Link the DR's appropriate VG to the SG (see issue
# anvil-manage-storage-groups --anvil an-anvil-01 --group "Storage group 1" --add --member KE03fH-9jp2-szdN-LcnL-fAfw-lG1A-pCqMsC
Added the volume group: [an-a01dr01_vg0] on the host: [an-a01dr01] to the storage group: [Storage group 1]. The new member UUID is: [a1f94e96-3022-4bf4-8b39-a644a09c745f].
  1. Protect the server;
# anvil-manage-dr --server srv01-sql --protect --dr-host an-a01dr01 --Yes
Sanity checks complete!
Beginning to protect the server: [srv01-sql]!
Verified that there is enough space on DR to proceed.
* The connection protocol will be: ..... [short-throw]
* We will update the DRBD resource file: [/etc/drbd.d/srv01-sql.res]
The following LV(s) will be created:

- Resource: [srv01-sql], Volume: [0]
 - The LV: [/dev/an-a01dr01_vg0/srv01-sql_0] with the size: [10.00 GiB (10,737,418,240 Bytes)] will be created.
The job has been recorded with the UUID: [0611d233-f002-4749-a671-0e79a5d3df7a], it will start in just a moment if anvil-daemon is running.
  1. When the job is finished, verify that the DR host now exists in the DRBD resource file.
# cat /etc/drbd.d/srv01-sql.res | grep -A13 connection | grep -B2 -A11 an-a01dr01
	connection {
		host an-a01n01 address 10.101.10.1:7789;
		host an-a01dr01 address 10.101.10.3:7789;
		disk {
			# The variable bit rate caps at 100 MiB/sec, setting this changes the maximum 
			# variable rate.
			c-max-rate 500M;
		}
		net {
			protocol A;
			verify-alg md5;
                	fencing dont-care;
		}
	}
--
	connection {
		host an-a01n02 address 10.101.10.2:7790;
		host an-a01dr01 address 10.101.10.3:7790;
		disk {
			# The variable bit rate caps at 100 MiB/sec, setting this changes the maximum 
			# variable rate.
			c-max-rate 500M;
		}
		net {
			protocol A;
			verify-alg md5;
                	fencing dont-care;
		}
	}

A single run of anvil-watch-drbd would also show that it's connected, assuming the DR host was alive to pickup and run it's side of the job.

From here, you can do # anvil-manage-dr --server srv01-sql --disconnect --Yes which should stop the replication, and --connect to reconnect. The --update should connect until the resource is UpToDate and then disconnect. You can use --remove to remove the replication to DR entirely. The --unlink should remove the DR host as a candidate for the node entirely.

By default, --protocol defaults to short-throw (protocol A in DRBD), you can also use --protocol sync should set 'Protocol C' instead. Ignore "proxy / long-throw" for now as it's going to stay an undocumented feature until we see how DRBD Proxy v4 works.

@digimer digimer added the pending test automation the issue requires a test written in CI to avoid regressions label Mar 12, 2025
@digimer digimer added this to the 3.2 Beta milestone Mar 12, 2025
@fabbione
Copy link
Member

@digimer one thing is not clear to me. what is going to setup the DR host to be a DR host?

is it something we need to do manually or are you planning to fix striker-auto-install for that?

IF manual, I need all the steps, tho I would prefer auto.

@digimer
Copy link
Member Author
digimer commented Mar 13, 2025

A DR host is identified as such simply by installing the anvil-dr RPM. No other steps are needed save for those listed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending test automation the issue requires a test written in CI to avoid regressions
Projects
None yet
Development

No branches or pull requests

3 participants
0