Discussion:
[ovirt-users] Can't remove snapshot
David Johnson
2021-05-28 16:08:06 UTC
Permalink
Hi all,

I patched one of my Windows VM's yesterday. I started by snapshotting the
VM, then applied the Windows update. Now that the patch has been tested, I
want to remove the snapshot. I get this message:

Error while executing action:

win-sql-2019:

- Cannot remove Snapshot. The following attached disks are in ILLEGAL
status: win-2019-tmpl_Disk1 - please remove them and try again.


Does anyone have any thoughts how to recover from this? I really don't want
to keep this snapshot hanging around.

Thanks in advance,

*David Johnson*
Roman Bednar
2021-06-03 10:03:08 UTC
Permalink
Ok, sounds good. Forgot to include the mailing list, doing it now.


---------- Forwarded message ---------
From: David Johnson <***@maxistechnology.com>
Date: Thu, Jun 3, 2021 at 11:17 AM
Subject: Re: [ovirt-users] Can't remove snapshot
To: Roman Bednar <***@redhat.com>


Thanks, I'll check it out.

Since my business is replatforming and transforming databases, digging
around the DB is something I will be very comfortable with.

I won't be able to do anything until Friday. I'll let you know how it goes.

David Johnson
Digging a bit further I found this is a known issue. A discrepancy can
occur between vdsm and engine db when removing a snapshot.
It's been already discussed [1] and a bug is filed [2]. In the discussion
you can find a workaround which is manual removal of the snapshot.
Don't forget to backup the engine database by running 'engine-backup' tool
on the engine node before doing any changes.
# engine-backup
--file=/var/lib/ovirt-engine-backup/ovirt-engine-backup-20210602055605.backup
--mode=restore --provision-all-databases
To check if the discrepancy occurred you can check the db and compare that
to what vdsm sees (which is a source of truth).
The example below shows a consistent setup from my env with one snapshot,
if there is anything extra in your env in the
db it should be removed and the parent id changed accordingly [3].
image_group_id (db) == image (vdsm)
image_guid (db) == logical volume on host (vdsm)
# su - postgres
# psql
postgres=# \c engine
engine=# select image_guid, image_group_id, parentid from images where
image_group_id = 'e75318bf-c563-4d66-99e4-63645736a418';
image_guid | image_group_id
| parentid
--------------------------------------+--------------------------------------+--------------------------------------
1955f6de-658a-43c3-969b-79db9b4bf14c |
e75318bf-c563-4d66-99e4-63645736a418 | 00000000-0000-0000-0000-000000000000
d6662661-eb87-4c01-a204-477919e65221 |
e75318bf-c563-4d66-99e4-63645736a418 | 1955f6de-658a-43c3-969b-79db9b4bf14c
# vdsm-tool dump-volume-chains <STORAGE_DOMAIN_ID>
Images volume chains (base volume first)
image: e75318bf-c563-4d66-99e4-63645736a418
- 1955f6de-658a-43c3-969b-79db9b4bf14c
LEGAL, type: PREALLOCATED, capacity: 5368709120, truesize: 5368709120
- d6662661-eb87-4c01-a204-477919e65221
status: OK, voltype: LEAF, format: COW, legality: LEGAL,
type: SPARSE, capacity: 5368709120, truesize: 3221225472
...
I hope this helps a bit and if you need further assistance let us know,
it's not very convenient to change the db
manually like this but a fix should be on the way :)
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1948599
[2]
[3]
Yes, I have the same error on the second try.
You can see it happening in the engine log starting at 2021-05-31 07:49.
*David Johnson*
*Director of Development, Maxis Technology*
844.696.2947 ext 702 (o) | 479.531.3590 (c)
<https://www.linkedin.com/in/pojoguy/>
<https://maxistechnology.com/wp-content/uploads/vcards/vcard-David_Johnson.vcf>
<https://maxistechnology.com/>
*Follow us:* <https://www.linkedin.com/company/maxis-tech-inc/>
Hi David,
awesome, thanks for the reply. Looking at the logs there does not seem
anything suspicious on vdsm side and as you said the snapshots are really
gone when looking from vdsm. I tried to reproduce without much success but
it looks like a problem on the engine side.
Did you get the same error saying that the disks are illegal on the
second try? There should be more in the engine log so try checking it as
well to see if this is really on the engine side.
It would be great to have a reproducer for this and file the bug so we
can track this and provide a fix.
-Roman
On Mon, May 31, 2021 at 3:20 PM David Johnson <
Hi Roman,
Thank you for your assistance.
I found another snapshot that needed collapsing, and deleted that.
These logs include that execution.
Prior to the execution, the vdsm-dump listed snapshot volumes.
Post-execution, the snapshot volumes were absent. That suggests to me that
the snapshot was actually removed, but Ovirt is confused.
*David Johnson*
*Director of Development, Maxis Technology*
844.696.2947 ext 702 (o) | 479.531.3590 (c)
<https://www.linkedin.com/in/pojoguy/>
<https://maxistechnology.com/wp-content/uploads/vcards/vcard-David_Johnson.vcf>
<https://maxistechnology.com/>
*Follow us:* <https://www.linkedin.com/company/maxis-tech-inc/>
Hello David,
there's quite a few reasons a volume could be marked as illegal, e.g.
failed operation that left the volume in this state. This is done in vdsm
so please provide a vdsm log on the host running the VM so we can check
exactly what went wrong. Also the state of the storage domain could be
helpful to see what volumes are present and marked illegal, you can get the
# vdsm-client StorageDomain dump sd_id=<SD_ID>
-Roman
On Fri, May 28, 2021 at 6:10 PM David Johnson <
Post by David Johnson
Hi all,
I patched one of my Windows VM's yesterday. I started by
snapshotting the VM, then applied the Windows update. Now that the patch
- Cannot remove Snapshot. The following attached disks are in
ILLEGAL status: win-2019-tmpl_Disk1 - please remove them and try again.
Does anyone have any thoughts how to recover from this? I really
don't want to keep this snapshot hanging around.
Thanks in advance,
*David Johnson*
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
https://www.ovirt.org/community/about/community-guidelines/
Nir Soffer
2021-06-03 13:16:49 UTC
Permalink
Post by David Johnson
Hi all,
I patched one of my Windows VM's yesterday. I started by snapshotting the
VM, then applied the Windows update. Now that the patch has been tested, I
- Cannot remove Snapshot. The following attached disks are in ILLEGAL
status: win-2019-tmpl_Disk1 - please remove them and try again.
Does anyone have any thoughts how to recover from this? I really don't
want to keep this snapshot hanging around.
The engine ILLEGAL state means that you started a delete snapshot operation,
and the data in this snapshot has changed. This snapshot cannot be used for
restoring the vm state to the state as it was when the snapshot was created.

In this situation you can retry the delete snapshot operation again. If the
first
delete failed because of a temporary error the operation should succeed and
the snapshot will be deleted.

Nir
David Johnson
2021-06-03 17:51:13 UTC
Permalink
If you check roman's update, there is a known bug in the engine. The
snapshot was successfully deleted, but the engine database is out of sync.
Hey Nir,
you said that the data in the snapshot changed ?
I always thought that snapshots are read-only.
How is that possible?
Best Regards,
Strahil Nikolov
Nir Soffer
2021-06-03 19:46:32 UTC
Permalink
Hey Nir,
you said that the data in the snapshot changed ?
I always thought that snapshots are read-only.
Indeed snapshot is ready only - until you start to delete it. This is
why we mark the
snapshot as illegal once delete snapshot was started.

It works like this:

1. Before snapshot

Snapshots: (none)
Volumes: A (active)

A is read-write volume, changing while the vm is running.

2. After snapshot

Snapshot: snap1 (disk snapshot A)
Volumes: A <- B (active)

A is now read only image, will never change
B is read-write, modified by the vm
B backing file is A

2. Start delete snapshot 1

Snapshot: snap1 (disk snapshot A, illegal)
Volumes: A <- B (active)

On the host running the vm, we perform block commit job,
copying data from B into A.

When the job completes, A contains all data in B, and any new
data written to the B is mirrored to A.

3. Pivoting to volume A

When the block commit has completed, we switch to vm to use volume A
instead of volume B.

At this point the VM is writing again to volume A, and volume B is unused.

Snapshot: snap1 (disk snapshot A, illegal)
Volumes: A (active) <- B

4. Cleanup

On engine side, snapshot 1 is deleted
On the host, volume B is deactivated
On the SPM host, volume B is deleted

Snapshot: (none)
Volumes: A (active)

I hope this is more clear now.

Nir
_______________________________________________
Users mailing list -- ***@ovirt.org
To unsubscribe send an email to users-***@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/***@ovirt.org/message/7
David Johnson
2021-06-03 19:48:10 UTC
Permalink
Thanks!
Post by Nir Soffer
Hey Nir,
you said that the data in the snapshot changed ?
I always thought that snapshots are read-only.
Indeed snapshot is ready only - until you start to delete it. This is
why we mark the
snapshot as illegal once delete snapshot was started.
1. Before snapshot
Snapshots: (none)
Volumes: A (active)
A is read-write volume, changing while the vm is running.
2. After snapshot
Snapshot: snap1 (disk snapshot A)
Volumes: A <- B (active)
A is now read only image, will never change
B is read-write, modified by the vm
B backing file is A
2. Start delete snapshot 1
Snapshot: snap1 (disk snapshot A, illegal)
Volumes: A <- B (active)
On the host running the vm, we perform block commit job,
copying data from B into A.
When the job completes, A contains all data in B, and any new
data written to the B is mirrored to A.
3. Pivoting to volume A
When the block commit has completed, we switch to vm to use volume A
instead of volume B.
At this point the VM is writing again to volume A, and volume B is unused.
Snapshot: snap1 (disk snapshot A, illegal)
Volumes: A (active) <- B
4. Cleanup
On engine side, snapshot 1 is deleted
On the host, volume B is deactivated
On the SPM host, volume B is deleted
Snapshot: (none)
Volumes: A (active)
I hope this is more clear now.
Nir
Loading...