[ovirt-users] Creating Snapshots failed

Discussion:

2021-05-27 14:58:06 UTC

Hello Community,

since I upgrade our cluster to ovirt 4.4.6.8-1.el8 I'm not able anymore
to create snapshots on certain VMs. For example I have two debian 10
VMs, from one I can make a snapshot, and from other one not.

Both are up to date and uses the same qemu-guest-agent versions.

I tried to create snapshots over API and on web gui, both gives the same
result.

In the attachment you found a snipped from the engine.log.

Any help would be wonderful!

Regards,

Jonathan

Liran Rotenberg

2021-05-27 15:21:18 UTC

Permalink

Post by jb
Hello Community,
since I upgrade our cluster to ovirt 4.4.6.8-1.el8 I'm not able anymore
to create snapshots on certain VMs. For example I have two debian 10
VMs, from one I can make a snapshot, and from other one not.
Both are up to date and uses the same qemu-guest-agent versions.
I tried to create snapshots over API and on web gui, both gives the same
result.
In the attachment you found a snipped from the engine.log.

Hi,
The error happened in VDSM (or even platform). But we need the VDSM
log to see what is wrong.

Regards,
Liran.

Post by jb
Any help would be wonderful!
Regards,
Jonathan
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/

_______________________________________________
Users mailing list -- ***@ovirt.org
To unsubscribe send an email to users-***@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/***@o

2021-05-27 15:57:48 UTC

Permalink

Hi Liran,

here are the vdsm logs, from all 3 nodes.

Regards

Jonathan

Post by Liran Rotenberg

Hi,
The error happened in VDSM (or even platform). But we need the VDSM
log to see what is wrong.
Regards,
Liran.

Liran Rotenberg

2021-05-31 07:44:16 UTC

Permalink

Post by jb
Hi Liran,
here are the vdsm logs, from all 3 nodes.

Thanks!
The real error is:
2021-05-27 16:46:35,539+0200 ERROR (virt/487072f9) [storage.VolumeManifest]
[Errno 116] Stale file handle (fileVolume:172)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileVolume.py", line
170, in getMetadata
data = self.oop.readFile(metaPath, direct=True)
File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py",
line 369, in readFile
return self._ioproc.readfile(path, direct=direct)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 574,
in readfile
"direct": direct}, self.timeout)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479,
in _sendCommand
raise OSError(errcode, errstr)
OSError: [Errno 116] Stale file handle
2021-05-27 16:46:35,539+0200 INFO (virt/487072f9) [vdsm.api] FINISH
prepareImage error=Error while processing volume meta data:
("('/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479',):
[Errno 116] Stale file handle",) from=internal,
task_id=67405e50-503c-4b44-822f-4a7cea33ab84 (api:52)
2021-05-27 16:46:35,539+0200 ERROR (virt/487072f9)
[storage.TaskManager.Task] (Task='67405e50-503c-4b44-822f-4a7cea33ab84')
Unexpected error (task:880)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileVolume.py", line
170, in getMetadata
data = self.oop.readFile(metaPath, direct=True)
File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py",
line 369, in readFile
return self._ioproc.readfile(path, direct=direct)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 574,
in readfile
"direct": direct}, self.timeout)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479,
in _sendCommand
raise OSError(errcode, errstr)
OSError: [Errno 116] Stale file handle

- [Errno 116] Stale file handle
I can also see you are using glusterFS, maybe they have a bug (fast looking
i saw https://bugzilla.redhat.com/show_bug.cgi?id=1708121 which from my
understanding result in the same error to the file).

Kotresh, can you provide some information if I am right? and how to
workaround it? If not, to point on who can look into it?

Regards,
Liran.

Post by jb
Hi Liran,
here are the vdsm logs, from all 3 nodes.
Regards
Jonathan

Post by Liran Rotenberg

Hi,
The error happened in VDSM (or even platform). But we need the VDSM
log to see what is wrong.
Regards,
Liran.

Post by jb
Any help would be wonderful!
Regards,
Jonathan
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html

https://www.ovirt.org/community/about/community-guidelines/

2021-06-01 08:10:55 UTC

Permalink

Here are the output:

-rw-rw----. 1 vdsm kvm 5249171456 JunÂ 1 09:39
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479

And folder content:

# ls -l
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/

total 5136997
-rw-rw----. 1 vdsm kvm 5249171456 JunÂ 1 09:39
15259a3b-1065-4fb7-bc3c-04c5f4e14479
-rw-rw----. 1 vdsm kvmÂ Â Â 1048576 DecÂ 4 14:50
15259a3b-1065-4fb7-bc3c-04c5f4e14479.lease
-rw-r--r--. 1 vdsm kvmÂ Â Â Â Â Â Â 301 May 28 23:02
15259a3b-1065-4fb7-bc3c-04c5f4e14479.meta

As I say before, the interesting thing is, that on some other VMs (with
bigger disks) I'm able to make snapshots.

Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 9:49, Liran Rotenberg
On Mon, May 31, 2021 at 7:08 PM Strahil Nikolov

This stale file handle can happen when there is a gfid mismatch

between bricks causing a some kind of splitbrain.

ls -l

/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479
could show the file status.
Hi Strahil,
Can you respond to the thread? Luckily for me, I don't deal with
glusterFS so much :)

Best Regards,
Strahil Nikolov
On Mon, May 31, 2021 at 10:46, Liran Rotenberg
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html

<https://www.ovirt.org/privacy-policy.html>
https://www.ovirt.org/community/about/community-guidelines/
<https://www.ovirt.org/community/about/community-guidelines/>

2021-06-01 14:24:56 UTC

Permalink

I don't know if this is related, but I see now that the storage domain

I thought that I sent it to all, but I was wrong
What is the output of ls -l
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479
?
Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 9:49, Liran Rotenberg
On Mon, May 31, 2021 at 7:08 PM Strahil Nikolov

This stale file handle can happen when there is a gfid mismatch

between bricks causing a some kind of splitbrain.

ls -l

Best Regards,
Strahil Nikolov
On Mon, May 31, 2021 at 10:46, Liran Rotenberg
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html

<https://www.ovirt.org/privacy-policy.html>
https://www.ovirt.org/community/about/community-guidelines/
<https://www.ovirt.org/community/about/community-guidelines/>

Jonathan Baecker

2021-06-02 19:57:33 UTC

Permalink

Most probably it does.
systemctl restart ovirt-engine

Thank you for the hint!

I did, and now I can create a snapshot. But the old locked snapshot
disks I still can not delete.

Jonathan

Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 17:27, jb
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
<https://www.ovirt.org/privacy-policy.html>
https://www.ovirt.org/community/about/community-guidelines/
<https://www.ovirt.org/community/about/community-guidelines/>

Jonathan Baecker

2021-06-02 21:26:53 UTC

Permalink

You can try to use the unlock_entity.sh
# cd /usr/share/ovirt-engine/setup/dbutils
# source /etc/ovirt-engine/engine.conf.d/10-setup-database.conf
# export PGPASSWORD=$ENGINE_DB_PASSWORD
# ./unlock_entity.sh -h
# ./unlock_entity.sh -u engine -t disk -q
Source: https://access.redhat.com/solutions/396753

Sadly this not helped.

This:

# ./unlock_entity.sh -t all -q

shows also no Locked results.

Best Regards,
Strahil Nikolov
Most probably it does.
systemctl restart ovirt-engine
Thank you for the hint!
I did, and now I can create a snapshot. But the old locked snapshot disks I still can not delete.
Jonathan

Best Regards,
Strahil Nikolov

On Tue, Jun 1, 2021 at 17:27, jb
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/

Jonathan Baecker

2021-06-02 21:33:44 UTC

Permalink

Ok, ./unlock_entity.sh -t all worked. Thanks again!

Best Regards,
Strahil Nikolov