Discussion:
[ovirt-users] Creating Snapshots failed
jb
2021-05-27 14:58:06 UTC
Permalink
Hello Community,

since I upgrade our cluster to ovirt 4.4.6.8-1.el8 I'm not able anymore
to create snapshots on certain VMs. For example I have two debian 10
VMs, from one I can make a snapshot, and from other one not.

Both are up to date and uses the same qemu-guest-agent versions.

I tried to create snapshots over API and on web gui, both gives the same
result.

In the attachment you found a snipped from the engine.log.

Any help would be wonderful!


Regards,

Jonathan
Liran Rotenberg
2021-05-27 15:21:18 UTC
Permalink
Post by jb
Hello Community,
since I upgrade our cluster to ovirt 4.4.6.8-1.el8 I'm not able anymore
to create snapshots on certain VMs. For example I have two debian 10
VMs, from one I can make a snapshot, and from other one not.
Both are up to date and uses the same qemu-guest-agent versions.
I tried to create snapshots over API and on web gui, both gives the same
result.
In the attachment you found a snipped from the engine.log.
Hi,
The error happened in VDSM (or even platform). But we need the VDSM
log to see what is wrong.

Regards,
Liran.
Post by jb
Any help would be wonderful!
Regards,
Jonathan
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
_______________________________________________
Users mailing list -- ***@ovirt.org
To unsubscribe send an email to users-***@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/***@o
jb
2021-05-27 15:57:48 UTC
Permalink
Hi Liran,

here are the vdsm logs, from all 3 nodes.


Regards

Jonathan
Post by Liran Rotenberg
Post by jb
Hello Community,
since I upgrade our cluster to ovirt 4.4.6.8-1.el8 I'm not able anymore
to create snapshots on certain VMs. For example I have two debian 10
VMs, from one I can make a snapshot, and from other one not.
Both are up to date and uses the same qemu-guest-agent versions.
I tried to create snapshots over API and on web gui, both gives the same
result.
In the attachment you found a snipped from the engine.log.
Hi,
The error happened in VDSM (or even platform). But we need the VDSM
log to see what is wrong.
Regards,
Liran.
Post by jb
Any help would be wonderful!
Regards,
Jonathan
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
Liran Rotenberg
2021-05-31 07:44:16 UTC
Permalink
Post by jb
Hi Liran,
here are the vdsm logs, from all 3 nodes.
Thanks!
The real error is:
2021-05-27 16:46:35,539+0200 ERROR (virt/487072f9) [storage.VolumeManifest]
[Errno 116] Stale file handle (fileVolume:172)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileVolume.py", line
170, in getMetadata
data = self.oop.readFile(metaPath, direct=True)
File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py",
line 369, in readFile
return self._ioproc.readfile(path, direct=direct)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 574,
in readfile
"direct": direct}, self.timeout)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479,
in _sendCommand
raise OSError(errcode, errstr)
OSError: [Errno 116] Stale file handle
2021-05-27 16:46:35,539+0200 INFO (virt/487072f9) [vdsm.api] FINISH
prepareImage error=Error while processing volume meta data:
("('/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479',):
[Errno 116] Stale file handle",) from=internal,
task_id=67405e50-503c-4b44-822f-4a7cea33ab84 (api:52)
2021-05-27 16:46:35,539+0200 ERROR (virt/487072f9)
[storage.TaskManager.Task] (Task='67405e50-503c-4b44-822f-4a7cea33ab84')
Unexpected error (task:880)
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/vdsm/storage/fileVolume.py", line
170, in getMetadata
data = self.oop.readFile(metaPath, direct=True)
File "/usr/lib/python3.6/site-packages/vdsm/storage/outOfProcess.py",
line 369, in readFile
return self._ioproc.readfile(path, direct=direct)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 574,
in readfile
"direct": direct}, self.timeout)
File "/usr/lib/python3.6/site-packages/ioprocess/__init__.py", line 479,
in _sendCommand
raise OSError(errcode, errstr)
OSError: [Errno 116] Stale file handle

- [Errno 116] Stale file handle
I can also see you are using glusterFS, maybe they have a bug (fast looking
i saw https://bugzilla.redhat.com/show_bug.cgi?id=1708121 which from my
understanding result in the same error to the file).

Kotresh, can you provide some information if I am right? and how to
workaround it? If not, to point on who can look into it?

Regards,
Liran.
Post by jb
Hi Liran,
here are the vdsm logs, from all 3 nodes.
Regards
Jonathan
Post by Liran Rotenberg
Post by jb
Hello Community,
since I upgrade our cluster to ovirt 4.4.6.8-1.el8 I'm not able anymore
to create snapshots on certain VMs. For example I have two debian 10
VMs, from one I can make a snapshot, and from other one not.
Both are up to date and uses the same qemu-guest-agent versions.
I tried to create snapshots over API and on web gui, both gives the same
result.
In the attachment you found a snipped from the engine.log.
Hi,
The error happened in VDSM (or even platform). But we need the VDSM
log to see what is wrong.
Regards,
Liran.
Post by jb
Any help would be wonderful!
Regards,
Jonathan
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
https://www.ovirt.org/community/about/community-guidelines/
jb
2021-06-01 08:10:55 UTC
Permalink
I thought that I sent it to all, but I was wrong
What is the output of ls -l
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479
?
Here are the output:

-rw-rw----. 1 vdsm kvm 5249171456 Jun  1 09:39
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479

And folder content:

# ls -l
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/

total 5136997
-rw-rw----. 1 vdsm kvm 5249171456 Jun  1 09:39
15259a3b-1065-4fb7-bc3c-04c5f4e14479
-rw-rw----. 1 vdsm kvm    1048576 Dec  4 14:50
15259a3b-1065-4fb7-bc3c-04c5f4e14479.lease
-rw-r--r--. 1 vdsm kvm        301 May 28 23:02
15259a3b-1065-4fb7-bc3c-04c5f4e14479.meta

As I say before, the interesting thing is, that on some other VMs (with
bigger disks) I'm able to make snapshots.
Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 9:49, Liran Rotenberg
On Mon, May 31, 2021 at 7:08 PM Strahil Nikolov
This stale file handle can happen when there is a gfid mismatch
between bricks causing a some kind of splitbrain.
ls -l
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479
could show the file status.
Hi Strahil,
Can you respond to the thread? Luckily for me, I don't deal with
glusterFS so much :)
Best Regards,
Strahil Nikolov
On Mon, May 31, 2021 at 10:46, Liran Rotenberg
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
<https://www.ovirt.org/privacy-policy.html>
https://www.ovirt.org/community/about/community-guidelines/
<https://www.ovirt.org/community/about/community-guidelines/>
jb
2021-06-01 14:24:56 UTC
Permalink
I don't know if this is related, but I see now that the storage domain
I thought that I sent it to all, but I was wrong
What is the output of ls -l
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479
?
Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 9:49, Liran Rotenberg
On Mon, May 31, 2021 at 7:08 PM Strahil Nikolov
This stale file handle can happen when there is a gfid mismatch
between bricks causing a some kind of splitbrain.
ls -l
/rhev/data-center/mnt/glusterSD/onode1.example.org:_vmstore/3cf83851-1cc8-4f97-8960-08a60b9e25db/images/ad23c0db-1838-4f1f-811b-2b213d3a11cd/15259a3b-1065-4fb7-bc3c-04c5f4e14479
could show the file status.
Hi Strahil,
Can you respond to the thread? Luckily for me, I don't deal with
glusterFS so much :)
Best Regards,
Strahil Nikolov
On Mon, May 31, 2021 at 10:46, Liran Rotenberg
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
<https://www.ovirt.org/privacy-policy.html>
https://www.ovirt.org/community/about/community-guidelines/
<https://www.ovirt.org/community/about/community-guidelines/>
Jonathan Baecker
2021-06-02 19:57:33 UTC
Permalink
Most probably it does.
systemctl restart ovirt-engine
Thank you for the hint!

I did, and now I can create a snapshot. But the old locked snapshot
disks I still can not delete.

Jonathan
Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 17:27, jb
_______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
<https://www.ovirt.org/privacy-policy.html>
https://www.ovirt.org/community/about/community-guidelines/
<https://www.ovirt.org/community/about/community-guidelines/>
Jonathan Baecker
2021-06-02 21:26:53 UTC
Permalink
You can try to use the unlock_entity.sh
# cd /usr/share/ovirt-engine/setup/dbutils
# source /etc/ovirt-engine/engine.conf.d/10-setup-database.conf
# export PGPASSWORD=$ENGINE_DB_PASSWORD
# ./unlock_entity.sh -h
# ./unlock_entity.sh -u engine -t disk -q
Source: https://access.redhat.com/solutions/396753
Sadly this not helped.

This:

# ./unlock_entity.sh -t all -q

shows also no Locked results.
Best Regards,
Strahil Nikolov
Most probably it does.
systemctl restart ovirt-engine
Thank you for the hint!
I did, and now I can create a snapshot. But the old locked snapshot disks I still can not delete.
Jonathan
Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 17:27, jb
  _______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
_______________________________________________
Users mailing list -- ***@ovirt.org
To unsubscribe send an email to users-***@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/u
Jonathan Baecker
2021-06-02 21:33:44 UTC
Permalink
You can try to use the unlock_entity.sh
# cd /usr/share/ovirt-engine/setup/dbutils
# source /etc/ovirt-engine/engine.conf.d/10-setup-database.conf
# export PGPASSWORD=$ENGINE_DB_PASSWORD
# ./unlock_entity.sh -h
# ./unlock_entity.sh -u engine -t disk -q
Source: https://access.redhat.com/solutions/396753
Ok, ./unlock_entity.sh -t all worked. Thanks again!
Best Regards,
Strahil Nikolov
Most probably it does.
systemctl restart ovirt-engine
Thank you for the hint!
I did, and now I can create a snapshot. But the old locked snapshot disks I still can not delete.
Jonathan
Best Regards,
Strahil Nikolov
On Tue, Jun 1, 2021 at 17:27, jb
  _______________________________________________
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
_______________________________________________
Users mailing list -- ***@ovirt.org
To unsubscribe send an email to users-***@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/
List Archives: https://lists.ovirt.org/archives/list/***@ovirt.org/message/NMUPDBJG2T7SND4CN7MY4YRM4FU5
Loading...