[ovirt-users] Seamless SAN HA failovers with oVirt?

Discussion:

Matthew Trent

2017-06-05 21:47:59 UTC

I'm using two TrueNAS HA SANs (FreeBSD-based ZFS) to provide storage via NFS to 7 oVirt boxes and about 25 VMs.

For SAN system upgrades I've always scheduled a maintenance window, shut down all the oVirt stuff, upgraded the SANs, and spun everything back up. It's pretty disruptive, but I assumed that was the thing to do.

However, in talking with the TrueNAS vendor they said the majority of their customers are using VMWare and they almost always do TrueNAS updates in production. They just upgrade one head of the TrueNAS HA pair then failover to the other head and upgrade it too. There's a 30-ish second pause in I/O while the disk arrays are taken over by the other HA head, but VMWare just tolerates it and continues without skipping a beat. They say this is standard procedure in the SAN world and virtualization systems should tolerate 30-60 seconds of I/O pause for HA failovers seamlessly.

It sounds great to me, but I wanted to pick this lists' brain -- is anyone doing this with oVirt? Are you able to failover your HA SAN with 30-60 seconds of no I/O without oVirt freaking out?

If not, are there any tunables relating to this? I see the default NFS mount options look fairly tolerant (proto=tcp,timeo=600,retrans=6), but are there VDSM or sanlock or some other oVirt timeouts that will kick in and start putting storage domains into error states, fencing hosts or something before that? I've never timed anything, but I want to say my past experience is that ovirt hosted engine started showing errors almost immediately when we've had SAN issues in the past.

Thanks!

--
Matthew Trent
Network Engineer
Lewis County IT Services

Dan Yasny

2017-06-05 23:55:41 UTC

Permalink

As soon as yous NAS goes down, qemu running the VMs will start getting EIO
errors and VMs will pause, so as to not lose any data. If the NAS upgrade
isn't a very long procedure, you might as well complete the updates, enable
the NAS, and unpause the VMs.

On Mon, Jun 5, 2017 at 5:47 PM, Matthew Trent <
***@lewiscountywa.gov> wrote:

> I'm using two TrueNAS HA SANs (FreeBSD-based ZFS) to provide storage via
> NFS to 7 oVirt boxes and about 25 VMs.
>
> For SAN system upgrades I've always scheduled a maintenance window, shut
> down all the oVirt stuff, upgraded the SANs, and spun everything back up.
> It's pretty disruptive, but I assumed that was the thing to do.
>
> However, in talking with the TrueNAS vendor they said the majority of
> their customers are using VMWare and they almost always do TrueNAS updates
> in production. They just upgrade one head of the TrueNAS HA pair then
> failover to the other head and upgrade it too. There's a 30-ish second
> pause in I/O while the disk arrays are taken over by the other HA head, but
> VMWare just tolerates it and continues without skipping a beat. They say
> this is standard procedure in the SAN world and virtualization systems
> should tolerate 30-60 seconds of I/O pause for HA failovers seamlessly.
>
> It sounds great to me, but I wanted to pick this lists' brain -- is anyone
> doing this with oVirt? Are you able to failover your HA SAN with 30-60
> seconds of no I/O without oVirt freaking out?
>
> If not, are there any tunables relating to this? I see the default NFS
> mount options look fairly tolerant (proto=tcp,timeo=600,retrans=6), but
> are there VDSM or sanlock or some other oVirt timeouts that will kick in
> and start putting storage domains into error states, fencing hosts or
> something before that? I've never timed anything, but I want to say my past
> experience is that ovirt hosted engine started showing errors almost
> immediately when we've had SAN issues in the past.
>
> Thanks!
>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

Sven Achtelik

2017-06-06 06:42:05 UTC

Permalink

Hi Matthew,

I'm also using a HA TrueNAS as the storage. I have NFS as well as iscsi shares and did do some in place upgrade. The failover went more or less smooth, it was more of an issue on the TrueNas side where the different vlans didn't come up. This caused the engine to take down the storage domain and things took some time until everything was up again. The VMs in ovirt did go into paused mode and started to work again as soon as the failover was done. I was failing over by rebooting one of the TrueNas nodes and this took some time for the other node to take over. I was thinking about asking the TN guys if there is a command or procedure to speed up the failover. In all I didn't stop any VMs although the VMs paused. Depending on the critically of the VMs you might want to move to another storage.

Sven

-----Ursprüngliche Nachricht-----
Von: users-***@ovirt.org [mailto:users-***@ovirt.org] Im Auftrag von Matthew Trent
Gesendet: Montag, 5. Juni 2017 23:48
An: users <***@ovirt.org>
Betreff: [ovirt-users] Seamless SAN HA failovers with oVirt?

I'm using two TrueNAS HA SANs (FreeBSD-based ZFS) to provide storage via NFS to 7 oVirt boxes and about 25 VMs.

For SAN system upgrades I've always scheduled a maintenance window, shut down all the oVirt stuff, upgraded the SANs, and spun everything back up. It's pretty disruptive, but I assumed that was the thing to do.

However, in talking with the TrueNAS vendor they said the majority of their customers are using VMWare and they almost always do TrueNAS updates in production. They just upgrade one head of the TrueNAS HA pair then failover to the other head and upgrade it too. There's a 30-ish second pause in I/O while the disk arrays are taken over by the other HA head, but VMWare just tolerates it and continues without skipping a beat. They say this is standard procedure in the SAN world and virtualization systems should tolerate 30-60 seconds of I/O pause for HA failovers seamlessly.

It sounds great to me, but I wanted to pick this lists' brain -- is anyone doing this with oVirt? Are you able to failover your HA SAN with 30-60 seconds of no I/O without oVirt freaking out?

If not, are there any tunables relating to this? I see the default NFS mount options look fairly tolerant (proto=tcp,timeo=600,retrans=6), but are there VDSM or sanlock or some other oVirt timeouts that will kick in and start putting storage domains into error states, fencing hosts or something before that? I've never timed anything, but I want to say my past experience is that ovirt hosted engine started showing errors almost immediately when we've had SAN issues in the past.

Thanks!

--
Matthew Trent
Network Engineer
Lewis County IT Services

Chris Adams

2017-06-06 13:06:53 UTC

Permalink

Once upon a time, Sven Achtelik <***@eps.aero> said:
> I was failing over by rebooting one of the TrueNas nodes and this took some time for the other node to take over. I was thinking about asking the TN guys if there is a command or procedure to speed up the failover.

That's the way TrueNAS failover works; there is no "graceful" failover,
you just reboot the active node.

--
Chris Adams <***@cmadams.net>

Juan Pablo

2017-06-06 13:39:21 UTC

Permalink

I think its not related to something on the trueNAS side. if you are using
iscsi multipath you should be using round-robin , if one of the paths goes
down you still have the other path with your information., so no sanlock .
unfortunately if you want iscsi mpath on ovirt, its prefered to edit the
config by hand and test. also, with multipath, you can tell the os to 'stop
using' one of the paths( represented as a disk).
so, for example:
multipath -ll should* be looking like this:
36589cfc000000341111968eacf965e3c dm-17 FreeNAS ,iSCSI Disk
size=50G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
|- 31:0:0:5 sdl 8:176 failed faulty running
|- 32:0:0:5 sdm 8:192 failed faulty running
|- 35:0:0:5 sdo 8:224 failed faulty running
`- 34:0:0:5 sdn 8:208 failed faulty running

and working correctly like this:
36589cfc000000ee205ed6757fa724bac dm-2 FreeNAS ,iSCSI Disk
size=5.5T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 13:0:0:10 sdi 8:128 active ready running
|- 15:0:0:10 sdk 8:160 active ready running
|- 28:0:0:10 sdg 8:96 active ready running
`- 29:0:0:10 sdj 8:144 active ready running
(yes, they are different ones, I wont disconnect one path just to show an
example =) )

hope I clarified a bit.

cant tell how it would work on nfs or if it works at all.

2017-06-06 10:06 GMT-03:00 Chris Adams <***@cmadams.net>:

> Once upon a time, Sven Achtelik <***@eps.aero> said:
> > I was failing over by rebooting one of the TrueNas nodes and this took
> some time for the other node to take over. I was thinking about asking the
> TN guys if there is a command or procedure to speed up the failover.
>
> That's the way TrueNAS failover works; there is no "graceful" failover,
> you just reboot the active node.
>
> --
> Chris Adams <***@cmadams.net>
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

Chris Adams

2017-06-06 13:44:53 UTC

Permalink

Once upon a time, Juan Pablo <***@gmail.com> said:
> I think its not related to something on the trueNAS side. if you are using
> iscsi multipath you should be using round-robin

TrueNAS HA is active/standby, so multipath has nothing to do with
rebooting/upgrading a TrueNAS.

--
Chris Adams <***@cmadams.net>

Juan Pablo

2017-06-06 13:53:35 UTC

Permalink

Im saying you can do it with multipath and not rely on truenas/freenas.
with an active/active configuration on the virt side...instead of
active/passive on the storage side.

2017-06-06 10:44 GMT-03:00 Chris Adams <***@cmadams.net>:

> Once upon a time, Juan Pablo <***@gmail.com> said:
> > I think its not related to something on the trueNAS side. if you are
> using
> > iscsi multipath you should be using round-robin
>
> TrueNAS HA is active/standby, so multipath has nothing to do with
> rebooting/upgrading a TrueNAS.
>
> --
> Chris Adams <***@cmadams.net>
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

Chris Adams

2017-06-06 14:03:01 UTC

Permalink

Once upon a time, Juan Pablo <***@gmail.com> said:
> Im saying you can do it with multipath and not rely on truenas/freenas.
> with an active/active configuration on the virt side...instead of
> active/passive on the storage side.

But there's still only one active system (the active TrueNAS node)
connected to the hard drives, and the only way to upgrade is to reboot
it. Multipath doesn't bypass that.

--
Chris Adams <***@cmadams.net>

Juan Pablo

2017-06-06 14:10:16 UTC

Permalink

Chris, if you have active-active with multipath: you upgrade one system,
reboot it, check it came active again, then upgrade the other.
-seamless.
-no service interruption.
-not locked to any storage solution.

multipath was designed exactly for that.

2017-06-06 11:03 GMT-03:00 Chris Adams <***@cmadams.net>:

> Once upon a time, Juan Pablo <***@gmail.com> said:
> > Im saying you can do it with multipath and not rely on truenas/freenas.
> > with an active/active configuration on the virt side...instead of
> > active/passive on the storage side.
>
> But there's still only one active system (the active TrueNAS node)
> connected to the hard drives, and the only way to upgrade is to reboot
> it. Multipath doesn't bypass that.
>
> --
> Chris Adams <***@cmadams.net>
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

Chris Adams

2017-06-06 14:21:48 UTC

Permalink

Once upon a time, Juan Pablo <***@gmail.com> said:
> Chris, if you have active-active with multipath: you upgrade one system,
> reboot it, check it came active again, then upgrade the other.

Yes, but that's still not how a TrueNAS (and most other low- to
mid-range SANs) works, so is not relevant. The TrueNAS only has a
single active node talking to the hard drives at a time, because having
two nodes talking to the same storage at the same time is a hard problem
to solve (typically requires custom hardware with active cache coherency
and such).

You can (and should) use multipath between servers and a TrueNAS, and
that protects against NIC, cable, and switch failures, but does not help
with a controller failure/reboot/upgrade. Multipath is also used to
provide better bandwidth sharing between links than ethernet LAGs.

--
Chris Adams <***@cmadams.net>

Matthew Trent

2017-06-06 17:45:19 UTC

Permalink

Thanks for the replies, all!

Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way around that when failing between heads.

Sven: In my experience with iX support, they have directed me to reboot the active node to initiate failover. There's "hactl takeover" and "hactl giveback" commends, but reboot seems to be their preferred method.

VMs going into a paused state and resuming when storage is back online sounds great. As long as oVirt's pause/resume isn't significantly slower than the 30-or-so seconds the TrueNAS takes to complete its failover, that's a pretty tolerable interruption for my needs. So my next questions are:

1) Assuming the SAN failover DOES work correctly, can anyone comment on their experience with oVirt pausing/thawing VMs in an NFS-based active/passive SAN failover scenario? Does it work reliably without intervention? Is it reasonably fast?

2) Is there anything else in the oVirt stack that might cause it to "freak out" rather than gracefully pause/unpause VMs?

2a) Particularly: I'm running hosted engine on the same TrueNAS storage. Does that change anything WRT to timeouts and oVirt's HA and fencing and sanlock and such?

2b) Is there a limit to how long oVirt will wait for storage before doing something more drastic than just pausing VMs?

--
Matthew Trent
Network Engineer
Lewis County IT Services
360.740.1247 - Helpdesk
360.740.3343 - Direct line

________________________________________
From: users-***@ovirt.org <users-***@ovirt.org> on behalf of Chris Adams <***@cmadams.net>
Sent: Tuesday, June 6, 2017 7:21 AM
To: ***@ovirt.org
Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?

Once upon a time, Juan Pablo <***@gmail.com> said:
> Chris, if you have active-active with multipath: you upgrade one system,
> reboot it, check it came active again, then upgrade the other.

Yes, but that's still not how a TrueNAS (and most other low- to
mid-range SANs) works, so is not relevant. The TrueNAS only has a
single active node talking to the hard drives at a time, because having
two nodes talking to the same storage at the same time is a hard problem
to solve (typically requires custom hardware with active cache coherency
and such).

You can (and should) use multipath between servers and a TrueNAS, and
that protects against NIC, cable, and switch failures, but does not help
with a controller failure/reboot/upgrade. Multipath is also used to
provide better bandwidth sharing between links than ethernet LAGs.

--
Chris Adams <***@cmadams.net>

Alex Crow

2017-06-06 19:11:55 UTC

Permalink

I use Open-E in production on standard Intel (Supermicro) hardware. It
can work in A/A (only in respect of ovirt, ie one LUN normally active on
one server, the other LUN normally stays on the the other node) or A/P
mode with multipath. Even in A/P mode it fails over quick enough to
avoid VM pauses, using virtual IPs that float between the nodes. These
modes are supported for both iSCSI or NFS.

I've also successfully implemented the same kind of rapid failover using
standard linux HA tools (pacemaker and corosync). I've had migration
times under 2s.

NFS has the added complications of filesystem locking. Maybe some of the
docs on the CTDB site will help, as they ensure that NFS will be running
on the same ports on each host and locking DBs will be shared between
the two hosts. I have no idea if TrueNAS supports CTDB or similar
distributed locking mechanisms.

Caveat: this is with iSCSI resources. I've not really run VMs in oVirt
in anger against any kind of NFS storage yet. My boss wants to try
Tintri, so I'll see how that works.

Cheers

Alex

On 06/06/17 18:45, Matthew Trent wrote:
> Thanks for the replies, all!
>
> Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way around that when failing between heads.
>
> Sven: In my experience with iX support, they have directed me to reboot the active node to initiate failover. There's "hactl takeover" and "hactl giveback" commends, but reboot seems to be their preferred method.
>
> VMs going into a paused state and resuming when storage is back online sounds great. As long as oVirt's pause/resume isn't significantly slower than the 30-or-so seconds the TrueNAS takes to complete its failover, that's a pretty tolerable interruption for my needs. So my next questions are:
>
> 1) Assuming the SAN failover DOES work correctly, can anyone comment on their experience with oVirt pausing/thawing VMs in an NFS-based active/passive SAN failover scenario? Does it work reliably without intervention? Is it reasonably fast?
>
> 2) Is there anything else in the oVirt stack that might cause it to "freak out" rather than gracefully pause/unpause VMs?
>
> 2a) Particularly: I'm running hosted engine on the same TrueNAS storage. Does that change anything WRT to timeouts and oVirt's HA and fencing and sanlock and such?
>
> 2b) Is there a limit to how long oVirt will wait for storage before doing something more drastic than just pausing VMs?
>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> 360.740.1247 - Helpdesk
> 360.740.3343 - Direct line
>
> ________________________________________
> From: users-***@ovirt.org <users-***@ovirt.org> on behalf of Chris Adams <***@cmadams.net>
> Sent: Tuesday, June 6, 2017 7:21 AM
> To: ***@ovirt.org
> Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?
>
> Once upon a time, Juan Pablo <***@gmail.com> said:
>> Chris, if you have active-active with multipath: you upgrade one system,
>> reboot it, check it came active again, then upgrade the other.
> Yes, but that's still not how a TrueNAS (and most other low- to
> mid-range SANs) works, so is not relevant. The TrueNAS only has a
> single active node talking to the hard drives at a time, because having
> two nodes talking to the same storage at the same time is a hard problem
> to solve (typically requires custom hardware with active cache coherency
> and such).
>
> You can (and should) use multipath between servers and a TrueNAS, and
> that protects against NIC, cable, and switch failures, but does not help
> with a controller failure/reboot/upgrade. Multipath is also used to
> provide better bandwidth sharing between links than ethernet LAGs.
>
> --
> Chris Adams <***@cmadams.net>
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

--
This message is intended only for the addressee and may contain
confidential information. Unless you are that person, you may not
disclose its contents or use it in any way and are requested to delete
the message along with any attachments and notify us immediately.
This email is not intended to, nor should it be taken to, constitute advice.
The information provided is correct to our knowledge & belief and must not
be used as a substitute for obtaining tax, regulatory, investment, legal or
any other appropriate advice.

"Transact" is operated by Integrated Financial Arrangements Ltd.
29 Clement's Lane, London EC4N 7AE. Tel: (020) 7608 4900 Fax: (020) 7608 5300.
(Registered office: as above; Registered in England and Wales under
number: 3727592). Authorised and regulated by the Financial Conduct
Authority (entered on the Financial Services Register; no. 190856).

Doug Ingham

2017-06-06 23:41:25 UTC

Permalink

Hey Matthew,
I think it's VDSM that handles the pausing & resuming of the VMs.

An analogous small-scale scenario...the Gluster layer for one of our
smaller oVirt clusters temporarily lost quorum the other week, locking all
I/O for about 30 minutes. The VMs all went into pause & then resumed
automatically when quorum was restored.

To my surprise/relief, not a single one of the 10 odd VMs reported any
errors.

YMMV

Doug

On 6 June 2017 at 13:45, Matthew Trent <***@lewiscountywa.gov>
wrote:

> Thanks for the replies, all!
>
> Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way
> around that when failing between heads.
>
> Sven: In my experience with iX support, they have directed me to reboot
> the active node to initiate failover. There's "hactl takeover" and "hactl
> giveback" commends, but reboot seems to be their preferred method.
>
> VMs going into a paused state and resuming when storage is back online
> sounds great. As long as oVirt's pause/resume isn't significantly slower
> than the 30-or-so seconds the TrueNAS takes to complete its failover,
> that's a pretty tolerable interruption for my needs. So my next questions
> are:
>
> 1) Assuming the SAN failover DOES work correctly, can anyone comment on
> their experience with oVirt pausing/thawing VMs in an NFS-based
> active/passive SAN failover scenario? Does it work reliably without
> intervention? Is it reasonably fast?
>
> 2) Is there anything else in the oVirt stack that might cause it to "freak
> out" rather than gracefully pause/unpause VMs?
>
> 2a) Particularly: I'm running hosted engine on the same TrueNAS storage.
> Does that change anything WRT to timeouts and oVirt's HA and fencing and
> sanlock and such?
>
> 2b) Is there a limit to how long oVirt will wait for storage before doing
> something more drastic than just pausing VMs?
>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> 360.740.1247 - Helpdesk
> 360.740.3343 - Direct line
>
> ________________________________________
> From: users-***@ovirt.org <users-***@ovirt.org> on behalf of
> Chris Adams <***@cmadams.net>
> Sent: Tuesday, June 6, 2017 7:21 AM
> To: ***@ovirt.org
> Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?
>
> Once upon a time, Juan Pablo <***@gmail.com> said:
> > Chris, if you have active-active with multipath: you upgrade one system,
> > reboot it, check it came active again, then upgrade the other.
>
> Yes, but that's still not how a TrueNAS (and most other low- to
> mid-range SANs) works, so is not relevant. The TrueNAS only has a
> single active node talking to the hard drives at a time, because having
> two nodes talking to the same storage at the same time is a hard problem
> to solve (typically requires custom hardware with active cache coherency
> and such).
>
> You can (and should) use multipath between servers and a TrueNAS, and
> that protects against NIC, cable, and switch failures, but does not help
> with a controller failure/reboot/upgrade. Multipath is also used to
> provide better bandwidth sharing between links than ethernet LAGs.
>
> --
> Chris Adams <***@cmadams.net>
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

--
Doug

Yaniv Kaul

2017-06-09 00:34:05 UTC

Permalink

On Tue, Jun 6, 2017 at 1:45 PM, Matthew Trent <
***@lewiscountywa.gov> wrote:

> Thanks for the replies, all!
>
> Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way
> around that when failing between heads.
>

General comment - 30 seconds is A LOT. Many application-level IO might
timeout. Most storage strive to remain lower than that.

>
> Sven: In my experience with iX support, they have directed me to reboot
> the active node to initiate failover. There's "hactl takeover" and "hactl
> giveback" commends, but reboot seems to be their preferred method.
>
> VMs going into a paused state and resuming when storage is back online
> sounds great. As long as oVirt's pause/resume isn't significantly slower
> than the 30-or-so seconds the TrueNAS takes to complete its failover,
> that's a pretty tolerable interruption for my needs. So my next questions
> are:
>
> 1) Assuming the SAN failover DOES work correctly, can anyone comment on
> their experience with oVirt pausing/thawing VMs in an NFS-based
> active/passive SAN failover scenario? Does it work reliably without
> intervention? Is it reasonably fast?
>

oVirt is not pausing VMs. qemu-kvm pauses the specific VM that issues an IO
and that IO is stuck. The reason is that the VM cannot reliably continue
without a concern for data loss (the data is in-flight somewhere, right?
host kernel, NIC buffers, etc.)

>
> 2) Is there anything else in the oVirt stack that might cause it to "freak
> out" rather than gracefully pause/unpause VMs?
>

We do monitor storage domain health regularly. We are working on ignoring
short hiccups (see https://bugzilla.redhat.com/show_bug.cgi?id=1459370 for
example).

>
> 2a) Particularly: I'm running hosted engine on the same TrueNAS storage.
> Does that change anything WRT to timeouts and oVirt's HA and fencing and
> sanlock and such?
>
> 2b) Is there a limit to how long oVirt will wait for storage before doing
> something more drastic than just pausing VMs?
>

As explained above, generally, no. We can't do much tbh, and we'd like to
ensure there is no data loss.
That being said, in extreme cases hosts may become unresponsive - if you
have fencing they may even be fenced (there's an option to fence a host
which cannot renew its storage lease). We have not seen that happening for
quite some time, and I don't anticipate short storage hiccups to cause that
, though.
Depending on your application, it may be the right thing to do, btw.
Y.

>
> --
> Matthew Trent
> Network Engineer
> Lewis County IT Services
> 360.740.1247 - Helpdesk
> 360.740.3343 - Direct line
>
> ________________________________________
> From: users-***@ovirt.org <users-***@ovirt.org> on behalf of
> Chris Adams <***@cmadams.net>
> Sent: Tuesday, June 6, 2017 7:21 AM
> To: ***@ovirt.org
> Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt?
>
> Once upon a time, Juan Pablo <***@gmail.com> said:
> > Chris, if you have active-active with multipath: you upgrade one system,
> > reboot it, check it came active again, then upgrade the other.
>
> Yes, but that's still not how a TrueNAS (and most other low- to
> mid-range SANs) works, so is not relevant. The TrueNAS only has a
> single active node talking to the hard drives at a time, because having
> two nodes talking to the same storage at the same time is a hard problem
> to solve (typically requires custom hardware with active cache coherency
> and such).
>
> You can (and should) use multipath between servers and a TrueNAS, and
> that protects against NIC, cable, and switch failures, but does not help
> with a controller failure/reboot/upgrade. Multipath is also used to
> provide better bandwidth sharing between links than ethernet LAGs.
>
> --
> Chris Adams <***@cmadams.net>
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> ***@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>