Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config/kernel: enforce kernel max version, with escape hatch #15986

Closed
wants to merge 2 commits into from

Conversation

robn
Copy link
Member

@robn robn commented Mar 12, 2024

Motivation and Context

It's possible for OpenZFS to build correctly against a newer kernel than it is supported for, but then not work correctly. This invariably results in disappointment, confusion and/or anger. See #15930/#15931 for a recent example.

Since it's not feasible for us to match Linux's release frequency, the next best thing seems to be to warn the user that they're entering the Nightmare Realm, so they aren't surprised when the wolves get them.

Description

[EDIT 2024-05-05: this PR has been updated based on discussion here. See this comment below for description of the new version. I'll leave this top comment here for context]

Check the kernel version we're configuring against and bail out if the kernel is too new.

Sometimes however we do actually want to compile against a newer kernel than is supported, usually when testing a pre-release kernel. Add the deliberately-verbose --disable-supported-linux-version-check option to disable this check. This is lots to type, and so hopefully can be taken as a very explicit signal that the user knows what they're doing.

Finally, if an unsupported kernel is used and the option is used, a big warning message is displayed at the end of the configure run to really try and make the point.

How Has This Been Tested?

Configuring as normal against a kernel in the supported range does what it always has:

...
checking kernel source version... 5.10.170
checking for kernel config option compatibility... done
...
...
checking kernel source version... 6.7.9
checking for kernel config option compatibility... done
...

Configuring against a kernel version higher than the max supported throws an error:

checking kernel source version... 6.8.0-rc3
configure: error:
	*** Cannot build against kernel version 6.8.0-rc3.
	*** The maximum supported kernel version is 6.7.

Overriding allows it to continue:

$ ./configure ... --disable-supported-linux-version-check
...
checking kernel source version... 6.8.0-rc3
checking for kernel config option compatibility... done
...

Should configure succeed after overriding the check (as currently happens on 6.8), it throws a big warning at the end:

...
config.status: executing depfiles commands
config.status: executing libtool commands
config.status: executing po-directories commands
configure: WARNING:

	You are building OpenZFS against Linux version 6.8.0-rc3.

	This combination IS NOT SUPPORTED by the OpenZFS project. Even if it
	appears to build and run correctly, there may be bugs that can cause
	SERIOUS DATA LOSS.

	YOU HAVE BEEN WARNED!

	If you choose to continue, we'd appreciate if you could report your
	results on the OpenZFS issue tracker at:

	   https://github.com/openzfs/zfs/issues/new

	Your feedback will help us prepare a new OpenZFS release that supports
	this version of Linux.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@satmandu
Copy link
Contributor

As the reporter for #15930 , I am obligated to note that running OpenZFS on not yet officially supported kernels is exactly how I find bugs to report.

As such, I would suggest some sort of ALL CAPS DANGER WILL ROBINSON warning as opposed to an error out condition that I then have to patch around for testing.

Having said that, it would be nice to have a 2.2.x proposed branch with back ported patches known to be needed for new kernel support merged in (even if a release isn't ready to be tagged), for those of us who are actively testing new kernel support.

@darkbasic
Copy link

darkbasic commented Mar 12, 2024

There is zfs-2.2.4-staging but it's a bit annoying because you will have to change the branch each and every time a new minor version gets released. Would be nice to have a zfs-2.2.x-staging or a zfs-stable-staging branch to follow.

@robn
Copy link
Member Author

robn commented Mar 12, 2024

As the reporter for #15930 , I am obligated to note that running OpenZFS on not yet officially supported kernels is exactly how I find bugs to report.

As such, I would suggest some sort of ALL CAPS DANGER WILL ROBINSON warning as opposed to an error out condition that I then have to patch around for testing.

I'm confused; that's literally what the --disable-supported-linux-version-check option is for. Were you thinking of something different?

@satmandu
Copy link
Contributor

There is zfs-2.2.4-staging but it's a bit annoying because you will have to change the branch each and every time a new point version gets released. Would be nice to have a zfs-stable-staging branch to follow.

Yes but I would note that https://github.com/openzfs/zfs/tree/zfs-2.2.4-staging does not yet have #15931 backported.

I've been using #15931 in my own 2.2.3-based PPA, and do use it with kernel 6.8.0, but I label that as experimental on purpose.

As I understand it, the current OpenZFS workflow is to add commits to the staging branch at the point when it is decided that it is probably time to tag a release.

I have no problems with that, but it does mean that someone wanting to use a newer kernel has to keep on top of the PRs (submitted and/or accepted) to figure out what patches might need to be applied to get that additional kernel support working properly.

Would it be nice to have some subset of OpenZFS tagged releases follow the kernel release cycle? Sure. But I'm not funding development of this well-honed software project, so I don't get a say in that.

@rincebrain
Copy link
Contributor

I would remark also that if people expect this warning to tell them it's a bad idea, they may be burned by expecting the inverse implication, that it is known to work on things this warning doesn't come up from, and then someone cherrypicks a breaking change into Linux LTS or a distro cherrypicks something from the future and people are very surprised indeed.

@darkbasic
Copy link

@satmandu that's because it still hasn't been merged in master either. Once it lands in master it will get backported. I don't know why it's taking so long. I usually look at commits to backport in my own branch where I add compatibility patches for newer kernels but I've missed that one, I will start looking at open PRs either.

@satmandu
Copy link
Contributor

As the reporter for #15930 , I am obligated to note that running OpenZFS on not yet officially supported kernels is exactly how I find bugs to report.

As such, I would suggest some sort of ALL CAPS DANGER WILL ROBINSON warning as opposed to an error out condition that I then have to patch around for testing.

I'm confused; that's literally what the --disable-supported-linux-version-check option is for. Were you thinking of something different?

My apologies for being obtuse. I just meant that in my PPA I would have to add a patch to apply the --disable-supported-linux-version-check flag in dkms. It's not a big deal on my end, and I agree with the intention of this PR!

@darkbasic
Copy link

How is this patch supposed to work? I've asked the Arch Linux maintainer to apply it to their dkms but I've seen reports that it doesn't work.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 21, 2024
@robn
Copy link
Member Author

robn commented Mar 27, 2024

@darkbasic

How is this patch supposed to work?

If configure detects you trying to build against a kernel newer than the Linux-Maximum version declared in META, it will abort with an error.

If you set --disable-supported-linux-version-check, then it will reduce that to emitting a warning.

I've asked the Arch Linux maintainer to apply it to their dkms but I've seen reports that it doesn't work.

"It doesn't work" needs details, preferably a build log.

@robn
Copy link
Member Author

robn commented Mar 27, 2024

I will say, if we do land this patch and vendor distributions simply add --disable-supported-linux-version-check and move on with life, I will be far more inclined to either ignore bug reports on unsupported kernels or require a lot more work from the bug submitter (at least, reconfirming on a supported Linux version).

I definitely do want to know about breakages specifically because of changes in newer kernel versions, but I am far less interested in standard OpenZFS operational issues on unsupported configurations. If downstreams are going to actively ignore recommendations, then they need to do more to support their users directly, or assist them when they turn up here.

I get that some distributions' whole purpose is to ship bleeding-edge everything, but that must be balanced by at least making it clear that OpenZFS' traditional stability and reliability guarantees may not hold in those situations. Quietly hiding this fact is just dishonest. At the very least, I would like those distributions to inform their users of this by some other mechanism, if showing output from OpenZFS configure is not possible or appropriate.

If that's a non-starter, I'm happy to take suggestions on alternate methods. I don't want to be a dick about it, but its already incredibly difficult to support the range of kernels we do. Distributors adding even more combinations without also getting involved in their support and upkeep just seems rude.

@robn robn mentioned this pull request Mar 27, 2024
13 tasks
@rrevans
Copy link
Contributor

rrevans commented Mar 27, 2024

The supported Linux version range today is the fully supported range.

Is there value in having this flag only enable a list of versions that are known to be stable enough for bleeding edge testing?

As a disincentive to shipping this to unsuspecting users, this flag could also imply debug build and/or emit warnings to kmsg on import and when invoking tools.

@darkbasic
Copy link

If configure detects you trying to build against a kernel newer than the Linux-Maximum version declared in META, it will abort with an error.

Ok, just wanted to be sure that the check aborts and doesn't simply show a warning.

"It doesn't work" needs details, preferably a build log.

They didn't provide any, I've just checked myself and it looks like it's working as expected:

(2/3) Install DKMS modules
==> dkms install --no-depmod zfs/2.2.3.r1.g58211157bf -k 6.8.1-arch1-1
configure: error: 
	*** Cannot build against kernel version 6.8.1-arch1-1.
	*** The maximum supported kernel version is 6.7.
			
Error! Bad return status for module build on kernel: 6.8.1-arch1-1 (x86_64)
Consult /var/lib/dkms/zfs/2.2.3.r1.g58211157bf/build/make.log for more information.
==> WARNING: `dkms install --no-depmod zfs/2.2.3.r1.g58211157bf -k 6.8.1-arch1-1' exited 10

They probably filed the report against the wrong package and the one they're using didn't have this patch backported.

@robn
Copy link
Member Author

robn commented Mar 27, 2024

@darkbasic I'm confused. That build does have this patch - they've taken the patch, and then not used the option? I don't even know why you would.

@darkbasic
Copy link

@darkbasic I'm confused. That build does have this patch - they've taken the patch, and then not used the option? I don't even know why you would.

That log is from my build, which does indeed have the patch. What happened is that a user commented on the AUR about being able to successfully build the dkms against 6.8. The package on the AUR has backported the patch, so I started wondering if this pr is supposed to downright fail or just add a warning. I've then tried it myself and it does indeed the former. What I guess has happened is that the user uses the zfs Arch repository, which contrary to the AUR probably didn't backport this PR. Why he commented on the AUR remains a mystery.

@robn
Copy link
Member Author

robn commented Mar 27, 2024

Ok, I'm sensing confusion about the intent of this change. Maybe that's me confused, or maybe I'm doing fine but haven't explained it properly. So I'll try explaining again, and if we're still no good, I'll let someone point out that I'm confused, and then I'll quietly withdraw into the hedges.

We publish a "maximum supported" kernel version in META. Currently, that says 6.7. However, we do nothing to enforce this.

In 2.2.3 we shipped experimental support for 6.8. This was buggy/incomplete. People tried it, noticed problems, reported them. That's great! But, there was also a certain amount of pressure in those requests (and elsewhere) to have it fixed quickly, which I felt was unreasonable.

I wondered if maybe the problem was that it wasn't easily discoverable that kernels beyond 6.7 were unsupported/experimental, and thus this patch: a message to let you know and a way to explicitly opt-in to potential carnage.

In my opinion, if a vendor distributes OpenZFS for a kernel with higher version than the maximum version listed in META, then that vendor is explicitly opting their users into an experimental/unsupported configuration, and if that breaks (incl. data loss), the responsibility is mostly on the vendor, not on the OpenZFS project itself. This patch doesn't change that theory, but would at least make it very clear that the vendor is making that decision also.

(not that I don't want to hear about such breakage; but I don't want an irate end-user blaming us for shipping broken software when we didn't, at least not knowingly).

The alternatives to this seem to be either to never put experimental patches anywhere near a release series (even with warnings), or to make experimental builds available ahead of time. Builds are probably ideal but requires time and infrastructure we mostly don't have, so shipping experimental support with warnings on it at least is a straightforward way to get it into people's hands.

I dunno, this felt like a light touch :)

@robn
Copy link
Member Author

robn commented Mar 27, 2024

@rrevans

Is there value in having this flag only enable a list of versions that are known to be stable enough for bleeding edge testing?

Maybe, except most of the time OpenZFS won't even compile against a new kernel version, due to their perpetual API churn. So this was mostly intended as a gate for when we do ship early support but don't yet know if its complete.

Maybe instead META could have Linux-Maximum-Experimental: 6.8, and you have to --enable-linux-experimental to build up to that, and beyond that is just a hard rejection.

As a disincentive to shipping this to unsuspecting users, this flag could also imply debug build and/or emit warnings to kmsg on import and when invoking tools.

A warning to the kernel log seems reasonable and benign. I like the idea of building with debug, though I wonder if its too heavy-handed so long as failed assertions panic the whole module. I'm pretty sure I don't want that if vendors are going to be quietly opting users into this option; that's very much a "you are definitely testing now" and I don't think that's entirely fair to spring on people unknowingly. On the other hand, maybe a vendor is going to be far less inclined to enable this option if the result is "kernel crashes" vs "mild inconvenience".

@rrevans
Copy link
Contributor

rrevans commented Mar 28, 2024

Maybe instead META could have Linux-Maximum-Experimental: 6.8, and you have to --enable-linux-experimental to build up to that, and beyond that is just a hard rejection.

Thanks for explaining. Yes this is exactly the sort of idea. In that approach it is nice and clear where supported ends and experimental starts, and it's also opt-in only.

I like the idea of building with debug, though I wonder if its too heavy-handed so long as failed assertions panic the whole module. I'm pretty sure I don't want that if vendors are going to be quietly opting users into this option; that's very much a "you are definitely testing now" and I don't think that's entirely fair to spring on people unknowingly. On the other hand, maybe a vendor is going to be far less inclined to enable this option if the result is "kernel crashes" vs "mild inconvenience".

What do you think about a middle of the road option where assertions are built and executed but print warnings instead?

The outcome is then identical for the user - corruption maybe or other unknown problems the assertions are intended to catch - but they get an actionable report to file if desired.

@robn
Copy link
Member Author

robn commented Mar 28, 2024

Thanks for explaining. Yes this is exactly the sort of idea. In that approach it is nice and clear where supported ends and experimental starts, and it's also opt-in only.

I'm persuaded. It's not miles away from this PR in function, but it feels a lot more solid.

What do you think about a middle of the road option where assertions are built and executed but print warnings instead?

I... rather like this. I'll have a play with it.

@robn robn marked this pull request as draft March 28, 2024 09:33
@Gendra13
Copy link

What I am wondering at this point:
How much are the distribution vendors even aware of the fact that there is a maximum supported kernel version and that there might be still serious issues left, even when the building process itself completes successfully?

As a current example:
The next Ubuntu LTS 24.04 release happens to be shipped with kernel 6.8 and includes zfs-2.2.2. Even though they cherry-picked an arbitrary selection of 6.8-compat-patches from the early 2.2.3-staging-branch to get the kernel module building, there are still a lot problems left including #15930.
But since the module builds correctly it is deemed to be fine and is about to be shipped with the next LTS release in April (and that’s not a rolling-release where I might expect some hiccups but a LTS version where I would expect extra stability).

@tonyhutter
Copy link
Contributor

tonyhutter commented Mar 28, 2024

@robn I'm fine with your overall approach. You might want to add another line to give the user a hint that --disable-supported-linux-version-check is available, like:

checking kernel source version... 6.8.0-rc3
configure: error:
	*** Cannot build against kernel version 6.8.0-rc3.
	*** The maximum supported kernel version is 6.7.
        *** Use --disable-supported-linux-version-check to bypass this check.

Also, is this still "Draft" or are you ready for it to be approved?

@robn
Copy link
Member Author

robn commented Mar 28, 2024

@Gendra13

How much are the distribution vendors even aware of the fact that there is a maximum supported kernel version and that there might be still serious issues left, even when the building process itself completes successfully?

Yeah, I have no idea. That's part of why I want this!

But since the module builds correctly it is deemed to be fine and is about to be shipped with the next LTS release in April (and that’s not a rolling-release where I might expect some hiccups but a LTS version where I would expect extra stability).

Yep, I'd hope they're paying attention.

@robn
Copy link
Member Author

robn commented Mar 28, 2024

@tonyhutter

Also, is this still "Draft" or are you ready for it to be approved?

Hold off for the moment, I'm going to try a different approach first and see if it feels nicer.

@ahesford
Copy link
Contributor

ahesford commented Apr 10, 2024

While I hate that this is necessary, I think it's a good idea. ZFS support is critical to Void Linux, and we hold off bumping our generic linux meta-package (which pulls in the [approximately] most recent stable kernel series) until ZFS and other important out-of-tree modules work with that series. We try to be conservative in backporting patches, and the risk of unexpected incompatibilities is further mitigated by offering users a choice between several active kernel series as well as a zfs-lts package. Nevertheless, having an additional sanity check on compatibility will help us avoid any major missteps.

@robn robn force-pushed the enforce-max-kernel-version branch 2 times, most recently from 1385d6e to 08b6eff Compare May 5, 2024 06:20
@robn
Copy link
Member Author

robn commented May 5, 2024

Alright, here's a rework based on the discussion:

  • META now has Linux-Maximum-Experimental: 6.9
  • Linux-Maximum: is now enforced, unless configure is run with...
  • --enable-linux-experimental, then Linux-Maximum-Experimental: is enforced instead
  • If the build is against a kernel > max but <= max-experimental, then HAVE_LINUX_EXPERIMENTAL is defined
  • If HAVE_LINUX_EXPERIMENTAL is defined, extra warnings are logged when the module is loaded

More detailed output:

By default, configure will abort when trying to compile against any kernel higher than than Linux-Maximum:. Consider a future Linux 6.10:

$ ./configure
...
checking kernel source version... 6.10.0
configure: error:
	*** Cannot build against kernel version 6.10.0
	*** The maximum supported kernel version is 6.8.

With --enable-linux-experimental, Linux-Maximum-Version: is checked instead. If the kernel is still too new, both supported versions are shown in the output:

$ ./configure --enable-linux-experimental
...
configure: error:
	*** Cannot build against kernel version 6.10.0.
	*** The maximum supported kernel version is 6.8.
	*** The maximum supported experimental kernel version is 6.9.

For a kernel version higher than Linux-Maximum: but within Linux-Maximum-Experimental, again, it will be rejected by default:

$ ./configure
...
checking kernel source version... 6.9.0-rc3
configure: error:
	*** Cannot build against kernel version 6.9.0-rc3.
	*** The maximum supported kernel version is 6.8.

With --enable-linux-experimental, it will be permitted, and a warning shown at the end of configure:

$ ./configure
...
checking kernel source version... 6.9.0-rc3
...
configure: WARNING:

	You are building OpenZFS against Linux version 6.9.0-rc3.

	This combination is considered EXPERIMENTAL by the OpenZFS project.
	Even if it appears to build and run correctly, there may be bugs that
	can cause SERIOUS DATA LOSS.

	YOU HAVE BEEN WARNED!

	If you choose to continue, we'd appreciate if you could report your
	results on the OpenZFS issue tracker at:

	   https://github.com/openzfs/zfs/issues/new

	Your feedback will help us prepare a new OpenZFS release that supports
	this version of Linux.

This will define HAVE_LINUX_EXPERIMENTAL, which will then cause additional warnings to be emitted to the kernel log at module load time:

$ grep HAVE_LINUX_EXPERIMENTAL zfs_config.h
#define HAVE_LINUX_EXPERIMENTAL 1
[    4.940299] ZFS: Loaded module v2.2.99-476_ge915cb5d0 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[    4.940380] ZFS: Using ZFS with kernel 6.9.0-rc3 is EXPERIMENTAL and SERIOUS DATA LOSS may occur!
[    4.940458] ZFS: Please report your results at: https://github.com/openzfs/zfs/issues/new

Finally, kernel versions below Linux-Maximum: will be accepted without fanfare:

$ ./configure
checking kernel source version... 6.8.2

Note that HAVE_LINUX_EXPERIMENTAL is only set if --enable-linux-experimental is supplied and the build needs to go above Linux-Maximum: to support it. This effectively allows a "don't care" option of always providing the option.

$ ./configure
checking kernel source version... 6.8.2
$ grep HAVE_LINUX_EXPERIMENTAL zfs_config.h
/* #undef HAVE_LINUX_EXPERIMENTAL */

Finally, I have a prototype of "soft assertions", but I want to think about it more, because it's a little more invasive than I would like for this PR, and might actually be a thing we want to make more widely available. So I'm not going to consider it further for this PR.

Copy link
Contributor

@rrevans rrevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good

Two notes

  1. For the hard stop, maybe have configure print how to bump the max version in the local repo for bleeding edge development e.g. "Contributors: See ... for supporting newer versions"
  2. META or somewhere else should write down guidance on what Linux-Maximum-Experimental should be. Is it meant to be builds, builds and passes tests, fairly stable, or just bumped as soon as the porting effort starts...?

@robn robn mentioned this pull request Jul 17, 2024
13 tasks
@tonyhutter
Copy link
Contributor

META now has Linux-Maximum-Experimental: 6.9

Unfortunately, updating META with new kernel versions is already a maintenance burden, and an additional field will add to that burden. As @rrevans mentioned, the criteria for "experimentally supported but not officially supported" is still undefined. Overall, I don't see a benefit over the simpler --disable-supported-linux-version-check discussed earlier:

checking kernel source version... 6.8.0-rc3
configure: error:
	*** Cannot build against kernel version 6.8.0-rc3.
	*** The maximum supported kernel version is 6.7.
        *** Use --disable-supported-linux-version-check to bypass this check.

@robn robn force-pushed the enforce-max-kernel-version branch from 08b6eff to 43d2c39 Compare July 17, 2024 23:32
@robn
Copy link
Member Author

robn commented Jul 17, 2024

Ahh sorry @rrevans, I had missed your comment.

I've been using "experimental" through this PR to mean a vague statement of "we are not ready to commit to OpenZFS' traditional guarantees of reliability and stability, but it's probably fine". Maybe in other contexts that would be "beta" or something. Regardless, I'll agree that it's not specific enough to "enforce".

But, I would still prefer to keep --enable-linux-experimental, because it sounds a little bit scary, and my hope is to make it clear that the user is making a choice to run with possibly reduced safety, and they should take care. It's an explicit opt-in and statement of intent, rather than --disable-supported-linux-version-check, which is easy to read as a workaround for something without being obvioulsy dangerous.

So, update pushed. I've removed Linux-Maximum-Experimental: and associated tests, but kept the rest, so:

  • Linux-Maximum: is now enforced, unless configure is run with...
  • --enable-linux-experimental, then the maximum is checked configure proceeds
  • If the build is against a kernel > max, then HAVE_LINUX_EXPERIMENTAL is defined, and configure emits a warning at the end of its run.
  • If HAVE_LINUX_EXPERIMENTAL is defined, extra warnings are logged when the module is loaded

@robn robn force-pushed the enforce-max-kernel-version branch from 43d2c39 to cc6b9d7 Compare July 26, 2024 01:40
@robn robn force-pushed the enforce-max-kernel-version branch from cc6b9d7 to be32257 Compare August 23, 2024 05:10
@robn
Copy link
Member Author

robn commented Aug 23, 2024

@tonyhutter what do you reckon?

@robn robn force-pushed the enforce-max-kernel-version branch from be32257 to b4f3c4e Compare September 4, 2024 22:38
@robn robn force-pushed the enforce-max-kernel-version branch from b4f3c4e to 1048eec Compare September 16, 2024 10:11
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I like --enable-linux-experimental and the slightly scary warning. It makes it quite clear what you're opting in to.

AS_VERSION_COMPARE([$kernsrcver], [$ZFS_META_KVER_MIN], [
AC_MSG_ERROR([
AX_COMPARE_VERSION([$kernsrcver], [ge], [$ZFS_META_KVER_MIN], [], [
AC_MSG_ERROR([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the pre-4.18 kernel changes have been merged it look like there's only one place left which uses AS_VERSION_COMPARE. Once this merges we should switch the AS_VERSION_COMPARE call in config/kernel-blkdev.m4 over to AX_COMPARE_VERSION.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good spot. Noted.

META lists the maximum kernel version we consider to be fully supported.
However, we don't enforce this.

Sometimes we ship experimental patches for a newer kernel than we're
ready to support or, less often, we compile just fine against a newer
kernel. Invariably, something doesn't quite work properly, and it's
difficult for users to understand that they're actually running against
a kernel that we're not yet ready to support.

This commit tries to improve this situation. First, it simply enforces
Linux-Maximum, by having configure bail out if you try to compile
against a newer version that.

Then, it adds the --enable-linux-experimental switch to configure. When
supplied, this disables enforcing the maximum version, allowing the user
to attempt to build against a kernel with version higher than
Linux-Maximum.

Finally, if the switch is supplied _and_ configure is run against a
higher kernel version, it shows a big warning message when configure
finishes, and defines HAVE_LINUX_EXPERIMENTAL for the build. This allows
us to add code to modify runtime behaviour as well.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Since the person using the kernel may not be the person who built it,
show a warning at module load too, in case they aren't aware that it
might be weird.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
@robn robn force-pushed the enforce-max-kernel-version branch from 1048eec to 4623240 Compare September 19, 2024 02:31
@robn
Copy link
Member Author

robn commented Sep 19, 2024

Rebased, retested, pushed.

@tonyhutter
Copy link
Contributor

--enable-linux-experimental makes it sound like the version of the Linux kernel itself is experimental. Like an experimental RC kernel or a kernel from Linus's master branch.

I like --disable-kernel-version-safety-check since:

  1. It describes exactly what it's doing
  2. It implies it's unsafe

I would also mention the flag in the build failure as an aid to the user, like:

       *** Cannot build against kernel version $kernsrcver.
       *** The maximum supported kernel version is $ZFS_META_KVER_MAX.
       *** Use --disable-kernel-version-safety-check to bypass this check.

@yurikoles
Copy link

I would also mention the flag in the build failure as an aid to the user

Let people do at least basic effort to research how to bypass this. Your suggestion sounds to me like a warning:

Kids, please don't play with fire! But, if you want to do so, lighter is there, and fuel is elsewhere.

@tonyhutter
Copy link
Contributor

Kids, please don't play with fire! But, if you want to do so, lighter is there, and fuel is elsewhere.

😄 I wish it was that exiting! 99% of the time it's just not going to build.

Honestly, if you're building ZFS from source with a bleeding edge kernel then you're probably used to playing with fire. And if that isn't the case, the new warning message does make things clear:

	You are building OpenZFS against Linux version $kernsrcver.

	This combination is considered EXPERIMENTAL by the OpenZFS project.
	Even if it appears to build and run correctly, there may be bugs that
	can cause SERIOUS DATA LOSS.

	YOU HAVE BEEN WARNED!

	If you choose to continue, we'd appreciate if you could report your
	results on the OpenZFS issue tracker at:

	    https://github.com/openzfs/zfs/issues/new

	Your feedback will help us prepare a new OpenZFS release that supports
	this version of Linux.

@robn
Copy link
Member Author

robn commented Sep 20, 2024

Generally, I prefer "enable danger mode" over "disable safe mode", because I really want it to be clear that the user is opting into something bad, and all the info they need is in front of them. Similarly, I don't like "press the red button to turn off the safety", because the user hasn't had to make any effort to think about the situation and assess the risks.

I won't die on this hill; having the option at all is more important to me than how its named, and I know that in practice the risks in this case are likely low - its more likely that something just doesn't work than does the wrong thing. But as a rule, I do think that when it comes to anything where the cost of a mistake is data loss, we should be very very clear what is happening.

So, I will not argue against rewording --enable-linux-experimental as long as it remains "enable danger mode", but I may argue against --disable-kernel-version-safety-check or like it, because it is "disable safe mode". But I'm also not an idiot; ultimately I just want to be sure the user understands what they're getting, so good words that work to that will be fine.

(and of course I can just be straight up overruled; at the end of the day I'm just Some Guy 🐶)

Copy link
Contributor

@tonyhutter tonyhutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robn I hear what you saying, lets just stick with --enable-linux-experimental then 👍

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 23, 2024
behlendorf pushed a commit that referenced this pull request Sep 23, 2024
Since the person using the kernel may not be the person who built it,
show a warning at module load too, in case they aren't aware that it
might be weird.

Reviewed-by: Robert Evans <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #15986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants