Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a suitable {-# WARNING in "x-partial" ... #-} to Data.List.{init,last} #292

Open
mpilgrem opened this issue Oct 6, 2024 · 17 comments
Open

Comments

@mpilgrem
Copy link

mpilgrem commented Oct 6, 2024

Following the past decisions in respect of Data.List.head and Data.List.tail in:

implemented in base-4.19.0.0 (GHC 9.8.1), which also added Data.List.unsnoc, I propose, for consistency, that the same idea is extended to Data.List.init and Data.List.last. Their Haddock documentation has warned that they are partial and suggested alternatives since base-4.17.0.0 (GHC 9.4.1).

The basis for the proposal is purely consistency of treatment of Data.List.{head,tail,init,last}. It seems to me that the principal merits and drawbacks applicable to head and tail, discussed at some length in the past approved proposals, are equally applicable to init and last.

@Bodigrim, on introducing #87 (at #87 (comment)), gave as a reason that it was not extended then to init or last that doing so would require Data.List.unsnoc first. That barrier no longer applies.

In case it is relevant, I note:

  1. that other (total) functions in base are defined in terms of Data.List.init or Data.List.last. For example, Data.List.NonEmpty.init:
    init :: NonEmpty a -> [a]
    init ~(a :| as) = List.init (a : as)
  2. that Cabal's autogenerated Paths_<package_name>.hs module makes use of Data.List.last - see https://hackage.haskell.org/package/Cabal-3.14.0.0/src/src/Distribution/Simple/Build/PathsModule/Z.hs.

EDIT: The resolution of this issue may depend on the resolution of the following issue, given that this issue provides advice (as does the existing Haddock documentation) about the use of unsnoc:

@phadej
Copy link

phadej commented Oct 6, 2024

The Paths_*.hs usage originates from filepath https://github.com/haskell/filepath/blob/e794e7376755fdaecae6fc23bd34556d46c46307/System/FilePath/Internal.hs#L747 (joinFilePath is "inlined" to have only dependencies on base).

It feels that

foo [] = something
foo xs = somethingElseWith (last xs)

it not uncommon pattern in older code, because writing that safely was quite inconvenient using just base. Now we can rewrite that as

foo [] = something
foo (x : xs) = somethingElseWith (NonEmpty.last (x :| xs))

i.e. above Data.List.init could be written in terms of Data.List.NonEmpty.init, not other way around; That refactoring may cause some module reorganization to avoid circular dependencies.


It would be really helpful to see in how many modules in base one would actually need to disable x-partial warning, if it's added to init and last. In other words, make a MR.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Oct 6, 2024

Nowadays one can write foo = maybe something (somethingElseWith . snd) . Data.List.unsnoc.

@phadej
Copy link

phadej commented Oct 6, 2024

unsnoc is since base-4.19, i.e. GHC-9.8; quite recent addition. On the other hand from #228 (comment)

In April 2024 CLC approved #262 to clarify that we do not recognise warnings as breaking changes, dilluting meaning of 3-release policy even further.


In that light, simply adding a proposed warning seems to be inline with recent-ish CLC views. filepath and Cabal and others just need to adapt, or disable warnings. No big deal.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Oct 6, 2024

We are past GHC 9.12 / base-4.21 fork, so the proposed warning can land no sooner than base-4.22, making unsnoc compatible with the 3-release policy, even in its stricter version.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 5, 2024

@mpilgrem would you like to prepare an MR for this?
(Besides boot libraries, you'd have to clean up init and last throughout GHC itself or, alternatively, tweak GHC.Prelude.Basic to re-export them without warnings)

@mpilgrem
Copy link
Author

mpilgrem commented Nov 7, 2024

@Bodigrim, I have created:

On avoiding warnings within the GHC repository, I added {-# OPTIONS_GHC -Wno-x-partial #-} in various places, following the existing example for head used in module GHC.Internal.Control.Monad.Fix. I think the task of the respository changing code to heed the warnings in base is a distinct task.

@mpilgrem
Copy link
Author

mpilgrem commented Nov 9, 2024

@Bodigrim, may I ask for a bit a advice? I have run into package time using last, so the CI pipeline of my ghc/ghc respository MR will not pass.

There is a time repository at https://github.com/haskell/time (referenced on Hackage). However, it seems to me that the https://gitlab.haskell.org/ghc/ghc/ repository may refer to https://gitlab.haskell.org/ghc/packages/time at a specific commit (because there is a time @ e5c5d198 file under https://gitlab.haskell.org/ghc/ghc/libraries that goes there when I click it).

I am not sure what I should be proposing to change, to try to make the changes that will allow GHC's CI to pass for my MR. I think that all is needed, in the immediate instance, is for the relevant module in time to have the addition of:

{-# LANGUAGE CPP #-}

#if MIN_VERSION_GLASGOW_HASKELL(9,8,1,0)
-- For init and last in trimTrailing
{-# OPTIONS_GHC -Wno-x-partial #-}
#endif

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 9, 2024

  1. You don't need CPP, the incantation is {-# OPTIONS_GHC -Wno-unrecognized-warning-flags -Wno-x-partial #-}.

  2. https://gitlab.haskell.org/ghc/packages/time is a mirror of https://github.com/haskell/time, it will be autoupdated once the source repo is.

  3. Instead of updating time, you can pass -Wno-x-partial via hadrian/src/Settings/Warnings.hs.

@meooow25
Copy link

meooow25 commented Nov 10, 2024

I see that the proposed warning directs users to unsnoc, but there is a performance penalty to using unsnoc in place of init or last, because of all the Maybes and (,)s. It is particularly bad for last because allocations increase from zero to O(n).

If you would like to check the Core: https://play.haskell.org/saved/ySiaRuho

@mpilgrem
Copy link
Author

For information: the origin of the text of the proposed compiler warning is the existing warning in the Haddock documentation for init and last in base >= 4.19.0.0 (which is unqualified as to performance).

@mpilgrem
Copy link
Author

mpilgrem commented Nov 11, 2024

I have no experience in measuring, or optimising, Haskell code performance, but (in the light of @meooow25's comments) I tried to apply the criterion library in the following example:

module Main (main) where

import Criterion.Main
import Data.List ( unsnoc )
import Data.List.NonEmpty ( NonEmpty (..) )
import qualified Data.List.NonEmpty as NE

partial :: String -> Maybe (String, Char)
partial "" = Nothing
partial s = Just (init s, last s)

total1 :: String -> Maybe (String, Char)
total1 = unsnoc

total2 :: String -> Maybe (String, Char)
total2 "" = Nothing
total2 (c:cs) = let s = c :| cs in Just (NE.init s, NE.last s)

main :: IO ()
main = defaultMain
  [ bgroup "last" [ bench "partial" $ nf partial "Hello! World"
                  , bench "total1" $ nf total1 "Hello! World"
                  , bench "total2" $ nf total2 "Hello! World"
                  ]
  ]

The results were typically like the following:

benchmarking last/partial
time                 164.2 ns   (161.4 ns .. 169.2 ns)
                     0.992 R²   (0.988 R² .. 0.996 R²)
mean                 179.7 ns   (174.9 ns .. 185.7 ns)
std dev              18.33 ns   (16.79 ns .. 20.59 ns)
variance introduced by outliers: 91% (severely inflated)

benchmarking last/total1
time                 392.6 ns   (377.3 ns .. 405.8 ns)
                     0.993 R²   (0.989 R² .. 0.996 R²)
mean                 368.4 ns   (361.7 ns .. 377.5 ns)
std dev              26.58 ns   (19.20 ns .. 36.09 ns)
variance introduced by outliers: 82% (severely inflated)

benchmarking last/total2
time                 167.8 ns   (164.8 ns .. 171.7 ns)
                     0.996 R²   (0.994 R² .. 0.999 R²)
mean                 169.3 ns   (166.8 ns .. 174.0 ns)
std dev              11.74 ns   (7.172 ns .. 20.49 ns)
variance introduced by outliers: 82% (severely inflated)

I was surprised that unsnoc ('total1') was so much worse than 'total2'. It made me wonder why 'unsnoc' is not specified like 'total2'. 'unsnoc' has a code comment: -- Expressing the recursion via 'foldr' provides for list fusion. Perhaps that has something to do with it or perhaps my example does not reflect how init and last are used 'in the wild'.

The introduction of unsnoc was discussed here:

@mpilgrem
Copy link
Author

I repeated my criterion experiment with polymorphic functions and, instead of "Hello! World", a list :: [Int]; list = replicate 10000 1. The results had the same pattern:

benchmarking last/partial
time                 78.17 μs   (75.80 μs .. 80.59 μs)
                     0.994 R²   (0.992 R² .. 0.997 R²)
mean                 77.59 μs   (76.08 μs .. 79.32 μs)
std dev              5.037 μs   (4.412 μs .. 5.981 μs)
variance introduced by outliers: 66% (severely inflated)

benchmarking last/total1
time                 332.6 μs   (318.8 μs .. 349.7 μs)
                     0.991 R²   (0.986 R² .. 0.997 R²)
mean                 317.6 μs   (313.6 μs .. 325.3 μs)
std dev              18.25 μs   (12.06 μs .. 28.19 μs)
variance introduced by outliers: 53% (severely inflated)

benchmarking last/total2
time                 79.66 μs   (77.45 μs .. 81.44 μs)
                     0.996 R²   (0.994 R² .. 0.998 R²)
mean                 79.62 μs   (78.34 μs .. 81.08 μs)
std dev              4.555 μs   (3.885 μs .. 5.611 μs)
variance introduced by outliers: 60% (severely inflated)

@mpilgrem
Copy link
Author

By way of update, the merge request pipeline for the MR is now all green (following some false starts).

The existing unsnoc performance issue remains. @Bodigrim, as the originator of #165, may I draw your attention to the discussion above? In short, if the tests above are not wrong, it seems existing unsnoc is somewhat slower than the 'naïve':

import Data.List.NonEmpty ( NonEmpty (..) )
import qualified Data.List.NonEmpty as NE

unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc (x:xs) = let l = x :| xs in Just (NE.init l, NE.last l)

If they are not wrong, it may be that there should be a new issue for the CLC on that topic, with this issue dependent on the resolution of that issue.

@Bodigrim
Copy link
Collaborator

I see that the proposed warning directs users to unsnoc, but there is a performance penalty to using unsnoc in place of init or last, because of all the Maybes and (,)s. It is particularly bad for last because allocations increase from zero to O(n).

These are cheap, short-living allocations in the nursery, so they are not materially worse than O(n) time.


@mpilgrem The existing implementation of unsnoc wraps and unwraps both Maybe and (,) on each iteration, which is expensive and probably explains your benchmarks. If we make a special case for unsnoc [] = Nothing (thus sacrificing list fusion), we can avoid wrapping / unwrapping of Maybe. If we additionally sacrifice the requirement of traversing the list only once, we can indeed avoid wrapping / unwrapping of (,) as in Just (NE.init l, NE.last l) above.

Sacrificing list fusion is probably reasonable: in cases where list fusion used to happen we will lose one allocation on materializing (:) but save one allocation on Maybe, which is no worse; and we simply win in cases where fusion didn't use to happen.

Traversing the list twice is less palatable to my taste, because it can double maximum memory residence: we would materialise the input in full while computing NE.init l, then keep both l and NE.init l in memory while computing NE.last l (after which l can be finally freed).

Could you possibly benchmark an implementation of unsnoc with unsnoc [] = Nothing; unsnoc xs = Just $ go xs? Is it far off the mark?

@mpilgrem
Copy link
Author

@Bodigrim, I did not follow what go function you intended, but I tried this:

unsnocGo :: [a] -> Maybe ([a], a)
unsnocGo [] = Nothing
unsnocGo (x : xs) = Just $ go (x :| xs)
 where
  go (y :| []) = ([], y)
  go (y1 :| (y2 : ys)) = (\((a, b)) -> (y1 : a, b)) (go (y2 :| ys))
{-# INLINABLE unsnocGo #-}

I also noticed, by accident, for the base unsnoc if ~(a, b) is replaced by just (a, b) it is about 15% faster.

My complete experiment (with GHC 9.8.3) is below. I am now using a long list ([1 .. 10000] :: Int). I copied the unsnoc from base, rather than importing it.

module Main (main) where

import Criterion.Main
import Data.List.NonEmpty ( NonEmpty (..) )
import qualified Data.List.NonEmpty as NE

partial :: [a] -> Maybe ([a], a)
partial [] = Nothing
partial s = Just (init s, last s)

naive ::  [a] -> Maybe ([a], a)
naive [] = Nothing
naive (x : xs) = let l = x :| xs in Just ( NE.init l, NE.last l)

unsnoc :: [a] -> Maybe ([a], a)
unsnoc = foldr (\x -> Just . maybe ([], x) (\(~(a, b)) -> (x : a, b))) Nothing
{-# INLINABLE unsnoc #-}

unsnocNoTilde :: [a] -> Maybe ([a], a)
unsnocNoTilde = foldr (\x -> Just . maybe ([], x) (\((a, b)) -> (x : a, b))) Nothing
{-# INLINABLE unsnocNoTilde #-}

unsnocGo :: [a] -> Maybe ([a], a)
unsnocGo [] = Nothing
unsnocGo (x : xs) = Just $ go (x :| xs)
 where
  go (y :| []) = ([], y)
  go (y1 :| (y2 : ys)) = (\((a, b)) -> (y1 : a, b)) (go (y2 :| ys))
{-# INLINABLE unsnocGo #-}

list :: [Int]
list = [1 .. 10000]

main :: IO ()
main = defaultMain
  [ bgroup "unsnoc" [ bench "partial" $ nf partial list
                    , bench "naive" $ nf naive list
                    , bench "unsnoc" $ nf unsnoc list
                    , bench "unsnocNoTilde" $ nf unsnocNoTilde list
                    , bench "unsnocGo" $ nf unsnocGo list
                    ]
  ]

Typical results are:

benchmarking unsnoc/partial
time                 86.54 μs   (83.61 μs .. 89.08 μs)
                     0.992 R²   (0.989 R² .. 0.995 R²)
mean                 87.59 μs   (85.35 μs .. 91.38 μs)
std dev              10.66 μs   (5.786 μs .. 17.82 μs)
variance introduced by outliers: 88% (severely inflated)

benchmarking unsnoc/naive
time                 83.51 μs   (81.15 μs .. 86.66 μs)
                     0.994 R²   (0.993 R² .. 0.996 R²)
mean                 85.34 μs   (83.90 μs .. 86.86 μs)
std dev              4.928 μs   (4.369 μs .. 5.827 μs)
variance introduced by outliers: 60% (severely inflated)

benchmarking unsnoc/unsnoc
time                 320.9 μs   (315.8 μs .. 326.5 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 320.9 μs   (317.6 μs .. 325.3 μs)
std dev              12.70 μs   (8.529 μs .. 16.35 μs)
variance introduced by outliers: 35% (moderately inflated)

benchmarking unsnoc/unsnocNoTilde
time                 269.5 μs   (265.0 μs .. 278.3 μs)
                     0.996 R²   (0.991 R² .. 0.999 R²)
mean                 266.3 μs   (263.9 μs .. 270.5 μs)
std dev              11.94 μs   (8.054 μs .. 20.08 μs)
variance introduced by outliers: 42% (moderately inflated)

benchmarking unsnoc/unsnocGo
time                 115.9 μs   (114.1 μs .. 117.9 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 114.8 μs   (113.9 μs .. 116.4 μs)
std dev              3.864 μs   (2.812 μs .. 5.289 μs)
variance introduced by outliers: 32% (moderately inflated)

That is, the 'naive' total version is about as quick as a 'partial' version (as before). The copy of base unsnoc is well over three times slower (as before). Dropping the tilde saves about 15% in time. Finally, my 'go' version is about 1.4 times slower than the 'naive' total version.

@mpilgrem
Copy link
Author

@meooow25, @Bodigrim I have separated out the question of the time performance of existing Data.List.unsnoc into its own issue at:

@Bodigrim
Copy link
Collaborator

@mpilgrem given that #307 was withdrawn, how would you like to proceed here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants