Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't specify Data.List.NonEmpty.{init,last} in terms of partial ... .List.{init,last}` #293

Closed
mpilgrem opened this issue Oct 7, 2024 · 14 comments
Labels
approved Approved by CLC vote base-4.22 Implemented in base-4.22 (GHC 9.14)

Comments

@mpilgrem
Copy link

mpilgrem commented Oct 7, 2024

Motivation:

At present, Data.List.NonEmpty has:

import qualified GHC.Internal.Data.List           as List

-- | Extract the last element of the stream.
last :: NonEmpty a -> a
last ~(a :| as) = List.last (a : as)

-- | Extract everything except the last element of the stream.
init :: NonEmpty a -> [a]
init ~(a :| as) = List.init (a: as)

I propose:

last :: NonEmpty a -> a
last (a :| []) = a
last (_ :| (a : as)) = last (a :| as)

init :: NonEmpty a -> [a]
init (_ :| []) = []
init (a1 :| (a2 : as)) = a1 : init (a2 :| as)

Although this could be written more succinctly with the functions promised by the Data.Foldable1.Foldable1 class (see here), module Data.Foldable1 itself imports Data.List.NonEmpty.

@mpilgrem
Copy link
Author

mpilgrem commented Oct 7, 2024

There is one other use of List.{init,last} in existing Data.List.NonEmpty, namely:

-- | The 'tails1' function takes a 'NonEmpty' stream @xs@ and returns all the
-- non-empty suffixes of @xs@, starting with the longest.
--
-- > tails1 (1 :| [2,3]) == (1 :| [2,3]) :| [2 :| [3], 3 :| []]
-- > tails1 (1 :| []) == (1 :| []) :| []
--
-- @since 4.18
tails1 :: NonEmpty a -> NonEmpty (NonEmpty a)
tails1 =
  -- fromList is an unsafe function, but this usage should be safe, since:
  -- * `tails xs = [xs, tail xs, tail (tail xs), ..., []]`
  -- * If `xs` is nonempty, it follows that `tails xs` contains at least one nonempty
  --   list, since `head (tails xs) = xs`.
  -- * The only empty element of `tails xs` is the last one (by the definition of `tails`)
  -- * Therefore, if we take all but the last element of `tails xs` i.e.
  --   `init (tails xs)`, we have a nonempty list of nonempty lists
  fromList . Prelude.map fromList . List.init . List.tails . Foldable.toList

To eliminate that also, I think you could specify:

tails1 :: NonEmpty a -> NonEmpty (NonEmpty a)
tails1 (a :| []) = (a :| []) :| []
tails1 as@(_ :| (a2 : as')) = as <| tails1 (a2 :| as')

@mixphix
Copy link
Collaborator

mixphix commented Oct 7, 2024

I'm open to removing uses of partial functions from Data.List from Data.List.NonEmpty. This would promote good programming styles within base, of which I am in favour. I just want to gently remind the proposal author that votes occur on merge requests.

@mpilgrem
Copy link
Author

mpilgrem commented Oct 7, 2024

@mixphix, am I doing something that is not in accordance with PROPOSALS.md? I understood: 'raise issue first', 'implementation only if requested'.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Oct 8, 2024

To eliminate that also, I think you could specify:

It would be simpler to use Data.List.tails1, to be released in base-4.21.

In the same vein, one can use Data.List.unsnoc to impement Data.List.NonEmpty.{init,last}. Not sure it it's worth it and does not cause circular dependency.

@mpilgrem
Copy link
Author

mpilgrem commented Oct 8, 2024

EDIT: The following is a bad point - I had not understood correctly what was meant (see below).

If you use Data.List.tails1 :: [a] -> [NonEmpty a] to implement Data.List.NonEmpty.tails1 :: NonEmpty a -> NonEmpty (NonEmpty a), don't you still need to use a partial function (fromList :: [NonEmpty a] -> NonEmpty (NonEmpty a)) - and apply the 'off-compiler' reasoning that it is safe to do so reflected in the code comment to the existing Data.List.NonEmpty.tails1? Whereas the 'recursive' specification uses types to check that all is well?

@Bodigrim
Copy link
Collaborator

Bodigrim commented Oct 8, 2024

I meant something like

Data.List.NonEmpty.tails1 :: NonEmpty a -> NonEmpty (NonEmpty a)
Data.List.NonEmpty.tails1 xs = xs :| Data.List.tails1 (Data.List.NonEmpty.tail xs)

@Bodigrim
Copy link
Collaborator

Bodigrim commented Oct 9, 2024

@mpilgrem I think it would be helpful for futher evaluation if you prepare a GHC MR.

@Bodigrim Bodigrim added the awaits-MR No GHC MR (https://gitlab.haskell.org/ghc/ghc/-/merge_requests) has been raised yet label Oct 9, 2024
@mpilgrem
Copy link
Author

mpilgrem commented Oct 9, 2024

@Bodigrim, I've tried to so so with https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13402.

I targetted the master branch. I don't know if that is correct; I do not know how the GHC project uses branches.

@Bodigrim Bodigrim removed the awaits-MR No GHC MR (https://gitlab.haskell.org/ghc/ghc/-/merge_requests) has been raised yet label Oct 17, 2024
@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 1, 2024

Let's compare Core using {-# OPTIONS_GHC -O2 -ddump-simpl -dsuppress-all -dno-suppress-type-signatures #-} before and after https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13402.

last

Before:

init :: forall a. NonEmpty a -> [a]
init
  = \ (@a_a11E) (ds_X1 :: NonEmpty a_a11E) ->
      case ds_X1 of { :| a1_aD2 as_aD3 ->
      GHC.Internal.List.init1 a1_aD2 as_aD3
      }

where GHC.Internal.List.init1 is

  init' _ []     = []
  init' y (z:zs) = y : init' z zs

Two things to notice:

  • Despite lazy pattern match in last ~(a :| as) it has no effect indeed: Core is strict in the constructor.
  • With -O2 GHC was able to specialise constructors and actually avoids any references to error.

After:

Rec {
$winit :: forall a. a -> [a] -> [a]
$winit
  = \ (@a_sEm) (ww_sEp :: a_sEm) (ww1_sEq :: [a_sEm]) ->
      case ww1_sEq of {
        [] -> [];
        : a2_aBa as_aBb -> : ww_sEp ($winit a2_aBa as_aBb)
      }
end Rec }

init :: forall a. NonEmpty a -> [a]
init
  = \ (@a_sEm) (ds_sEn :: NonEmpty a_sEm) ->
      case ds_sEn of { :| ww_sEp ww1_sEq -> $winit ww_sEp ww1_sEq }

Here $winit is equivalent to GHC.Internal.List.init1 above.

Conclusion:

  • We did not lose any performance in Core.
  • We no longer use unsafe functions in the surface Haskell.

init

Before:

Rec {
last_$spoly_go1 :: forall a. a -> [a] -> a -> a
last_$spoly_go1
  = \ (@a_a11U) (sc_s1jP :: a_a11U) (sc1_s1jQ :: [a_a11U]) _ ->
      poly_go1_r1km sc1_s1jQ sc_s1jP

poly_go1_r1km :: forall a. [a] -> a -> a
poly_go1_r1km
  = \ (@a_a11U) (ds_a1ir :: [a_a11U]) (eta_B0 :: a_a11U) ->
      case ds_a1ir of {
        [] -> eta_B0;
        : y_a1iu ys_a1iv -> poly_go1_r1km ys_a1iv y_a1iu
      }
end Rec }

last :: forall a. NonEmpty a -> a
last
  = \ (@a_a11U) (ds_X1 :: NonEmpty a_a11U) ->
      last_$spoly_go1
        (case ds_X1 of { :| a1_alR as_alS -> a1_alR })
        (case ds_X1 of { :| a1_alR as_alS -> as_alS })
        last1

where last1 is essentially error.

This looks bad:

  • last pattern matches on case ds_X1 of twice because of lazy pattern matching in the surface Haskell. However, last_$spoly_go1 would not return anything lazily before forcing its first argument, so we just waste our time.
  • last_$spoly_go1 takes three arguments only to discard the third one immediately.

After:

Rec {
$wlast :: forall a. a -> [a] -> a
$wlast
  = \ (@a_sEd) (ww_sEg :: a_sEd) (ww1_sEh :: [a_sEd]) ->
      case ww1_sEh of {
        [] -> ww_sEg;
        : a1_alJ as_alK -> $wlast a1_alJ as_alK
      }
end Rec }

last :: forall a. NonEmpty a -> a
last
  = \ (@a_sEd) (ds_sEe :: NonEmpty a_sEd) ->
      case ds_sEe of { :| ww_sEg ww1_sEh -> $wlast ww_sEg ww1_sEh }

This is nice and clean, I cannot spot any inefficiency.

Conclusion:

  • We improved the performance of Core.
  • We no longer use unsafe functions in the surface Haskell.

tails1

Before:

Rec {
tails3 :: forall a. [a] -> [NonEmpty a]
tails3
  = \ (@a_s1iU) (ds_a1ay :: [a_s1iU]) ->
      case ds_a1ay of {
        [] -> [];
        : x_a1aB xs_a1aC -> : (:| x_a1aB xs_a1aC) (tails3 xs_a1aC)
      }
end Rec }

tails1 :: forall a. NonEmpty a -> NonEmpty (NonEmpty a)
tails1
  = \ (@a_s1iU) (x_s1iV :: NonEmpty a_s1iU) ->
      case tails3 ($fFoldableNonEmpty_$ctoList x_s1iV) of {
        [] -> case tails2 of {};
        : a1_a1ap as_a1aq -> :| a1_a1ap as_a1aq
      }

where tails2 is essentially error.

This is not too bad, but:

  • Passing around dead code for error does not help anyone.
  • Instead of simply pattern matching on x_s1iV we go through $fFoldableNonEmpty_$ctoList (not even inlined!) only to pattern match on the resulting list immediately.

After:

tails1 :: forall a. NonEmpty a -> NonEmpty (NonEmpty a)
tails1
  = \ (@a_aC0) (xs_aBc :: NonEmpty a_aC0) ->
      :| xs_aBc (case xs_aBc of { :| ds1_aE2 as_aE3 -> Data.List.tails1 as_aE3 })

All heavy lifting is outsourced to Data.List.tails1, which we discussed at length recently in #252 and implemented as efficiently as possibly.

Conclusion:

  • We improved the performance of Core.
  • We no longer use unsafe functions in the surface Haskell.

Altogether I see no drawbacks in the proposal: on contrary, it's an improvement in every aspect.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 2, 2024

Dear CLC members, let's vote on the proposal to avoid partial functions in the definitions of Data.List.NonEmpty.{init,last,tails1}. The MR is available at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/13402. This is not a breaking change.

@tomjaguarpaw @mixphix @velveteer @parsonsmatt @angerman @hasufell


+1 from me, see my analysis above.

@Bodigrim Bodigrim added the vote-in-progress The CLC vote is in progress label Nov 2, 2024
@mixphix
Copy link
Collaborator

mixphix commented Nov 2, 2024

+1, this is great!

@hasufell
Copy link
Member

hasufell commented Nov 3, 2024

+1

1 similar comment
@tomjaguarpaw
Copy link
Member

+1

@Bodigrim
Copy link
Collaborator

Bodigrim commented Nov 3, 2024

Thanks all, that's enough votes to approve.

@Bodigrim Bodigrim closed this as completed Nov 3, 2024
@Bodigrim Bodigrim added approved Approved by CLC vote and removed vote-in-progress The CLC vote is in progress labels Nov 3, 2024
hubot pushed a commit to ghc/ghc that referenced this issue Nov 5, 2024
See haskell/core-libraries-committee#293

`List.init` had already been driven out of `tails1` by 21fc180 but this specification also avoided partial `fromList`, so I preferred it.

The `changelog.md` for `base` is updated, with an entry added under `base-4.22.0.0`.
hubot pushed a commit to ghc/ghc that referenced this issue Nov 7, 2024
See haskell/core-libraries-committee#293

`List.init` had already been driven out of `tails1` by 21fc180 but this specification also avoided partial `fromList`, so I preferred it.

The `changelog.md` for `base` is updated, with an entry added under `base-4.22.0.0`.
hubot pushed a commit to ghc/ghc that referenced this issue Nov 7, 2024
See haskell/core-libraries-committee#293

`List.init` had already been driven out of `tails1` by 21fc180 but this specification also avoided partial `fromList`, so I preferred it.

The `changelog.md` for `base` is updated, with an entry added under `base-4.22.0.0`.
@Bodigrim Bodigrim added the base-4.22 Implemented in base-4.22 (GHC 9.14) label Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Approved by CLC vote base-4.22 Implemented in base-4.22 (GHC 9.14)
Projects
None yet
Development

No branches or pull requests

5 participants