todo.txt

todo list now in separate file, instead of carrying over every couple days
in log.


large & detailed final report TODO list.

Title
 - think about, maybe change

Abstract
 - Write last

Introduction

 - Background: same content but nicer
 - Problem setting: clear short statement
 - Related work:

   Incorporate Lutter continuous fitted value iter?
   and a few others about RL + improving with TO or similar (BC, IL...)
   Disclaimer that here "dynamic programming" is used in a broad sense
   (pinn, infLP, SoS etc)
   place "level set methods" less awkwardly?
   make own section about backward integration/backward reachability stuff

Fundamentals
 - find nicer place for regularity assumptions?
 - manifolds: shorten if possible
 - active learning: give to lenart&bhavya, probably this sounds amaeturish
 - Sobolev training: also shorten? esp. manifolds & uncertainty estimate part
   split? some parts to impl

Proposed Methods

 - importantly: make clear disclaimer when the concepts are introduced in
   the simplified setting (without any consideration of finite data or
   smoooth approximation) where it is easier to understand and reason
   about, and when we make the step to the more realistic setting.

 - go over 3.1 again but only after another day of distance from it.
 - clean up 3.2.7 training & pruning


Results:
 - standardise plots. do same thing for both experiments.
   (line plot, change scatter/cdf plots to closed loop / reference rather
   than predicted)
 - sweeps: change plots to (cost - ref cost) / ref cost (percentiles of
   that). but also move to appendix. dont spend loads of time.

Discussion
 - proper draft DONE! see if anything missing, and if structure makes sense


appendix:
 - go over implementation things so they roughly make sense
 - write the unwritten parts in the end


~~~ from here on very old ~~~

report.

- write structure proposal. what goes where?
  - look over examples sent.

- think about baselines. as per meeting on 2024-04-12:
  - obvious one: uniform proposals vs max-sigma(ish) ones
  - other ones, maybe far fetched:
    - compare result w other trajectory optimiser
      easy win if we can give poignant example of local/global issue being
      solved
    - compare w neural fitted bellman recursion (discrete time) type thing?
      is there an easy reference implementation?
    - compare w any RL algo?


theory.

- write down everything cleanly once for a start.
  - probably theory & problem setting are the easiest to start with.

- frame it as "clean" active learning problem by separating acquisition
  function from sampling/optimisation method.
  - grok that paper by holzmüller, see if our implementation is correct

- think about whether measuring uncertainty only in terms of V (not vxx)
  makes sense.
  - or not.

- address the pruning with a bit more rigor & respect. see log 2024-04-22.


implementation, most important.

- spend some time thinking about prune&train function and try to see it
  working with some appropriately selected test case. make sure it works
  both from a theory standpoint and in practice at least for easy cases.

- always be on the lookout for magic constants that can be eliminated

- save the results somehow and prepare for evaluation?

- evaluate and use the test error somehow? e.g. only accept the value level
  up to where the test error stops being good. If uncertainty is low but
  test error high we should probably not trust the value estimate, but
  currently we do.

- try on other systems.


implementation, mid/low importance.

  (done)
- handle manifold correctly.
  probably projecting back after each simulation is good enough. thoughts
  in log 2024-04-02.
  update, did a basic version: project to manifold and cotangent space before
  each backward sol. still ignore manifold within ODE solver. works fine,
  |m(x)| becomes around 1e-5 at most.

  (not done but alternative with known points buffer)
- reject estimated known value level above last value target. this happens
  sometimes in the early stages of the run, especially with large active
  learning batchsizes. this means we "trust" the extrapolation because the
  nns agree, however i don't think we should. fix is a simple clipping
  operation.
  or not? for simple control problems/very well designed NNs with inductive
  biases *just right* for the solutions at hand this might reasonably
  happnen. and also fits the paradigm of "we assume well calibrated model"
  and treating it as a black box initially.

  (not done, hack accepted)
- make "prior" without problem specific hacks?
  currently we push up Vnn(x) at the point where it empirically tends to
  come close to zero again, which is the equilibrium but with the quad
  upside down. is there a way to find this out automatically, maybe based
  on failed forward simulations? probably yes. unclear if better.

  (done)
- in general, make it more "transparent". output much more data so we can
  locate failures more quickly. also some global "output verboseness"
  setting? probably it would be wise to also use wandb for this stuff at
  some point... for example:
  - "straying off" of manifold into ambient space
  - time durations/time steps stats of all trajectories
  - distance between proposed point and (closest point on) optimal
    trajectory?

- think more about the "practical tricks" such as estimating a slightly
  bigger level set of low enough sigma & simulating forward until sigma is
  low enough, or the "buffer" marking points as optimal, despite them maybe
  going above the sigma limit sometime.
  are they necessary? can we eliminate them and replace w/ something smarter?
  if not, at least make sure that they don't allow any wildly unexpected
  behaviour by close monitoring.

- is there a nice way to simulate a tiny bit further and use lower-sigma
  information if possible? because now we stop as soon as the sigma
  threshold is reached which is still not very great.

- more jit where possible.

- adapt visualisation to "embedded manifold" representation. (transforming
  to old repr should work just fine)

- system state as dict too? probably not worth the effort though.

  (done, unclear if advantageous, ditched again)
- instead of uniform backward shooting for initialisation, try to get a
  better trajectory distribution by forward simulating with the LQR
  solution, starting from uniformly distributed points inside some LQR
  value level set (= ellipsoid), and then backward shooting from there.

  (done, calling function with ONLY data from relevant value band)
- in the calibration plot exclude nan/inf points to make the plots look
  better

  (done, using projection after each step now)
- improve the manifold stuff a bit. during forward simulation, we do
  sometimes stray off the manifold into the ambient space, by a LOT. if
  this is only the case for the simulations that don't reach the lower set
  anyway it's not a problem. Otherwise, we should fix it -- if we evaluate
  NN ensemble mean/std (significantly) outside of the manifold, it is
  meaningless...

- adapt kernel length scale (in proposal) to current data extent?
  or use some cosine distance type kernel in the first place?


implementation, long term "extra" goals, probably overkill though:

- find out a nice "tailored" regularization scheme. optimally it should
  alleviate the slowing progress as value level (and volume of value bands)
  grows. probably this is closely connected to the intuition that the
  optimal solution of "well behaved" (= smooth, with some amount of
  timescale separation) problems is "almost-invariant" in directions of the
  slow states.

- nicer way to handle u* convex optimisation.
  - general explicit qp solver (still bruteforce, or with simple pruning of
    never-optimal active sets)
  - constrained ODE -> switched DAE reformulation like in that one paper
    https://link.springer.com/article/10.1007/s10957-020-01744-4


general rough time plan until handin. always the week following. this middle
section is still quite dumb...


 ~~~ 6 may ~~~

decide definitely: better pruning & smooth approximation, or nonsmooth approximation
with slightly suboptimal data too.
-> smooth approximation it is, definitely.

go over active learning formulation & make it "proper" a la Holzmüller.
-> ditched. comparing simple diversity mechanism with assumed kernel
   function against uniform sampling is probably enough.

find simplest configuration that half works, which is probably:
 - conservative pruning
 - huber-type loss to "reject" outliers
 - throw away "outliers" in second pruning step.
to that end: put loss function params (huber width, switching of vx loss
for "suboptimal" data) in algoparams & run euler sweep.


start writing easy to write parts
 - Problem setting
 - Related work
 - Fundamentals

 ~~~ 13 may ~~~

define & implement 2-3 simpler 2D examples to show basic principle.

think about 1-2 other not entirely trivial systems for examples.
bhavya suggests: acrobot/cartpole
    acrobot is not strictly a stabilisation problem, modify somehow?
very cool too: plane landing from LQRTrees paper...


implement some sort of metric like closed loop performance
somehow log the dataset and/or nn params

start writing "central ideas" section, or at least plan it in more detail.
continue writing those "easy" parts.

 ~~~ 20 may ~~~

update for this week. spent lots of previous time on parameter tuning,
think i am finally half satisfied with how it works. immediate next tasks
in terms of coding:

 - implement some nice way of storing the resulting dataset, could be
   literally just a pickle of data on $SCRATCH named after wandb run id
 - make separate thing that takes this data, fits one NN, and does some
   closed loop sims
 - think about best (reasonably feasible) "reference solution" for the 2D
   examples, implement, compare
 - think about interesting parameter sweeps/ablations to actually include
   in the report?
   - implicit function smoothness tradeoff probably the central one
 - try one or two other nontrivial systems besides flatquad?

if time and bock for trying that "one last" idea:
 - value substeps w/ checkpointing from log 2024-05-21

do writing
 - 1. introduction can/should be done with the last 10% mental battery each day
 - 2. fundamentals is now in a relatively nice state
 - 3. central ideas "implementation" part still empty, rest needs some work
 - 4. results and everything after: nada. maybe come up with rought
   structure sometime? but first obvs we need to collect results ^^^

 ~~~ 27 may  ~~~

collect results
 - find good set of test cases/metrics.
   - incurred cost vs learned value?
   - expected/mean cost wrt fixed x0s?
   - intuitive, qualitative showcase of "works where policy smooth, fails
     close to watershed"
 - find nice-ish way to track those metrics. right in run itself too?

do writing

 - goal this week: central ideas done in terms of structure & basic
   content. polish later.

procrastination tasks:
 - intro writing
 - plotting nice with makefiles and stuff

 ~~~ 3. jun ~~~

 ~~~ 10. jun  ~~~

central tasks:
 - continue finding good params but stop once diminishing returns.
 - do parameter sweeps for results section
 - make controlcost_lines figure (and other interesting ones) for parameter sweep.
 - WRITE!
   intro: start writing something.
   fundamentals&methods: small incremental improvements.
   results: continuously make nicer
   discussion/conclusion: make some type of skeleton to avoid writers block
   generally: unify notation & terminology

if motivated:
 - TO refsol...
 - other system???????????


 ~~~ 17. jun  ~~~

prepare final draft

 ~~~ 24. jun ~~~

send final draft for feedback

writing & figures

 ~~~ 1 jul ~~~

writing polish

 ~~~ 2 jul ~~~

hand in !!!!!