-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtodo.txt
360 lines (240 loc) · 11.6 KB
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
todo list now in separate file, instead of carrying over every couple days
in log.
large & detailed final report TODO list.
Title
- think about, maybe change
Abstract
- Write last
Introduction
- Background: same content but nicer
- Problem setting: clear short statement
- Related work:
Incorporate Lutter continuous fitted value iter?
and a few others about RL + improving with TO or similar (BC, IL...)
Disclaimer that here "dynamic programming" is used in a broad sense
(pinn, infLP, SoS etc)
place "level set methods" less awkwardly?
make own section about backward integration/backward reachability stuff
Fundamentals
- find nicer place for regularity assumptions?
- manifolds: shorten if possible
- active learning: give to lenart&bhavya, probably this sounds amaeturish
- Sobolev training: also shorten? esp. manifolds & uncertainty estimate part
split? some parts to impl
Proposed Methods
- importantly: make clear disclaimer when the concepts are introduced in
the simplified setting (without any consideration of finite data or
smoooth approximation) where it is easier to understand and reason
about, and when we make the step to the more realistic setting.
- go over 3.1 again but only after another day of distance from it.
- clean up 3.2.7 training & pruning
Results:
- standardise plots. do same thing for both experiments.
(line plot, change scatter/cdf plots to closed loop / reference rather
than predicted)
- sweeps: change plots to (cost - ref cost) / ref cost (percentiles of
that). but also move to appendix. dont spend loads of time.
Discussion
- proper draft DONE! see if anything missing, and if structure makes sense
appendix:
- go over implementation things so they roughly make sense
- write the unwritten parts in the end
~~~ from here on very old ~~~
report.
- write structure proposal. what goes where?
- look over examples sent.
- think about baselines. as per meeting on 2024-04-12:
- obvious one: uniform proposals vs max-sigma(ish) ones
- other ones, maybe far fetched:
- compare result w other trajectory optimiser
easy win if we can give poignant example of local/global issue being
solved
- compare w neural fitted bellman recursion (discrete time) type thing?
is there an easy reference implementation?
- compare w any RL algo?
theory.
- write down everything cleanly once for a start.
- probably theory & problem setting are the easiest to start with.
- frame it as "clean" active learning problem by separating acquisition
function from sampling/optimisation method.
- grok that paper by holzmüller, see if our implementation is correct
- think about whether measuring uncertainty only in terms of V (not vxx)
makes sense.
- or not.
- address the pruning with a bit more rigor & respect. see log 2024-04-22.
implementation, most important.
- spend some time thinking about prune&train function and try to see it
working with some appropriately selected test case. make sure it works
both from a theory standpoint and in practice at least for easy cases.
- always be on the lookout for magic constants that can be eliminated
- save the results somehow and prepare for evaluation?
- evaluate and use the test error somehow? e.g. only accept the value level
up to where the test error stops being good. If uncertainty is low but
test error high we should probably not trust the value estimate, but
currently we do.
- try on other systems.
implementation, mid/low importance.
(done)
- handle manifold correctly.
probably projecting back after each simulation is good enough. thoughts
in log 2024-04-02.
update, did a basic version: project to manifold and cotangent space before
each backward sol. still ignore manifold within ODE solver. works fine,
|m(x)| becomes around 1e-5 at most.
(not done but alternative with known points buffer)
- reject estimated known value level above last value target. this happens
sometimes in the early stages of the run, especially with large active
learning batchsizes. this means we "trust" the extrapolation because the
nns agree, however i don't think we should. fix is a simple clipping
operation.
or not? for simple control problems/very well designed NNs with inductive
biases *just right* for the solutions at hand this might reasonably
happnen. and also fits the paradigm of "we assume well calibrated model"
and treating it as a black box initially.
(not done, hack accepted)
- make "prior" without problem specific hacks?
currently we push up Vnn(x) at the point where it empirically tends to
come close to zero again, which is the equilibrium but with the quad
upside down. is there a way to find this out automatically, maybe based
on failed forward simulations? probably yes. unclear if better.
(done)
- in general, make it more "transparent". output much more data so we can
locate failures more quickly. also some global "output verboseness"
setting? probably it would be wise to also use wandb for this stuff at
some point... for example:
- "straying off" of manifold into ambient space
- time durations/time steps stats of all trajectories
- distance between proposed point and (closest point on) optimal
trajectory?
- think more about the "practical tricks" such as estimating a slightly
bigger level set of low enough sigma & simulating forward until sigma is
low enough, or the "buffer" marking points as optimal, despite them maybe
going above the sigma limit sometime.
are they necessary? can we eliminate them and replace w/ something smarter?
if not, at least make sure that they don't allow any wildly unexpected
behaviour by close monitoring.
- is there a nice way to simulate a tiny bit further and use lower-sigma
information if possible? because now we stop as soon as the sigma
threshold is reached which is still not very great.
- more jit where possible.
- adapt visualisation to "embedded manifold" representation. (transforming
to old repr should work just fine)
- system state as dict too? probably not worth the effort though.
(done, unclear if advantageous, ditched again)
- instead of uniform backward shooting for initialisation, try to get a
better trajectory distribution by forward simulating with the LQR
solution, starting from uniformly distributed points inside some LQR
value level set (= ellipsoid), and then backward shooting from there.
(done, calling function with ONLY data from relevant value band)
- in the calibration plot exclude nan/inf points to make the plots look
better
(done, using projection after each step now)
- improve the manifold stuff a bit. during forward simulation, we do
sometimes stray off the manifold into the ambient space, by a LOT. if
this is only the case for the simulations that don't reach the lower set
anyway it's not a problem. Otherwise, we should fix it -- if we evaluate
NN ensemble mean/std (significantly) outside of the manifold, it is
meaningless...
- adapt kernel length scale (in proposal) to current data extent?
or use some cosine distance type kernel in the first place?
implementation, long term "extra" goals, probably overkill though:
- find out a nice "tailored" regularization scheme. optimally it should
alleviate the slowing progress as value level (and volume of value bands)
grows. probably this is closely connected to the intuition that the
optimal solution of "well behaved" (= smooth, with some amount of
timescale separation) problems is "almost-invariant" in directions of the
slow states.
- nicer way to handle u* convex optimisation.
- general explicit qp solver (still bruteforce, or with simple pruning of
never-optimal active sets)
- constrained ODE -> switched DAE reformulation like in that one paper
https://link.springer.com/article/10.1007/s10957-020-01744-4
general rough time plan until handin. always the week following. this middle
section is still quite dumb...
~~~ 6 may ~~~
decide definitely: better pruning & smooth approximation, or nonsmooth approximation
with slightly suboptimal data too.
-> smooth approximation it is, definitely.
go over active learning formulation & make it "proper" a la Holzmüller.
-> ditched. comparing simple diversity mechanism with assumed kernel
function against uniform sampling is probably enough.
find simplest configuration that half works, which is probably:
- conservative pruning
- huber-type loss to "reject" outliers
- throw away "outliers" in second pruning step.
to that end: put loss function params (huber width, switching of vx loss
for "suboptimal" data) in algoparams & run euler sweep.
start writing easy to write parts
- Problem setting
- Related work
- Fundamentals
~~~ 13 may ~~~
define & implement 2-3 simpler 2D examples to show basic principle.
think about 1-2 other not entirely trivial systems for examples.
bhavya suggests: acrobot/cartpole
acrobot is not strictly a stabilisation problem, modify somehow?
very cool too: plane landing from LQRTrees paper...
implement some sort of metric like closed loop performance
somehow log the dataset and/or nn params
start writing "central ideas" section, or at least plan it in more detail.
continue writing those "easy" parts.
~~~ 20 may ~~~
update for this week. spent lots of previous time on parameter tuning,
think i am finally half satisfied with how it works. immediate next tasks
in terms of coding:
- implement some nice way of storing the resulting dataset, could be
literally just a pickle of data on $SCRATCH named after wandb run id
- make separate thing that takes this data, fits one NN, and does some
closed loop sims
- think about best (reasonably feasible) "reference solution" for the 2D
examples, implement, compare
- think about interesting parameter sweeps/ablations to actually include
in the report?
- implicit function smoothness tradeoff probably the central one
- try one or two other nontrivial systems besides flatquad?
if time and bock for trying that "one last" idea:
- value substeps w/ checkpointing from log 2024-05-21
do writing
- 1. introduction can/should be done with the last 10% mental battery each day
- 2. fundamentals is now in a relatively nice state
- 3. central ideas "implementation" part still empty, rest needs some work
- 4. results and everything after: nada. maybe come up with rought
structure sometime? but first obvs we need to collect results ^^^
~~~ 27 may ~~~
collect results
- find good set of test cases/metrics.
- incurred cost vs learned value?
- expected/mean cost wrt fixed x0s?
- intuitive, qualitative showcase of "works where policy smooth, fails
close to watershed"
- find nice-ish way to track those metrics. right in run itself too?
do writing
- goal this week: central ideas done in terms of structure & basic
content. polish later.
procrastination tasks:
- intro writing
- plotting nice with makefiles and stuff
~~~ 3. jun ~~~
~~~ 10. jun ~~~
central tasks:
- continue finding good params but stop once diminishing returns.
- do parameter sweeps for results section
- make controlcost_lines figure (and other interesting ones) for parameter sweep.
- WRITE!
intro: start writing something.
fundamentals&methods: small incremental improvements.
results: continuously make nicer
discussion/conclusion: make some type of skeleton to avoid writers block
generally: unify notation & terminology
if motivated:
- TO refsol...
- other system???????????
~~~ 17. jun ~~~
prepare final draft
~~~ 24. jun ~~~
send final draft for feedback
writing & figures
~~~ 1 jul ~~~
writing polish
~~~ 2 jul ~~~
hand in !!!!!