forked from schoebel/mars
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChangeLog
1965 lines (1776 loc) · 83.2 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
IMPORTANT: the historic distinction between MARS Light and the future
MARS Full has been dropped. Now all versions are simply called "mars".
Old tagnames light* will remain valid, but newer names will follow the
convention s/light/mars/g (this means that the old version number counting
will be continued, only the "light" is substituted).
Meaning of stable tagnames
--------------------------
Example: mars0.1stable01:
0 = version of on-disk data structures
(only incremented when downgrades are impossible)
(not incremented on backwards-compatible upgrades)
1 = version of feature set
stable = feature set is frozen during this series
01 = bugfix revision
Example: mars0.2beta2.3:
The general idea is as before.
"beta" means that new features are roughly tested
in the lab, but not in production, so there may be
some bugs. New features may be added during
the beta phase.
Example: mars0.3alpha*:
Never use this for production. Only for historic
code inspection.
Release Conventions / Branches / Tagnames
-----------------------------------------
mars0.1 series (now EOL):
- Unstable tagnames: light0.1beta%d.%d (obsolete)
- Stable branch: mars0.1.y (obsolete)
- Stable tagnames: mars0.1stable%02d (obsolete)
mars0.1a series (stable):
New master branch. Now stable.
This branch is operational for several years on
several thousands of servers, and several petabytes
of data.
- Unstable tagnames: light0.1abeta%d (obsolete)
- Stable branch: mars0.1a.y
- Stable tagnames: mars0.1astable%02d
mars1.0 series (planned):
- Replace symlink tree by transactional status files
(future-proof)
This is required for upstream merging to the kernel.
It has further advantages, such as better scalability.
- Trying to additionally address public needs.
- Potentially for Linux kernel upstream,
- Unstable tagnames: mars1.0beta%d.%d (planned)
- Stable branch: mars1.0.y (planned)
- Stable tagnames: mars1.0stable%02d (planned)
WIP-* branches are for development and may be rebased onto anything
at any time without notice. They will disappear eventually.
Never use them for production!
*stable* branches mean the following:
- Heavily tested. Has to obey an HA SLA of 99.98% end-to-end,
including network outages and HumanError(tm) at 1&1 Ionos
ShaHoLin. Thus the _component_ SLA of MARS must be much better.
- There is always an upgrade path. Simply install the new
version, obeying the below compatibility rules.
- Rolling upgrades (temporarily different MARS kernel module
versions at primary vs secondary side) are supported.
Typically, do "rmmod mars; modprobe mars" at the secondary
side first, then handover, then do the same at the former
primary side.
Or, of course, you may combine it with (typically security-
triggered) rolling kernel reboots.
I am putting high effort into maintaining rolling upgrades
of kernel modules. The network protocols are designed to
support this.
- COMPATIBILITY RULES:
Ensure that $marsadm_version >= $module_version.
This is the safe side of your update strategy.
Update marsadm first, before updating the kernel module.
This way, the controls for newer features are already in
place when the new kernel module is activated (no blind
flight).
Since marsadm is a plain Perl script with _no_ dependencies
from anything else, this is something I can reasonably expect
from users.
REASON: ensuring forever backwards compatibility to stone-aged
marsadm versions would make me ill. I cannot change old versions
anymore, but just provide new versions. I cannot ensure and
test all possible O(n^2) combinations of marsadm versions with
kernel module versions to work eternally for all times when
marsadm would be frozen, or even all O(n^3) combinations of
frozen marsadm with mixed-operations kernel modules.
The development of MARS would be hindered by too old marsadm
versions, since my effort would grow quadratically or
even worse.
Hint: nevertheless, many combinations of old marsadm with newer
kernel module version are working anyway, in particular when
the gap is a small $epsilon. But I cannot guarantee in general.
If you want to violate the above rule, you must test the
combination yourself.
- Best practice in bigger installations: first test your upgrade
or downgrade at some test clusters first.
If you have a separate pre-live stage, it definitely is
your friend.
- As long as $marsadm_version + $epsilon >= $module_version
remains true (at least "approximately") and has been tested
in pre-live, marsadm may be upgraded and downgraded
independently from kernel, and during operations
(best via your favorite package manager).
Of course, no magic will happen: newer features are only
available when newer versions of both the userspace tool and
the kernel modules are installed.
- Please check this ChangeLog for any upgrade / downgrade
incompatibility bugs. In case they are detected, they will
be fixed. But I cannot retrospectivly change already released
versions and their bugs. Fixes are only possible in newer
versions.
- Downgrade is possible *inside* of the same stable branch
series.
- Downgrade to _prior_ *stable* branches may be restricted,
or may require some extaordinary actions.
Please read this ChangeLog for details.
Example: a new future-proof internal deletion format has been
introduced in mars0.1astable88. It is off by default.
If you never activate it, you can downgrade inside of mars0.1astable*
as you like.
Only if you actually activate it, you have to obey the
downgrade instructions documented below.
-----------------------------------
Changelog for series 0.1a:
This is the new master branch, starting January 2019.
The old stable branch mars 0.1.y is EOL,
now fully superseeded by this branch.
mars0.1astable116
* Critical fix, only relevant for cluster naming schemes where hostname A
may be a _prefix_ of hostname B: such naming schemes could have led to
a multitude of bizarre and unexplainable confusions and problems.
Example: hostnames icpu-bs6 and icpu-bs60 .
Please UPDATE when such hostnames may occur in the _same_ cluster.
I was unable to find the bug via the test suite because such hostnames
were not deployed at the test machines. Thanks to Stephan Chistiany who
pointed me at the problem.
* Fix annoying bug: during long-lasting sync (several TB), the automatic
flipping between replay and sync could sometimes get stuck in sync mode,
and then /mars could fill up because replay was starving unnecessarily
(waiting that sync would finish, which could take a long time).
Workaround was possible by "pause-sync"; wait until replay has
caught up; "resume-sync".
mars0.1astable115
* Critical regression from mars0.1astable113 / 114, only relevant
when the new ssh-less peer operations are actually used:
Race on peer thread creation could lead to kernel memory corruption.
For maximum safety, please avoid the affected kernel module versions.
* dkms improvements from Gabriel Francisco.
mars0.1astable114
* Major usability: ssh-less {merge,split}-cluster.
Now all cluster operations should work without ssh and
its agent forwarding.
Of course, you will need to update mars.ko and marsadm
on all of your machines first.
* Doc update: describe new options and behaviour.
* Some smaller fixes / safeguards / improvements.
mars0.1astable113
* Critical fix: deadlock was possible after receiving _corrupted_
data over the network. Very unlikely to trigger, since there
is a lot of other magic checking, but anyway.
Now treated like any other communication error.
* Major improvement: "marsadm primary" (also with --force) now does
the equivalent of "up", after the operation has succeeded.
This should be useful for people who forget to do the "up" manually
after a manual unplanned failover.
* Minor fix: race at {wait,update}-cluster leading to unnecessary
abort.
* Minor fix: update-cluster did not always transfer directories.
* Minor fix: new join-cluster method could sometimes fail
at the first try. Workaround by repetition.
* Minor fix: primitive macros wait-todo-primary-{on,off} were
documented, but not implemented.
* Minor improvement: by the way, all missing combinations from
{is,nr,todo}-secondary and
wait-{is,todo}-{primary,secondary}-{on,off} are also implemented.
* Minor improvement: try to automatically fetch any unknown
peer info. May help after failed join-cluster & co.
* Minor improvement: speedup new join-cluster method.
* Minor doc update: describe new primitives.
* Some smaller fixes and improvements.
mars0.1astable112
* Critical fix: generic mars_readlink() did not work with an
extremely low probability, so it slipped through years
of testing. My reproducer indicates that it "fixed" itself
after a while, just leading to some unnecessary delays.
Nevertheless, I mark it "critical" under a HA viewpoint,
although most people likely might have never noticed it.
Recommendation: please update.
* Major fix, only relevant for k > 2 replica:
fetch could get stuck in cyclic dependencies for some
time, making only slow progress.
* Major fix: join-resource could loop when
old method is selected and ssh was not working.
* Minor fix: do not produce alive-timestamp & co on a fresh
/mars, before {create,join}-cluster has been executed.
* Several smaller fixes and improvements.
mars0.1astable111
* Minor fix, only relevant for new deletions:
Split-brain cleanup was sometimes stumbling over
deleted logfiles. Workaround by cron.
mars0.1astable110
* Minor improvement: new disk-error for better diagnosing
any problems with disk setup / LVM etc.
* Doc update (new macros etc).
mars0.1astable109
* Regression from mars0.1astable106: when the old
deletions were active, logfiles could be unlinked
unnecessarily (displayed as Orphan).
It did not really harm due to automatic re-fetching, but
caused unnecessary network traffic.
mars0.1astable108
* Improved metadata scalability.
* Some smaller fixes and improvements.
mars0.1astable107
* Critical regression from mars0.1astable106: use-after-free.
* Fix use-after-free at rmmmod.
mars0.1astable106
* Major regression from mars0.1astable97: marsadm primitive
disk-present erronously reported the disk name in place of
boolean value 0 or 1.
* Minor fix for new deletions (beta):
invalidate / re- join-resource were sometimes hanging
in Orphan due to a conflict with the new deletions.
* Minor improvements: somewhat more improved scalability both
in #resources and in #hosts.
mars0.1astable105
* Minor marsadm regression from mars0.1astable104: race on
_old_ deletions could lead to lost deletions. Workaround
by repeating any affected commands, e.g. leave-resource.
mars0.1astable104
* Major fix: marsadm did not obey an abort of certain phased
commands when a single resource argument was given. As a result,
a wrong exit code could be returned in such a case.
* Minor fix: when beta feature logfile digests were disabled
_during_ operations, already existing old logfiles were
not always checked correctly at the secondary,
reporting DefectiveLog (although they were healthy).
Workaround by just enabling again and invalidate.
With the fix, you may now replay the old logfiles :)
* Minor fix: inherent race between join-resource and log-rotate
(unavoidable in the Distributed System) could lead to split brain,
or to hanging replay. Now compensated.
* Minor fix: join-cluster without ssh was sometimes not
updating the local link tree immediately.
* Usability (BETA feature): improved scalability in #hosts.
The below BETA feature warnings apply.
Do not exceed the "officially documented" limits too much.
* Usability: join-resource avoids unnecessary fallback
to ssh / rsync.
IMPORTANT: please update marsadm first, before updating the
kernel module. See the above compatibility rules.
This time the compatibility rules are important. I know that
marsadm < 0.1astable85 does no reliable join-resource anymore,
while combinations with old 0.1astable95 appear to work. There is
no merit in bisecting old marsadm releases, instead of just
fucking update the old userspace script in a controlled manner.
* Usability: more accurate IOPS and friends.
* Several smaller fixes and improvements.
mars0.1astable103
* Major regression from mars0.1astable99: secondary replay could
hang unnecessarily due to a cascade of race conditions.
AFAICS consistency was not affected (thanks to md5 checksumming).
Observed with a specific load pattern at less than 1% of resources,
or in average after ~ 120 operation hours when logrotate
was 12 times per hour. Unfortunately, it slipped through all my
release tests due to relatively low trigger probability.
Workaround by "invalidate". Which is however no good solution.
Please avoid kernel module versions between *99 and *102
for production.
mars0.1astable102
* Major usability (BETA): scalability in number of hosts.
It should have no visible side effect in functionality,
but better non-functional properties.
Tested in the _lab_ with 1000 additional dummy hosts
and additionally 8000 dummy resources in total.
BETA WARNING: at the moment, there are no practical experiences.
There might be problems which might not show up during lab tests.
Do not blindly rollout or merge-cluster big masses in production.
I will tell you when practical experiences allow for rising
the "official" limits as documented in the user manual.
mars0.1astable101
* Major usability: join-cluster now works without ssh.
Of course, you need to rollout the new marsadm and
the new mars.ko first, and to modprobe it at any
pre-existing cluster.
The new feature is automatically activated when you
modprobe _before_ doing join-cluster. By running
join-cluster first (without modprobe), you can fallback
to the old ssh + rsync based method.
Important: now you can modprobe before /mars/uuid is
created or retrieved. Previously, you could accidentally
try the wrong sequence "modprobe mars; mount /mars"
without harm because it was denied by missing uuid, but now
such illegal attempts would result in a big fuckup.
Suchalike fuckup is now prevented by always insisting on
/mars being a mountpoint.
This might break old ill-behaved scripts or buggy /etc/fstab
or racy systemd dependencies, which need to be fixed.
Always ensure that no modprobe is attempted before /mars
has been mounted in a race-free and reboot-safe manner.
Notice: merge-cluster and split-cluster are not yet
ssh-free zones. This will be addressed in a later release.
* Minor usability: show age of any hanging /dev/mars/
IO requests. This is useful for diagnosing faulty RAID
controllers etc.
* Lots of further minor fixes and improvements.
mars0.1astable100
* Minor fix: UpToDate was not reported in a very weird
corner case.
* Minor fix, only relevant when the new deletion method
is enabled: leave-resource did sometimes not delete
all superfluous logfiles at the other peers, sometimes
not clearing a split brain situation immediately.
Workaround by cron which did the cleanup later.
* Minor usability: reduced speakiness of "marsadm view all"
with respect to the new compression / digest features.
Full info can be obtained with --verbose.
* Minor fix, only observed at join-cluster without ssh:
Not all symlink infos were transferred in a corner case.
* Further minor fixes and improvements.
mars0.1astable99
* Minor fixes: some more corner cases of unnecessary
split brain rarely occuring after fatal primary
crashes.
mars0.1astable98
* Minor regression from mars0.1astable97: when old
kernel modules < mars0.1astable97 were combined with
exactly that marsadm version, the presence of
/dev/mars/$resource was detected incorrectly.
Do not use exactly that combination. Simply skip
the marsadm version mars0.1astable97.
Other version combinations are still possible for independent
and rolling updates of kernel and marsadm.
Best practice: first update marsadm to mars0.1astable98
or newer, so this bug is fixed, and then your rolling
kernel updates will work again for updating or even
downgrading old kernels.
* Minor fix: in a hardly reachable corner case, detach
was hanging. Workaround by rmmod was possible.
* Minor fix: spurious races at join-resource without ssh could
occur, so it sometimes did not notice that a new resource
was added in the meantime. Usage of ssh, or just retrying
was helpful. Thus hardly relevant in practice.
* Various minor fixes and improvements. Some masked bugs,
not visible, only triggerable by a future version of MARS.
mars0.1astable97
* Critical fix: when logfile is damaged (e.g. after a
primary crash), some corner cases of primary recovery
could hang. Workaround by "detach ; attach" seemed
possible (as far as observed during testing).
* Critical fix for BETA feature network compression only:
Memory deallocation could fail under certain circumstances,
resulting in a memory leak, or potentially memory corruption.
Only relevant when network transport compression is enabled.
* Major fix: when a primary crash was occuring exactly during
a very short log-rotate time window, a race condition could
sometimes lead to unnecessary split brain (secondaries could
bypass the primary).
* Several minor fixes and improvements.
mars0.1astable96
* Minor improvement: auto-correct defective symlink
timestamps which are too far in the future.
This can happen when running with a defective CMOS
hardware clock, e.g. after a fatal hardware failure, and
before ntpd has corrected the local clock.
* Minor usability: more pretty formatting of compression
and digest flags in "marsadm view".
mars0.1astable95
* Minor fix: sometimes, in a hardly relevant corner case,
join-resource could abort unnecessarily.
* Minor improvement: marsadm view now distinguishes role ForcedPrimary
from plain Primary. This could help a larger team of sysadmins
earlier noticing potentially upcoming SplitBrain even while the
network is interrupted, so any actual SplitBrain cannot be
detected, although it is suspectible.
* Reduce footprint of some deprecated marsadm functions
and macros.
mars0.1astable94
* Major regression from mars0.1astable86:
Memory leak in remote communication.
This could accumulate over a longer time. Please update when
affected.
mars0.1astable93
* Minor improvement: in some special cases, secondaries
may now follow primaries having a damaged logfile.
mars0.1astable92
* Major improvement from an operational perspective:
"marsadm view all" now reports the current status of
/dev/mars/mydata in human-readable form, including
the Open status, the current IOPS, the number of currently
flying IO requests = IO queue length = indicator for IO problems
or overload, and any error information.
mars0.1astable91
* Major features, disabled by default:
- Network transport compression.
May improve network bottlenecks.
- Transaction logfile payload compression.
May improve the filling speed of /mars.
* Major feature, enabled by default:
- More logfile checksumming digests, some
consuming less CPU.
* Rough benchmarks, supporting you activation decisions.
Please read mars-user-manual.pdf for instructions.
Rolling updates with mixed versions are supported.
mars0.1astable90
* Minor improvement: more reactiveness. This release
is meant as an anchor point in case you would need
a downgrade.
mars0.1astable89
* Minor improvement: better kernel module reactiveness.
More on scalability is in the dev pipeline.
For now, use marsadm --timeout=300 or similar when
stretching the official limits (but don't stretch too
much until I have improved all relevant parts).
mars0.1astable88
* New experimental scalability feature, deactivated
by default:
New deletion method, uses the special symlink value
".deleted" as a marker for logically deleted symlinks.
This leads to a _massive_ simplification of code,
and improves scalability for future masses of
resources and/or cluster hosts.
After updating both mars.ko and marsadm, you may
activate it via marsadm option --delete-method=0
but ONLY FOR TESTING.
I will tell you when it will be stable enough for
production. Somewhen in future, it will hopefully
become the default, and eventually the old complex code
can be hopefully purged after the whole world
uses the new method.
Note: when never activated, it should not have any
influence on old-style production. Both methods
can be used in parallel on different clusters.
So you can activate it on some test clusters first.
Do not _directly_ rollback to old mars.ko and/or marsadm versions
after activation. First deactivate the feature via
--delete-method=1, then wait for a few hours until marsadm cron
has done purging. "find /mars -type l -ls" must no longer report
any "-> .deleted" values anywhere in the entire cluster.
Then you can roll back to old releases.
* Doc: small update on new marsadm command link-purge-all.
mars0.1astable87
* Minor fix: unnecessary split brain could result from a race
between handover and log-rotate / cron.
mars0.1astable86
* Minor improvement: speedup metadata traffic avoiding
some O(n^2) internal algorithms.
mars0.1astable85
* Minor improvement: avoid ssh / rsync at join-resource.
Only when ordinary communication over over port 7777 (default)
fails, fallback to ssh connections.
* Minor marsadm speedup by avoidance of unnecessary
sleep times.
* Minor fix: ensure that primary --force works even when a
logfile was truncated forcefully.
* Minor fix: use-after-free reported by KASAN, only
triggerable with a future development version, not
observed with the current stable version.
I include it here for safeguarding.
* Minor doc updates. Explain fundamental requirements for
geo-redundancy, and some background on cost comparisons.
mars0.1astable84
* Major improvement: try to automatically self-repair
any defective logfile at secondaries, by fetching again
from primary.
This can only work when the version at the primary is
healthy.
When successful, "invalidate" is no longer necessary.
mars0.1astable83
* Major improvement: new marsadm option --parallel can drastically
speed up handover, provided that the rest of your infrastructure
can deal with parallelism. Several cluster managers are
known to have problems with that. So be careful, do not
blindly use this feature!
Future releases will try to improve the systemd interface
such that parallelism is possible without problems.
* Doc updates: describe dimensioning of storage networks
and its realtime behaviour, at the background of Kirchhoff's
law. Neglecting this may lead to much higher cost than
necessary, and may lead to a variety of operational problems,
up to failures of projects.
Also, working with wrong definitions of Cloud Storage can lead
to a similar effect.
Recommended reading!
mars0.1astable82
* Major improvement: the mars_main kernel thread is now working
non-blocking in practically all relevant cases. Some more cases
will be addressed in future.
Testing with 32 resources in parallel is now working, and even
64 resources appear to work in the lab, although somewhat slower
(on typical server iron).
"marsadm primary all" is now much faster.
More future improvements to come. Currently, "marsadm primary all"
uses an internal barrier synchronisation model, which may lead
to unnecessary waiting time for faster resources. There are
plans to address this in future releases.
ATTENTION! You will need NEW VERSIONS of your pre-patch.
This will automatically adjust /proc/sys/fs/aio-max-nr to higher
values when needed. If you don't use the new pre-patch, you will
need to tune /proc/sys/fs/aio-max-nr yourself. Otherwise
you will get serious operational deadlocks due to virtual
resource limitations, even with only 32 resources, but a
higher number of replicas.
Since there is no practical experience yet (the biggest known
productive installation uses only 24 resources), I do not yet
increase the official limits as documented in the appendix of
mars-user-manual.pdf.
Although very slow due to some O(n^2) algorithms, 128 resources
are just surviving now, without bombing or deadlocking, but are
not yet really usable.
Therefore, do not try to stretch the official limits too much.
Please report any success stories (or problems) in case you
are using some more resources _productively_.
* Minor doc improvements. New slides from LCA2020 added.
mars0.1astable81
* Minor doc improvement: explain why running MARS inside of VMs
is a bad idea. Explain fully managed geo-location transparency
of VMs.
mars0.1astable80
* Compatibility up to kernels <= 4.14.
Attention! There is a bug in upstream kernels >= 4.11, leading
to an endless loop in kernel mode under certain preconditions.
The fix is in pre-patches/vanilla-4.14/0001-sched-wait-fix-*
If you _forget_ to apply this fix for _affected_ kernels, you may
get "operational fun" at the wrong moment: ordinary operations
will likely be unaffected, but a _silent_ network outage at the
wrong moment (race condition) may hang up your kernel at the
secondary site, just in the moment when you probably want to do
a failover.
LTS kernels 4.9 and earlier are not affected by the bug, although
potentially present also there, but it is a _masked_ (sleeping)
bug there.
I already submitted the fix to LKML, but unfortunately has been
ignored up to now.
mars0.1astable79
* Critical fix: in a multiple-failure scenario which is hard
to reach, and then acting badly by disregarding
heavy warnings from marsadm and from mars-user-manual.pdf,
data consistency could be violated. Detected by testing
(the situation has not been observed in practice up to now).
When unsure, better update to this fixed version.
* Minor fix: in a scarce corner case plus an additional
scarce race, primary handover could hang.
* Major systemd interface fixes and improvements:
- When handover fails due to failed systemd stopping at
the old primary (e.g. hanging umount etc), the application
stack will be automatically restarted before the handover
operation reports timeout. The idea is to keep your
applications running whenever possible.
- New commands marsadm set-systemd-want and get-systemd-want
for a temporary shutdown of the systemd unit stack.
This is useful e.g. for performing an fsck.
- Implemented transitive closure of indirectly referenced
further systemd units.
- Attach / detach now automatically starts / stops the
systemd unit stack.
- Improved reliability of systemd handover.
- Fixed many bugs in the systemd template macro processor.
- Updated doc accordingly.
mars0.1astable78
* Major or minor fix: memory leak, triggered under scarce conditions.
Observed cases were a few kilobytes. However, it could accumulate
over a very long time. When unsure, better update to this version.
* Minor usability: report each resource size.
mars0.1astable77
* Major doc update: the old mars-manual.pdf has been split into
- mars-user-manual.pdf (for sysadmins)
- mars-architecture-guide.pdf (for managers and architects)
- mars-for-kernel-developers.lyx (unfinished)
- football-user-manual.lyx
The first two manuals have been heavily rewritten and
extended!
* Minor fix: after primary crash without failover, the secondaries
could get stuck because a version symlink was forgotten to
update under scarce preconditions.
* Minor improvement: emergency space calculation is now more
accurate.
* Minor usability: hint when marsadm resize would be possible.
* Several minor cosmetic improvements.
mars0.1astable76
* Major fix: when the primary was dead and the
secondary had an incomplete logfile which was
not recognized as being damaged, "primary --force"
did not always work under all circumstances.
* Minor fix: some config information was not
replicated throughout the cluster.
Ordinary users were typically not affected.
* Minor improvement: marsadm view now shows
the replication degree [$x/$y] at each individual
resource.
* Added slides from FrOSCon2019.
mars0.1astable75
* Major fix, only relevant for a scarce corner case:
When overflowing the kernel fscache with gigabytes of
data, and when a few more weird preconditions were met,
it was possible to potentially eat up the whole kernel
memory and to trigger OOM.
Notice: depending on kernel version, and depending on various
overload scenarios, you may trigger OOM anyway, independently
from MARS.
* Minor fix: marsadm now is reporting the amount of
Writeback data (as necessary for the Recovery phase after
a crash) more precisely.
* Minor improvement: speedup IOPS by better internal
hash dimensioning.
mars0.1astable74
* Full merge of EOL branch mars0.1.stable74,
which was the last stable release in EOL branch
mars0.1.y.
* Major fix, only relevant for a corner case:
Writeback made no human-visible progress under
multiple weird preconditions.
* Minor fix: ssh connections should be more robust
when clumsy firewalls are leading to ssh hangs.
* Minor usability improvement: marsadm view shows
more fancy details on logfile numbers.
* Minor speedups in internal infrastructure.
* Football subproject: update to Football-2.0
mars0.1astable73 (merged from mars0.1stable73)
* Critical fix, only relevant for kernels >= 4.2.x:
NULL deref occurs systematically when more than 64
file handles are being allocated.
There is already an upstream bugfix in linux-next
(missing initializer for resize_wait in fs/file.c).
Since this fix is missing in many LTS and distro kernels
(at the moment), I added a workaround in MARS.
Recommendation: anyone operating MARS on newer kernels
should update to mars0.1astable73 for safe operations.
Don't leave this unfixed. It can explode at the worst
moment, and restoring operations may only be possible
by completely giving up a secondary host, or with a fix.
mars0.1astable72 (merged from mars0.1stable72)
* Minor fix: writeback improved in a corner case.
* Minor improvement: display WriteBack data amount in
marsadm view.
* Major doc improvement: describe IO performance tuning.
mars0.1astable71 (merged from mars0.1stable71)
* Major fix: writeback at the primary was unnecessarily
slow at certain situations.
mars0.1astable70 (merged from mars0.1stable70)
* Critical fix: a few upper-layer kernel components are
allocating struct bio on the stack. This led to stack memory
corruption. If you ever had this problem, you certainly have
noticed it ;) Thus it should not have affected your data.
Unfortunately, I got no bug reports about this for several years.
Discovered when testing compatibility to very new kernels,
and now hopefully fixed.
* Major fixes: the systemd interface was not in a mature state.
Now improved a lot. More improvements are likely to follow
in the next months.
* Minor clarification: build for ancient kernel 2.6.32 was broken.
Fixing the build was no problem, but then the resulting kernel
deadlocked in certain situations (sb_mount mutex and sisters).
The reason is that stacking of filesystem instances (like
/vol/mydata relying on IO to /mars) is a pain in the very old
kernel architecture.
Any upstream kernel before 3.16 is EOL right now. Nevertheless,
I am officially supporting 3.2 at the moment, and have tested it.
Anyway, productive use of ancient kernels is not
recommended, for various reasons.
Notice that you also need old gcc versions for building such
EOL kernels.
Thus I decided to remove support for 2.6.32 officially.
If somebody needs it _really_, please contact me.
mars0.1astable69 (merged from mars0.1stable69)
* Major improvement: compatibility to upstream kernel 4.9.x.
mars0.1astable68 (merged from mars0.1stable68)
* Minor fix: sometimes sync was advancing only slowly.
* Minor fix: in extremly rare cases and under further conditions,
detach could hang due to a race.
Workaround was possible by re-attaching.
* Minor improvement: /dev/mars/mydata now disappears only after
writeback has finished. Although the old behaviour was correct,
certain userspace tool could have erronously concluded that
the primary has finished working. The new bevaiour is
hopefully more like to user expectance.
* Minor improvement: propagate physical and logical sector
sizes from the underlying disk to /dev/mars/mydata.
This can affects mkfs and other tools for making better
decisions about their internal parameters.
* Minor safeguard: disallow manual --ignore-sync override
when the target primary is inconsistent, only relevant
for (non-existent) sysadmins who absolutely don't know what
they are doing when they are combining this with --force.
Systemadmins who really know what they are doing can use
fake-sync in front of it, and then they are explicitly stating
once again that they really want to force a defective system,
and that they really know the fact that it is defective.
* Minor improvement: additional warning when network connections
are interrupted (asymmetrically), such as by mis-configuration
of network interfaces / routing / firewall rules / etc.
mars0.1astable67 (merged from mars0.1stable67)
* Minor fix: don't unnecessarily alert sysadmins when no systemd
unit files are installed.
* Minor doc update: new slides from LCA2019, updated old
slides from FrOSCon2018.
* Minor doc update: describe some more use cases, add some
advice for managers.
mars0.1astable66.
* Merge mars0.1stable66. In detail:
* Critical fix, only relevant for kernels 4.3 to 4.4:
Due to a forgotten adaptation to newer kernels,
some userspace tools like xfs_repair could read/write
wrong data upon _large_ IO requests, and/or kernel memory
corruption could occur. Kernel-level filesystems
are typically _not_ affected because they typically use 4k
pages at maximum.
If you are operating such a kernel, please upgrade to
minimize any risks. You probably want userspace tools like
xfs_repair to not crash your kernel ;)
The problem was reproducibly detected at lab regression testing,
_before_ updating a big installation from kernel 3.16 to 4.4.
It did not show up with the old kernel.
Notice: kernels >4.6 are not yet supported at the moment,
but work on them is likely being continued during the next
months. Stay tuned.
* Minor doc updates.
mars0.1abeta18
* Merge mars0.1stable65.
mars0.1abeta17
* Merge mars0.1stable64.
* Fix compiler warning at certain kernel versions.
mars0.1abeta16
* Merge mars0.1stable63.
mars0.1abeta15
* Merge mars0.1stable62.
mars0.1abeta14
* Merge mars0.1stable61.
mars0.1abeta13
* Minor feature: marsadm takes comma-separated list of
resource names in place of "all".
* Merge mars0.1stable60.
mars0.1abeta12
* Merge mars0.1stable59.
mars0.1abeta11
* Merge mars0.1stable58.
mars0.1abeta10
* Make IP_TOS compile-time configurable.
* Update doc on IP_TOS.
mars0.1abeta9
* Major feature: lowlevel TCP tuning, separately for traffic
types MARS_TRAFFIC_META (default port 7777),
and MARS_TRAFFIC_REPLICATION (default port 7778),
and MARS_TRAFFIC_SYNC (default port 7779).
* Merge mars0.1stable57.
mars0.1abeta8
* Merge mars0.1stable56.
mars0.1abeta7
* Merge mars0.1stable55.
mars0.1abeta6
* Merge mars0.1stable54.
mars0.1abeta5
* Merge mars0.1stable53.
mars0.1abeta4
* Merge mars0.1stable52.
mars0.1abeta3
* Merge mars0.1stable51.
mars0.1abeta2
* Merge mars0.1stable50.
* Silence annoying false-positive network interruption messages.
mars0.1abeta1
* Merge mars0.1stable49.
* Several smaller fixes.
mars0.1abeta0
Forked off from 0.1balpha4.
Merge 0.1stable48 (in several intermediate steps).
Some infrastructure for version detection.
Backport of selected fixes from branch 0.1b.y.
Add marsadm split-cluster.
-----------------------------------
Changelog for the deprecated series 0.1b:
(only the part which has been merged with branch mars0.1a)
(notice that there were a few more historic branches which
were not really usable, and never went into production)
mars0.1balpha4
--------
* First improvements for scalability to thousands of nodes.
Not yet tested with really huge masses of nodes, only
with relatively small clusters.
* Merge fixes from mars0.1stable41 (see there)
* Doc update on socket bundling.
mars0.1balpha3.4
--------
* Merge fix from mars0.1stable40 (see there)
mars0.1balpha3.3
--------
* Merge fixes from mars0.1stable39
* Major fix: copy was sometimes hanging.
* Minor fix: unnecessary delay of metadata propagation.
* Performance improvements / bottleneck enhancenemts:
- Lamport clock
- Network
- md5 checksumming
* Userspace: faster logfile deletion via cron job.
mars0.1balpha3.2
--------
* Merge mars0.1stable38: now compiles without pre-patch
on certain kernel versions. Please read ChangeLog there.
mars0.1balpha3.1
--------
* Minor fix: deadlock on termination of copy thread.
mars0.1balpha3
--------
* Some tuning (more to come later):
* Speedup network by better corking.
* New scalable Lamport clock implementation.
mars0.1balpha2
--------
* Socket bundling (cherry-picked from mars0.2.y).
* Speedup copy processes (sync, logfile transfer).
* Speedup bio and md5 checksumming.
mars0.1balpha1
--------
* First improvements for scalability to more than 10 resources
per node. Already tested with 128 resources on a pair of nodes.
More improvements to come later.
No functional changes otherwise (from a sysadmin perspective).
Rollback to stable series 0.1 should be possible at
any time.
* Include fix from 0.1stable37.
mars0.1balpha0
--------
* Minor fix: the 1&1 specific feature set-sync-pref-list was
not used at all. Without it, the limitation feature for the sync
parallelism degree did not work correctly (without leading to harm,
other than optimum sync throughput / performance).
Removed the old _obsolete_ feature (for formal reasons,
this cannot be done in the 0.1stable branch).
Re-implemnented the feature in a very simple form,
which is hopefully "obviously correct" now.
* Minor feature: please use "marsadm cron" as a fool-proof short form,
in particular at cron jobs.
-----------------------------------
Changelog for series 0.1:
Attention! This branch is now EOL.
Everything has been merged into branch mars0.1a.y which
is also the master branch.
PLEASE UPGRADE to the new branch.
Upgrade is easy: just rollout the new marsadm version,
install the new kernel modules, and load them where possible.
Mixed operation of different versions is no problem,
but is of course not the desired state, so keep this period
as short as possible.
Rollback is also easy.
Motivation: branch 0.1a is productive for several years at 1&1.
Experiences: now runs provably better than 0.1.y with
better performance, smoother, etc.
mars0.1stable74 (last stable release in branch mars0.1.y)
* Major fix, only relevant for a corner case:
Writeback made no human-visible progress under
multiple weird preconditions.
* Minor usability improvement: marsadm view shows
more fancy details on logfile numbers.
mars0.1stable73
* Critical fix, only relevant for kernels >= 4.2.x:
NULL deref occurs systematically when more than 64
file handles are being allocated.
There is already an upstream bugfix in linux-next
(missing initializer for resize_wait in fs/file.c).
Since this fix is missing in many LTS and distro kernels
(at the moment), I added a workaround in MARS.
Recommendation: anyone operating MARS on newer kernels
should update to mars0.1astable73 for safe operations.
Don't leave this unfixed. It can explode at the worst
moment, and restoring operations may only be possible
by completely giving up a secondary host, or with a fix.
mars0.1stable72
* Minor fix: writeback improved in a corner case.
* Minor improvement: display WriteBack data amount in
marsadm view.
* Major doc improvement: describe IO performance tuning.
mars0.1stable71
* Major fix: writeback at the primary was unnecessarily
slow at certain situations.
mars0.1stable70
* Critical fix: a few upper-layer kernel components are
allocating struct bio on the stack. This led to stack memory
corruption. If you ever had this problem, you certainly have
noticed it ;) Thus it should not have affected your data.
Unfortunately, I got no bug reports about this for several years.
Discovered when testing compatibility to very new kernels,
and now hopefully fixed.
* Major fixes: the systemd interface was not in a mature state.
Now improved a lot. More improvements are likely to follow
in the next months.
* Minor clarification: build for ancient kernel 2.6.32 was broken.
Fixing the build was no problem, but then the resulting kernel
deadlocked in certain situations (sb_mount mutex and sisters).
The reason is that stacking of filesystem instances (like
/vol/mydata relying on IO to /mars) is a pain in the very old
kernel architecture.
Any upstream kernel before 3.16 is EOL right now. Nevertheless,
I am officially supporting 3.2 at the moment, and have tested it.
Anyway, productive use of ancient kernels is not
recommended, for various reasons.
Notice that you also need old gcc versions for building such
EOL kernels.
Thus I decided to remove support for 2.6.32 officially.
If somebody needs it _really_, please contact me.