-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathzaloha2.sh
5391 lines (4638 loc) · 218 KB
/
zaloha2.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#!/bin/bash
function zaloha_docu {
less << 'ZALOHADOCU'
###########################################################
MIT License
Copyright (c) 2019 Fitus
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
###########################################################
OVERVIEW
Zaloha is a small and simple directory synchronizer:
* Zaloha is a BASH script that uses only FIND, SORT and AWK. All you need
is THIS file. For documentation, also read THIS file.
* Cyber-secure: No new binary code, no new open ports, no interaction with
the Internet, easily reviewable.
* Three operation modes are available: Local Mode, Remote Source Mode and
Remote Backup Mode
* Local Mode: Both <sourceDir> and <backupDir> are available locally
(local HDD/SSD, flash drive, mounted Samba or NFS volume).
* Remote Source Mode: <sourceDir> is on a remote source host that can be
reached via SSH/SCP, <backupDir> is available locally.
* Remote Backup Mode: <sourceDir> is available locally, <backupDir> is on a
remote backup host that can be reached via SSH/SCP.
* Zaloha does not lock files while copying them. No writing on either directory
may occur while Zaloha runs.
* Zaloha always copies whole files via the operating system's CP command
or the SCP command (= no delta-transfer like in RSYNC).
* Zaloha is not limited by memory (metadata is processed as CSV files,
no limits for huge directory trees).
* Zaloha has optional reverse-synchronization features (details below).
* Zaloha can optionally compare the contents of files (details below).
* Zaloha prepares scripts for case of eventual restore (can be optionally
switched off to shorten the analysis phase, details below).
To detect which files need synchronization, Zaloha compares file sizes and
modification times. It is clear that such detection is not 100% waterproof.
A waterproof solution requires comparing file contents, e.g. via "byte by byte"
comparison or via SHA-256 hashes. However, such comparing increases the
processing time by orders of magnitude. Therefore, it is not enabled by default.
Section Advanced Use of Zaloha describes two alternatives how to enable it.
Zaloha asks to confirm actions before they are executed, i.e. prepared actions
can be skipped, exceptional cases manually resolved, and Zaloha re-run.
For automatic operations, use the "--noExec" option to tell Zaloha to not ask
and to not execute the actions (but still prepare the scripts).
<sourceDir> and <backupDir> can be on different filesystem types if the
filesystem limitations are not hit. Such limitations are (e.g. in case of
ext4 -> FAT): not allowed characters in filenames, filename uppercase
conversions, file size limits, etc.
No writing on either directory may occur while Zaloha runs (no file locking is
implemented). In high-availability IT operations, a higher class of backup
solution should be deployed, based on taking filesystem snapshots at times when
writing processes are stopped for a short instant (i.e. functionality that must
be supported by the underlying OS). If either directory contains data files
of running databases, then they must be excluded from backups on file level.
Databases have their own logic of backups, replications and failovers, usually
based on transactional logs, and it is plainly wrong to intervene with generic
tools that operate on files and directories. Dedicated tools provided by the
database vendor shall be used.
Handling of "weird" characters in filenames was a special focus during
development of Zaloha (details below).
On Linux/Unics, Zaloha runs natively. On Windows, Cygwin is needed.
Repository: https://github.com/Fitus/Zaloha2.sh
An add-on script to create hardlink-based snapshots of the backup directory
exists, that allows to create "Time Machine"-like backup solutions:
Repository of add-on script: https://github.com/Fitus/Zaloha2_Snapshot.sh
###########################################################
MORE DETAILED DESCRIPTION
The operation of Zaloha can be partitioned into five steps, in that following
actions are performed:
Exec1: unavoidable removals from <backupDir> (objects of conflicting types
which occupy needed namespace)
-----------------------------------
RMDIR regular remove directory from <backupDir>
REMOVE regular remove file from <backupDir>
REMOVE.! remove file from <backupDir> which is newer than the
last run of Zaloha
REMOVE.l remove symbolic link from <backupDir>
REMOVE.x remove other object from <backupDir>, x = object type (p/s/c/b/D)
Exec2: copy files/directories to <backupDir> which exist only in <sourceDir>,
or files which are newer in <sourceDir>
-----------------------------------
MKDIR regular create new directory in <backupDir>
NEW regular create new file in <backupDir>
UPDATE regular update file in <backupDir>
UPDATE.! update file in <backupDir> which is newer than the last run of Zaloha
UPDATE.? update file in <backupDir> by a file in <sourceDir> which is not newer
(or not newer by 3600 secs if option "--ok3600s" is given plus
an eventual 2 secs FAT tolerance)
unl.UP unlink file in <backupDir> + UPDATE (can be switched off via the
"--noUnlink" option, see below)
unl.UP.! unlink file in <backupDir> + UPDATE.! (can be switched off via the
"--noUnlink" option, see below)
unl.UP.? unlink file in <backupDir> + UPDATE.? (can be switched off via the
"--noUnlink" option, see below)
SLINK.n create new symbolic link in <backupDir> (if synchronization of
symbolic links is activated via the "--syncSLinks" option)
SLINK.u update (= unlink+create) a symbolic link in <backupDir> (if
synchronization of symbolic links is activated via the
"--syncSLinks" option)
ATTR:ugmT update only attributes in <backupDir> (u=user ownership,
g=group ownership, m=mode, T=modification time)
(optional features, see below)
Exec3: reverse-synchronization from <backupDir> to <sourceDir> (optional
feature, can be activated via the "--revNew" and "--revUp" options)
-----------------------------------
REV.MKDI reverse-create parent directory in <sourceDir> due to REV.NEW
REV.NEW reverse-create file in <sourceDir> (if a standalone file in
<backupDir> is newer than the last run of Zaloha)
REV.UP reverse-update file in <sourceDir> (if the file in <backupDir>
is newer than the file in <sourceDir>)
REV.UP.! reverse-update file in <sourceDir> which is newer
than the last run of Zaloha (or newer than the last run of Zaloha
minus 3600 secs if option "--ok3600s" is given)
Exec4: remaining removals of obsolete files/directories from <backupDir>
(can be optionally switched off via the "--noRemove" option)
-----------------------------------
RMDIR regular remove directory from <backupDir>
REMOVE regular remove file from <backupDir>
REMOVE.! remove file from <backupDir> which is newer than the
last run of Zaloha
REMOVE.l remove symbolic link from <backupDir>
REMOVE.x remove other object from <backupDir>, x = object type (p/s/c/b/D)
Exec5: updates resulting from optional comparing contents of files
(optional feature, can be activated via the "--byteByByte" or
"--sha256" options)
-----------------------------------
UPDATE.b update file in <backupDir> because its contents is not identical
unl.UP.b unlink file in <backupDir> + UPDATE.b (can be switched off via the
"--noUnlink" option, see below)
(internal use, for completeness only)
-----------------------------------
OK object without needed action in <sourceDir> (either files or
directories already synchronized with <backupDir>, or other objects
not to be synchronized to <backupDir>). These records are necessary
for preparation of shellscripts for the case of restore.
OK.b file proven identical byte by byte (in CSV metadata file 555)
KEEP object to be kept only in <backupDir>
uRMDIR unavoidable RMDIR which goes into Exec1 (in CSV files 380 and 390)
uREMOVE unavoidable REMOVE which goes into Exec1 (in CSV files 380 and 390)
###########################################################
INDIVIDUAL STEPS IN FULL DETAIL
Exec1:
------
Unavoidable removals from <backupDir> (objects of conflicting types which occupy
needed namespace). This must be the first step, because objects of conflicting
types in <backupDir> would prevent synchronization (e.g. a file cannot overwrite
a directory).
Unavoidable removals are prepared regardless of the "--noRemove" option.
Exec2:
------
Files and directories which exist only in <sourceDir> are copied to <backupDir>
(action codes NEW and MKDIR).
Further, Zaloha "updates" files in <backupDir> (action code UPDATE) if files
exist under same paths in both <sourceDir> and <backupDir> and the comparisons
of file sizes and modification times result in needed synchronization of the
files. If the files in <backupDir> are multiply linked (hardlinked), Zaloha
removes (unlinks) them first (action code unl.UP), to prevent "updating"
multiply linked files, which could lead to follow-up effects. This unlinking
can be switched off via the "--noUnlink" option.
Optionally, Zaloha can also synchronize attributes (u=user ownerships,
g=group ownerships, m=modes (permission bits)). This functionality can be
activated by the options "--pUser", "--pGroup" and "--pMode". The selected
attributes are then preserved during each MKDIR, NEW, UPDATE and unl.UP
action. Additionally, if these attributes differ on files and directories
for which no action is prepared, special action codes ATTR:ugm are prepared to
synchronize (only) the differing attributes.
Synchronization of attributes is an optional feature, because:
(1) the filesystem of <backupDir> might not be capable of storing these
attributes, or (2) it may be wanted that all files and directories in
<backupDir> are owned by the user who runs Zaloha.
Regardless of whether attributes are synchronized or not, an eventual restore
of <sourceDir> from <backupDir> including these attributes is possible thanks
to the restore scripts which Zaloha prepares in its Metadata directory
(see below).
Zaloha contains an optional feature to detect multiply linked (hardlinked) files
in <sourceDir>. If this feature is switched on (via the "--detectHLinksS"
option), Zaloha internally flags the second, third, etc. links to same file as
"hardlinks", and synchronizes to <backupDir> only the first link (the "file").
The "hardlinks" are not synchronized to <backupDir>, but Zaloha prepares a
restore script for them (file 830). If this feature is switched off
(no "--detectHLinksS" option), then each link to a multiply linked file is
treated as a separate regular file.
The detection of hardlinks brings two risks: Zaloha might not detect that a file
is in fact a hardlink, or Zaloha might falsely detect a hardlink while the file
is in fact a unique file. The second risk is more severe, because the contents
of the unique file will not be synchronized to <backupDir> in such case.
For that reason, Zaloha contains additional checks against falsely detected
hardlinks (see code of AWKHLINKS). Generally, use this feature only after proper
testing on your filesystems. Be cautious as inode-related issues exist on some
filesystems and network-mounted filesystems.
Symbolic links in <sourceDir>: There are two dimensions: The first dimension is
whether to follow them or not (the "--followSLinksS" option). If follow, then
the referenced files and directories are synchronized to <backupDir> and only
the broken symbolic links stay as symbolic links. If not follow, then all
symbolic links stay as symbolic links. See section Following Symbolic Links for
details. Now comes the second dimension: What to do with the symbolic links that
stay as symbolic links: They are always kept in the metadata and Zaloha prepares
a restore script for them (file 820). Additionally, if the option "--syncSLinks"
is given, Zaloha will indeed synchronize them to <backupDir> (action codes
SLINK.n or SLINK.u).
Zaloha does not synchronize other types of objects in <sourceDir> (named pipes,
sockets, special devices, etc). These objects are considered to be part of the
operating system or parts of applications, and dedicated scripts for their
(re-)creation should exist.
It was a conscious decision for a default behaviour to synchronize to
<backupDir> only files and directories and keep other objects in metadata only.
This gives more freedom in the choice of filesystem type for <backupDir>,
because every filesystem type is able to store files and directories,
but not necessarily the other objects.
Exec3:
------
This step is optional and can be activated via the "--revNew" and "--revUp"
options.
Why is this feature useful? Imagine you use a Windows notebook while working in
the field. At home, you have got a Linux server to that you regularly
synchronize your data. However, sometimes you work directly on the Linux server.
That work should be "reverse-synchronized" from the Linux server (<backupDir>)
back to the Windows notebook (<sourceDir>) (of course, assumed that there is no
conflict between the work on the notebook and the work on the server).
REV.NEW: If standalone files in <backupDir> are newer than the last run of
Zaloha, and the "--revNew" option is given, then Zaloha reverse-copies that
files to <sourceDir> (action code REV.NEW). This might require creation of the
eventually missing but needed structure of parent directories (REV.MKDI).
REV.UP: If files exist under same paths in both <sourceDir> and <backupDir>,
and the files in <backupDir> are newer, and the "--revUp" option is given,
then Zaloha uses that files to reverse-update the older files in <sourceDir>
(action code REV.UP).
Optionally, to preserve attributes during the REV.MKDI, REV.NEW and REV.UP
actions: use options "--pRevUser", "--pRevGroup" and "--pRevMode".
If reverse-synchronization is not active: If no "--revNew" option is given,
then each standalone file in <backupDir> is considered obsolete (and removed,
unless the "--noRemove" option is given). If no "--revUp" option is given, then
files in <sourceDir> always update files in <backupDir> if their sizes and/or
modification times differ.
Please note that the reverse-synchronization is NOT a full bi-directional
synchronization where <sourceDir> and <backupDir> would be equivalent.
Especially, there is no REV.REMOVE action. It was a conscious decision to not
implement it, as any removals from <sourceDir> would introduce not acceptable
risks.
Reverse-synchronization to <sourceDir> increases the overall complexity of the
solution. Use it only in the interactive regime of Zaloha, where human oversight
and confirmation of the prepared actions are in place.
Do not use it in automatic operations.
Exec4:
------
Zaloha removes all remaining obsolete files and directories from <backupDir>.
This function can be switched off via the "--noRemove" option.
Why are removals from <backupDir> split into two steps (Exec1 and Exec4)?
The unavoidable removals must unconditionally occur first, also in Exec1 step.
But what about the remaining (avoidable) removals: Imagine a scenario when a
directory is renamed in <sourceDir>: If all removals were executed in Exec1,
then <backupDir> would transition through a state (namely between Exec1 and
Exec2) where the backup copy of the directory is already removed (under the old
name), but not yet created (under the new name). To minimize the chance for such
transient states to occur, the avoidable removals are postponed to Exec4.
Advise to this topic: In case of bigger reorganizations of <sourceDir>, also
e.g. in case when a directory with large contents is renamed, it is much better
to prepare a rename script (more generally speaking: a migration script) and
apply it to both <sourceDir> and <backupDir>, instead of letting Zaloha perform
massive copying followed by massive removing.
Exec5:
------
Zaloha updates files in <backupDir> for which the optional comparisons of their
contents revealed that they are in fact not identical (despite appearing
identical by looking at their file sizes and modification times).
The action codes are UPDATE.b and unl.UP.b (the latter is update with prior
unlinking of multiply linked target file, as described under Exec2).
Please note that these actions might indicate deeper problems like storage
corruption (or even a cyber security issue), and should be actually perceived
as surprises.
This step is optional and can be activated via the "--byteByByte" or "--sha256"
options.
Metadata directory of Zaloha
----------------------------
Zaloha creates a Metadata directory: <backupDir>/.Zaloha_metadata. Its location
can be changed via the "--metaDir" option.
The purposes of the individual files are described as comments in program code.
Briefly, they are:
* AWK program files (produced from "here documents" in Zaloha)
* Shellscripts to run FIND commands
* CSV metadata files
* Exec1/2/3/4/5 shellscripts
* Shellscripts for the case of restore
* Touchfile 999 marking execution of actions
Files persist in the Metadata directory until the next invocation of Zaloha.
To obtain information about what Zaloha did (counts of removed/copied files,
total counts, etc), do not parse the screen output: Query the CSV metadata files
instead. Query the CSV metadata files after AWKCLEANER. Do not query the raw
CSV outputs of the FIND commands (before AWKCLEANER) and the produced
shellscripts, because due to eventual newlines in filenames, they may contain
multiple lines per "record".
In some situations, the existence of the Zaloha metadata directory is unwanted
after Zaloha finishes. In such cases, put a command to remove it to the wrapper
script that invokes Zaloha. At the same time, use the option "--noLastRun" to
prevent Zaloha from running FIND on file 999 in the Zaloha metadata directory
to obtain the time of the last run of Zaloha.
Please note that by not keeping the Zaloha metadata directory, you sacrifice
some functionality (see "--noLastRun" option below), and you loose the CSV
metadata for an eventual analysis of problems and you loose the shellscripts
for the case of restore (especially the scripts to restore the symbolic links
and hardlinks (which are eventually kept in metadata only)).
Temporary Metadata directory of Zaloha
--------------------------------------
In the Remote Source Mode, Zaloha needs a temporary Metadata directory on the
remote source host for copying scripts to there, executing them and obtaining
the CSV file from the FIND scan of <sourceDir> from there.
In the Remote Backup Mode, Zaloha performs its main metadata processing in a
temporary Metadata directory on the local (= source) host and then copies only
select metadata files to the Metadata directory on the remote (= backup) host.
The default location of the temporary Metadata directory is
<sourceDir>/.Zaloha_metadata_temp and can be changed via the "--metaDirTemp"
option.
Shellscripts for case of restore
--------------------------------
Zaloha prepares shellscripts for the case of restore in its Metadata directory
(scripts 800 through 870). Each type of operation is contained in a separate
shellscript, to give maximum freedom (= for each script, decide whether to apply
or to not apply). Further, each shellscript has a header part where
key variables for whole script are defined (and can be adjusted as needed).
The production of the shellscripts for the case of restore may cause increased
processing time and/or storage space consumption. It can be switched off by the
"--noRestore" option.
In case of need, the shellscripts for the case of restore can also be prepared
manually by running the AWK program 700 on the CSV metadata file 505:
awk -f "<AWK program 700>" \
-v backupDir="<backupDir>" \
-v restoreDir="<restoreDir>" \
-v remoteBackup=<0 or 1> \
-v backupUserHost="<backupUserHost>" \
-v remoteRestore=<0 or 1> \
-v restoreUserHost="<restoreUserHost>" \
-v scpExecOpt="<scpExecOpt>" \
-v cpRestoreOpt="<cpRestoreOpt>" \
-v f800="<script 800 to be created>" \
-v f810="<script 810 to be created>" \
-v f820="<script 820 to be created>" \
-v f830="<script 830 to be created>" \
-v f840="<script 840 to be created>" \
-v f850="<script 850 to be created>" \
-v f860="<script 860 to be created>" \
-v f870="<script 870 to be created>" \
-v noR800Hdr=<0 or 1> \
-v noR810Hdr=<0 or 1> \
-v noR820Hdr=<0 or 1> \
-v noR830Hdr=<0 or 1> \
-v noR840Hdr=<0 or 1> \
-v noR850Hdr=<0 or 1> \
-v noR860Hdr=<0 or 1> \
-v noR870Hdr=<0 or 1> \
"<CSV metadata file 505>"
Note 1: All filenames/paths should begin with a "/" (if absolute) or with a "./"
(if relative), and <snapDir> and <restoreDir> must end with a terminating "/".
Note 2: If any of the filenames/paths passed into AWK as variables (<snapDir>,
<restoreDir> and the <scripts 8xx to be created>) contain backslashes as "weird
characters", replace them by ///b. The AWK program 700 will replace ///b back
to backslashes inside.
###########################################################
INVOCATION
Zaloha2.sh --sourceDir=<sourceDir> --backupDir=<backupDir> [ other options ... ]
--sourceDir=<sourceDir> is mandatory. <sourceDir> must exist, otherwise Zaloha
throws an error (except when the "--noDirChecks" option is given).
In Remote Source mode, this is the source directory on the remote source
host. If <sourceDir> is relative, then it is relative to the SSH login
directory of the user on the remote source host.
--backupDir=<backupDir> is mandatory. <backupDir> must exist, otherwise Zaloha
throws an error (except when the "--noDirChecks" option is given).
In Remote Backup mode, this is the backup directory on the remote backup
host. If <backupDir> is relative, then it is relative to the SSH login
directory of the user on the remote backup host.
--sourceUserHost=<sourceUserHost> indicates that <sourceDir> resides on a remote
source host to be reached via SSH/SCP. Format: user@host
--backupUserHost=<backupUserHost> indicates that <backupDir> resides on a remote
backup host to be reached via SSH/SCP. Format: user@host
--sshOptions=<sshOptions> are additional command-line options for the
SSH commands, separated by spaces. Typical usage is explained in section
Advanced Use of Zaloha - Remote Source and Remote Backup Modes.
--scpOptions=<scpOptions> are additional command-line options for the
SCP commands, separated by spaces. Typical usage is explained in section
Advanced Use of Zaloha - Remote Source and Remote Backup Modes.
--scpExecOpt=<scpExecOpt> can be used to override <scpOptions> specially for
the SCP commands used during the execution phase.
--findSourceOps=<findSourceOps> are additional operands for the FIND command
that scans <sourceDir>, to be used to exclude files or subdirectories in
<sourceDir> from synchronization to <backupDir>. This is a complex topic,
described in full detail in section FIND operands to control FIND commands
invoked by Zaloha.
The "--findSourceOps" option can be passed in several times. In such case
the final <findSourceOps> will be the concatenation of the several
individual <findSourceOps> passed in with the options.
--findGeneralOps=<findGeneralOps> are additional operands for the FIND commands
that scan both <sourceDir> and <backupDir>, to be used to exclude "Trash"
subdirectories, independently on where they exist, from Zaloha's scope.
This is a complex topic, described in full detail in section FIND operands
to control FIND commands invoked by Zaloha.
The "--findGeneralOps" option can be passed in several times. In such case
the final <findGeneralOps> will be the concatenation of the several
individual <findGeneralOps> passed in with the options.
--findParallel ... in the Remote Source and Remote Backup Modes, run the FIND
scans of <sourceDir> and <backupDir> in parallel. As the FIND scans run on
different hosts in the remote modes, this will save time.
--noExec ... needed if Zaloha is invoked automatically: do not ask,
do not execute the actions, but still prepare the scripts. The prepared
scripts then will not contain shell tracing and the "set -e" instruction.
This means that the scripts will ignore individual failed commands and try
to do as much work as possible, which is a behavior different from the
interactive regime, where scripts are traced and halt on the first error.
--noRemove ... do not remove files, directories and symbolic links that
are standalone in <backupDir>. This option is useful when <backupDir> should
hold "current" plus "historical" data whereas <sourceDir> holds only
"current" data.
Please keep in mind that if objects of conflicting types in <backupDir>
prevent synchronization (e.g. a file cannot overwrite a directory),
removals are unavoidable and will be prepared regardless of this option.
In such case Zaloha displays a warning message in the interactive regime.
In automatic operations, the calling process should query the CSV metadata
file 510 to detect this case.
--revNew ... enable REV.NEW (= if standalone file in <backupDir> is
newer than the last run of Zaloha, reverse-copy it
to <sourceDir>)
--revUp ... enable REV.UP (= if file in <backupDir> is newer than
file in <sourceDir>, reverse-update the file in <sourceDir>)
--detectHLinksS ... perform hardlink detection (inode-deduplication)
on <sourceDir>
--ok2s ... tolerate +/- 2 seconds differences due to FAT rounding of
modification times to nearest 2 seconds (special case
[SCC_FAT_01] explained in Special Cases section below).
This option is necessary only if Zaloha is unable to
determine the FAT file system from the FIND output
(column 6).
--ok3600s ... additional tolerable offset of modification time differences
of exactly +/- 3600 seconds (special case [SCC_FAT_01]
explained in Special Cases section below)
--byteByByte ... compare "byte by byte" files that appear identical (more
precisely, files for which either "no action" (OK) or just
"update of attributes" (ATTR) has been prepared).
(Explained in the Advanced Use of Zaloha section below).
This comparison might dramatically slow down Zaloha.
If additional updates of files result from this comparison,
they will be executed in step Exec5. This option is
available only in the Local Mode.
--sha256 ... compare contents of files via SHA-256 hashes. There is an
almost 100% security that files are identical if they have
equal sizes and SHA-256 hashes. Calculation of the hashes
might dramatically slow down Zaloha. If additional updates
of files result from this comparison, they will be executed
in step Exec5. Moreover, if files have equal sizes and
SHA-256 hashes but different modification times, copying of
such files will be prevented and only the modification times
will be aligned (ATTR:T). This option is available in all
three modes (Local, Remote Source and Remote Backup).
--noUnlink ... never unlink multiply linked files in <backupDir> before
writing to them
--extraTouch ... use cp + touch -m instead of cp --preserve=timestamps
(special case [SCC_OTHER_01] explained in Special Cases
section below). This has also a subtle impact on access
times (atime): cp --preserve=timestamps obtains mtime and
atime from the source file (before it reads it and changes
its atime) and applies the obtained mtime and atime to the
target file. On the contrary, cp keeps atime of the target
file intact and touch -m just sets the correct mtime on the
target file.
--cpOptions=<cpOptions> can be used to override the default command-line options
for the CP commands used in the Local Mode (which are
"--preserve=timestamps" (or none if option "--extraTouch"
is given)).
This option can be used if the CP command needs a different
option(s) to preserve timestamps during copying, or e.g. to
instruct CP to preserve extended attributes during copying
as well, or the like:
--cpOptions='--preserve=timestamps,xattr'
--cpRestoreOpt=<cpRestoreOpt> can be used to override <cpOptions> specially for
the CP commands used in the restore scripts.
--pUser ... preserve user ownerships, group ownerships and/or modes
--pGroup (permission bits) during MKDIR, NEW, UPDATE and unl.UP
--pMode actions. Additionally, if these attributes differ on files
and directories for which no action is prepared, synchronize
the differing attributes (action codes ATTR:ugm).
The options "--pUser" and "--pGroup" also apply to symbolic
links if their synchronization is active ("--syncSLinks").
--pRevUser ... preserve user ownerships, group ownerships and/or modes
--pRevGroup (permission bits) during REV.MKDI, REV.NEW and REV.UP
--pRevMode actions
--followSLinksS ... follow symbolic links on <sourceDir>
--followSLinksB ... follow symbolic links on <backupDir>
Please see section Following Symbolic Links for details.
--syncSLinks ... synchronize symbolic links from <sourceDir> to <backupDir>
--noWarnSLinks ... suppress warnings related to symbolic links
--noRestore ... do not prepare scripts for the case of restore (= saves
processing time and disk space, see optimization note below). The scripts
for the case of restore can still be produced ex-post by manually running
the respective AWK program (700 file) on the source CSV file (505 file).
--optimCSV ... optimize space occupied by CSV metadata files by removing
intermediary CSV files after use (see optimization note below).
If intermediary CSV metadata files are removed, an ex-post analysis of
eventual problems may be impossible.
--metaDir=<metaDir> allows to place the Zaloha metadata directory to a different
location than the default (which is <backupDir>/.Zaloha_metadata).
The reasons for using this option might be:
a) non-writable <backupDir> (if Zaloha is used to perform comparison only
(i.e. with "--noExec" option))
b) a requirement to have Zaloha metadata on a separate storage
c) Zaloha is operated in the Local Mode, but <backupDir> is not available
locally (which means that the technical integration options described
under the section Advanced Use of Zaloha are utilized). In that case
it is necessary to place the Metadata directory to a location
accessible to Zaloha.
If <metaDir> is placed to a different location inside of <backupDir>, or
inside of <sourceDir> (in Local Mode), then it is necessary to explicitly
pass a FIND expression to exclude the Metadata directory from the respective
FIND scan via <findGeneralOps>.
If Zaloha is used for multiple synchronizations, then each such instance
of Zaloha must have its own separate Metadata directory.
In Remote Backup Mode, if <metaDir> is relative, then it is relative to the
SSH login directory of the user on the remote backup host.
--metaDirTemp=<metaDirTemp> may be used only in the Remote Source or Remote
Backup Modes, where Zaloha needs a temporary Metadata directory too. This
option allows to place it to a different location than the default
(which is <sourceDir>/.Zaloha_metadata_temp).
If <metaDirTemp> is placed to a different location inside of <sourceDir>,
then it is necessary to explicitly pass a FIND expression to exclude it
from the respective FIND scan via <findGeneralOps>.
If Zaloha is used for multiple synchronizations in the Remote Source or
Remote Backup Modes, then each such instance of Zaloha must have its own
separate temporary Metadata directory.
In Remote Source Mode, if <metaDirTemp> is relative, then it is relative to
the SSH login directory of the user on the remote source host.
--noDirChecks ... switch off the checks for existence of <sourceDir> and
<backupDir>. (Explained in the Advanced Use of Zaloha section below).
--noLastRun ... do not obtain time of the last run of Zaloha by running
FIND on file 999 in Zaloha metadata directory.
This makes Zaloha state-less, which might be a desired
property in certain situations, e.g. if you do not want to
keep the Zaloha metadata directory. However, this sacrifices
features based on the last run of Zaloha: REV.NEW and
distinction of actions on files newer than the last run
of Zaloha (e.g. distinction between UPDATE.! and UPDATE).
--noIdentCheck ... do not check if objects on identical paths in <sourceDir>
and <backupDir> are identical (= identical inodes). This
check brings to attention cases where objects in <sourceDir>
and corresponding objects in <backupDir> are in reality
the same objects (possibly via hardlinks), which violates
the logic of backup. Switching off this check might be
necessary in some special uses of Zaloha.
--noFindSource ... do not run FIND (script 210) to scan <sourceDir>
and use externally supplied CSV metadata file 310 instead
--noFindBackup ... do not run FIND (script 220) to scan <backupDir>
and use externally supplied CSV metadata file 320 instead
(Explained in the Advanced Use of Zaloha section below).
--no610Hdr ... do not write header to the shellscript 610 for Exec1
--no621Hdr ... do not write header to the shellscript 621 for Exec2
--no622Hdr ... do not write header to the shellscript 622 for Exec2
--no623Hdr ... do not write header to the shellscript 623 for Exec2
--no631Hdr ... do not write header to the shellscript 631 for Exec3
--no632Hdr ... do not write header to the shellscript 632 for Exec3
--no633Hdr ... do not write header to the shellscript 633 for Exec3
--no640Hdr ... do not write header to the shellscript 640 for Exec4
--no651Hdr ... do not write header to the shellscript 651 for Exec5
--no652Hdr ... do not write header to the shellscript 652 for Exec5
--no653Hdr ... do not write header to the shellscript 653 for Exec5
These options can be used only together with the "--noExec" option.
(Explained in the Advanced Use of Zaloha section below).
--noR800Hdr ... do not write header to the restore script 800
--noR810Hdr ... do not write header to the restore script 810
--noR820Hdr ... do not write header to the restore script 820
--noR830Hdr ... do not write header to the restore script 830
--noR840Hdr ... do not write header to the restore script 840
--noR850Hdr ... do not write header to the restore script 850
--noR860Hdr ... do not write header to the restore script 860
--noR870Hdr ... do not write header to the restore script 870
(Explained in the Advanced Use of Zaloha section below).
--noProgress ... suppress progress messages during the analysis phase (less
screen output). If "--noProgress" is used together with
"--noExec", Zaloha does not produce any output on stdout
(traditional behavior of Unics tools).
--color ... use color highlighting (can be used on terminals which
support ANSI escape codes)
--mawk ... use mawk, the very fast AWK implementation based on a
bytecode interpreter. Without this option, awk is used,
which usually maps to GNU awk (but not always).
(Note: If you know that awk on your system maps to mawk,
use this option to make the mawk usage explicit, as this
option also turns off mawk's i/o buffering on places where
progress of commands is displayed, i.e. on places where
i/o buffering causes confusion and is unwanted).
--lTest ... (do not use in real operations) support for lint-testing
of AWK programs
--help ... show Zaloha documentation (using the LESS program) and exit
Optimization note: If Zaloha operates on directories with huge numbers of files,
especially small ones, then the size of metadata plus the size of scripts for
the case of restore may exceed the size of the files themselves. If this leads
to problems, use options "--noRestore" and "--optimCSV".
Zaloha must be run by a user with sufficient privileges to read <sourceDir> and
to write and perform other required actions on <backupDir>. In case of the REV
actions, privileges to write and perform other required actions on <sourceDir>
are required as well. Zaloha does not contain any internal checks as to whether
privileges are sufficient. Failures of commands run by Zaloha must be monitored
instead.
Zaloha does not contain protection against concurrent invocations with
conflicting <backupDir> (and for REV also conflicting <sourceDir>): this is
responsibility of the invoker, especially due to the fact that Zaloha may
conflict with other processes as well.
In case of failure: resolve the problem and re-run Zaloha with same parameters.
In the second run, Zaloha should not repeat the actions completed by the first
run: it should continue from the action on which the first run failed. If the
first run completed successfully, no actions should be performed in the second
run (this is an important test case, see below).
Typically, Zaloha is invoked from a wrapper script that does the necessary
directory mounts, then runs Zaloha with the required parameters, then directory
unmounts.
###########################################################
FIND OPERANDS TO CONTROL FIND COMMANDS INVOKED BY ZALOHA
Zaloha obtains information about the files and directories via the FIND command.
Ad FIND command itself: It must support the -printf operand, as this allows to
obtain all needed information from a directory in one scan (= one process),
which is efficient. GNU find supports the -printf operand, but some older
FIND implementations don't, so they cannot be used with Zaloha.
The FIND scans of <sourceDir> and <backupDir> can be controlled by two options:
Option "--findSourceOps" are additional operands for the FIND command that scans
<sourceDir> only, and the option "--findGeneralOps" are additional operands
for both FIND commands (scans of both <sourceDir> and <backupDir>).
Both options "--findSourceOps" and "--findGeneralOps" can be passed in several
times. This allows to construct the final <findSourceOps> and <findGeneralOps>
in Zaloha part-wise, e.g. expression by expression.
Difference between <findSourceOps> and <findGeneralOps>
-------------------------------------------------------
<findSourceOps> applies only to <sourceDir>. If files in <sourceDir> are
excluded by <findSourceOps> and files exist in <backupDir> under same paths,
then Zaloha evaluates the files in <backupDir> as obsolete (= removes them,
unless the "--noRemove" option is given, or eventually even attempts to
reverse-synchronize them (which leads to corner case [SCC_FIND_01]
(see the Corner Cases section))).
On the contrary, the files excluded by <findGeneralOps> are not visible to
Zaloha at all, neither in <sourceDir> nor in <backupDir>, so Zaloha will not
act on them.
The main use of <findSourceOps> is to exclude files or subdirectories in
<sourceDir> from synchronization to <backupDir>.
The main use of <findGeneralOps> is to exclude "Trash" subdirectories,
independently on where they exist, from Zaloha's scope.
Rules and limitations
---------------------
Both <findSourceOps> and <findGeneralOps> must consist of one or more
FIND expressions in the form of an OR-connected chain:
expressionA -o expressionB -o ... expressionN -o
Adherence to this convention assures that Zaloha is able to correctly combine
<findSourceOps> with <findGeneralOps> and with own FIND expressions.
The OR-connected chain works so that if an earlier expression in the chain
evaluates TRUE, FIND does not evaluate following expressions, i.e. will not
evaluate the final -printf operand, so no output will be produced. In other
words, matching by any of the expressions leads to exclusion.
Further, the internal logic of Zaloha imposes the following limitations:
* Exclusion of files by the "--findSourceOps" option: No limitations exist
here, all expressions supported by FIND can be used (but make sure the
exclusion applies only to files). Example: exclude all files smaller than
1000 bytes:
--findSourceOps='( -type f -a -size -1000c ) -o'
* Exclusion of subdirectories by the "--findSourceOps" option: One limitation
must be obeyed: If a subdirectory is excluded, all its contents must be
excluded too. Why? If Zaloha sees the contents but not the subdirectory
itself, it will prepare commands to create the contents of the subdirectory,
but they will fail as the command to create the subdirectory itself (mkdir)
will not be prepared. Example: exclude all subdirectories owned by user fred
and all their contents:
--findSourceOps='( -type d -a -user fred ) -prune -o'
The -prune operand instructs FIND to not descend into directories matched
by the preceding expression.
* Exclusion of files by the "--findGeneralOps" option: As <findGeneralOps>
applies to both <sourceDir> and <backupDir>, and the objects in both
directories are "matched" by file's paths, only expressions with -path or
-name operands make sense. Why? If objects exist under same paths in both
directories, Zaloha should either see both of them or none of them.
Both -path and -name expressions assure this, but not necessarily the
expressions based on other operands like -size, -user and so on.
Example: exclude core dumps (files named core) wherever they exist:
--findGeneralOps='( -type f -a -name core ) -o'
Note 1: GNU find supports the -ipath and -iname operands for case-insensitive
matching of paths and names. They fulfill the above described "both or none"
criterion as well and hence are allowed too. The same holds for the -regex
and -iregex operands supported by GNU find, as they act on paths as well.
Note 2: As <findGeneralOps> act on both <sourceDir> and <backupDir> and the
paths differ in the start point directories, the placeholder ///d/ must be
used in the involved path patterns. This is described further below.
* Exclusion of subdirectories by the "--findGeneralOps" option: Both above
described limitations must be obeyed: Only expressions with -path or -name
operands are allowed, and if subdirectories are excluded, all their contents
must be excluded too. Notes 1 and 2 from previous bullet hold too.
Example: exclude subdirectories lost+found wherever they exist:
--findGeneralOps='( -type d -a -name lost+found ) -prune -o'
If you do not care if an object is a file or a directory, you can abbreviate:
--findGeneralOps='-name unwanted_name -prune -o'
--findGeneralOps='-path unwanted_path -prune -o'
*** CAUTION <findSourceOps> AND <findGeneralOps>: Zaloha does not validate if
the described rules and limitations are indeed obeyed. Wrong <findSourceOps>
and/or <findGeneralOps> can break Zaloha. On the other hand, an eventual
advanced use by knowledgeable users is not prevented. Some <findSourceOps>
and/or <findGeneralOps> errors might be detected in the directories hierarchy
check in AWKCHECKER.
Troubleshooting
---------------
If FIND operands do not work as expected, debug them using FIND alone.
Let's assume, that this does not work as expected:
--findSourceOps='( -type f -a -name *.tmp ) -o'
The FIND command to debug this is:
find <sourceDir> '(' -type f -a -name '*.tmp' ')' -o -printf 'path: %P\n'
Beware of interpretation by your shell
--------------------------------------
Your shell might interpret certain special characters contained on the command
line. Should these characters be passed to the called program (= Zaloha)
uninterpreted, they must be quoted or escaped.
The BASH shell does not interpret any characters in strings quoted by single
quotes. In strings quoted by double-quotes, the situation is more complex.
Please see the respective shell documentation for more details.
Parsing of FIND operands by Zaloha
----------------------------------
<findSourceOps> and <findGeneralOps> are passed into Zaloha as single strings.
Zaloha has to split these strings into individual operands (words) and pass them
to FIND, each operand as a separate command line argument. Zaloha has a special
parser (AWKPARSER) to do this.
The trivial case is when each (space-delimited) word is a separate FIND operand.
However, if a FIND operand contains spaces, it must be enclosed in double-quotes
(") to be treated as one operand. Moreover, if a FIND operand contains
double-quotes themselves, then it too must be enclosed in double-quotes (")
and the original double-quotes must be escaped by second double-quotes ("").
Examples (for BASH for both single-quoted and double-quoted strings):
* exclude all objects named Windows Security
* exclude all objects named My "Secret" Things
--findSourceOps='-name "Windows Security" -prune -o'
--findSourceOps='-name "My ""Secret"" Things" -prune -o'
--findSourceOps="-name \"Windows Security\" -prune -o"
--findSourceOps="-name \"My \"\"Secret\"\" Things\" -prune -o"
Interpretation of special characters by FIND itself
---------------------------------------------------
In the patterns of the -path and -name expressions, FIND itself interprets
following characters specially (see FIND documentation): *, ?, [, ], \.
If these characters are to be taken literally, they must be handed over to
FIND backslash-escaped.
Examples (for BASH for both single-quoted and double-quoted strings):
* exclude all objects whose names begin with abcd (i.e. FIND pattern abcd*)
* exclude all objects named exactly mnop* (literally including the asterisk)
--findSourceOps='-name abcd* -prune -o'
--findSourceOps='-name mnop\* -prune -o'
--findSourceOps="-name abcd* -prune -o"
--findSourceOps="-name mnop\\* -prune -o"
The placeholder ///d/ for the start point directories
-----------------------------------------------------
If expressions with the "-path" operand are used in <findSourceOps>, the
placeholder ///d/ should be used in place of <sourceDir>/ in their path
patterns.
If expressions with the "-path" operand are used in <findGeneralOps>, the
placeholder ///d/ must (not should) be used in place of <sourceDir>/ and
<backupDir>/ in their path patterns, unless, perhaps, the <sourceDir> and
<backupDir> parts of the paths are matched by a FIND wildcard.
Zaloha will replace ///d/ by the start point directory that is passed to FIND
in the given scan, with eventual FIND pattern special characters properly
escaped (which relieves you from doing the same by yourself).
Example: exclude <sourceDir>/.git
--findSourceOps="-path ///d/.git -prune -o"
Internally defined default for <findGeneralOps>
-----------------------------------------------
<findGeneralOps> has an internally defined default, used to exclude:
<sourceDir or backupDir>/$RECYCLE.BIN
... Windows Recycle Bin (assumed to exist directly under <sourceDir> or
<backupDir>)
<sourceDir or backupDir>/.Trash_<number>*
... Linux Trash (assumed to exist directly under <sourceDir> or
<backupDir>)
<sourceDir or backupDir>/lost+found
... Linux lost + found filesystem fragments (assumed to exist directly
under <sourceDir> or <backupDir>)
To replace this internal default with own <findGeneralOps>:
--findGeneralOps=<your replacement>
To switch off this internal default:
--findGeneralOps=
To extend (= combine, not replace) the internal default by own extension (note
the plus (+) sign):
--findGeneralOps=+<your extension>
If several "--findGeneralOps" options are passed in, the plus (+) sign mentioned
above should be passed in only with the first instance, not with the second,
third (and so on) instances.
Known traps and problems
------------------------
Beware of matching the start point directories <sourceDir> or <backupDir>
themselves by the expressions and patterns.
In some FIND versions, the name patterns starting with the asterisk (*)
wildcard do not match objects whose names start with a dot (.).