Skip to content

Commit

Permalink
Handle symlinks (#4)
Browse files Browse the repository at this point in the history
  • Loading branch information
jotaen authored Jan 28, 2024
1 parent 05d5848 commit 86af5a6
Show file tree
Hide file tree
Showing 9 changed files with 194 additions and 109 deletions.
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "snapdiff"
version = "0.0.1"
version = "0.0.2"
edition = "2021"
rust-version = "1.70.0"

Expand Down
43 changes: 6 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# snapdiff

snapdiff can compare two snapshots of a directory tree, which have been captured at different points in time.
snapdiff compares two snapshots of a directory tree, captured at different points in time.
(Think of a “snapshot” as a backup of the original directory tree, in the sense of a full copy.)
It diffs the two snapshots, and summarizes how many files are identical, and how many have been moved, modified, added, or deleted.
That way, you get a high-level insight into how the data evolved between both snapshots.
That way, it gives a high-level insight into how the directory tree has evolved over time.

Learn more in [this blog post](https://www.jotaen.net/iE3XC).

Expand Down Expand Up @@ -34,6 +33,8 @@ The categories are defined as:
- **Deleted**: the first snapshot contains a file whose path or contents is not present in the second snapshot.
- **Modified**: both snapshots contain a file at the same path, but with different contents.

Note: the files count doesn’t include folders.

## Usage

```
Expand All @@ -46,45 +47,13 @@ snapdiff
SNAP1 SNAP2
```

`SNAP1` and `SNAP2` are in “chronological” order, so snapshot 1 is assumed to precede snapshot 2.

See also `snapdiff --help` for info.

### `--report PATH`

Example: `--report ./my-report.txt`

Print a detailed report to a file.

The file will be newly created, so it fails if a file already exists at the target path.

### `--include-dot-paths`

Include files and folders whose name start with a dot (`.`), instead of ignoring them (which is the default).

### `--include-symlinks`

Resolve symlinks, instead of ignoring them (which is the default).

### `--workers N`

Example: `--workers 4` or `--workers 1:8`

The number of workers (CPU cores) to utilize.

`0` means that it detects the number of available CPU cores automatically (which is the default).

You can specify two different values, separated by a colon (`:`), to differentiate between the first and the second snapshot.

### `--no-color`

Print output in plain text, without colouring.
Run `snapdiff --help` for all details.

## Build from Sources

Prerequisites: Rust toolchain (see [`Cargo.toml`](./Cargo.toml) for required version).

Compile via `cargo build`. (Produces binary to `target/debug/snapdiff`.)
Compile via `cargo build --release`. (Produces binary to `target/release/snapdiff`.)

## About

Expand Down
50 changes: 32 additions & 18 deletions demo/basic.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ rnd() {
if [[ -z "${size}" ]]; then
size="$(( ( RANDOM % 5000 ) + 100 ))"
fi
openssl rand -base64 "${size}" > "${path}"
openssl rand "${size}" > "${path}"
}

rm -rf "${DIR_1}" "${DIR_2}"
Expand All @@ -19,28 +19,42 @@ mkdir "${DIR_1}"
pushd "${DIR_1}" > /dev/null
{
mkdir folder
rnd folder/asd.id.txt
rnd folder/uiu.md.txt 2000
rnd bbb.mv.txt
rnd ghq.dl.txt
rnd foo.id.txt
rnd tst.md.txt 500
rnd wop.mv.txt
rnd xyz.id.txt
rnd za1.dl.txt
rnd identical1.txt 4716
rnd identical2.txt 8712
cp identical2.txt identical222.duplicate.txt
touch identical3.empty.txt
rnd .identical4.dot 57612
rnd folder/identical5.txt 911
touch folder/identical6.empty.txt
rnd moved1.txt 474
rnd moved2.txt 60091
rnd deleted1.txt 38620
rnd deleted2.txt 1098
rnd modified1.more.txt 541
rnd folder/modified2.less.txt 2762
ln -s identical1.txt link1.txt
ln -s identical2.txt link2.txt
}
popd > /dev/null

cp -R "${DIR_1}" "${DIR_2}"
pushd "${DIR_2}" > /dev/null
{
mv bbb.mv.txt xxx.mv.txt
mv wop.mv.txt folder/wop.mv.txt
rm ghq.dl.txt
rm za1.dl.txt
rnd tst.md.txt 600
rnd folder/uiu.md.txt 700
rnd pqr.ad.txt
rnd 123.ad.txt
# Move files:
mv moved1.txt moved111.txt
mv moved2.txt folder/moved222.mv.txt
# Delete files:
rm deleted1.txt
rm deleted2.txt
# Modify files:
rnd modified1.more.txt 90031
rnd folder/modified2.less.txt 327
# Add files:
rnd added1.txt
rnd added2.txt
# (Add) Duplicate files:
cp identical1.txt identical111.duplicate.txt
# Relink files:
rm link2.txt && ln -s identical3.empty.txt link2.txt
}
popd > /dev/null
2 changes: 1 addition & 1 deletion run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ run::cli() {

# Compile project
run::build() {
cargo build
cargo build --release
}

# Run one of the demo folders
Expand Down
49 changes: 43 additions & 6 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -32,42 +32,79 @@ impl CtrlCSignal {
}
}

/// snapdiff compares two snapshots of a directory tree, captured at different points
/// in time. That way, it gives a high-level insight into how the directory tree has
/// evolved over time. It summarises the difference between both snapshots based on
/// the following categories:
/// - Identical: both snapshots contain a file at the same path with the same contents.
/// - Moved: both snapshots contain a file with the same contents, but at different
/// paths.
/// - Added: the second snapshot contains a file whose path or contents is not
/// present in the first snapshot.
/// - Deleted: the first snapshot contains a file whose path or contents is not
/// present in the second snapshot.
/// - Modified: both snapshots contain a file at the same path, but with different
/// contents.
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
#[command(author, version, about, long_about = None, verbatim_doc_comment)]
struct Args {
/// Path to the first snapshot (the older one).
#[arg(verbatim_doc_comment)]
snap1_path: String,

/// Path to the second snapshot (the more recent one).
#[arg(verbatim_doc_comment)]
snap2_path: String,

#[arg(long = "report", short = 'r', help = "Print a detailed report to file")]
/// Print a detailed report to a file. The report lists
/// all captured file names (one per line, for all but
/// identical files).
#[arg(long = "report", short = 'r', verbatim_doc_comment)]
report_file: Option<String>,

/// Include files or folders whose name start with a dot,
/// instead of ignoring them (which is the default). For
/// dot-folders, it ignores the entire (sub-)directory
/// tree, with all files and folders it may contain.
#[arg(
long = "include-dot-paths",
short = 'd',
default_value_t = false,
help = "Ignore files or folders that start with a dot"
verbatim_doc_comment
)]
include_dot_paths: bool,

/// Include symlinks, instead of ignoring them (which is
/// the default). If symlinks are included, it counts one
/// file per symlink, without increasing the byte count.
/// If the symlink target had been changed between snapshots,
/// it counts the symlink file as modified.
#[arg(
long = "include-symlinks",
short = 's',
default_value_t = false,
help = "Ignore paths that are symlinks"
verbatim_doc_comment
)]
include_symlinks: bool,

/// Number of CPU cores to utilise. A value of `0` means
/// that all available cores are maxed out (which is the
/// default). The value can be distinguished for each
/// snapshot side via a colon, e.g. `1:4`.
#[arg(
long = "workers",
alias = "worker",
value_delimiter = ':',
help = "Number of CPU cores to utilise"
verbatim_doc_comment
)]
workers: Option<Vec<usize>>,

/// Disable output colouring.
#[arg(
long = "no-color",
alias = "no-colour",
default_value_t = false,
help = "Disable colouring of output"
verbatim_doc_comment
)]
no_color: bool,
}
Expand Down
45 changes: 35 additions & 10 deletions src/dir_iter.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use crate::error::Error;
use crate::file::SizeBytes;
use crate::filter::Filter;
use crate::filter::{Filter, MatchReason};
use crate::printer::TerminalPrinter;
use crate::progress::Progress;
use crate::snapper::{open_file, CHUNK_SIZE};
Expand All @@ -14,6 +14,7 @@ pub struct DirIterator {
pub root: path::PathBuf,
pub scheduled: Stats,
filters: Filter,
skipped: SkippedStats,
num_workers: usize,
}

Expand All @@ -33,6 +34,7 @@ impl DirIterator {
small_files: PathList::new(),
scheduled: Stats::new(),
filters,
skipped: SkippedStats::new(),
num_workers,
};
dir_it.scan_dir(root)?;
Expand All @@ -48,11 +50,7 @@ impl DirIterator {
Ordering::Equal
};
});
progress.scan_done(
dir_it.scheduled.count,
dir_it.filters.skipped_files,
dir_it.filters.skipped_folders,
);
progress.scan_done(dir_it.scheduled.count, dir_it.skipped);
return Ok(dir_it);
}

Expand All @@ -61,7 +59,7 @@ impl DirIterator {
return Err(Error::new(format!("not a directory: {}", path.display())));
}
let read_dir_result = fs::read_dir(path).map_err(|e| {
self.filters.track_skipped_file(1);
self.skipped.no_opener += 1;
return Error::from(
format!("cannot read directory: {}", path.display()),
e.to_string(),
Expand All @@ -79,8 +77,16 @@ impl DirIterator {
);
})
.map(|r| (r.path(), r.file_name()))?;
if self.filters.is_filtered(&p, &name) {
self.filters.track_skipped(&p);
let shall_skip = self
.filters
.matches(&p, &name)
.map(|r| match r {
MatchReason::IsSymlink => self.skipped.symlinks += 1,
MatchReason::IsDotPath => self.skipped.dot_paths += 1,
})
.map(|_| true)
.unwrap_or(false);
if shall_skip {
continue;
}
if p.is_dir() {
Expand All @@ -92,8 +98,10 @@ impl DirIterator {
self.push(p, m.len());
})
.unwrap_or_else(|_| {
self.filters.track_skipped_folder(1);
self.skipped.no_opener += 1;
});
} else if p.is_symlink() {
self.push(p, 0);
}
}
return Ok(());
Expand Down Expand Up @@ -141,3 +149,20 @@ impl PathList {
return Some(p.to_path_buf());
}
}

#[derive(Debug, Copy, Clone)]
pub struct SkippedStats {
pub dot_paths: u64,
pub symlinks: u64,
pub no_opener: u64,
}

impl SkippedStats {
pub fn new() -> SkippedStats {
return SkippedStats {
dot_paths: 0,
symlinks: 0,
no_opener: 0,
};
}
}
Loading

0 comments on commit 86af5a6

Please sign in to comment.