-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-create all nautilus ZIMs #999
Comments
As discussed live, I consider as well that we should indeed expand the Zip on the drive, reencode videos, create a JSON with all individual files URLs, and update the recipe. This is a task for a developer (me probably) since it is too cumbersome / error-prone to do by hand |
Indeed. FYI, sample reencode script that can be applied on drive root import argparse
import logging
import pathlib
import sys
import humanfriendly
from zimscraperlib.video.encoding import reencode
from zimscraperlib.video.presets import VideoWebmLow
logging.basicConfig(level=logging.DEBUG, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)
ROOT = pathlib.Path(__file__).parent
def disk_usage(folder):
return sum(file.stat().st_size for file in folder.glob("**/*"))
def hsize(size):
return humanfriendly.format_size(size, binary=True)
def main(root: pathlib.Path):
du = disk_usage(root)
logger.info(f"re-encoding videos from {root} ({hsize(du)})")
ffmpeg_args = VideoWebmLow().to_ffmpeg_args()
errored = []
for video_fpath in root.rglob("*.webm"):
logger.info(f"** {video_fpath}")
if reencode(
src_path=video_fpath,
dst_path=video_fpath,
ffmpeg_args=ffmpeg_args,
delete_src=True,
with_process=False,
failsafe=True,
):
logger.info(" OK")
else:
logger.error(" ERROR")
errored.append(video_fpath)
final_du = disk_usage(root)
logger.info(f"new disk-usage: {hsize(final_du)} (diff: {hsize(final_du - du)})")
if not errored:
logger.info("ALL OK")
return
logger.error(f"{len(errored)} files failed to re-encode:\n- "+ "\n- ".join(errored))
def entrypoint():
parser = argparse.ArgumentParser(
prog="re-encode",
description="re-encode videos using scraperlib",
)
parser.add_argument(
help="Source file path",
dest="src_path",
)
args = parser.parse_args()
try:
sys.exit(main(pathlib.Path(args.src_path).expanduser().resolve()))
except Exception as exc:
logger.error(f"FAILED. An error occurred: {exc}")
logger.exception(exc)
raise SystemExit(1) from exc
if __name__ == "__main__":
entrypoint() |
Can we please just do that (redoing the ZIM files) programmaticaly? This is a priority. The rest should be handled separatly and I‘m not in favour of rewritting the ZIP except if really necessary, see for example openzim/nautilus#23 |
Following live discussion:
It is understood we'll reencode those because we know that those are broken webm files and because we don't have the source videos anymore. In a normal situation, we'll store the source video on the drive and the (yet to be implemented) nautilus-included encoder will optimize it. |
I have rescheduled all 47 recipes based on the „nautilus“ tag and after fixing a few ones (Nautilus |
All nautilus ZIMs would benefit from being re-ran:
Now that nautilus supports URL entries, we may want to switch to URL-based collections and drop the ZIP archive. Advantage is that all files are individually available and replaceable ; collections are easy to extend.
On the other side, it means it's difficult for one to download a full recipe's data and run nautilus locally. @benoit74 WDYT?
Here's the list of all nautilus recipes (zimfarm has no filter for it)
The text was updated successfully, but these errors were encountered: