Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update example #19

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ tests/test_files/chips/
tests/test_files/full_moon/
tests/test_files/new_moon/
src/feedback_images/
src/feedback_model/
72 changes: 45 additions & 27 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,55 +1,60 @@
repos:
# Standard pre-commit hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: check-yaml
# File formatting
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-json
- id: mixed-line-ending
- id: requirements-txt-fixer
- id: pretty-format-json
args: ["--autofix"]

# Syntax checking
- id: check-yaml
- id: check-json
- id: check-toml
- id: check-ast
- id: check-case-conflict
- id: check-docstring-first
- id: check-added-large-files
- id: check-ast
- id: check-byte-order-marker
- id: check-executables-have-shebangs
- id: check-merge-conflict
- id: check-toml
- id: debug-statements
- id: check-byte-order-marker

# Content validation
- id: pretty-format-json
args: ["--autofix"]
- id: requirements-txt-fixer
- id: check-added-large-files
args: ["--maxkb=1000"]

# Security
- id: detect-aws-credentials
args: [--allow-missing-credentials]
- id: detect-private-key
- id: debug-statements

# Type checking
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.1.1
rev: v1.8.0 # Updated to latest
hooks:
- id: mypy
args:
[
--install-types,
--ignore-missing-imports,
--disallow-untyped-defs,
--ignore-missing-imports,
--non-interactive,
--exclude=(__init__.py)$,
]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: detect-private-key
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-added-large-files

# Security scanning
- repo: https://github.com/PyCQA/bandit
rev: "1.7.5"
rev: 1.7.7 # Updated to latest
hooks:
- id: bandit
exclude: ^tests/
args:
- -s
- B101
args: [-s, B101]

# Documentation coverage
- repo: local
hooks:
- id: interrogate
Expand All @@ -70,8 +75,21 @@ repos:
.ipynb_checkpoints/,
--fail-under=90,
]
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.0.257"

# Python linting
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.9 # Updated to latest
hooks:
- id: ruff
exclude: docs/openapi.json

# Dockerfile linting
- repo: https://github.com/hadolint/hadolint
rev: v2.12.0 # Updated to latest
hooks:
- id: hadolint-docker
name: Lint Dockerfiles
description: Runs hadolint Docker image to lint Dockerfiles
language: docker_image
types: ["dockerfile"]
entry: ghcr.io/hadolint/hadolint hadolint
55 changes: 40 additions & 15 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,20 +1,45 @@
FROM ubuntu:22.04@sha256:67211c14fa74f070d27cc59d69a7fa9aeff8e28ea118ef3babc295a0428a6d21

RUN apt-get update -y
RUN apt-get install ffmpeg libsm6 libxext6 -y

RUN apt-get install libhdf5-serial-dev netcdf-bin libnetcdf-dev -y

RUN apt-get update && apt-get install -y \
python3-pip

COPY requirements/requirements.txt requirements.txt

RUN pip3 install --no-cache-dir --upgrade -r requirements.txt

# Use an official Python runtime as a parent image with SHA for reproducibility
# hadolint ignore=DL3008
FROM python:3.12-slim@sha256:2a6386ad2db20e7f55073f69a98d6da2cf9f168e05e7487d2670baeb9b7601c5

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1 \
PYTHONPATH=/src

# Install all required system packages in one RUN statement to reduce image layers
# hadolint ignore=DL3008
RUN apt-get update && apt-get install -y --no-install-recommends \
# Original required packages
ffmpeg \
libsm6 \
libxext6 \
libhdf5-dev \
netcdf-bin \
libnetcdf-dev \
# Additional geospatial packages
gdal-bin \
libgdal-dev \
libproj-dev \
libgeos-dev \
gcc \
g++ \
build-essential \
&& rm -rf /var/lib/apt/lists/* # Clean up to reduce image size

# Copy requirements to leverage Docker cache
COPY requirements/requirements.txt /tmp/requirements.txt

# Fix urllib3 version specifier and install watchdog instead of pathtools
RUN pip install --no-cache-dir --upgrade -r /tmp/requirements.txt

# Set the working directory
WORKDIR /src

# Copy the source code in one layer
COPY ./src /src
COPY ./tests /src/tests

CMD ["python3", "main.py"]
# Specify the default command to run
CMD ["python", "main.py"]
24 changes: 7 additions & 17 deletions data.md
Original file line number Diff line number Diff line change
@@ -1,64 +1,54 @@
## Data for inference

There are two required datasets for inference, the light intensity data (\*DNB_NRT) and supporting data including geolocation, moonlight illumination, and other files used during inference. In addition to these two data sources, there are several optional datasets that are used to improve the quality of the detections. The optional datasets are cloud masks (CLDMSK_NRT) and additional bands (MOD_NRT) used for gas flare identification and removal. The DNB and MOD datasets are provided in near real time through [earthdata](https://www.earthdata.nasa.gov/learn/find-data/near-real-time/viirs) and the cloud masks are provided in near real time through [sips-data](https://sips-data.ssec.wisc.edu/nrt/). The urls for each dataset and satellite is below. Note that downloads require a token, if using the API. Register for the API and create a token at [earthdata](https://urs.earthdata.nasa.gov/).
## Data for inference
There are two required datasets for inference, the light intensity data (*DNB_NRT) and supporting data including geolocation, moonlight illumination, and other files used during inference. In addition to these two data sources, there are several optional datasets that are used to improve the quality of the detections. The optional datasets are cloud masks (CLDMSK_NRT) and additional bands (MOD_NRT) used for gas flare identification and removal. The DNB and MOD datasets are provided in near real time through [earthdata](https://www.earthdata.nasa.gov/learn/find-data/near-real-time/viirs) and the cloud masks are provided in near real time through [sips-data](https://sips-data.ssec.wisc.edu/nrt/). The urls for each dataset and satellite is below. Note that downloads require a token, if using the API. Register for the API and create a token at [earthdata](https://urs.earthdata.nasa.gov/).
Suomi NPP (NOAA/NASA Suomi National Polar-orbiting Partnership)
| File | SUOMI-NPP | NOAA-20 |
| File | SUOMI-NPP | NOAA-20 |
|-------------------------------|-----------------------------------------------------------------------|----------|
| Day/Night Band (DNB) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP02DNB_NRT) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ102DNB_NRT) |
| Terrain Corrected Geolocation (DNB) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP03DNB_NRT)| [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ103DNB_NRT)|
| Clear sky confidence | [url](https://sips-data.ssec.wisc.edu/nrt/CLDMSK_L2_VIIRS_SNPP_NRT) | [url](https://sips-data.ssec.wisc.edu/nrt/CLDMSK_L2_VIIRS_NOAA20_NRT)|
| Clear sky confidence | [url](https://sips-data.ssec.wisc.edu/nrt/CLDMSK_L2_VIIRS_SNPP_NRT) | [url](https://sips-data.ssec.wisc.edu/nrt/CLDMSK_L2_VIIRS_NOAA20_NRT)|
| Gas Flares Band | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP02MOD_NRT/) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ102MOD_NRT/)|
| Terrain Corrected Geolocation (MOD) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP03MOD_NRT/)| [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ103DNB_NRT/)|

## Downloading data

1. Register an account on earthdata and download a token: https://www.earthdata.nasa.gov/learn/find-data
2. Set this token in your environment e.g. (export EARTHDATA_TOKEN=$DOWNLOADED_TOKEN)
3. Download data for each img_path (DNB, GEO data, and cloud masks are required with the default configuration on and around full moons)

```python
TOKEN = f"{os.environ.get('EARTHDATA_TOKEN')}"
with open(dnb_path, "w+b") as fh:
utils.download_url(img_path, TOKEN, fh)
```

Sample data can be found in the test_files directory. The example requests reference data within test_files.

## API documentation

The API schema is automatically generated from src.utils.autogen_api_schema. The schema is written to docs/openapi.json (open in openapi editor such as swagger: https://editor.swagger.io/). Documentation and additional examples are available at http://0.0.0.0:5555/redoc after starting server. Example data is located in test_files.

To regenerate the schema:

```bash
python -c 'from src import utils; utils.autogen_api_schema()'
```

## Tuning the model

Parameters are defined in src/config/config.yml. Within that config, there are in line comments for the most important parameters, along with recommendations on appropriate ranges to tune those values in order to achieve higher precision or higher recall.

By default, the model filters out a variety of light sources and image artifacts that cause false positive detections. These filters are defined in pipeline section, and can be turned off or on within the config. By default, there are filters for auroral lit clouds, moonlit clouds, image artifacts (bowtie/noise smiles, edge noise), near shore detections, non-max suppression, lightning, and gas flares.

## Generate a labeled dataset

There are two types of training datasets. The first contains bounding box annotations for each detection in a frame. The second contains image level labels (crops of detected vessels) for training the supervised CNN referenced in src/postprocessor.

To generate a new object detection dataset:

1. Create account at https://nrt3.modaps.eosdis.nasa.gov/
2. Download earthdata token by clicking on profile icon and "Download token"
3. Build and run docker container with an an optional mounted volume:

```bash
docker run -d -m="50g" --cpus=120 --mount type=bind,source="$(pwd)"/target,target=/src/raw_data ghcr.io/allenai/vessel-detection-viirs:latest
docker run -d -m="50g" --cpus=120 --mount type=bind,source="$(pwd)"/target,target=/src/raw_data skylight-vvd-service:latest
```
4. Set this token in your environment: e.g. ```export EARTHDATA_TOKEN=YOUR_DOWNLOADED_TOKEN_FROM_STEP_2```
5. Annotate the data from within the docker container using ```python src/gen_object_detection_dataset.py```

4. Set this token in your environment: e.g. `export EARTHDATA_TOKEN=YOUR_DOWNLOADED_TOKEN_FROM_STEP_2`
5. Annotate the data from within the docker container using `python src/gen_object_detection_dataset.py`

To generate a new image label dataset:

1. Use src/gen_image_labeled_dataset.py. Sample imagery to train the feedback model is contained within the feedback_model/viirs_classifier folder

Note that a sample dataset of ~1000 detections (<1 GB) has been provided within this repository.
10 changes: 5 additions & 5 deletions docs/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"type": "number"
},
"nanowatts": {
"title": "Nanowatts",
"title": "radiance_nw",
"type": "number"
},
"orientation": {
Expand Down Expand Up @@ -82,6 +82,10 @@
"output_dir": "output"
},
"properties": {
"cloud_maskname": {
"title": "Phys Filename",
"type": "string"
},
"dnb_filename": {
"title": "Dnb Filename",
"type": "string"
Expand Down Expand Up @@ -113,10 +117,6 @@
"output_dir": {
"title": "Output Dir",
"type": "string"
},
"phys_filename": {
"title": "Phys Filename",
"type": "string"
}
},
"required": [
Expand Down
18 changes: 7 additions & 11 deletions example/sample_request.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,25 @@
""" Use this script to inference the API with locally stored data"""

import json
import os
import time

import requests

PORT = os.getenv("VVD_PORT", default=5555)
VVD_ENDPOINT = f"http://localhost:{PORT}/detections"
SAMPLE_INPUT_DIR = "/example/"
SAMPLE_OUTPUT_DIR = "/example/chips/"
SAMPLE_INPUT_DIR = "tests/test_files/"
SAMPLE_OUTPUT_DIR = "tests/test_files/chips/"
TIMEOUT_SECONDS = 600
DNB_FILENAME = "VJ102DNB_NRT_2023_310_VJ102DNB_NRT.A2023310.0606.021.2023310104322.nc"
GEO_FILENAME = "VJ103DNB_NRT_2023_310_VJ103DNB_NRT.A2023310.0606.021.2023310093233.nc"


def sample_request() -> None:
"""Sample request for files stored locally"""
start = time.time()

REQUEST_BODY = {
"input_dir": SAMPLE_INPUT_DIR,
"output_dir": SAMPLE_OUTPUT_DIR,
"dnb_filename": DNB_FILENAME,
"geo_filename": GEO_FILENAME

"dnb_filename": "VNP02DNB_NRT.A2023300.1136.002.2023300154339.nc",
"geo_filename": "VNP03DNB_NRT.A2023300.1136.002.2023300145841.nc",
}

response = requests.post(VVD_ENDPOINT, json=REQUEST_BODY, timeout=TIMEOUT_SECONDS)
Expand All @@ -31,8 +29,6 @@ def sample_request() -> None:
if response.ok:
with open(output_filename, "w") as outfile:
json.dump(response.json(), outfile)
end = time.time()
print(f"elapsed time: {end-start}")


if __name__ == "__main__":
Expand Down
6 changes: 0 additions & 6 deletions example/sample_request_cloud.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
"""Runs a sample request for VIIRS detections from running server for images in cloud
"""
import json
import os
import time

import requests

Expand All @@ -22,7 +19,6 @@ def sample_request(sample_image_data: str) -> None:
sample_image_data : str

"""
start = time.time()

REQUEST_BODY = {
"gcp_bucket": GCP_BUCKET,
Expand All @@ -38,8 +34,6 @@ def sample_request(sample_image_data: str) -> None:
if response.ok:
with open(output_filename, "w") as outfile:
json.dump(response.json(), outfile)
end = time.time()
print(f"elapsed time for {sample_image_data} is: {end-start}")


if __name__ == "__main__":
Expand Down
Loading
Loading