Merge pull request #134 from SinaAboutalebi/main

Enhance ASN Fetching with Multi-Country Support and Improved Docker Functionality
hatamiarash7 · Nov 8, 2024 · a178266 · a178266
2 parents cb3076a + b1ef15a
commit a178266
Show file tree

Hide file tree

Showing 8 changed files with 189 additions and 77 deletions.
diff --git a/.dockerignore b/.dockerignore
@@ -1,4 +1,8 @@
 .git*
 LICENSE
 Makefile
-README.md
+README.md
+*.txt
+*.csv
+.vscode
+.venv
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -20,7 +20,7 @@ jobs:
     needs: init
     steps:
       - name: Checkout
-        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
+        uses: actions/checkout@v4.2.2
 
       - name: Get repository info
         uses: gacts/github-slug@v1
@@ -45,10 +45,6 @@ jobs:
           username: ${{ github.actor }}
           password: ${{ secrets.GH_TOKEN }}
 
-      - name: Get Current Date
-        id: date
-        run: echo "::set-output name=date::$(date +'%Y-%m-%d')"
-
       - name: Build & Push Docker image
         uses: docker/build-push-action@v6
         with:
@@ -57,8 +53,7 @@ jobs:
           push: true
           platforms: linux/amd64,linux/386,linux/arm64,linux/arm/v6,linux/arm/v7
           build-args: |
-            APP_VERSION=${{ steps.slug.outputs.version }}"
-            DATE_CREATED=${{ steps.date.outputs.date }}
+            APP_VERSION=${{ steps.slug.outputs.version }}
           tags: |
             hatamiarash7/asn-by-country:${{ steps.slug.outputs.version }}
             hatamiarash7/asn-by-country:latest

diff --git a/.gitignore b/.gitignore
@@ -155,3 +155,4 @@ cython_debug/
 asn_list.csv
 ranges.txt
 .vscode
+output_data
diff --git a/Dockerfile b/Dockerfile
@@ -2,22 +2,21 @@ FROM --platform=$BUILDPLATFORM python:3.11.4-slim-buster
 
 ARG APP_VERSION="undefined@docker"
 
-LABEL org.opencontainers.image.title="asn-by-country"
-LABEL org.opencontainers.image.description="Get ASN delegations list of specific country"
-LABEL org.opencontainers.image.url="https://github.com/hatamiarash7/ASN-By-Country"
-LABEL org.opencontainers.image.source="https://github.com/hatamiarash7/ASN-By-Country"
-LABEL org.opencontainers.image.vendor="hatamiarash7"
-LABEL org.opencontainers.image.author="hatamiarash7"
-LABEL org.opencontainers.version="$APP_VERSION"
-LABEL org.opencontainers.image.created="$DATE_CREATED"
-LABEL org.opencontainers.image.licenses="MIT"
+LABEL org.opencontainers.image.title="asn-by-country" \
+      org.opencontainers.image.description="Get ASN delegations list of specific country" \
+      org.opencontainers.image.url="https://github.com/hatamiarash7/ASN-By-Country" \
+      org.opencontainers.image.source="https://github.com/hatamiarash7/ASN-By-Country" \
+      org.opencontainers.image.vendor="hatamiarash7" \
+      org.opencontainers.image.author="hatamiarash7" \
+      org.opencontainers.version="$APP_VERSION" \
+      org.opencontainers.image.created="$(date --iso-8601=seconds)" \
+      org.opencontainers.image.licenses="MIT"
 
 WORKDIR /app
 
-COPY ./requirements.txt /app/
+COPY requirements.txt .
 
-RUN pip3 install --no-cache-dir pip \
-    && pip3 install --no-cache-dir -r requirements.txt
+RUN pip3 install --no-cache-dir -r requirements.txt
 
 COPY . .
 

diff --git a/Makefile b/Makefile
@@ -2,8 +2,7 @@ install: ## Install the requirements
 	@python3 -m pip install -r requirements.txt
 
 clean: ## Clean the results
-	@rm -f ranges.txt
-	@rm -f asn_list.csv
+	@rm -f output_data
 
 run: clean ## Run the script
 	@python3 main.py IR

diff --git a/README.md b/README.md
@@ -7,27 +7,51 @@ It's a simple script to get ASN delegations list of specific country. I'm using
 ## Usage
 
 ```bash
-python main.py <country>
+python main.py <country_code_1> <country_code_2> ... [options]
 ```
 
-Example:
+### Optional Arguments
+
+- `--data-type <type>`:
+  Specify which type of data to fetch. The options are:
+
+  - `asn`: Retrieve only AS numbers (default).
+  - `ipv4`: Retrieve only IPv4 addresses.
+  - `ipv6`: Retrieve only IPv6 addresses.
+  - `all`: Retrieve AS numbers, IPv4 addresses, and IPv6 addresses.
+
+  **Default**: `asn`
+
+## Examples
 
 ```bash
-python main.py IR
+python main.py IR US FR
+
+python main.py IR --data-type asn
+
+python main.py IR US --data-type all
 ```
 
 ### Docker
 
 ```bash
-docker run --rm  -v /results:/app hatamiarash7/asn-by-country:latest <country>
+docker run --rm  -v /results:/app/output_data hatamiarash7/asn-by-country:latest <country_code_1> <country_code_2> ... [options]
 ```
 
 ## Result
 
-This script will generate two file:
+The output of the ASN By Country script will be generated in the `output_data` directory. The following files will be created based on the specified country codes, and they will contain ASN delegation information for both IPv4 and IPv6 addresses.
+
+### File List and Descriptions
 
-- `asn_list.csv`: Contains the information
-- `ranges.txt`: Contains all ASN ranges
+| File Name                 | Description                                                   |
+| ------------------------- | ------------------------------------------------------------- |
+| `{Country}_asn_list.csv`  | Contains a list of ASN delegations for specified country.     |
+| `{Country}_ipv4_list.csv` | Contains IPv4-specific ASN delegations for specified country. |
+| `{Country}_ipv6_list.csv` | Contains IPv6-specific ASN delegations for specified country. |
+| `asn_ranges.txt`          | Contains a list of all ASN ranges across countries.           |
+| `ipv4_ranges.txt`         | Contains a list of all IPv4 ranges.                           |
+| `ipv6_ranges.txt`         | Contains a list of all IPv6 ranges.                           |
 
 ---
 

diff --git a/main.py b/main.py
@@ -1,59 +1,147 @@
-"""This is a simple script to get all AS numbers of a given country
+"""
+This script retrieves AS numbers, IPv4, and/or IPv6 addresses
+with prefixes for one or more given country codes.
 """
 
 import argparse
-import time
+import os
 import warnings
+from concurrent.futures import ThreadPoolExecutor, as_completed
 
 import pandas as pd
 import requests
 from bs4 import BeautifulSoup
 from rich.console import Console
 from rich.progress import track
 
+# Suppress FutureWarnings
 warnings.simplefilter(action="ignore", category=FutureWarning)
 
-console = Console()
-
-parser = argparse.ArgumentParser(description="Get AS numbers of a country")
-parser.add_argument("country", help="Country code")
+console = Console(log_path=False)
+
+# Argument parsing
+parser = argparse.ArgumentParser(
+    description="Get AS numbers, IPv4, and/or IPv6 allocations of one or more countries"  # noqa: E501
+)
+parser.add_argument(
+    "countries",
+    nargs="+",
+    help="Country codes (e.g., 'FR', 'US')",
+)
+parser.add_argument(
+    "--data-type",
+    choices=["asn", "ipv4", "ipv6", "all"],
+    default="asn",
+    help="Specify which data to fetch: 'asn', 'ipv4', 'ipv6', or 'all'",
+)
 args = parser.parse_args()
 
-country = args.country.upper()
-
-# Create object page - pylint: disable=line-too-long
-console.log("\t[blue]Downloading data ...[/blue]")
-url = f"https://www-public.imtbs-tsp.eu/~maigron/RIR_Stats/RIR_Delegations/Delegations/ASN/{country}.html"  # noqa E501
-response = requests.get(url)
-response.raise_for_status()
-
-# Obtain page's information
-soup = BeautifulSoup(response.text, "lxml")
-
-# Obtain information from tag <table>
-table = soup.find("table", attrs={"class": "delegs asn ripencc"})
-
-# Obtain headers
-headers = [header.text for header in table.find_all("th")]
-
-# Create a data frame
-data = pd.DataFrame(columns=headers[1:])
-with open(file="ranges.txt", mode="w", encoding="UTF-8") as ranges_file:
-    rows = table.find_all("tr")[2:]
-
-    for row_index, row in enumerate(track(rows, description="Reading ...")):
-        row_data = row.find_all("td")
-        row = [i.text for i in row_data]
-
-        if row[6] == "Allocated":
-            SEP = "," if row_index != len(rows) - 1 else ""
-            ranges_file.write(row[3] + SEP)
-
-        data = data._append(dict(zip(headers[1:], row)), ignore_index=True)
-        time.sleep(0.001)
-
-    console.log(f"Found\t[green]{len(data)}[/green] ASNs")
-
-
-# Export to csv
-data.to_csv("asn_list.csv", index=False)
+# Set Headers
+HEADERS = {"User-Agent": "Mozilla/5.0 (compatible; ASNumberFetcher/1.0)"}
+
+# Base URLs
+BASE_URLS = {
+    "asn": "https://www-public.imtbs-tsp.eu/~maigron/rir-stats/rir-delegations/delegations/asn/{country}-asn-delegations.html",  # noqa: E501
+    "ipv4": "https://www-public.imtbs-tsp.eu/~maigron/rir-stats/rir-delegations/delegations/ipv4/{country}-ipv4-delegations.html",  # noqa: E501
+    "ipv6": "https://www-public.imtbs-tsp.eu/~maigron/rir-stats/rir-delegations/delegations/ipv6/{country}-ipv6-delegations.html",  # noqa: E501
+}
+
+
+def fetch_data(country_code, data_type):
+    """Fetch ASN, IPv4, or IPv6 data for a given country code."""
+    url = BASE_URLS[data_type].format(country=country_code.lower())
+    try:
+        response = requests.get(url, headers=HEADERS)
+        response.raise_for_status()
+        soup = BeautifulSoup(response.text, "lxml")
+
+        # Locate the table
+        table = soup.find(
+            "table",
+            attrs={"class": f"delegs {data_type} ripencc"},
+        )
+        if not table:
+            console.log(
+                f"[yellow]No data table found for {data_type.upper()} in {country_code}.[/yellow]"  # noqa: E501
+            )
+            return country_code, data_type, None, None
+
+        # Extract headers and rows
+        headers = [header.text.strip() for header in table.find_all("th")]
+        rows = table.find_all("tr")[2:]
+
+        # Collect data rows
+        data_rows = []
+        allocations = []
+        for row in rows:
+            columns = [td.text.strip() for td in row.find_all("td")]
+            if columns:
+                row_data = dict(zip(headers[1:], columns))
+                data_rows.append(row_data)
+
+                if data_type == "asn" and columns[6] == "Allocated":
+                    allocations.append(columns[3])  # Collect allocated ASNs
+                elif data_type in ["ipv4", "ipv6"] and columns[7] == "Allocated":
+                    ip_with_prefix = f"{columns[3]}{columns[4].strip()}"
+                    allocations.append(ip_with_prefix)
+
+        return country_code, data_type, data_rows, allocations
+
+    except requests.exceptions.RequestException as e:
+        console.log(
+            f"[red]Error fetching {data_type.upper()} data for {country_code}: {e}[/red]"  # noqa: E501
+        )
+        return country_code, data_type, None, None
+
+
+# Run fetch requests in parallel
+country_data = {}
+output_dir = "output_data"
+os.makedirs(output_dir, exist_ok=True)
+
+console.log("[blue]Fetching data for countries...[/blue]")
+
+# Prepare to fetch all specified data types
+data_types = [args.data_type] if args.data_type != "all" else ["asn", "ipv4", "ipv6"]
+
+# Create a new console instance for cleaner logging within the progress bar
+console_no_time = Console(log_path=False, log_time=False)
+
+with ThreadPoolExecutor() as executor:
+    futures = [
+        executor.submit(fetch_data, country, data_type)
+        for country in args.countries
+        for data_type in data_types
+    ]
+
+    for future in track(
+        as_completed(futures),
+        total=len(futures),
+        description="Processing data...",
+    ):
+        country_code, data_type, data_rows, allocations = future.result()
+        if data_rows is not None:
+            # Save data to CSV for each country and type
+            df = pd.DataFrame(data_rows)
+            csv_filename = os.path.join(
+                output_dir, f"{country_code}_{data_type}_list.csv"
+            )
+            df.to_csv(csv_filename, index=False)
+
+            # Improve readability with a new console instance
+            console_no_time.log(
+                f" [green]Data saved for {data_type.upper()} in {country_code}[/green]"  # noqa: E501
+            )
+
+            # Write IP or ASN ranges to ranges file
+            if data_type in ["asn", "ipv4", "ipv6"]:
+                range_file_path = os.path.join(
+                    output_dir,
+                    f"{data_type}_ranges.txt",
+                )
+                with open(range_file_path, "a", encoding="UTF-8") as range_file:
+                    range_file.write(",".join(allocations) + "\n")
+
+console.log(
+    f"[green]Completed fetching data for {len(args.countries)} countries.[/green]"  # noqa: E501
+)
diff --git a/requirements.txt b/requirements.txt
@@ -1,5 +1,7 @@
-beautifulsoup4==4.12.3
-requests==2.32.3
 pandas==2.2.3
-lxml==5.3.0
-rich==13.9.4
+requests==2.32.3
+beautifulsoup4==4.12.3
+rich==13.9.4
+numpy==1.24.3
+pybind11>=2.12
+lxml==5.3.0