Skip to content

Commit

Permalink
fix: make logs persistent + other fixes (#140)
Browse files Browse the repository at this point in the history
- remove root logs dir
- revert CI env var in embedding server proc
- remove unsupported vectordb configs
- update readme: remove beta status
- update readme for logs
- fix: better handling of embedding server failure

---------

Signed-off-by: Anupam Kumar <[email protected]>
  • Loading branch information
kyteinsky authored Jan 20, 2025
1 parent fdeac86 commit b689509
Show file tree
Hide file tree
Showing 12 changed files with 40 additions and 35 deletions.
8 changes: 5 additions & 3 deletions .github/workflows/integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ jobs:
pip install -r requirements.txt
cp example.env .env
echo "NEXTCLOUD_URL=http://localhost:8080" >> .env
python3 -u ./main.py > logs/backend_logs 2>&1 &
python3 -u ./main.py > backend_logs 2>&1 &
echo $! > ../pid.txt # Save the process ID (PID)
- name: Register backend
Expand Down Expand Up @@ -233,7 +233,9 @@ jobs:
- name: Show logs
if: always()
run: |
tail data/nextcloud.log
cat data/nextcloud.log
echo '--------------------------------------------------'
tail -v -n +1 context_chat_backend/logs/* || echo "No logs in logs directory"
cat context_chat_backend/backend_logs || echo "No backend logs"
echo '--------------------------------------------------'
tail -v -n +1 context_chat_backend/persistent_storage/logs/* || echo "No logs in logs directory"
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,3 @@ __pycache__/
.env
persistent_storage/*
.vscode/
logs/
3 changes: 0 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,6 @@ RUN python3 -m pip install --no-cache-dir https://github.com/abetlen/llama-cpp-p
RUN sed -i '/llama_cpp_python/d' requirements.txt
RUN python3 -m pip install --no-cache-dir -r requirements.txt && python3 -m pip cache purge

# Create an empty logs dir
RUN mkdir logs

# Copy application files
COPY context_chat_backend context_chat_backend
COPY main.py .
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
[![REUSE status](https://api.reuse.software/badge/github.com/nextcloud/context_chat_backend)](https://api.reuse.software/info/github.com/nextcloud/context_chat_backend)

> [!NOTE]
> This is a beta software. Expect breaking changes.
>
> Be mindful to install the backend before the Context Chat php app (Context Chat php app would sends all the user-accessible files to the backend for indexing in the background. It is not an issue even if the request fails to an uninitialised backend since those files would be tried again in the next background job run.)
>
> The HTTP request timeout is 50 minutes for all requests and can be changed with the `request_timeout` app config for the php app `context_chat` using the occ command (`occ config:app:set context_chat request_timeout --value=3000`, value is in seconds). The same also needs to be done for docker socket proxy. See [Slow responding ExApps](https://github.com/cloud-py-api/docker-socket-proxy?tab=readme-ov-file#slow-responding-exapps)
Expand Down Expand Up @@ -101,6 +99,11 @@ volumes:
-v /var/run/docker.sock:/var/run/docker.sock:ro
```

## Logs
Logs are stored in the `logs/` directory in the persistent directory. In a docker container, it should be at `/nc_app_context_chat_backend/logs/`. The log file is named `ccb.log` and is set to otate at 20 MB with 10 backups. These logs are in JSONL format, i.e. each line is a valid JSON object.
Now only warning and above logs are printed to the console. All the debug logs are written to the log file if `debug` is set to `true` in the config file.
The logs of the embedding server are written to `logs/embedding_server_[date].log` in the persistent directory, it rotates with date change and is not in JSONL format, just raw stdout and stderr from the embedding server's process.

## Configuration
Configuration resides inside the persistent storage as `config.yaml`. The location is `$APP_PERSISTENT_STORAGE`. By default it would be at `/nc_app_context_chat_backend_data/config.yaml` inside the container.

Expand Down
11 changes: 0 additions & 11 deletions config.cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,6 @@ vectordb:
pgvector:
# 'connection' overrides the env var 'CCB_DB_URL'

chroma:
is_persistent: true
# chroma_server_host:
# chroma_server_http_port:
# chroma_server_ssl_enabled:
# chroma_server_api_default_path:

weaviate:
# auth_client_secret:
# url: http://localhost:8080

embedding:
protocol: http
host: localhost
Expand Down
11 changes: 0 additions & 11 deletions config.gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,6 @@ vectordb:
pgvector:
# 'connection' overrides the env var 'CCB_DB_URL'

chroma:
is_persistent: true
# chroma_server_host:
# chroma_server_http_port:
# chroma_server_ssl_enabled:
# chroma_server_api_default_path:

weaviate:
# auth_client_secret:
# url: http://localhost:8080

embedding:
protocol: http
host: localhost
Expand Down
2 changes: 2 additions & 0 deletions context_chat_backend/controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,8 @@ def _(sources: list[UploadFile]):

try:
added_sources = exec_in_proc(target=embed_sources, args=(vectordb_loader, app.extra['CONFIG'], sources))
except (DbException, EmbeddingException) as e:
raise e
except Exception as e:
raise DbException('Error: failed to load sources') from e
finally:
Expand Down
12 changes: 8 additions & 4 deletions context_chat_backend/dyn_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import signal
import subprocess
from abc import ABC, abstractmethod
from datetime import datetime
from time import sleep, time
from typing import Any

Expand All @@ -22,7 +23,7 @@

from .models.loader import init_model
from .network_em import NetworkEmbeddings
from .types import LoaderException, TConfig
from .types import EmbeddingException, LoaderException, TConfig
from .vectordb.base import BaseVectorDB
from .vectordb.loader import get_vector_db
from .vectordb.types import DbException
Expand All @@ -43,7 +44,11 @@ def offload(self):
class EmbeddingModelLoader(Loader):
def __init__(self, config: TConfig):
self.config = config
self.logfile = open('logs/embedding_server.log', 'a+')
logfile_path = os.path.join(
os.environ['EM_SERVER_LOG_PATH'],
f'embedding_server_{datetime.now().strftime("%Y-%m-%d")}.log',
)
self.logfile = open(logfile_path, 'a+')

def load(self):
global pid
Expand Down Expand Up @@ -84,8 +89,7 @@ def load(self):
try_ += 1
sleep(3)

logger.error('Error: failed to start the embedding server')
os.kill(os.getpid(), signal.SIGTERM)
raise EmbeddingException('Error: the embedding server is not responding')

def offload(self):
global pid
Expand Down
14 changes: 14 additions & 0 deletions context_chat_backend/logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import logging
import logging.config
import logging.handlers
import os
from time import gmtime

from ruamel.yaml import YAML
Expand Down Expand Up @@ -88,6 +89,19 @@ def get_logging_config() -> dict:
try:
yaml = YAML(typ='safe')
config: dict = yaml.load(f)

persistent_storage = os.getenv('APP_PERSISTENT_STORAGE', 'persistent_storage')
if (config.get('handlers', {}).get('file_json', {}).get('filename')):
if (
not config['handlers']['file_json']['filename'].startswith(persistent_storage)
and not config['handlers']['file_json']['filename'].startswith('/')
):
config['handlers']['file_json']['filename'] = os.path.join(
persistent_storage,
config['handlers']['file_json']['filename'],
)
# create logs directory if it doesn't exist
os.makedirs(os.path.dirname(config['handlers']['file_json']['filename']), exist_ok=True)
except Exception as e:
raise AssertionError('Error: could not load config from logger_config.yaml file') from e

Expand Down
5 changes: 5 additions & 0 deletions context_chat_backend/setup_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,14 @@ def setup_env_vars():

config_path = os.path.join(persistent_storage, 'config.yaml')

em_server_log_path = os.path.join(persistent_storage, 'logs')
if not os.path.exists(em_server_log_path):
os.makedirs(em_server_log_path, 0o750, True)

os.environ['APP_PERSISTENT_STORAGE'] = persistent_storage
os.environ['VECTORDB_DIR'] = vector_db_dir
os.environ['MODEL_DIR'] = model_dir
os.environ['SENTENCE_TRANSFORMERS_HOME'] = os.getenv('SENTENCE_TRANSFORMERS_HOME', model_dir)
os.environ['HF_HOME'] = os.getenv('HF_HOME', model_dir)
os.environ['CC_CONFIG_PATH'] = os.getenv('CC_CONFIG_PATH', config_path)
os.environ['EM_SERVER_LOG_PATH'] = os.getenv('EM_SERVER_LOG_PATH', em_server_log_path)
1 change: 1 addition & 0 deletions example.env
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
# HF_HOME=persistent_storage/model_files
# VECTORDB_DIR=persistent_storage/vector_db_data
# CC_CONFIG_PATH=persistent_storage/config.yaml
# EM_SERVER_LOG_PATH=persistent_storage/logs

# Huggingface offline mode
#TRANSFORMERS_OFFLINE=1
Expand Down
Empty file removed logs/.gitkeep
Empty file.

0 comments on commit b689509

Please sign in to comment.