Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPX transport node class response.raise_for_status() causes misbehavior of 404-based methods in the client. #174

Closed
Thijsvandepoll opened this issue Jun 6, 2024 · 3 comments

Comments

@Thijsvandepoll
Copy link
Contributor

The Sync/Async Elasticsearch client relies on 404 status codes for some of it's functionality. Some examples are:

  • client.indices.exists()
  • client.indices.alias_exists(), or
  • client.get()

The current implementation of the HttpxAsyncHttpNode is:

    async def perform_request(  # type: ignore[override]
        self,
        method: str,
        target: str,
        body: Optional[bytes] = None,
        headers: Optional[HttpHeaders] = None,
        request_timeout: Union[DefaultType, Optional[float]] = DEFAULT,
    ) -> NodeApiResponse:
        resolved_headers = self._headers.copy()
        if headers:
            resolved_headers.update(headers)

        if body:
            if self._http_compress:
                resolved_body = gzip.compress(body)
                resolved_headers["content-encoding"] = "gzip"
            else:
                resolved_body = body
        else:
            resolved_body = None

        try:
            start = time.perf_counter()
            if request_timeout is DEFAULT:
                resp = await self.client.request(
                    method,
                    target,
                    content=resolved_body,
                    headers=dict(resolved_headers),
                )
            else:
                resp = await self.client.request(
                    method,
                    target,
                    content=resolved_body,
                    headers=dict(resolved_headers),
                    timeout=request_timeout,
                )
            resp.raise_for_status()
            response_body = resp.read()
            duration = time.perf_counter() - start
        except RERAISE_EXCEPTIONS + BUILTIN_EXCEPTIONS:
            raise
        except Exception as e:
            err: Exception
            if isinstance(e, (TimeoutError, httpx.TimeoutException)):
                err = ConnectionTimeout(
                    "Connection timed out during request", errors=(e,)
                )
            elif isinstance(e, ssl.SSLError):
                err = TlsError(str(e), errors=(e,))
            else:
                err = ConnectionError(str(e), errors=(e,))

            self._log_request(
                method=method,
                target=target,
                headers=resolved_headers,
                body=body,
                exception=err,
            )
            raise err from None

        meta = ApiResponseMeta(
            resp.status_code,
            resp.http_version,
            HttpHeaders(resp.headers),
            duration,
            self.config,
        )

        self._log_request(
            method=method,
            target=target,
            headers=resolved_headers,
            body=body,
            meta=meta,
            response=response_body,
        )

        return NodeApiResponse(meta, response_body)

This means that if a 404 is returned because a resource does not exist, then an error is raised because of the response.raise_for_status(). This causes other code to fail as well. For example this piece (in _async.client._base.py):

meta, resp_body = await self.transport.perform_request(
            method,
            target,
            headers=request_headers,
            body=body,
            request_timeout=self._request_timeout,
            max_retries=self._max_retries,
            retry_on_status=self._retry_on_status,
            retry_on_timeout=self._retry_on_timeout,
            client_meta=self._client_meta,
            otel_span=otel_span,
        )

        # HEAD with a 404 is returned as a normal response
        # since this is used as an 'exists' functionality.
        if not (method == "HEAD" and meta.status == 404) and (
            not 200 <= meta.status < 299
            and (
                self._ignore_status is DEFAULT
                or self._ignore_status is None
                or meta.status not in self._ignore_status
            )
        ):
            message = str(resp_body)

            # If the response is an error response try parsing
            # the raw Elasticsearch error before raising.
            if isinstance(resp_body, dict):
                try:
                    error = resp_body.get("error", message)
                    if isinstance(error, dict) and "type" in error:
                        error = error["type"]
                    message = error
                except (ValueError, KeyError, TypeError):
                    pass

            raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
                message=message, meta=meta, body=resp_body
            )

The mentioned functionality (client.indices.exists(), client.indices.alias_exists(), client.get()) then stops working as expected. The fix seems easy:

    async def perform_request(  # type: ignore[override]
        self,
        method: str,
        target: str,
        body: Optional[bytes] = None,
        headers: Optional[HttpHeaders] = None,
        request_timeout: Union[DefaultType, Optional[float]] = DEFAULT,
    ) -> NodeApiResponse:
        resolved_headers = self._headers.copy()
        if headers:
            resolved_headers.update(headers)

        if body:
            if self._http_compress:
                resolved_body = gzip.compress(body)
                resolved_headers["content-encoding"] = "gzip"
            else:
                resolved_body = body
        else:
            resolved_body = None

        try:
            start = time.perf_counter()
            if request_timeout is DEFAULT:
                resp = await self.client.request(
                    method,
                    target,
                    content=resolved_body,
                    headers=dict(resolved_headers),
                )
            else:
                resp = await self.client.request(
                    method,
                    target,
                    content=resolved_body,
                    headers=dict(resolved_headers),
                    timeout=request_timeout,
                )
            response_body = resp.read()
            duration = time.perf_counter() - start
        except RERAISE_EXCEPTIONS + BUILTIN_EXCEPTIONS:
            raise
        except Exception as e:
            err: Exception
            if isinstance(e, (TimeoutError, httpx.TimeoutException)):
                err = ConnectionTimeout(
                    "Connection timed out during request", errors=(e,)
                )
            elif isinstance(e, ssl.SSLError):
                err = TlsError(str(e), errors=(e,))
            else:
                err = ConnectionError(str(e), errors=(e,))

            self._log_request(
                method=method,
                target=target,
                headers=resolved_headers,
                body=body,
                exception=err,
            )
            raise err from None

        meta = ApiResponseMeta(
            resp.status_code,
            resp.http_version,
            HttpHeaders(resp.headers),
            duration,
            self.config,
        )

        self._log_request(
            method=method,
            target=target,
            headers=resolved_headers,
            body=body,
            meta=meta,
            response=response_body,
        )

        return NodeApiResponse(meta, response_body)
@pquentin
Copy link
Member

Thanks for the report @Thijsvandepoll! Is your fix simply to stop raising for status? Difficult to tell like this. Would you be interested in opening a pull request so that it can be included in 8.15?

@Thijsvandepoll
Copy link
Contributor Author

@pquentin yes, raise_for_status gives problems for the downstream Elasticsearch client functionality. I have created a pull request for this fix.

@miguelgrinberg
Copy link
Contributor

Fixed by #182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants