Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling gaps where time is not consecutive / contiguous #16

Open
joseortiz3 opened this issue Jan 1, 2025 · 0 comments
Open

handling gaps where time is not consecutive / contiguous #16

joseortiz3 opened this issue Jan 1, 2025 · 0 comments

Comments

@joseortiz3
Copy link
Contributor

joseortiz3 commented Jan 1, 2025

Consider this example: you have a OHLC(t) price timeseries for t in [0, 2, 4], where prices for t=1 and t=3 are missing (e.g. the asset was not trading at times t=1 and t=3).

As far as I understand it, EDGE requires at least two pairs of contiguous (consecutive) times (e.g. t, t+1, t+c, t+c+1 for some c>0) to estimate the bid-ask spread for time t. Like, t = [0, 1, 5, 6] or [1, 2, 3].

So in the example, EDGE cannot (correctly) produce an estimate because there are no contiguous/consecutive times. Is there a proper way around this?

It seems it would be incorrect to simply feed the data for t=0, 2, 4 into the EDGE estimator code (without NaNs) because differences in subsequent log prices ("log returns") would be exagerated. These are used in quantities like:

    # compute log-returns
    r1 = m - o
    r2 = o - m1
    r3 = m - c1
    r4 = c1 - m1
    r5 = o - c1

r2, r3, and r5 would be exagerated above whenever t skips over a time period because of the time gap.

On the other hand, including NaN / missing values in the data being fed to EDGE doesn't seem right either. For instance, if you replace every odd row of the test dataset with NaN (missing values), then the EDGE estimate would be undefined because r2, r3, and r5 above would be all NaN. Is it true that there is no valid bid-ask estimate in such a case?

If one excludes the missing rows from being passed, but also pass a new integer column t, I have played around unsuccessfully with trying a corrective factor, for example:

    r2 = (o - m1) / f(t - t1)
    # same for r3, r5

Where f(t-t1) is a correction factor function such that f(1) equals 1, and presumably monotonic increasing. For volatility estimators, I know that f(x) = sqrt(x) is usually the appropriate correction factor for log returns (follows from Wiener Process). I tried sqrt and other powers (like 0 or 1), but all introduce bias.

Is this something where the math behind EDGE breaks down and there is no appropriate correction to be made for non-consecutive t? If so, the same concern applies to any gaps in trading times, like whenever the market is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant