-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk handling of logs #49
Comments
How about: type bufferedBulkHandler struct {
buffer []*Record // the buffer is flushed when len(buffer) == cap(buffer)
handler BulkHandler
flushInterval time.Duration
forceFlushLevel Level // buffer is flushed directly (while blocking the logging call) after a message with this level (or higher) is added.
} We get rid of the bufferSize field as it is redundant with the slice capacity.
My preference is for the argument type to be BulkHandler. It is more explicit, less magical, and as you pointed out, it doesn't make sense to buffer the records and then forward them one at a time. Alternative DesignI have a bigger design concern though. I'm not sure that buffering the log.Records is the right scope. The LogglyHandler wants to batch up to a maximum number of bytes which we only know after the records have been formatted. Likewise, it makes more sense for a FileHandler to wrap the os.File with a bufio.Buffer to combine Write calls. On the other hand we also want the capability to force a flush on certain events, such as the passage of time or the logging of an important message. This situation makes me think that a better solution involves some mechanism for sending a flush signal (or hint) through the handler chain. Some Handlers may respond to the flush signal, and others may generate it. In this approach we would add a LvlFlushHandler and a TimerFlushHandler to generate flush signals. Meanwhile BufferedFileHandler (new) and LogglyHandler would collect formatted log data in a bufio.Buffer and flush when the buffer is full or when a flush signal is received from upstream. The flush signal is implemented as an additional |
I like the idea here; the I'll think some more about this, maybe there's an even better solution that combines the best of both.. |
Just like all buffered IO APIs wrap the IO interfaces and add a type BulkHandler interface {
Handler
Flush() error
} Applications can then determine their own flushing strategies, and we can provide a few sane and easy ones. func LevelFlusher(lvl Lvl, h BulkHandler) BulkHandler {
return FuncHandler(func (r *Record) error {
h.Log(r)
if r.Lvl > lvl {
h.Flush()
}
return nil
})
}
func TimedFlusher(d time.Duration, h BulkHandler) BulkHandler {
go func() {
for _ = range time.Tick(d) { h.Flush() }
}()
return h
}
func SyncFlusher(h BulkHandler) Handler {
return FuncHandler(func (r *Record) error {
h.Log(r)
return h.Flush()
})
} Obviously, this means each BulkHandler will have to responsible for doing its own internal buffering of pending messages. |
But that doesn't allow you to set the max buffer size, and I still think it's a bit strange to have a signal on the handler tree. It also can be quite hard to implement good buffering with guaranteed non-blocking calls to Log (also when the buffer is flushing). And I think it doesn't make much sense to do that for each Handler that wants to be buffered. What if we introduce a new type Small example, I left out the synchronization for now. // Buffer is exported so it can be used by subpackages
type Buffer struct {
bufferSize int
flushInterval time.Duration
flushLevel Lvl
currentNode *bufferNode
bufferPool sync.Pool // pool with used buffer slices to avoid allocations and gc
//++ and other synchonization related fields
convert func(*Record) []byte
flush func([]byte)
}
type bufferNode struct {
bufferBytes []byte // slice cap == Buffer.bufferSize
//++ other fields for synchronization and timeout
}
func NewBuffer(size int, interval time.Duration, convert func(*Record) []byte, flush func([]byte)) *Buffer {
b := &Buffer{
bufferSize: size,
flushInterval: interval,
flushLevel: LvlCrit, // default value
convert: convert,
flush: flush,
}
//++ setup sync.Pool
//++ create first bufferNode
//++ setup synchronization and buffer rotation (Log() shouldn't block when buffer is being flushed)
return b
}
// LogBuffered is not "Log" as we don't want *buffer to implement Handler
func (b *Buffer) LogBuffered(r *Record) {
newBytes := b.convert(r)
//++ starting here there should be synchronization to avoid races
//++ calculate if newBytes will overflow the buffer. if yes flush that buffer
//++ also should add newBytes to new buffer taken from the bufferPool
_ = append(b.currentNode.bufferBytes, newBytes...)
//++ force a flush when r.Lvl < b.flushLevel
}
func (b *Buffer) SetBufferSize(size int) {
//++ if buffer was used already: return error
b.bufferSize = size
}
func (b *Buffer) SetFlushInterval(interval time.Duration) {
//++ if buffer was used already: return error
b.flushInterval = interval
}
func (b *Buffer) SetFlushLevel(lvl Lvl) {
//++ if buffer was used already: return error
b.flushLevel = lvl
}
// ##### and now how the buffer is used to create a buffered Handler
type bufferedFileHandler struct {
buffer *Buffer
filename string
}
func BufferedFileHandler(filename string) Handler {
h := &bufferedFileHandler{
filename: filename,
}
// create new buffer with defaults that make sense for this Handler
h.buffer = NewBuffer(1*1024*1024, 5*time.Second, h.convert, h.flush)
return h
}
func (h *bufferedFileHandler) Log(r *Record) {
h.buffer.LogBuffered(r)
}
func (h *bufferedFileHandler) convert(r *Record) []byte { return nil }
func (h *bufferedFileHandler) flush([]byte) {}
// in this example we allow the user to change flushInterval and flushLevel but not bufferSize
func (h *bufferedFileHandler) SetFlushInterval(interval time.Duration) error {
if interval == 42*time.Second {
return errors.New("for a mysterious reason the BufferedFileHandler cannot flush exactly every 42 seconds")
}
h.buffer.SetFlushInterval(interval)
return nil
}
func (h *bufferedFileHandler) SetFlushLevel(lvl Lvl) {
h.buffer.SetFlushLevel(lvl)
} |
@inconshreveable Unfortunately this doesn't compose well with things like MultiHandler and FailoverHandler. It also just misses describing ext/SpeculativeHandler, which has a @GeertJohan In general I think we should be trying really hard to fit in with the Go standard library when it comes to buffering bytes. For example type WriteFlusher interface {
io.Writer
Available() int // helps avoid splitting log records
Flush() error
} With that we could implement a import "gopkg.in/natefinch/lumberjack.v2"
lj := &lumberjack.Logger{
Filename: "/path/to/file.log",
}
h := log15.StreamFlushHandler(bufio.NewWriter(lj), log15.JsonFormat())
h = log15.LvlFlusher(log15.LvlError, h)
log15.Root().SetHandler(h) Furthermore, anyone can implement the WriteFlusher interface if they want to optimize buffering differently and it will just plug right in. Time based flushing could be implemented as a wrapper around |
If new Record-based reasons to flush a log arrise those can be added to Buffer and (when applicable) exposed through the Handlers using the Buffer. I still don't like that the level-based flushing is a Handler pipeline. In the end the pipeline will contain a lot of lines like: Furthermore, any buffering handler should by default flush at LvlCrit as a convenience to both new and advanced users. It is very likely that the application will exit or crash after a LvlCrit log. So these things should be enabled by default. This is also the case for interval based flushing. How about a combination of the ideas we've gathered so far? I wouldn't always want to use an external buffer (e.g. |
I disagree. I view Handlers as a highly composable set of tools for building arbitrary log record routing and filtering trees. I think h := log15.FuncHandler(func (r *log15.Record) error {
if r.Msg == "fizzbuzz" {
r.Flush = true
}
}) This approach does not prevent someone from writing a monolithic Finally, I usually write my handler chains like this: h := log15.BufferedFileHandler("filename.log")
h = log15.IntervalFlusher(5*time.second, h)
h = log15.LvlFlusher(log15.LvlError, h)
log15.Root().SetHandler(h) I see no problem with that. It is done once during set up and is easy to understand and maintain. |
Hmm... But then the Handler would look like the one @inconshreveable proposed? type BulkHandler interface {
Handler
Flush() error
} And what about MultiHandler or FailoverHandler? I think I have one more idea that I'd like to present, and maybe you'll even like it. It's different from what we've seen so far, but I'll need to work it out a bit better.. I'll get back at that tomorrow. After that lets just make the call and implement one of the solutions. We've probably spent enough time thinking about this already. |
@GeertJohan My example pipeline was just a copy of yours reformated. I am not in favor of BufferedFileHandler. My preferred approach remains the one I described in my comment above that defines the func StreamFlushHandler(wf WriteFlusher, fmtr Format) Handler {
h := FuncHandler(func(r *Record) error {
_, err := wr.Write(fmtr.Format(r))
if err != nil || !r.Flush {
return err
}
return wf.Flush()
})
return LazyHandler(SyncHandler(h))
} Which is a very slightly modified copy of |
In this case the |
I don't think all buffering Handlers should work with a WriteFlusher, Flush doesn't always apply to a stream of bytes. WriteFlusher won't work for the LogglyHandler. So this only solves the case for StreamFlushHandler, right? I think we should aim Flush() to be a signal that all records must be flushed, it shouldn't matter in what form, it could even be Am now working on that new approach that I think combines all our requirements and most wishes. |
Yes. I still consider that idea viable.
It might work if we had a LogglyWriter though. Then LogglyHandler becomes: func LogglyHandler(...) Handler {
w := loggly.NewWriter(...)
f := loggly.NewFormat(...)
return StreamFlushHandler(w, f)
} And if we had a LogglyWriter it could be used to do batch uploads like so: f, err := os.Open("/path/to/log-file")
if err != nil {
// handle error
}
io.Copy(loggly.NewWriter(...), f) At that point the LogglyWriter might as well go in it's own repo because it is useful in other contexts than log15.
In my mind the Flush bool on a Record is a hint that the record should be flushed as soon as possible, which implies that all unflushed records ahead of it in the pipe also get flushed. Any Handler can inspect the flush flag and take appropriate action, so I think it works in the general case.
I look forward to what you come up with. |
It's been a while, sorry for not responding.. Busy times.. I still see some problems with h1 := log15.SomeHandler()
h1 = log15.LvlFlusher(log15.LvlDebug, h1)
h2 := log15.SomeHandler()
h2 = log15.LvlFlusher(log15.LvlError, h2)
h12 := log.MultiHandler(h1, h2)
log15.Root().SetHandler(h12) Although the Interval-based flushing still doesn't make sense in the handler tree, it's goal is to fire a flush when none took place for duration . So it should be notified of flushes. Flushing in general isn't a concept all handlers could implement. Lets say there is a BeepHandler (beeps when Record.Lvl <= log15.LvlError). It can't do anything with the flush flag. What I'm trying to point out here is that I think the flusher shouldn't go in the handler tree, but instead be plugged right before or onto compatible (flushable) handlers. Anyway, the idea I had.... I have some actual code about how this would work internally, but lets skip that for now and just look at how the user would use/implement this. Just to see how it feels and discuss if this might be a good way to set it all up. h := log15.BufferedFileHandler("filename.log")
h.AddFlusher(LvlFlusher(log15.LvlError))
h.AddFlusher(IntervalFlusher(15*time.Second))
log15.Root().SetHandler(h) AddFlusher is a way to add generic flushers to a flushable handler. The handler itself could also have specific flushing rules. For example: the bulk upload limit at loggly must cause a flush. This would work with some channels to communicate between the Flusher and the Handler. There is also be a method to pass each *Record by the Flusher for inspection. The It's a while ago I worked on it but the code I have for this is almost a working example/proof-of-concept so I could work this out... Please let me know what you think. |
Sorry for the long delay.
This is the strongest argument against a Your suggestion seems like a lot of API surface to support flushing. At this point I think the whole concept is best left out of the core library until a simpler solution is found. |
This issue originates from #44 where an external dependency (go-loggly) did buffering of log messages.
One of the key features of log15 is the way Handlers connect and create a handling tree. I think properties such as async, buffering (with max buffersize and flush interval) should be handled by a Handler, and the hook to a logging service shouldn't have to implement buffering itself and should just send logs to the logging service.
Now ofcourse the whole purpose of the buffering in go-loggly is so that can send multiple log entries in a single call to loggly. And this is actually very important, especially when working with huge amounts of logs. But buffering and bulk writes are also important in other handlers. For instance, a colleague of mine (@michaelvlaar) found that some of our application spend a huge amount of time in the kernel due to writes to files and
os.Stdout
. So a more generic approach to buffering and handling of bulk log records seems to be an interesting topic. We brainstormed a bit about it and came up with the following idea:Add an extra interface type:
Some Handlers such as the
LogglyHandler
and theFileHandler
could implement this interface, taking a slice of logs and sending/writing them all at once, or at least as much as possible/efficient per call/write. For instance: Loggly's bulk endpoint states "You can send a maximum of 5MB per batch and up to 1MB per event", so theLogglyHandler
will need to take care of that.Then a new Handler would be added:
The
ForceFlushLevel
is actually quite important, there's a good chance a program is crashing when a log withLvlCrit
is added. And you wouldn't want that message to be stuck in the buffer when the program crashes. Normal flushing takes place asynchronously, not directly blocking aLog
call.Maybe in this case (because of the number of options) the
BufferedBulkHandler
should be an exported type with exported fields, and not a function.In summary, what I propose is:
BulkHandler
interfaceBufferedBulkHandler
(either as func or exported struct)LogBulk([] *log15.Record) error
for theFileHandler
andLogglyHandler
, and possibly more.What I'm not sure about is whether
BufferedBulkHandler
should accept a Handler as argument, and check with a type assertion if theHandler
also happens to implementBulkHandler
. Or maybe it should just acceptBulkHandler
's (it doesn't really make sense to first buffer messages, and then handle them one by one through a normalHandler
, or does it?).The text was updated successfully, but these errors were encountered: