You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use fms-hf-tuning to collect system level metrics while finetuning models, these include model load time, as well as device related metrics (like these that AIM collects).
One way would be to rely on just the measurements that AIM collects by pointing fms-hf-tuning to an AIM server then contacting AIM to retrieve the data. However, this is a bit restricting in that we cannot collect custom data, collect system metrics at a period other than the one that AIM is using - 30 seconds, or collect data at all if we don't spin up an AIM server.
A more convenient solution would be to allow providing an optional parameter to train() for which can contain a list of callbacks
In the same spirit I'd like to get access to the TrainingOutput object that sft_trainer.train() returns here (input_tokens_per_second, train_runtime, etc) :
Ideally I'd also like to measure the time it takes to load the model so that I can predict how long it would take to instantiate the weights that I produce. This would probably involve returning both the output of trainer.train() plus some extra information that the sft_trainer.train method measures (e.g. model_load_time)
Issue
I would like to use fms-hf-tuning to collect system level metrics while finetuning models, these include model load time, as well as device related metrics (like these that AIM collects).
One way would be to rely on just the measurements that AIM collects by pointing fms-hf-tuning to an AIM server then contacting AIM to retrieve the data. However, this is a bit restricting in that we cannot collect custom data, collect system metrics at a period other than the one that AIM is using - 30 seconds, or collect data at all if we don't spin up an AIM server.
A more convenient solution would be to allow providing an optional parameter to
train()
for which can contain a list of callbacksfms-hf-tuning/tuning/sft_trainer.py
Lines 29 to 34 in fc07060
In the same spirit I'd like to get access to the
TrainingOutput
object thatsft_trainer.train()
returns here (input_tokens_per_second, train_runtime, etc) :fms-hf-tuning/tuning/sft_trainer.py
Line 168 in fc07060
(just by returning the output of trainer.train() as the output of train()).
Done when
trainer.train()
to the caller oftrain()
The text was updated successfully, but these errors were encountered: