-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add analytics logging to MosaicMLLogger
#3106
base: main
Are you sure you want to change the base?
Conversation
…a-to-metadata-for-analytics
…nterval` to analytics logs
…a-to-metadata-for-analytics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a brief description of what we are trying to enable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments!
…:mosaicml/composer into angel/add-data-to-metadata-for-analytics
…:mosaicml/composer into angel/add-data-to-metadata-for-analytics
Added some tests and adjusted the way we display |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lgtm! a few minor follow ups.
…:mosaicml/composer into angel/add-data-to-metadata-for-analytics
backend, _, _ = parse_uri(self.analytics_data.save_folder) | ||
metrics['composer/cloud_provided_save_folder'] = backend if backend else 'local' | ||
|
||
# Save interval can be passed in w/ multiple types. If the type is a function, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some idea for utility of analytics on save interval? Nothing is coming to mind for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was requested in a Slack thread a while back. this included some less-helpful metrics (i.e. num_workers
) so if it isn't useful i'll take it out. @dakinggg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think just drop this one
…alytics logging fails
@@ -96,10 +115,57 @@ def log_hyperparameters(self, hyperparameters: Dict[str, Any]): | |||
def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None: | |||
self._log_metadata(metrics) | |||
|
|||
def log_analytics(self, state: State, loggers: Tuple[LoggerDestination, ...]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a callback instead? It seems like all loggers and experiment tracking tools would benefit from having these extra information for navigation or reproducibility purposes.
def init(self, state: State, logger: Logger) -> None: | ||
try: | ||
self.log_analytics(state, logger.destinations) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we log the exception here? We should also probably only log once. if its failing consistently, we don't want to spam the run with warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Sorry, lost track of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just log this all callbacks? Make it a separate callback?
@mvpatel2000 are you suggesting a new logger function |
@mvpatel2000 @angel-ruiz7 @dakinggg is this one basically done? or are there still concerns? |
Add
log_analytics
function toMosaicMLLogger
This adds a
log_analytics
function to theMosaicMLLogger
, which records these metrics about composer runs on the Mosaic platform:fsdp_config
sharding_strategy
state_dict_type
activation_checkpointing
forward_prefetch
backward_prefetch
mixed_precision
device_mesh
autoresume
algorithms
cloud_providers
cloud_provider_data
cloud_provider_checkpoints
save_interval
loggers
precision
optimizer
By logging these fields from the
MosaicMLLogger
into a run's metadata, we can then parse the metadata in SQL and add these metrics into UC. This will enable anyone on the genai team to create dashboards / queries for analytical purposes and view the patterns in these metrics across runs on the platform. (see the genai_runs table for more info)Output of the
Quick Start
Example