Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming complex flows #293

Open
joaquinvanschoren opened this issue Nov 23, 2016 · 5 comments
Open

Naming complex flows #293

joaquinvanschoren opened this issue Nov 23, 2016 · 5 comments
Labels

Comments

@joaquinvanschoren
Copy link
Contributor

Hi, we had quite a long discussion in the python team about how to name flows. The current approach (where we use an auto-generated name for each flow) will cause problems/limitations in the future: there is no guarantee that the name is unique (two different flows could end up with the same name), and we lose information on the exact names of the components.

To fix this we propose to add two new fields to the flow description:

  • custom_name: a new field that can be set by the user at will to name her flow (can be NULL)
  • class_name: a new field with the name of the (top-level) class, e.g. 'classif.RandomForest'. This must be provided by the client (e.g. mlr).

If the randomforest is wrapped in a randomsearch operator, the top-level flow would have class_name 'TuneControlRandom' (is this correct?), and the subflow would have class_name 'classif.RandomForest'. For weka, we can auto-fill these name from the database. I'm less sure about the mlr.

The current 'name' remains as is. It is generated by the client as a human-readable flow name. In the Python API they will make this a unique descriptor. This is a good idea, because if not careful, one could end up with 2 flows with identical names and different structure. The server will recognize the difference and auto-version them, but it still may cause confusion.

Can you give an example of a complex mlr flow that is recently uploaded to help explain it? Is this still how you do things: http://www.openml.org/api/v1/flow/3880 ? Are you uploading R objects with the flow?

We will likely add these two fields as optional fields to the XML soon. Would it be hard for you to provide at least the class_name in the near future?

Thanks.

@joaquinvanschoren
Copy link
Contributor Author

Hi, Any comment on this? It would be good to at least discuss it because we need to coordinate on this. Thanks.

@giuseppec
Copy link
Member

Although, I see that we havn't solved this perfectly in mlr, this is the current status:

  1. A flow is determined by its external.version (which is a crc32 hash generated from the R-object that contains the flow) and its flow.name which is something like 'classif.randomforest'.
  2. If we have a complex flow, e.g. a randomforest wrapped in a randomsearch operator, we handle this as a new (separate) flow (we don't make use of "subflows" or "components"). In that case, the flow.name is 'classif.randomforest.tuned' and the external.version is the crc32 hash is generated from the R object that contains the random forest wrapped with a randomsearch operator. If we, for example, change the tuning algorithm and use, e.g., a grid search operator, the flow.name will still be 'classif.randomforest.tuned', however, this will be a new flow since the crc32 hash is generated from a different R object.

If you extend the XML, you will do this for the OpenML API version 2.0 since the API 1.0 is now fixed and will remain untouched for such "big" changes, right?

I guess in our case the "class_name" will still be something like 'classif.randomforest.tuned', which we can autofill. However, we still need something like the external.version. And I would like to test this new API change before we publish this to the main server.

@joaquinvanschoren
Copy link
Contributor Author

joaquinvanschoren commented Nov 29, 2016 via email

@giuseppec
Copy link
Member

Which XML fields would be the primary key after that change?
Currently it seems to be the flow.name and external.version.

@HeidiSeibold
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants