Naming complex flows #293

joaquinvanschoren · 2016-11-23T23:38:20Z

Hi, we had quite a long discussion in the python team about how to name flows. The current approach (where we use an auto-generated name for each flow) will cause problems/limitations in the future: there is no guarantee that the name is unique (two different flows could end up with the same name), and we lose information on the exact names of the components.

To fix this we propose to add two new fields to the flow description:

custom_name: a new field that can be set by the user at will to name her flow (can be NULL)
class_name: a new field with the name of the (top-level) class, e.g. 'classif.RandomForest'. This must be provided by the client (e.g. mlr).

If the randomforest is wrapped in a randomsearch operator, the top-level flow would have class_name 'TuneControlRandom' (is this correct?), and the subflow would have class_name 'classif.RandomForest'. For weka, we can auto-fill these name from the database. I'm less sure about the mlr.

The current 'name' remains as is. It is generated by the client as a human-readable flow name. In the Python API they will make this a unique descriptor. This is a good idea, because if not careful, one could end up with 2 flows with identical names and different structure. The server will recognize the difference and auto-version them, but it still may cause confusion.

Can you give an example of a complex mlr flow that is recently uploaded to help explain it? Is this still how you do things: http://www.openml.org/api/v1/flow/3880 ? Are you uploading R objects with the flow?

We will likely add these two fields as optional fields to the XML soon. Would it be hard for you to provide at least the class_name in the near future?

Thanks.

joaquinvanschoren · 2016-11-28T10:11:47Z

Hi, Any comment on this? It would be good to at least discuss it because we need to coordinate on this. Thanks.

giuseppec · 2016-11-29T13:46:42Z

Although, I see that we havn't solved this perfectly in mlr, this is the current status:

A flow is determined by its external.version (which is a crc32 hash generated from the R-object that contains the flow) and its flow.name which is something like 'classif.randomforest'.
If we have a complex flow, e.g. a randomforest wrapped in a randomsearch operator, we handle this as a new (separate) flow (we don't make use of "subflows" or "components"). In that case, the flow.name is 'classif.randomforest.tuned' and the external.version is the crc32 hash is generated from the R object that contains the random forest wrapped with a randomsearch operator. If we, for example, change the tuning algorithm and use, e.g., a grid search operator, the flow.name will still be 'classif.randomforest.tuned', however, this will be a new flow since the crc32 hash is generated from a different R object.

If you extend the XML, you will do this for the OpenML API version 2.0 since the API 1.0 is now fixed and will remain untouched for such "big" changes, right?

I guess in our case the "class_name" will still be something like 'classif.randomforest.tuned', which we can autofill. However, we still need something like the external.version. And I would like to test this new API change before we publish this to the main server.

joaquinvanschoren · 2016-11-29T14:38:08Z

Thanks. class_name would be something like 'tuneParams', as that is the top-level component. Ideally, that guy will have components such as the learner and control method. name would remain something like 'classif.randomforest.tuned'. It is a sort of human-readable expression of the flow, devised by the client API. custom_name could be something like 'Bernd's Automatic Tuner Making Awesomeness Now', is just a label for reference (and facilitates search). The main goal would be to better understand what a flow is about (e.g. you want to do benchmarking over many flows). Maybe I also want to show the user a tree graph of the flow, e.g. tuneParams ('randomsearch') |- classif.RandomForest And maybe I want to someday re-build my flow by piecing it together based on that information. Basically, it allows to deserialize the flow (even though this is not currently implemented yet) We could do it in v2 version of the API. Cheers, Joaquin

…

On Tue, Nov 29, 2016 at 2:46 PM giuseppec ***@***.***> wrote: Although, I see that we havn't solved this perfectly in mlr, this is the current status: 1. A flow is determined by its external.version (which is a crc32 hash generated from the R-object that contains the flow) and its flow.name which is something like 'classif.randomforest'. 2. If we have a complex flow, e.g. a randomforest wrapped in a randomsearch operator, we handle this as a new (separate) flow (we don't make use of "subflows" or "components"). In that case, the flow.name is 'classif.randomforest.tuned' and the external.version is the crc32 hash is generated from the R object that contains the random forest wrapped with a randomsearch operator. If we, for example, change the tuning algorithm and use, e.g., a grid search operator, the flow.name will still be 'classif.randomforest.tuned', however, this will be a new flow since the crc32 hash is generated from a different R object. If you extend the XML, you will do this for the OpenML API version 2.0 since the API 1.0 is now fixed and will remain untouched for such "big" changes, right? I guess in our case the "class_name" will still be something like 'classif.randomforest.tuned', which we can autofill. However, we still need something like the external.version. And I would like to test this new API change before we publish this to the main server. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#293 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABpQV0duvUNMt-5mRFBYLhOApvLcO4Xuks5rDCzCgaJpZM4K7Jbg> .

giuseppec · 2016-11-29T16:47:26Z

Which XML fields would be the primary key after that change?
Currently it seems to be the flow.name and external.version.

HeidiSeibold · 2016-12-06T12:10:04Z

Neues ctree auf OpenML testen

HeidiSeibold mentioned this issue Dec 6, 2016

Learners with same name: party::ctree, partykit::ctree and partyNG::ctree mlr-org/mlr#1377

Closed

giuseppec added the APIv2 label Feb 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naming complex flows #293

Naming complex flows #293

joaquinvanschoren commented Nov 23, 2016

joaquinvanschoren commented Nov 28, 2016

giuseppec commented Nov 29, 2016

joaquinvanschoren commented Nov 29, 2016 via email

giuseppec commented Nov 29, 2016

HeidiSeibold commented Dec 6, 2016

Naming complex flows #293

Naming complex flows #293

Comments

joaquinvanschoren commented Nov 23, 2016

joaquinvanschoren commented Nov 28, 2016

giuseppec commented Nov 29, 2016

joaquinvanschoren commented Nov 29, 2016 via email

giuseppec commented Nov 29, 2016

HeidiSeibold commented Dec 6, 2016