-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naming complex flows #293
Comments
Hi, Any comment on this? It would be good to at least discuss it because we need to coordinate on this. Thanks. |
Although, I see that we havn't solved this perfectly in mlr, this is the current status:
If you extend the XML, you will do this for the OpenML API version 2.0 since the API 1.0 is now fixed and will remain untouched for such "big" changes, right? I guess in our case the "class_name" will still be something like 'classif.randomforest.tuned', which we can autofill. However, we still need something like the external.version. And I would like to test this new API change before we publish this to the main server. |
Thanks.
class_name would be something like 'tuneParams', as that is the top-level
component. Ideally, that guy will have components such as the learner and
control method.
name would remain something like 'classif.randomforest.tuned'. It is a sort
of human-readable expression of the flow, devised by the client API.
custom_name could be something like 'Bernd's Automatic Tuner Making
Awesomeness Now', is just a label for reference (and facilitates search).
The main goal would be to better understand what a flow is about (e.g. you
want to do benchmarking over many flows). Maybe I also want to show the
user a tree graph of the flow, e.g.
tuneParams ('randomsearch')
|- classif.RandomForest
And maybe I want to someday re-build my flow by piecing it together based
on that information.
Basically, it allows to deserialize the flow (even though this is not
currently implemented yet)
We could do it in v2 version of the API.
Cheers,
Joaquin
…On Tue, Nov 29, 2016 at 2:46 PM giuseppec ***@***.***> wrote:
Although, I see that we havn't solved this perfectly in mlr, this is the
current status:
1. A flow is determined by its external.version (which is a crc32 hash
generated from the R-object that contains the flow) and its flow.name
which is something like 'classif.randomforest'.
2. If we have a complex flow, e.g. a randomforest wrapped in a
randomsearch operator, we handle this as a new (separate) flow (we don't
make use of "subflows" or "components"). In that case, the flow.name
is 'classif.randomforest.tuned' and the external.version is the crc32 hash
is generated from the R object that contains the random forest wrapped with
a randomsearch operator. If we, for example, change the tuning algorithm
and use, e.g., a grid search operator, the flow.name will still be
'classif.randomforest.tuned', however, this will be a new flow since the
crc32 hash is generated from a different R object.
If you extend the XML, you will do this for the OpenML API version 2.0
since the API 1.0 is now fixed and will remain untouched for such "big"
changes, right?
I guess in our case the "class_name" will still be something like
'classif.randomforest.tuned', which we can autofill. However, we still need
something like the external.version. And I would like to test this new API
change before we publish this to the main server.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#293 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABpQV0duvUNMt-5mRFBYLhOApvLcO4Xuks5rDCzCgaJpZM4K7Jbg>
.
|
Which XML fields would be the primary key after that change? |
Hi, we had quite a long discussion in the python team about how to name flows. The current approach (where we use an auto-generated name for each flow) will cause problems/limitations in the future: there is no guarantee that the name is unique (two different flows could end up with the same name), and we lose information on the exact names of the components.
To fix this we propose to add two new fields to the flow description:
If the randomforest is wrapped in a randomsearch operator, the top-level flow would have class_name 'TuneControlRandom' (is this correct?), and the subflow would have class_name 'classif.RandomForest'. For weka, we can auto-fill these name from the database. I'm less sure about the mlr.
The current 'name' remains as is. It is generated by the client as a human-readable flow name. In the Python API they will make this a unique descriptor. This is a good idea, because if not careful, one could end up with 2 flows with identical names and different structure. The server will recognize the difference and auto-version them, but it still may cause confusion.
Can you give an example of a complex mlr flow that is recently uploaded to help explain it? Is this still how you do things: http://www.openml.org/api/v1/flow/3880 ? Are you uploading R objects with the flow?
We will likely add these two fields as optional fields to the XML soon. Would it be hard for you to provide at least the class_name in the near future?
Thanks.
The text was updated successfully, but these errors were encountered: