Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic export is triggered multiple times per process #6140

Open
BartChris opened this issue Jul 23, 2024 · 9 comments · May be fixed by #6141
Open

Automatic export is triggered multiple times per process #6140

BartChris opened this issue Jul 23, 2024 · 9 comments · May be fixed by #6141
Labels
bug export Export mappings and configurations

Comments

@BartChris
Copy link
Collaborator

BartChris commented Jul 23, 2024

Describe the bug
When a lot of automatic exports are triggered for multiple processes (e.g. after multiple automatic processing steps), it seems that the export for the processes is sometimes triggered two times and the export of the same process appears two times in the task manager.

image

I tried to trace the behaviour and it seems as if the automatic export is triggered two times, when the previous task is closed and the next task is started and executed automatically. Maybe i am reading the code wrong, but the current implementation seems wrong.

When a task is closed activateTasksForClosedTask is called, which will activate the following tasks.

private void activateTasksForClosedTask(Task closedTask) throws DataException, IOException, DAOException {

The code inside activateTask is then calling processAutomaticTask

private void activateTask(Task task) throws DataException, IOException, DAOException {

which will add all automatic tasks to the list automaticTasks

For every task on the list Kitodo will start a TaskScriptThread

for (Task automaticTask : automaticTasks) {
automaticTask.setProcessingBegin(new Date());
TaskScriptThread thread = new TaskScriptThread(automaticTask);
TaskManager.addTask(thread);

When this TaskScriptThread is executed it will get evaluated for the type of task, in the case of the export this will lead to the execution of executeDmsExport (taskService.executeDmsExport(this.task);)

This is were the problems might begin, because that method will start another TaskScriptThread

public boolean startExport(Process process, URI unused) {
if (ConfigCore.getBooleanParameterOrDefaultValue(ParameterCore.ASYNCHRONOUS_AUTOMATIC_EXPORT)) {
TaskManager.addTask(new ExportDmsTask(this, process));
Helper.setMessage(TaskSitter.isAutoRunningThreads() ? "DMSExportByThread" : "DMSExportThreadCreated",
process.getTitle());
return false;

In my understanding we now have two threads which will end up doing the exact same thing: Exporting the process.

This is different from generateImages which seems to get executed in the already existing thread.

public void generateImages(EmptyTask executingThread, Task task, boolean automatic) throws DataException {

We probably should handle the export similar to the generateImages and not start a Thread, when there is already a thread for the export.

Expected behavior
The export should only be triggered one time per process.

Release
3.6, Master

@BartChris BartChris added the bug label Jul 23, 2024
@BartChris BartChris linked a pull request Jul 23, 2024 that will close this issue
@henning-gerhardt
Copy link
Collaborator

henning-gerhardt commented Jul 23, 2024

I think this behavior is correct but misleading :-( The first shown task in the task manager is the task from the workflow. The second task in the task manager is the executed action from the workflow task. As both have using the same words it is confusing.

edit: if you have a process with a lot of media data (50gb or more, depending on how fast a file copy is): you will see that on execution of the first task no export is done / no data get copied. Only on execution of the second task the data get exported / data get copied. Look even on the time stamp information on the exported data.

@apiller
Copy link

apiller commented Jul 23, 2024

But should the task be executed several times? As I understand the first one is the automatic script and the second one the execution of the script.
In this case the script is executed more than once.
Export

@henning-gerhardt
Copy link
Collaborator

Did you execute tasks in taskmanager parallel (config taskManager.autoRunLimit with a value greater than 1)?

An other reason could be on exporting year issues of newspapers with all their daily issues at once. We discovered in this case a crazy, time and resource consuming behavior until Java.out.OfMemoryException.

@apiller
Copy link

apiller commented Jul 23, 2024

Normal export by selecting issues an choosing export in the actions on the left is ok and working. Just finishing a whole year at once or even a month or less with "set status of a process up" is causing this. It even starts to export allready finished processes.

@henning-gerhardt
Copy link
Collaborator

Just finishing a whole year at once or even a month or less with "set status of a process up" is causing this. It even starts to export allready finished processes.

We have the same discovering but without "set status of a process up" (as we did not use this on our workflow as it could break more things than it solve things after our experience). I think that this usage scenario is really not full covered by the current code.

I digged into this and the code in the startExport() method (https://github.com/kitodo/kitodo-production/blob/master/Kitodo/src/main/java/org/kitodo/export/ExportDms.java#L102-L122) plays a important role at this export behavior of exporting year level newspapers from the ui. It is more complicated as already exported and not exported issues of this year influence this.

Finding a better and more robust way to export all kinds of processes and including of all kind of hierarchy levels is a complicated thing.

@BartChris
Copy link
Collaborator Author

BartChris commented Jul 24, 2024

I think this behavior is correct but misleading :-( The first shown task in the task manager is the task from the workflow. The second task in the task manager is the executed action from the workflow task. As both have using the same words it is confusing.

edit: if you have a process with a lot of media data (50gb or more, depending on how fast a file copy is): you will see that on execution of the first task no export is done / no data get copied. Only on execution of the second task the data get exported / data get copied. Look even on the time stamp information on the exported data.

I think you are right. There are two threads started, but the first probably only acts as a wrapper thread which triggers the actual exporting thread. I suppose we do not need the wrapper thread and can directly trigger the export thread (#6141) but i suppose this will not adress the large problems with the export as discussed here.

@henning-gerhardt
Copy link
Collaborator

One more note: the second thread is only created and started if the automatic export configuration is set to true. If this is not the case the export will immediately started. So this configuration option has influence of exporting.

@BartChris
Copy link
Collaborator Author

BartChris commented Jul 25, 2024

No hard evidence yet, but setting "ASYNCHRONOUS_AUTOMATIC_EXPORT" to false seem to bring some improvements on systems which try to combine multiple "set status of a process up" with autoRunLimit greater than 1.
In that scenario the export operation still runs in it's own thread (the thread which was only a "wrapper" thread before but now does the actual work).
The problem is that when you do that there is no mechanism to close the export task so it stays open.

return startExport(process, (ExportDmsTask) null);

So maybe it is worth investigating whether the instantiation of a thread, which does not much more than triggering another thread, might lead to problems.

@henning-gerhardt
Copy link
Collaborator

Using "set status of a process" up is may the cause. It is working different on different configured tasks.

  • F.e. on an automatic tasks with a script the script get executed and the state of the tasks depends on how the script execution is completed. If the script is executed with an return value of 0 the task get closed and in any other case the task remain open.
  • On non automatic tasks using this will only change the state from open to "in work" and you must call this action again to get the state to done / closed.
  • Using this with an export task its depends - how you already discovered - on the asynchronous export configuration. Is this configuration set to false then the export task of the workflow get started but never closed automatically. In this case you must use this action again to close this task as you have the control over the task and if you discover that the export is failing then you would not close this task or. This is a behavior which already exists in 2.x. Only if the asynchronous export is set to true and the export is successful then the task would go to done / get closed.
  • Then you have even the task open "complete on take over" (or similar as I did not have a running instance near by me while writing this). This will close the task if you assign in to you / taking it over. I don't know who need this but the option is existing and will influence the behavior too.

This different behavior of this action decided us to not use this action as a regular user action and even using this as an administrative action is dangerous act as you must know which task will work in which way on using this action and in some situations (exporting is such an action) we did not use it.

@solth solth added the export Export mappings and configurations label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug export Export mappings and configurations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants