Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests: Add retries to NbTestCase#deleteFile to stabilize unittests #8187

Closed
wants to merge 1 commit into from

Conversation

matthiasblaesing
Copy link
Contributor

@matthiasblaesing matthiasblaesing commented Jan 24, 2025

For the groovy.editor unittests several fails were observed, where the testfailure did not result from the test itself, but by the cleanup routine. The cleanup removes the work directory to prepare for the next test run.

The recursive deletion can fail with an IOException, because new files are created, while the directory tree is traversed. Observed where archives.properties and segments from parsing API. Both are saved delayed.

Instead of failing, the deletion is retried to continue testing.

The observed stacktrace:

at java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:246) 
at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
at java.base/java.nio.file.Files.delete(Files.java:1152)
at org.netbeans.junit.NbTestCase$4.postVisitDirectory(NbTestCase.java:1006)
at org.netbeans.junit.NbTestCase$4.postVisitDirectory(NbTestCase.java:998)
at java.base/java.nio.file.Files.walkFileTree(Files.java:2828)
at java.base/java.nio.file.Files.walkFileTree(Files.java:2882)
at org.netbeans.junit.NbTestCase.deleteFile(NbTestCase.java:998)
at org.netbeans.junit.NbTestCase.deleteSubFiles(NbTestCase.java:1017)
at org.netbeans.junit.NbTestCase.clearWorkDir(NbTestCase.java:1031)
at org.netbeans.modules.groovy.editor.api.parser.GroovyParserTest.tearDown(GroovyParserTest.java:79)
at org.netbeans.junit.NbTestCase.runBare(NbTestCase.java:530)
at org.netbeans.junit.NbTestCase.run(NbTestCase.java:298)

For the groovy.editor unittests several fails were observed, where the
testfailure did not result from the test itself, but by the cleanup
routine. The cleanup removes the work directory to prepare for the next
test run.

The recursive deletion can fail with an `IOException`, because new files
are created, while the directory tree is traversed. Observed where
`archives.properties` and `segments` from parsing API. Both are saved
delayed.

Instead of failing, the deletion is retried to continue testing.

The observed stacktrace:

at java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:246)
at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)
at java.base/java.nio.file.Files.delete(Files.java:1152)
at org.netbeans.junit.NbTestCase$4.postVisitDirectory(NbTestCase.java:1006)
at org.netbeans.junit.NbTestCase$4.postVisitDirectory(NbTestCase.java:998)
at java.base/java.nio.file.Files.walkFileTree(Files.java:2828)
at java.base/java.nio.file.Files.walkFileTree(Files.java:2882)
at org.netbeans.junit.NbTestCase.deleteFile(NbTestCase.java:998)
at org.netbeans.junit.NbTestCase.deleteSubFiles(NbTestCase.java:1017)
at org.netbeans.junit.NbTestCase.clearWorkDir(NbTestCase.java:1031)
at org.netbeans.modules.groovy.editor.api.parser.GroovyParserTest.tearDown(GroovyParserTest.java:79)
at org.netbeans.junit.NbTestCase.runBare(NbTestCase.java:530)
at org.netbeans.junit.NbTestCase.run(NbTestCase.java:298)
@matthiasblaesing matthiasblaesing added tests ci:all-tests [ci] enable all tests labels Jan 24, 2025
@matthiasblaesing matthiasblaesing added this to the NB25 milestone Jan 24, 2025
@matthiasblaesing matthiasblaesing changed the base branch from master to delivery January 24, 2025 17:59
@mbien
Copy link
Member

mbien commented Jan 24, 2025

if something is still writing to the folder while the test is trying to delete it, the test probably should have to make sure first that this task is stopped.

Iterating on delete until the directory is gone does hide the problem, since the next test case could end up seeing unexpected files still.

If this is specific to the groovy tests, lets give the groovy tests their own delete method - the retry loop shouldn't run everywhere IMO since it can hide more problems.

@matthiasblaesing
Copy link
Contributor Author

matthiasblaesing commented Jan 24, 2025

What I see in the directories are the files created by indexing. Sure, I can mess with the production code to make these stoppable, but if retrying deletion in only test code, I think that is the better idea.

if something is still writing to the folder while the test is trying to delete it, the test probably should have to make sure first that this task is stopped.

Sure, you just need to wait. At some point the parsing/scanning infrastructure will settle and you can delete.

Iterating on delete until the directory is gone does hide the problem, since the next test case could end up seeing unexpected files still.

If this is specific to the groovy tests, lets give the groovy tests their own delete method - the retry loop shouldn't run everywhere IMO since it can hide more problems.

This is the cleanup code. Not the test code. It is best effort and if it improves the situation I don't see a problem.

Take it leave it, I'm willing to step away from this and close it.

@mbien
Copy link
Member

mbien commented Jan 25, 2025

@matthiasblaesing concurrent write is no good sign even during testing. The lucene index is built while the dir is cleared - this could cause all kinds of exceptions in the log etc - and there is no guarantee that the folder is empty after the teardown is done.

some observations: no other groovy test clears the workdir on teardown, only on setup - lets do the same in GroovyParserTest (#8191).

If this would be just about the groovy tests it would be less of a problem, but NbTestCase is used everywhere. Knowing when concurrent writes during recursive delete happen can be useful information.

@matthiasblaesing
Copy link
Contributor Author

I question that a failing cleanup is a good test indicator. For me this is a brittle test setup. Anyway, you made your point, so I'll step away and lets you fix the tests in the manor you prefer.

@mbien
Copy link
Member

mbien commented Jan 25, 2025

I question that a failing cleanup is a good test indicator.

@matthiasblaesing the problem is that the method is not exclusively called during cleanup and if cleanup fails, tests shouldn't run. Iterating on cleanup does not assert that the folder is empty once it finishes, the concurrent write is outside of the control of that method. Having half-cleaned up folders is an even worse race condition in the making. Detecting concurrent writes is a feature not a bug in my opinion.

@matthiasblaesing
Copy link
Contributor Author

I won't start a discussion why I think this assessment is wrong, I made my point and will stop wasting my time.

@mbien
Copy link
Member

mbien commented Jan 25, 2025

thats unfortunate - I would have liked to know why my assessment is wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:all-tests [ci] enable all tests tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants