Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the native crash happening on Android. #116

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MohitMaliFtechiz
Copy link
Collaborator

See issue kiwix/kiwix-android#3956

  • The issue on the Android side was a null pointer dereference, meaning that when accessing the getEntryByPath method, the archive object had already been disposed of, which caused the crash.
  • We are correctly handling and disposing of the previous archive and web views for the previously set archive. However, web views load content on their own thread. When we change the ZIM file, we dispose of the previous archive and stop the ongoing web view processes. In certain cases, the web view internally tries to get content via the getEntryByPath method because it takes some time to cancel all the previous calls on the background thread. As a result, the web view attempts to call the method on a disposed archive, which causes the native crash.
  • To fix this, I refactored the getPtr method to return nullptr if the archive has already been disposed of. I also handled this in the utils.h class by checking if the object is null (meaning it has already been disposed). If so, an error is thrown (which should be handled by the caller, in this case, the Kiwix app), instead of crashing the program.
  • Handling this scenario on the Android side is not feasible since the WebView operates on a separate thread over which we have no control. However, addressing this issue on the java-libkiwix side is a better approach. If an object is already disposed, it should throw an error that can be caught by the caller, instead of crashing the entire program. Handling such errors is the responsibility of the user of java-libkiwix (e.g., Kiwix Android). We are already handling all errors thrown by java-libkiwix in the Kiwix Android app.
  • Currently, I have created the SAFE_THIS macro to gather feedback on this approach. Changing the existing THIS macro to incorporate the new behavior would require updating all the places where it is being used. To validate this fix before making those changes, I implemented the new function separately. I explicitly disposed of the archive and attempted to access its content via the getEntryByPath method. Now, instead of crashing the entire program, it correctly throws an error, which is handled by Kiwix.

The issue sometimes occurs on the CI see https://github.com/kiwix/kiwix-android/actions/runs/12809338048/job/35713951704?pr=4177#step:8:9560

* The issue on the Android side was a `null pointer dereference`, meaning that when accessing the getEntryByPath method, the archive object had already been disposed of, which caused the crash.
* We are correctly handling and disposing of the previous archive and web views for the previously set archive. However, web views load content on their own thread. When we change the ZIM file, we dispose of the previous archive and stop the ongoing web view processes. In certain cases, the web view internally tries to get content via the getEntryByPath method because it takes some time to cancel all the previous calls on the background thread. As a result, the web view attempts to call the method on a disposed archive, which causes the native crash.
* To fix this, I refactored the getPtr method to return nullptr if the archive has already been disposed of. I also handled this in the utils.h class by checking if the object is null (meaning it has already been disposed). If so, an error is thrown (which should be handled by the caller, in this case, the Kiwix app), instead of crashing the program.
* Handling this scenario on the Android side is not feasible since the WebView operates on a separate thread over which we have no control. However, addressing this issue on the java-libkiwix side is a better approach. If an object is already disposed, it should throw an error that can be caught by the caller, instead of crashing the entire program. Handling such errors is the responsibility of the user of `java-libkiwix` (e.g., Kiwix Android). We are already handling all errors thrown by `java-libkiwix` in the Kiwix Android app.
* Currently, I have created the `SAFE_THIS` macro to gather feedback on this approach. Changing the existing `THIS` macro to incorporate the new behavior would require updating all the places where it is being used. To validate this fix before making those changes, I implemented the new function separately. I explicitly disposed of the archive and attempted to access its content via the getEntryByPath method. Now, instead of crashing the entire program, it correctly throws an error, which is handled by Kiwix.
@MohitMaliFtechiz
Copy link
Collaborator Author

@kelson42, @mgautierfr Can you please review the PR and validate the idea? then I will refactore the remaining code.

Copy link

codecov bot commented Jan 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.23%. Comparing base (4c6dadb) to head (5fd942d).

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #116   +/-   ##
=========================================
  Coverage     93.23%   93.23%           
  Complexity      237      237           
=========================================
  Files            47       47           
  Lines           325      325           
  Branches          3        3           
=========================================
  Hits            303      303           
  Misses           19       19           
  Partials          3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kelson42 kelson42 requested review from veloman-yunkan and mgautierfr and removed request for mgautierfr January 17, 2025 17:37
Comment on lines +31 to +38
({ \
auto ptr = getPtr<NATIVE_TYPE>(env, thisObj); \
if (!ptr) { \
throwException(env, "java/lang/IllegalStateException", "The native object has already been disposed."); \
return nullptr; \
} \
ptr; \
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see how this could work in the pre-C++-14 language standard that I am more or less familiar with. SAFE_THIS has to be an expression but it doesn't seem to expand to code that can be considered an expression according to the language specification. I tried to quickly skim through https://en.cppreference.com/w/cpp/language/expressions but couldn't find anything that allows to consider this code valid in modern c++.

In other words, I am puzzled how this code compiles (in the context of the change in lib/src/main/cpp/libzim/archive.cpp).

Copy link
Collaborator Author

@MohitMaliFtechiz MohitMaliFtechiz Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@veloman-yunkan This code is made by @mgautierfr, he can better explain this, and i am not very much familiar with C++.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that @veloman-yunkan understand how it works :)

#define FOO BAR make every instance of FOO literally replaced by BAR.

So something like THIS->getEntryByPath(...) is replaced by getPtr<NATIVE_TYPE>(env, thisObj)->getEntryByPath(...). And it compiles because getPtr is a function returning a pointer.

With your define of SAFE_THIS we have SAFE_THIS->getEntryByPath(...) replaced by:

 ({ 
        auto ptr = getPtr<NATIVE_TYPE>(env, thisObj); 
        if (!ptr) { 
            throwException(env, "java/lang/IllegalStateException", "The native object has already been disposed."); 
            return nullptr; 
        } 
        ptr; 
    })->getEntryByPath(...)

And, as @veloman-yunkan, I don't know how this code can be compiled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your define of SAFE_THIS we have SAFE_THIS->getEntryByPath(...) replaced by:

@mgautierfr SAFE_THIS also uses the getPtr<NATIVE_TYPE>(env, thisObj) to get the pointer, and if this method returns the nullptr it throws the error which will be caught in the archive.cpp.

METHOD(jobject, getEntryByPath__Ljava_lang_String_2, jstring path) {
  return BUILD_WRAPPER("org/kiwix/libzim/Entry", SAFE_THIS->getEntryByPath(TO_C(path)));
} CATCH_EXCEPTION(nullptr)

Also, in the archive.cpp SAFE_THIS returns the archive.

Screenshot from 2025-01-21 17-41-04

I think that's why this code was compiled without any error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAFE_THIS is not a function. It is a macro (ie. textual replacement)

It appears that we are facing a statement expression (https://stackoverflow.com/questions/8344101/what-is-it-called-when-a-block-returns-a-value) which is not a standard C++ code but a gcc extension.

This explain why it works. However, I not really confident to use a non standard c++ here, especially as we have a better alternative.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh okay got it. @mgautierfr So we can remove this SAFE_THIS as it is non-standard. In this PR I have updated the getPtr method if that is null it returns the nullptr which we can handle before calling the functions on this.

e.g. The updated getPtr method is:

// Return a shared_ptr for the handle
template<typename T>
shared_ptr<T> getPtr(JNIEnv* env, jobject thisObj, const char* handleName = "nativeHandle")
{
  jclass thisClass = env->GetObjectClass(thisObj);
  jfieldID fidNumber = env->GetFieldID(thisClass, handleName, "J");
  auto handle = reinterpret_cast<shared_ptr<T>*>(env->GetLongField(thisObj, fidNumber));
    if (handle == nullptr) {
        return nullptr;
    }
  return *handle;
}

and we are getting this like this in archive.cpp:

METHOD(jobject, getEntryByPath__Ljava_lang_String_2, jstring path) {
  if (THIS == nullptr) {
     throwException(env, "java/lang/IllegalStateException", "The Archive object has already been disposed.");
     return nullptr;
  }
  return BUILD_WRAPPER("org/kiwix/libzim/Entry", THIS->getEntryByPath(TO_C(path)));
} CATCH_EXCEPTION(nullptr)

and will update the other code to use this approach. By using this approach we can handle the native crash for now without upgrading the minimum SDK version. If you have a better approach to not writing this check in all places, something like extension function that avoids writing this check everywhere please let me know.

In the screenshot: I have compiled the code with this new approach and tried to explicitly dispose the archive and then tried to access the content on the disposed archive and instead of crashing the program, it normally throws the exception. That we are already handling in Kiwix app.

image

I am not in favor of upgrading the minimum SDK to 33 because we will lose most of the devices, SDK 33 means Android 13 which is one of the most recent versions of Android.

lib/src/main/cpp/utils.h Outdated Show resolved Hide resolved
Copy link
Member

@mgautierfr mgautierfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are correctly handling and disposing of the previous archive and web views for the previously set archive.

I would say that you are not correctly disposing the previous archive if you are disposing it while something is still using it.

However, I understand it can be difficult to do such memory management in a java world where everything is expected to be managed by GC.

I have a clean solution for this problem and it is pr #25. The PR remove the need to call dispose on libzim object (and even remove this method) and make resource automatically clean up when GC destroy the object.

@MohitMaliFtechiz
Copy link
Collaborator Author

I would say that you are not correctly disposing the previous archive if you are disposing it while something is still using it.

@mgautierfr You are right, and I mentioned this thing in the PR description please see point 4(We have done all possible things to remove loading of webView at the Android level).

I have a clean solution for this problem and it is pr #25. The PR remove the need to call dispose on libzim object (and even remove this method) and make resource automatically clean up when GC destroy the object.

Could we complete that PR so we get rid of this error?

@mgautierfr
Copy link
Member

Could we complete that PR so we get rid of this error?

The PR is complete (at least at the time of submission, we may have few merge conflict to fix).
It hasn't been merge because we need an update the minimun sdk version to 33 and it will (was at least, don't know now) drop compatibility with too many "old" android devices.

@MohitMaliFtechiz
Copy link
Collaborator Author

It hasn't been merge because we need an update the minimun sdk version to 33 and it will (was at least, don't know now) drop compatibility with too many "old" android devices.

@mgautierfr I am not in favor of upgrading the minimum SDK version to 33 since it drops the compatibility for most of the devices. Since SDK 33 means Android 13 which is a very new device which is few users using this and the above devices, and most of the users still using the old devices like Android 12, Android 11, etc.

Is there any other alternative to use the new PR without upgrading the minimum SDK version? If currently not, what do you think about my this proposal #116 (comment)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants