feat: handle SIGBUS / SIGSEGV only when originating from YARA's memory blocks #2020

secDre4mer · 2023-12-04T16:17:34Z

YARA currently registers its own signal handler unless instructed otherwise with SCAN_FLAGS_NO_TRYCATCH. This is dangerous for several reasons:

The current implementation is racy; when multiple threads scan simultaneously, the original signal handler may never be restored.
If the calling application is multithreaded, and one thread scans with YARA while a signal occurs in another thread, then that signal is ignored by the current signal handler.
If a callback from YARA causes a segmentation fault, YARA assumes this to be due to the data, potentially causing havoc if the caller (which may be unaware of YARA's signal handler) expects its own signal handler to be called.

This PR tries to improve this situation by adding several safety mechanisms when handling signals:

Creating a signal handler in the (racy) POSIX way is now secured with a mutex
Signals are only handled if the fault address lies within the currently inspected memory block
Unhandled signals are forwarded to the signal handler that was installed before YARA (analogous to the Windows implementation, which uses EXCEPTION_CONTINUE_SEARCH)

If multiple threads were running in parallel (e.g. each using a YR_SCANNER), race conditions could cause the original signal handler to not be restored correctly. Add a mutex that controls access to the (process wide) signal handler to ensure that the YARA signal handler is only installed once and only removed when it is no longer needed.

If a signal is received that does not originate from YARA (which is quite possible in multi-threaded applications using YARA), it is currently ignored. Especially for SIGSEGV, this may cause unexpected behaviour in the caller, even in threads that do not use YARA. Add logic to call the original signal handler to handle this case more cleanly. It is not perfect - with only a process wide signal handler, it's probably impossible to have a perfect solution - but it's better than the current behaviour.

Some callbacks from YARA code are done from YR_TRYCATCH. Currently, YARA assumes that SIGBUS / SIGSEGV from these callbacks are due to the scanned memory range, but this is not necessarily true. Add logic to identify the currently scanned memory block and only catch the signal if the SIGBUS / SIGSEGV originated from it.

secDre4mer · 2023-12-04T16:22:54Z

This is related to hillu/go-yara#137, which tries to deal with these issues with an original signal handler (installed by the Golang runtime) versus YARA's signal handler. Thanks to @plusvic for initially identifying the problems with a process wide signal handler, and to @hillu for discussing possible solutions.

vthib · 2023-12-04T18:56:42Z

Looks nice! Would have allowed us to catch #2016 much more easily.

Not entirely related to your changes, but I wonder if there is actually a reason to catch SIGSEGV?

From my understanding, this signal handler is to handle issues that can arise when reading from an mmap'ed memory region. But afaik, those issues can only trigger SIGBUS signals (for example, mmap'ing a file, shortening it, then reading past the new end of the file), and are not supposed to trigger SIGSEGV. Are there expected scenarios where reading from a mmap region raises a SIGSEGV signal?

hillu · 2023-12-04T20:11:21Z

libyara/exception.h

+      old_handler = old_sigsegv_exception_handler;
+    pthread_mutex_unlock(&exception_handler_mutex);
+
+    if (old_handler.sa_flags & SA_SIGINFO)


Actually calling the original handler should probably be handled together. I suggest the following simplification:

if (old_handler.sa_handler == SIG_DFL) { // Old handler is the default action. To do this, set the signal handler back to default and raise the signal. // This is fairly volatile - since this is not an atomic operation, signals from other threads might also // cause the default action while we're doing this. However, the default action will typically cause a // process termination anyway. pthread_mutex_lock(&exception_handler_mutex); struct sigaction current_handler; sigaction(sig, &old_handler, &current_handler); raise(sig); sigaction(sig, &current_handler, NULL); pthread_mutex_unlock(&exception_handler_mutex); } else if (old_handler.sa_handler == SIG_IGN) { // SIG_IGN wants us to ignore the signal return; } else if (old_handler.sa_flags & SA_SIGINFO) { old_handler.sa_sigaction(sig, info, context); } else { old_handler.sa_handler(sig); }

SIG_DFL is defined as zero, though. If I have a signal handler with flag SA_SIGINFO set, which therefore only sets sa_sigaction and not sa_handler, this proposal would incorrectly use the SIG_DFL part. I agree that an if / else structure makes it easier to read, but if we do so, we need the SA_SIGINFO check at the start.

Correct, so that would be:

if (old_handler.sa_flags & SA_SIGINFO) { old_handler.sa_sigaction(sig, info, context); } else if (old_handler.sa_handler == SIG_DFL) { // Old handler is the default action. To do this, set the signal handler back to default and raise the signal. // This is fairly volatile - since this is not an atomic operation, signals from other threads might also // cause the default action while we're doing this. However, the default action will typically cause a // process termination anyway. pthread_mutex_lock(&exception_handler_mutex); struct sigaction current_handler; sigaction(sig, &old_handler, &current_handler); raise(sig); sigaction(sig, &current_handler, NULL); pthread_mutex_unlock(&exception_handler_mutex); } else if (old_handler.sa_handler == SIG_IGN) { // SIG_IGN wants us to ignore the signal return; } else { old_handler.sa_handler(sig); }

Implemented like this in the current version.

hillu · 2023-12-04T20:14:30Z

libyara/exception.h

+    {
+      if (old_handler.sa_handler == SIG_DFL)
+      {
+        // Old handler is the default action. To do this, set the signal handler back to default and raise the signal.


We might get away without locking and having to worry about non-atomic operations if we use SA_RESETHAND when installing our handler over SIG_DFL handlers.

SA_RESETHAND comes with additional complications, though. If we use it, we are inherently racy when two threads run (within YARA) into a SIGBUS in parallel: the first occurrence will reset the signal handler and the second one will therefore no longer be caught by our signal handler, and instead cause a crash.

secDre4mer · 2023-12-05T07:49:05Z

Looks nice! Would have allowed us to catch #2016 much more easily.

Not entirely related to your changes, but I wonder if there is actually a reason to catch SIGSEGV?

From my understanding, this signal handler is to handle issues that can arise when reading from an mmap'ed memory region. But afaik, those issues can only trigger SIGBUS signals (for example, mmap'ing a file, shortening it, then reading past the new end of the file), and are not supposed to trigger SIGSEGV. Are there expected scenarios where reading from a mmap region raises a SIGSEGV signal?

SIGSEGV is caught, I believe, because it's used on FreeBSD / OpenBSD instead of SIGBUS in the scenario you described, see issue #551.
I agree though that catching SIGSEGV on other POSIX systems is unnecessary. Maybe we can add a differentiation by OS so we only catch the signal that is necessary. Similarly, I don't think catching EXCEPTION_ACCESS_VIOLATION (which is the SIGSEGV equivalent) on windows is necessary - EXCEPTION_IN_PAGE_ERROR (the SIGBUS equivalent) should be sufficient.

secDre4mer added 4 commits December 4, 2023 17:09

fix: make exception test more verbose

ff9804c

secDre4mer marked this pull request as ready for review December 4, 2023 16:33

hillu reviewed Dec 4, 2023

View reviewed changes

secDre4mer added 2 commits December 5, 2023 09:04

chore: simplify signal handler forwarding

9618f7b

feat: reduce caught signals to required for known OS

359f71f

plusvic approved these changes Dec 7, 2023

View reviewed changes

plusvic merged commit 19da77e into VirusTotal:master Dec 7, 2023
9 checks passed

secDre4mer deleted the forwardsignals branch December 8, 2023 08:16

vojone mentioned this pull request Apr 29, 2024

Update of documentation - added info about yr_fetch_block_data #2071

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: handle SIGBUS / SIGSEGV only when originating from YARA's memory blocks #2020

feat: handle SIGBUS / SIGSEGV only when originating from YARA's memory blocks #2020

secDre4mer commented Dec 4, 2023

secDre4mer commented Dec 4, 2023

vthib commented Dec 4, 2023

hillu Dec 4, 2023 •

edited

Loading

secDre4mer Dec 5, 2023

hillu Dec 5, 2023

secDre4mer Dec 5, 2023

hillu Dec 4, 2023

secDre4mer Dec 5, 2023

secDre4mer commented Dec 5, 2023

feat: handle SIGBUS / SIGSEGV only when originating from YARA's memory blocks #2020

feat: handle SIGBUS / SIGSEGV only when originating from YARA's memory blocks #2020

Conversation

secDre4mer commented Dec 4, 2023

secDre4mer commented Dec 4, 2023

vthib commented Dec 4, 2023

hillu Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

secDre4mer Dec 5, 2023

Choose a reason for hiding this comment

hillu Dec 5, 2023

Choose a reason for hiding this comment

secDre4mer Dec 5, 2023

Choose a reason for hiding this comment

hillu Dec 4, 2023

Choose a reason for hiding this comment

secDre4mer Dec 5, 2023

Choose a reason for hiding this comment

secDre4mer commented Dec 5, 2023

hillu Dec 4, 2023 •

edited

Loading