Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Echoserver Select Timeouts #612

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

ejohnstown
Copy link
Contributor

A couple changes to keep the echoserver from spin-locking. (Should fix #607.)

  1. The SFTP worker should wait for data if the channel window is full. If the client isn't sending a window adjust, it might be because of a human time scale issue. New timeout is 60 seconds.
  2. When the echoserver is waiting for something from the user and the select times out, wait a second instead.

A couple changes to keep the echoserver from spin-locking.

1. The SFTP worker should wait for data if the channel window is full.
   If the client isn't sending a window adjust, it might be because of a
   human time scale issue. New timeout is 60 seconds.
2. When the echoserver is waiting for something from the user and the
   select times out, wait a second instead.
@falemagn
Copy link
Contributor

falemagn commented Oct 31, 2023

Given this PR spurred from #607, and given that #607 spurred from a problem we're having in our own worker loop, which could be replicated in the echoserver's worker loop as well, I'd like to ask some questions about it to better understand what is going on.

It is my understanding that when WS_WINDOW_FULL is returned, it's because the peer cannot accept any more data until it has consumed what was already sent to it - is this right?

If so, how's it different, from the library user's point of you, than WS_WANT_WRITE?

Given the fact that our loop doesn't use timeouts, it's rather event-based and the events are about being able to read from or write to the socket, how do you suggest we deal with WS_WINDOW_FULL? What does the library expect the user to do, when it returns that value?

@ejohnstown
Copy link
Contributor Author

This is how I understand the intention of RFC 4254 sections 5.1 and 5.2.

When a channel is opened, the init message and reply message both indicate to its peer how much buffer space it has allocated to receive for the channel. This is the receive window size for the channel. wolfSSH allocates a fixed buffer for receiving channel data, per channel. This buffer size is set by DEFAULT_WINDOW_SZ and starts at 128kb. As an endpoint sends data to the peer's channel, it knows how much it has sent and how much the peer will accept for the channel. Periodically, as peer takes data out of this buffer, it sends a Window Adjust message, indicating how much more space is available in the buffer. I used fixed buffers because I was originally targeting an embedded device, and I was trying to minimize allocations. (And I know, there's still a lot going on.)

The echoserver is getting WS_WINDOW_FULL as it is trying to send data to the client, but it thinks the client's channel's receive buffer is full. It has sent enough to fill it. The echoserver doesn't know why the client isn't sending messages back, it only knows the client allows 2MB on the channel and it sent 2MB. It didn't even try to send the channel data, it knows it is full. The server is waiting for the client to send the Channel Adjust Window message.

When you foreground the client, it receives all the pending messages, nearly fills its channel's receive buffer, and sends adjust messages back to the echoserver, who then picks up sending the file.

The echoserver is spin locking on a select call on the socket. I just lengthened the timeout.

@JacobBarthelmeh JacobBarthelmeh merged commit 218a2fd into wolfSSL:master Nov 3, 2023
3 checks passed
@ejohnstown ejohnstown deleted the select-timeout branch November 30, 2023 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

If the client is suspended, the server enters a busy loop with WS_WINDOW_FULL
4 participants