Timeouts with vsync #145

andamian · 2020-08-10T21:50:15Z

John's observations from an email thread:

"Like I said in the last message, I managed to transfer Michael’s data, but discovered that the timeouts that he was seeing were not coming from the network.

Michael gave me access to the node with the data on it, so I was able to test the transfers, and try to complete the transfer for him. When using either vcp or vsync (version 3.1.1), the timeouts would always occur. If a file failed on the original vsync, it seemed that it always failed. Although I was seeing the GETs from vcp/vsync in the service logs, I did not see any POSTs or PUTs related to these files – this is what lead me to believe that the problem was network related.

To test the network hypothesis, I transferred the data to gimli3 (using scp to pull the data) – no errors there, of course. The interesting thing was that I saw the same read timeouts vsync’ing from gimli3 – those connections never leave our network, never go through bluecoat, etc. Since I was still not seeing the connections on the servers, I am pretty sure that the problem is with the client.

Although the vsync was failing to move data, if a file did not exist in a container node, a data node would be created – I tested this by deleting files (nodes, actually, since the data had never been transferred) from the vospace then trying vcp and vsync again. Same thing: the node would get created, but no data would get transferred.

I started wondering if the problem was with Michael’s vospace – low probility, but… -- there was space enough for all of the data, so I then tried vsync’ing the entire data set to my vospace – worked without a single error.
I then tired deleting a directory structure of Michael’s vospace (just of the data that Michael had on hand and was trying to transfer – most of the files in there were still just empty nodes, anyway) and retransferred it. No joy – nodes were created, but no data transferred.

Given where the read timeouts were occurring, I started to suspect that the problem was occurring when the client was trying to parse the directory structure of the vospace – Michael’s VOSpace was deep and had lots of files, mine isn’t and I had only transferred the DR1 directory structure from gimli3 to the top level of my vospace (i.e. vos:jouellet/DR1, not vos:GOGREEN/Data/Releases/Public/DR1). I tried vsyncing the data to vos:GOGREEN/Data – no joy, lots of timeouts. Then I tried vsyncing to the top level – vos:GOGREEN – aaaand bingo: the entire directory structure and data set transferred without a single error.

I was able to use vmv to move the data to the correct place in Michael’s vospace this morning.

Given where the timeouts occurred, I think the hypothesis that it has something to do with the clientthreads just thrashing or hanging when they are trying to do something with the directory structure of the vospace…. Like I said, too, this seems to be repeatable so a test could probably be written.

John"

andamian added the BUG label Aug 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeouts with vsync #145

Timeouts with vsync #145

andamian commented Aug 10, 2020

Timeouts with vsync #145

Timeouts with vsync #145

Comments

andamian commented Aug 10, 2020