-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test 1238 causing shortly following tests 1242 and 1243 to fail #5364
Conversation
The problem can be observed on AppVeyor (Windows) and CirrusCI (FreeBSD): |
So the hypothesis is that the server spends more time on the test than the client. The relevant tftpd server code can be found here: Lines 1167 to 1215 in 54fa68b
Based on an analysis of the logs, the server code and a packet capture, the test seems misconfigured. It has one iteration too much. output for: ./runtests.pl -n -p -v 1238
log/tftp_server.log
During the test I ran Packet capture summary33042 is taken from the debug output above. The columns shown are: 1) time since begin of the capture, 2) unique ip/port tuple identifier, 3) UDP source port, 4) UDP destination port, 5) extra info.
However if for whatever reason (timing, failure to deliver ICMP, etc.) the server receives an ACK, then the server will continue to read the next. In total the server could be spending about 6 seconds whereas the client fails within a second. I believe that another way to fix this issue is to reduce the test data for test1238 from 1200 bytes (3 blocks) to two blocks (e.g. 600 bytes). Assuming that |
Yes, but I would not call it a hypothesis since I observed it on 2 CIs and in my local tests.
That "read: fail" does not happen on FreeBSD or Windows. The server is stuck reading and will timeout after 5 tries of 5 second waits using the alarm functionality instead of read timeouts.
Why would there be an ICMP deliver? curl just does not send the data via UDP packet anymore and the server is stuck waiting for that. Since there is no connection handling in UDP unlike TCP, the server will never get a "connection close event" from the client and read will not return 0.
Anyway, I will try your suggestion about reducing the transfer size. But are you sure that doesn't change the purpose of test 1238 timing out during the transfer instead of after the transfer has already finished and reached the server? |
setjmp.h should only be included if HAVE_SETJMP_H is defined. Add additional log statements to see wether reads and writes are blocking or finishing before an alarm signal is received. Assisted-by: Peter Wu Part of #5364
Just for reference here is the the output of tftp_server.log after applying the logging patch:
As you can see curl is timing out during the 2nd block (as you described with Round 2), but there is no "read: fail" during test 1238 which can lead to tftp still waiting for 1238 when 1242 starts. |
I have now also tried your proposal and this does not help with the TFTP server still being busy:
Also the test explicitly states the size of 1200 as part of the test behavior:
Therefore I would propose my original patch with an additional comment about the reason why killing the server is required in order to never affect other afterwards following TFTP tests. |
The tftpd server may still be busy if the total timeout of 25 seconds has not been reached or no sread error was received during or after the execution of the timeout test 1238. Once the next TFTP test comes around (eg. 1242 or 1243), those will fail because the tftpd server is still waiting on data from curl due to the UDP protocol being stateless and having no connection close. On Linux this error may not happen, because ICMP errors generated due to a swrite error can also be returned async on the next sread call instead. Therefore we will now just kill the tftpd server after test 1238 to make sure that the following tests are not affected. This enables us to no longer ignore tests 1242, 1243, 2002 and 2003 on the CI platforms CirrusCI and AppVeyor. Assisted-by: Peter Wu Closes curl#5364
734d404
to
95c60d8
Compare
More background why you may be seeing a "read: fail" on Linux while that is not related to the read itself, but the write instead: As far as I can tell FreeBSD and Windows do not behave like that. Also see comments here: 000f721.
Regarding my quoted above comment of course, I now understand that you most likely refer to the ICMP error messages regarding send and not read failure according to what I described above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing, LGTM!
@Lekensteyn thanks @bagder what do you think? |
Some TFTP tests seem to enter into a loop and maybe hang? E.g. 1007, 1009, 1238 Try fixing it by skipping all TFTP tests. Ref: https://github.com/curl/curl/actions/runs/9141987545/job/25137038249?pr=13698 Also drop mingw-w64 test exclusions copy-pasted from MSYS jobs. Possibly related: cffbcc3 curl#5364 Close #xxxxx
Some TFTP tests seem to enter into a loop and maybe hang? E.g. 1007, 1009, 1238 Try fixing it by skipping all TFTP tests. Ref: https://github.com/curl/curl/actions/runs/9141987545/job/25137038249?pr=13698 Also drop mingw-w64 test exclusions copy-pasted from MSYS jobs. Possibly related: cffbcc3 #5364 Close #13699
At the moment tests 1242 and 1243 fail if the TFTP server is still busy with test 1238.
Therefore the TFTP server will now be killed after test 1238. This PR also includes other TFTP fixes.