Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect 307 / 308 correctly #187

Open
oberien opened this issue Oct 16, 2020 · 10 comments
Open

Redirect 307 / 308 correctly #187

oberien opened this issue Oct 16, 2020 · 10 comments

Comments

@oberien
Copy link

oberien commented Oct 16, 2020

This feature was already tried to be implemented. However, when sending the request, the body is consumed (due to being dyn Read), so it can't be sent again for the 307 or 308 redirect, which requires the same request to be sent again.

ureq/src/unit.rs

Lines 237 to 238 in 5b75dec

// reinstate this with expect-100
// 307 | 308 | _ => connect(unit, method, use_pooled, redirects - 1, body),

From what I can tell so far after looking at the source, we can reset any Payload back to the beginning, except the generic Reader(Box<dyn Read + 'static>). If this reader also implemented Seek, we would be able to get its position before starting the body-write, and reset it to that position on a 307 / 308 to read from it again. However, this would be a breaking API change to Request::send(&mut self, reader: impl Read + 'static) -> Response as the argument now also needs to implement Seek. (Additionally, io::Empty currently doesn't implement Seek, for which I opened rust-lang/rust#78029.)
Another option would be to require the reader to be Clone, however I'd see that as a strictly worse version of Seek.

@jsha
Copy link
Collaborator

jsha commented Oct 16, 2020

Thanks for filing, and welcome to the repo! I've been wanting to get this done for a while. It sounds like your proposal has two parts:

  • For all input types other than Reader(Box<dyn Read + 'static>), just reset in the straightforward way.
  • Add another input type like Reader(Box<dyn Read + Seek + 'static>) so that certain Read inputs (like files) can be seek'ed to reset.

Is that right?

@algesten
Copy link
Owner

Can we solve this with a buffer of configurable size? (This is an idea @jsha started)

The default buffer size would be some multiple of a probable initial TCP window size, meaning the server would have a chance to send the 307/308 response while the client sending more than the buffer size is "held back" by TCP flow control.

Obviously probable initial TCP window size is a bit moot, so the user should be able to configure this for Agent and/or Request.

In addition, such a buffer could means for small Read bodies, we can pre-buffer the entire contents into memory, avoiding transfer-encoding chunked.

@oberien
Copy link
Author

oberien commented Oct 17, 2020

For all input types other than Reader(Box<dyn Read + 'static>), just reset in the straightforward way.

A straightforward way could be added. However, I think that all creations of the dyn Read type passed to the SizedReader already implement Seek, so the bound could just be added to SizedReader and work for all non-Reader payload variants (once stdlib implements Seek for io::Empty; if the fix should be done before, it might make sense to introduce an additional trait and implement that for io::Empty).

Add another input type like Reader(Box<dyn Read + Seek + 'static>) so that certain Read inputs (like files) can be seek'ed to reset.

I would have replaced the current Reader type by adding the Seek bound. This ensures that all variants of Payload are resettable. Thus the Seek bound can also be added to SizedReader, which allows the unit::connect function to internally reset the SizedReader it got as argument when redirecting.

the server would have a chance to send the 307/308 response while the client sending more than the buffer size is "held back" by TCP flow control.

This might be a very interesting idea. I'm not too sure about the intricacies of HTTP and if the server must respond with 307 / 308 when only receiving a partial payload, or if it might be allowed to hold back the response until the whole payload is received. I could imagine a reverse proxy like nginx buffering several chunks before forwarding them as a single chunk to the original server. Otherwise I think this could be an interesting idea.
Another problem with this approach would be latency. How do you make sure to stop sending until you receive the response? What about long fat pipes with a delay of several seconds but high bandwidth? You'd need to wait for a whole round-trip without necessarily knowing how long one such round-trip would be.
Or maybe I'm already overthinking this and it's an already solved problem ¯\_(ツ)_/¯

@algesten
Copy link
Owner

I'm not too sure about the intricacies of HTTP and if the server must respond with 307 / 308 when only receiving a partial payload, or if it might be allowed to hold back the response until the whole payload is received.

I suspect that's entirely up to the HTTP server, but it would be kinda awkward to support 307/308 without also considering the case of aborting uploads of large bodies.

A related technique that we might want to consider is expect 100-continue. Curl has a default 1 second delay before sending the body, in which the server has a chance to respond with some other code aborting the body send.

That's another way to do it, which could be done for 307/308 in addition to buffering.

  1. Send headers,
  2. flush
  3. wait 1 sec
  4. send buffered body
  5. send unbuffered body

During steps 3-4 the server can successfully respond with 307/308.

@jsha
Copy link
Collaborator

jsha commented Oct 17, 2020

I would have replaced the current Reader type by adding the Seek bound.

Part of our user-facing API is:

pub fn send(&mut self, reader: impl Read + 'static) -> Response

I'd prefer not to add a Seek bound to that, because we want to support sending a streaming body, where the user doesn't know ahead of time how big it will be.

I'm not too sure about the intricacies of HTTP and if the server must respond with 307 / 308 when only receiving a partial payload, or if it might be allowed to hold back the response until the whole payload is received.

Personally I've never seen any HTTP server do something like this. It would require some fairly custom code in most HTTP server implementations, but more importantly it would require client cooperation. Specifically, the client would have to send some of the body, then do a non-blocking read from the response stream to see if the server responded early. That's pretty unusual - though perhaps more common in the HTTP/2 world?

I think algesten's on the right track, that this is the sort of problem Expect: 100-continue is designed to solve.

One thing that would help us reach a good design: Can you give examples of real-world APIs or websites that return 307 or 308? If we design around making those real use cases work well, I think we'll wind up with something good.

@oberien
Copy link
Author

oberien commented Oct 17, 2020

Expect: 100-continue looks like the correct fix here. And if curl does something, it should be perfect do copy its behaviour.

I noticed the missing 308 redirection when making a GET request to the gitlab.com API. If you include a double-slash in the request, e.g. try to request https://gitlab.com/api/v4//groups, it returns a 308 redirecting to the URL without the double-slash.

@algesten
Copy link
Owner

That's pretty unusual - though perhaps more common in the HTTP/2 world?

Lol.. And here I was thinking that would be the norm, cause how would you otherwise detect "early" response headers? … Any early 3xx, 4xx or 5xx is an opportunity to abort sending a big body. Maybe I'm tainted by h2 :D

@jsha
Copy link
Collaborator

jsha commented Oct 18, 2020

On the flip side (a little off-topic but it's entertaining IMO), there are some servers out there that will send you a response as soon as you connect, without even waiting for a request! This gives up any opportunity to serve different responses for different requests, but the hosts I've seen it on seem to be serving analytics JS at very large scale. It's probably a significant performance win, and they're always only serving the same JS anyhow.

@jyn514
Copy link
Contributor

jyn514 commented Nov 23, 2020

I'm not sure the current status of this, but deadlinks would like this to avoid having to deal with redirects itself: deadlinks/cargo-deadlinks#110. It would be fine by me to treat it exactly the same as 301/302.

@jyn514
Copy link
Contributor

jyn514 commented Jan 10, 2021

Note that since #288 307/308 are followed for GET/HEAD, which should help the majority of use cases. For PUT/POST/DELETE you still have to handle the error yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants