Bug 17147 - [GTK] API: Stream-based loader API
Summary: [GTK] API: Stream-based loader API
Status: RESOLVED WONTFIX
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords: Gtk
Depends on:
Blocks: 15843
  Show dependency treegraph
 
Reported: 2008-02-01 17:37 PST by Alp Toker
Modified: 2014-04-08 17:56 PDT (History)
9 users (show)

See Also:


Attachments
Add signal before an image load, and add in API function to load in an image (17.95 KB, patch)
2008-09-30 22:32 PDT, Andrew May
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alp Toker 2008-02-01 17:37:30 PST
We need to support applications that load custom resources into WebView. Evolution, tnymail, Yelp, Monodoc are the apps that come to mind.

Something like WebDataSource, or perhaps the ability to register URL protocol handlers, or both. GIO could be useful for stream classes, perhaps as a counterpart to NSData in the Mac API.

The Mac Obj-C code is quite tangly and looks like parts are deprecated. We don't necessarily want to copy everything there.

(Bug #15843 is a request from application authors that we provide additional metadata to the loader application, such as image scale which might be worth keeping in mind when designing a stream-based loader API.)
Comment 1 José Dapena Paz 2008-07-15 05:17:16 PDT
I would very interested in helping on this. Unfortunately, I still don't have enough knowledge on the parts I should touch to add this feature (webcore and webkit architecture elements involved, the interaction among them, etc).

Some additional points for things we need in tinymail and modest:
   * We need to be able to feed a custom stream.
   * We also need that webkit provide a way to provide a custom stream for specific uris (for example, for loading images).

Currently, talking about the implementation we use, in gtkhtml, it provides:
   * A signal "url-requested". This signal offers a GtkHTMLStream you can feed (from your own stream or whatever you need). This is used for loading images or other resources included inside the document (we hook to this for cid: uris, and also for managing internally fetching or blocking external image uris).
   * An api to obtain a GtkHTMLStream from the gtkhtml widget, you can write to. This API is used for loading the document itself to the html widget.

Maybe it would be good if we could use a standard GIO channel for this, so that we can use as much standard glib/gtk api as possible.
Comment 2 Andrew May 2008-09-30 22:32:43 PDT
Created attachment 23967 [details]
Add signal before an image load, and add in API function to load in an image

I don't do any glib/gtk devel or WebKit stuff, so I don't expect this to be perfect or correct. There is some left over junk in the patch with "notimplemented" macro to help me see what else is missing.
But I did try to dive into the classes and get something working. I did get this working with a patch to the unreleased mayflower plugin for claws mail.

So I am looking to see if this is anywhere close to being the correct approach and what needs to be done to fix it.
Comment 3 Peter Bloomfield 2008-11-26 05:55:02 PST
I'm exploring WebKitWebView as an alternative to gtkhtml{2,3} in an email UA (Balsa).  We would need some feature like this, to (a) meet "cid:" requests, and (b) optionally block loading of images or anything else from remote sites--some users have privacy concerns.

I'm completely unfamiliar with the WebKit codebase, so I'm not going to be much help with coding, but I'd be very interested in helping out with testing.
Comment 4 Patrick Mueller 2009-09-12 16:03:23 PDT
I ran into Peter Bloomfield at a conference, and had another idea about how to solve a particular problem related to this bug.  Peter described the problem of dealing with html email with embedded images.  My suggestion was to make use of the data: url to avoid having to do anything more complicated.  Reference here:

   http://en.wikipedia.org/wiki/Data_URI_scheme

So the basic idea would be to extract the embedded images out of the email payload, and replace the <img src="xxx"> with <img src="data:yyy"> in the actual html email section.  This could be done in either the primary code dealing with the message (C?), or could probably be done in JavaScript with something like an onload handler.

May be off-base, but it sounded like it might be an easy way to work around this particular problem, but not a general purpose solution (I don't think).
Comment 5 Peter Bloomfield 2009-12-28 14:12:46 PST
(In reply to comment #4)
> I ran into Peter Bloomfield at a conference, and had another idea about how to
> solve a particular problem related to this bug.

Hi Patrick:

I enjoyed the opportunity to chat with you about this.  I finally got around to trying some rewriting of the HTML source text.  To avoid interpolating potentially large amounts of data, I first tried saving the matching message part as a temporary file, then replacing the "cid:" protocol with "file:///tmp/".  But that ran afoul of WebKit's resolute refusal to use "file:" links.

So then I tried your suggestion of a "data:" URI, and it works!  Well, in a small number of tests.  But at least it requires *zero* patches to WebKit, which means we don't have to wait indefinitely to see a fix.
Comment 6 Alexander Butenko 2009-12-28 20:04:57 PST
webview has resource-request-starting and navigation-decission-requested signal to redirect request

resource_request_set_uri(request, "file://tmp/blabla.jpg") should do the job as i understand.
Comment 7 Peter Bloomfield 2009-12-29 12:21:02 PST
(In reply to comment #6)
> webview has resource-request-starting and navigation-decission-requested signal
> to redirect request
> 
> resource_request_set_uri(request, "file://tmp/blabla.jpg") should do the job as
> i understand.

Thanks for pointing out the "resource-request-starting" signal--I must have started working with WebKit before it appeared in 1.1.14!  It does indeed provide a very clean solution to the cid: problem.
Comment 8 Martin Robinson 2010-10-21 17:11:13 PDT
If I'm not mistaken, this functionality is provided will be provider by SoupURILoader in the future. Sergio, can you comment?
Comment 9 Martin Robinson 2010-10-21 17:14:54 PDT
(In reply to comment #8)
> If I'm not mistaken, this functionality is provided will be provider by SoupURILoader in the future. Sergio, can you comment?

That came out mangled: If I'm not mistaken this functionality will be provided by SoupURILoader in the future.
Comment 10 Sergio Villar Senin 2010-10-22 00:47:54 PDT
(In reply to comment #9)
> (In reply to comment #8)
> > If I'm not mistaken, this functionality is provided will be provider by SoupURILoader in the future. Sergio, can you comment?
> 
> That came out mangled: If I'm not mistaken this functionality will be provided by SoupURILoader in the future.

Well, actually we *do* support this functionality right now as we imported the SoupURILoader code in webkit as a basement for the new HTTP cache. So we currently have a stream-based loader API for all the protocols we support.
Comment 11 Martin Robinson 2010-10-22 08:34:15 PDT
> Well, actually we *do* support this functionality right now as we imported the SoupURILoader code in webkit as a basement for the new HTTP cache. So we currently have a stream-based loader API for all the protocols we support.

Correct me if I'm wrong, but my understanding is that WebKit doesn't expose an API for it and it isn't officially part of libsoup yet.
Comment 12 Sergio Villar Senin 2010-10-22 08:42:06 PDT
(In reply to comment #11)
> > Well, actually we *do* support this functionality right now as we imported the SoupURILoader code in webkit as a basement for the new HTTP cache. So we currently have a stream-based loader API for all the protocols we support.
> 
> Correct me if I'm wrong, but my understanding is that WebKit doesn't expose an API for it and it isn't officially part of libsoup yet.

Oh my fault, I didn't read the bug properly. As you said, we do not currently expose any API.
Comment 13 talby 2011-10-07 16:47:31 PDT
I have been using the "resource-request-starting" signal to call webkit_network_request_set_uri() updating the uri to a data: url, which works wonderfully to embed media resources, however embedding html is problematic.

I am attempting to sandbox a browser display of html content, but since this technique alters the urls visible to the DOM, onDomain policies get mangled and relative urls can not be resolved properly.
Comment 14 Martin Robinson 2011-10-07 16:55:56 PDT
(In reply to comment #13)
> I have been using the "resource-request-starting" signal to call webkit_network_request_set_uri() updating the uri to a data: url, which works wonderfully to embed media resources, however embedding html is problematic.
> 
> I am attempting to sandbox a browser display of html content, but since this technique alters the urls visible to the DOM, onDomain policies get mangled and relative urls can not be resolved properly.

Have you tried using http://webkitgtk.org/reference/webkitgtk-webkitwebview.html#webkit-web-view-load-string ?
Comment 15 talby 2011-10-08 12:21:50 PDT
(In reply to comment #14)
> 
> Have you tried using http://webkitgtk.org/reference/webkitgtk-webkitwebview.html#webkit-web-view-load-string ?

That's a great suggestion, and yes I am, for the main frame load.  However my trouble shows up when the page has child frames (or child windows).  If I use a "resource-request-starting" handler to rewrite the url to a data: url, the DOM is unable to resolve relative links and onDomain policies are not honored.  If instead I use a "resource-request-starting" handler to call webkit_web_frame_load_string(), it seems to mangle the WebKitWebFrame and lead to segfaults.  I think the DocumentLoader ends up in a bad state where a new load attempt has partially initiated, yet the last attempt has not finished failing.  So, it doesn't seem like webkit_web_frame_load_string() is intended for use in signal handlers, at least not "resource-request-starting" (or "navigation-policy-decision-requested").

I can't find a way to satisfy a pending load attempt with a *_load_string() call, so it doesn't seem to help me with child frames where the load is initiated as a side effect of a parent load.
Comment 16 Martin Robinson 2011-10-08 12:26:11 PDT
Hrm. Perhaps you'll have better luck handling the load-started signal. The documentation claims it's deprecated, but I'm in favor of undeprecating it.

Also the fact that calling load_data in a signal handler causes a crash, sounds like a bug! Do you have a stack trace?
Comment 17 talby 2011-10-10 11:44:05 PDT
(In reply to comment #16)
> Hrm. Perhaps you'll have better luck handling the load-started signal. The documentation claims it's deprecated, but I'm in favor of undeprecating it.
> 
> Also the fact that calling load_data in a signal handler causes a crash, sounds like a bug! Do you have a stack trace?

"load-started" on the WebKitWebView only seems to fire for the main frame.  WebKitWebFrame doesn't emit a "load-started" signal, so I don't think that can help me.

I have since been able to call webkit_web_frame_load_string() from within "navigation-policy-decision-requested".  On the first pass I didn't notice that webkit_web_frame_load_string() from with the signal handler emits second "navigation-policy-decision-requested" signal, and my naive attempt was simply blowing the stack.  If I don't call _load_string() in the second emit, it seems to load smoothly.

I still have a segfault attempting to use webkit_web_view_load_string() from within "resource-request-starting", and can provide the stack trace if it's still interesting to you.  It may not be worth investigating because there's a workaround, but that one is not a handler recursion issue, it's something more complex.
Comment 18 Martin Robinson 2014-04-08 17:56:10 PDT
We have an API to implement custom protocols now.