Implement the mechanism needed to download files. I'm working on it.
Created attachment 18479 [details] First attempt Ok, this is a first try. This implementation does have the pro of providing an API to download files but has cons. One of them is that the curl backend it use is separate from the one used to download the page the file was on. So if you need a cookie or authentication to get that file, it won't work with this version (ie. no cookie or connection reuse). There is also a bug in the UI (main.c) part of this patch, for reasons out of my control, the UI thread seems to lock while a download is in progress. The download itself will finish without error, but the browser is blocked. Known workaround: to revive it, just hover a link. A Gtk+ master knows what's happening here? This patch require the latest patch in http://bugs.webkit.org/show_bug.cgi?id=16562 Reviewer's suggested checklist: Could we better integrate the actual download (curl) part into ResourceHandleManager? Is the wording of the API function correct? In WebKit/gtk/* should we use Gtk Coding styles or WebKit's?
webkitprivate.h should only contain private data that has to be shared with other files. In this case, it should be moved to the .cpp file.
This patch might no longer have a reason to be with the introduction of a libsoup backend.
Created attachment 21021 [details] File downloading This is the same patch as before but updated to be applied to the current webkit revision. The main problem of the patch is that it uses curl directly. How do I ask an HTTP back-end to download a file somewhere? Another problem is that it's not gio-friendly. Probably we should add functions accepting/returning GFiles if webkit is compiled against glib 2.16+ (always the case if using the soup back-end). Other things I would like to change: - I don't like error handling - Most things should become properties instead of having just _get_foo/_set_foo
Created attachment 23795 [details] Add download support Updated patch depending on the new patch for policy implementation. Before being ready for commit we need to update the signal names after a patch for #17066 lands. The patch was mainly tested with CURL so it could have some problem with the soup back-end.
Comment on attachment 23795 [details] Add download support Assigning to Alp for review...
Marco, I have tried the attached patch & below are couple of comments from me: 1) File: WebKit/gtk/WebCoreSupport/FrameLoaderClientGtk.cpp Function: FrameLoaderClient::download We are calling g_object_deref on the created WebKitWebDownload object at the end of this function itself, which would cause failure in the downloading of the file, because in most of the cases, the download would not have started or atleast completed by the end of the call. This is because, download gets underway async from this function using the main loop of the g_object. So, that deref needs to be removed. Instead, we need to move that g_object_deref() call to the else block of if(handled)-else block, since we are canceling the download in the else block. Hence, we need to free the object. 2) File: WebKit/gtk/webkit/webkitwebdownload.cpp Function: webkit_web_download_class_init Currently, we are storing the signal info for both "progress-update" & "error" signals at the signal index of signal "started" itself. The index is needs to be changed. 3) File: WebKit/gtk/webkit/webkitwebdownload.cpp Function: webkit_web_download_finished_loading, webkit_web_download_error & webkit_web_download_cancel We need to deref the download object created at the time of sending signal "download-created". Currently, we are not deleting the object in error & cancel usecases. If I find any more issues, I will update.
I have tested some sites with recent patch, but I found a bug that download-created signal is automatically called when we are loading webpage having unknown MIME type. example site: http://media.daum.net/society/nation/seoul/view.html?cateid=100004&newsid=20081201161910429&p=yonhap&RIGHT_COMM=R5 In this site, 'http://clixad.daum.net/clix_ad/Find' link has unknown MIME type.
(In reply to comment #7) > We are calling g_object_deref on the created WebKitWebDownload object at the > end of this function itself, which would cause failure in the downloading of > the file, because in most of the cases, the download would not have started or > atleast completed by the end of the call. This is because, download gets > underway async from this function using the main loop of the g_object. So, that > deref needs to be removed. Instead, we need to move that g_object_deref() call > to the else block of if(handled)-else block, since we are canceling the > download in the else block. Hence, we need to free the object. Yes, but I disagree with your other comments on unref'ing: > 3) File: WebKit/gtk/webkit/webkitwebdownload.cpp > Function: webkit_web_download_finished_loading, webkit_web_download_error & > webkit_web_download_cancel > > We need to deref the download object created at the time of sending signal > "download-created". Currently, we are not deleting the object in error & cancel > usecases. I believe what we might need is a default signal handler for all those that does an _unref on the object. Does that make sense? > 2) File: WebKit/gtk/webkit/webkitwebdownload.cpp > Function: webkit_web_download_class_init > > Currently, we are storing the signal info for both "progress-update" & "error" > signals at the signal index of signal "started" itself. The index is needs to > be changed. I have fixed this, will post the updated patch soonish.
Created attachment 26200 [details] reworked to apply on current webkit I am attaching my unfinished work on this patch so that people are able to look at it. I believe the approach taken by the patch is flawed: the download should already be started when FrameLoadClient::download is called - it is called only as a means of letting the application know what is happening, not to perform the actual download. I'm still not positive about where the download is actually started, but I'll investigate it. Also, we should probably implement FrameLoaderClient::startDownload and make the download item in the context menu work.
Created attachment 27079 [details] proposed patch This patch is an evolution of (pierlux/barisione)'s patch. It implements more download code paths (context menu, and policy decision for navigation action), and ports the file saving to GIO. This allows the user to save to a 'mounted' sftp, for instance (tested! hehe). I am not sure if there should be any kind of code dealing with mounting the "volumes" in this functionality, but this is an implementation detail we can work out later. There are probably other weak points, and errors - download_get_total_size seems to always return 0 on my tests, for instance. I would like to have feedback on the general idea, though. The previous patch handled saving the downloaded content to a temporary file while the user was selecting where to save to semi-automatically. I decided to simplify this and let client applications handle this (since you can esily emulate the same functionality by using set_destination_uri). I tried to follow our discussions on how to handle progress, replacing start/progress-update/finished signals with a single 'progress' property. Clearly missing functionality: download doesn't happen when you navigate (click) to unhandled mime types, such as an ISO file yet. I'm still trying to figure this out. Fire away.
Created attachment 27090 [details] simplified approach A very simple approach to handling Downloads.
(In reply to comment #11) > Created an attachment (id=27079) [review] > proposed patch > > This patch is an evolution of (pierlux/barisione)'s patch. It implements more > download code paths (context menu, and policy decision for navigation action), > and ports the file saving to GIO. This allows the user to save to a 'mounted' > sftp, for instance (tested! hehe). That sounds great. I think if it includes an optional file chooser it would be perfect for clients that just need to handle downloading with minimal effort. > I tried to follow our discussions on how to handle progress, replacing > start/progress-update/finished signals with a single 'progress' property. And I like that :) (In reply to comment #12) > Created an attachment (id=27090) [review] > simplified approach > > A very simple approach to handling Downloads. I very much like that simple approach, from the point of view of Midori I actually prefer it over the WebDownload. If there's a way to obtain a SoupMessage or a GFile depending on the source, that would be perfect. Now I think both the WebDownload and the 'direct' access are attractive depending on the use case. I tend to think WebDownload falls in the category of what could be in an extension library of libSoup, rather than tied to WebKit, we might give that some thought. In short, something that generally abstracts away GIO and libSoup could be used outside of WebKit. But I don't have a concrete idea, so for what I want, we can have both the WebDownload and the manual access. Btw maybe call it simply WebKitDownload?
I called it WebKitWebDownload to be consistent with WebKitWebView and all, but a download is indeed web releated...
Created attachment 27104 [details] follow-up patch OK, so WebKitDownload; I didn't add a file chooser yet; I have an idea regarding SoupMessage/GFile/blah: after adding my patch for NetworkRequest we will be able to get the soup message from it; if we use GIO all the user needs is the URI. So I believe we could add a "handler" property to WebKitNetworkRequest with WEBKIT_NETWORK_REQUEST_HANDLER_{SOUP,GIO,DATA} as possible values (see ResourceHandle::start in ResourceHandleSoup.cpp). If SOUP is the handler, you then grab the SoupMessage and keep going; if GIO, you get the URI and create your GFile and go on from there, if DATA, then you do what you like (you are probably getting some private data from your browser), the only thing you need to do to tell WebKitDownload to not handle it is return FALSE. If you decide to let WebKitDownload handle it you only need to return TRUE when handling the download-requested signal, and you're done.
(In reply to comment #15) > OK, so WebKitDownload; I didn't add a file chooser yet; I have an idea > regarding SoupMessage/GFile/blah: after adding my patch for NetworkRequest we > will be able to get the soup message from it; if we use GIO all the user needs > is the URI. So I believe we could add a "handler" property to > WebKitNetworkRequest with WEBKIT_NETWORK_REQUEST_HANDLER_{SOUP,GIO,DATA} as > possible values (see ResourceHandle::start in ResourceHandleSoup.cpp). It's worth noting that by doing it this way we are making the "normal" page loading more powerful, and we could probably close https://bugs.webkit.org/show_bug.cgi?id=17147 too.
Created attachment 27353 [details] proposed patch I believe this patch is correct, and fits the requirements of the stakeholders I consulted. This patch allows clients wanting to handle the download themselves to retrieve the WebKitNetworkRequest and go on from there, or provides a simple interface for them to follow the download progress.
I did a basic implementation of this in Epiphany. I figured the patch I proposed is not really ready for review while doing so (some GObject problems and some fault tolerance to investigate). Here is the epiphany bug: http://bugzilla.gnome.org/show_bug.cgi?id=570735
(In reply to comment #8) > I have tested some sites with recent patch, but I found a bug that > download-created signal is automatically called when we are loading webpage > having unknown MIME type. > > example site: > http://media.daum.net/society/nation/seoul/view.html?cateid=100004&newsid=20081201161910429&p=yonhap&RIGHT_COMM=R5 > > In this site, 'http://clixad.daum.net/clix_ad/Find' link has unknown MIME type. > Well, I didn't notice any unusual behavior on that site (possibly the error was triggered by an ad that is no longer running ?), but I have found some weird behavior on another site while testing this patch with a patched epiphany: If you visit http://washingtonindependent.com/5517/predatory-practices, a file chooser will automatically open after the page is about 2/3 loaded. and it will offer to save a file with the suggested name "?u=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fwashingtonindependent.com%2F5517%2Fpredatory-practices&r=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fimages.google.com%2Fimgres%3Fimgurl%3Dhttps%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fwww.washingtonindependent.com%2Fwp-content%2Fuploads%2F2008%2F09%2Fwolf.jpg%26imgrefurl%3Dhttps%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fwashingtonindependent". If I attempt to save this file, I end up with a 0-byte file saved to disk.
Created attachment 27618 [details] download support and API This is my proposed implementation for download support in WebKit/GTK+.
Created attachment 27621 [details] download support and API Last patch was missing the API files. This is my proposal for download support.
(In reply to comment #21) > Last patch was missing the API files. This is my proposal for download support. > Here is what is different from the previous version: This version cleans up some issues xan and I discussed on IRC, and adds tracking of elapsed time; it no longer includes the GtkLauncher test code, and includes ChangeLog entries, since I believe it is now close to getting ready for inclusion, so I'm officially requesting review now.
(In reply to comment #19) > Well, I didn't notice any unusual behavior on that site (possibly the error was > triggered by an ad that is no longer running ?), but I have found some weird > behavior on another site while testing this patch with a patched epiphany: If > you visit http://washingtonindependent.com/5517/predatory-practices, a file > chooser will automatically open after the page is about 2/3 loaded. and it will > offer to save a file with the suggested name > "?u=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fwashingtonindependent.com%2F5517%2Fpredatory-practices&r=https%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fimages.google.com%2Fimgres%3Fimgurl%3Dhttps%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fwww.washingtonindependent.com%2Fwp-content%2Fuploads%2F2008%2F09%2Fwolf.jpg%26imgrefurl%3Dhttps%3A%2F%2F2.gy-118.workers.dev/%3A443%2Fhttp%2Fwashingtonindependent". > If I attempt to save this file, I end up with a 0-byte file saved to disk. This problem is unrelated to this patch, as I understand it. This looks like a policy decision caveat being exposed by we now having download support =).
(In reply to comment #21) > Created an attachment (id=27621) [review] > download support and API > > Last patch was missing the API files. This is my proposal for download support. > + Reviewed by NOBODY (OOPS!). + + Make the Soup backend able to handle requests without a frame, + since we may have such things now that we support downloads. + + * platform/network/soup/ResourceHandleSoup.cpp: + (WebCore::ResourceHandle::start): Tabs vs Spaces. + if (handled) { + webkit_download_start(download); + } + else { + webkit_download_cancel(download); braces? (and same thing in FrameLoaderClient::startDownload, which is almost the same code) + // we could reuse the same handle, but our replacing of the + // client seems to make this impossible; the main load fails + // and is stopped + handle->cancel(); + startDownload(request); Mmmm. + /* We don't call webkit_download_cancel() because we don't want to emit + * signals when finalising an object. */ s/finalising/finalizing/ +WebKitDownload* webkit_download_new(const gchar* uri) +{ + g_return_val_if_fail(uri, 0); + + WebKitNetworkRequest* request = webkit_network_request_new(uri); + WebKitDownload* download = webkit_download_new_from_request(request); + g_object_unref(request); + return download; _new functions that do something else than g_object_new (...) (or equivalent) are evil for bindings. Maybe create a property like URI_FOR_REQUEST that does this automatically? + WebKitDownload* download = WEBKIT_DOWNLOAD(g_object_new(WEBKIT_TYPE_DOWNLOAD, "network-request", request, NULL)); + return download; No need for variable. + gboolean handled; + g_signal_emit_by_name(download, "error", 0, WEBKIT_DOWNLOAD_ERROR_CANCELLED_BY_USER, "User cancelled the download", &handled); I wonder, what happens if you do not pass a variable for the return variable? If it's OK you could miss the dummy variables in a couple of places. +/** + * webkit_download_get_destination_uri: + * @download: the #WebKitDownload + * @destination_uri: the destination URI + * + * Defines the URI that should be used to save the downloaded file to. + * + * Since: 1.1.1 + */ +void webkit_download_set_destination_uri(WebKitDownload* download, const gchar* destination_uri) The doc says _get_destination_uri, but the function is _set_destination_uri + if (error) { + gboolean handled; + g_signal_emit_by_name(download, "error", 0, WEBKIT_DOWNLOAD_ERROR_DESTINATION, error->message, &handled); + g_error_free(error); + return; + } + + if (downloading) { + if (!webkit_download_open_stream_for_uri(download, destination_uri, TRUE)) { + webkit_download_cancel(download); + return; + } + } In these two cases you don't actually set the new destination_uri. Shouldn't you? + * webkit_download_get_progress: + * @web_view: a #WebKitDownload s/web_view/download/ (happens in a few places) +struct _WebKitDownload { + GObject parent; parent_instance +struct _WebKitDownloadClass { + GObjectClass parent; parent_class
Created attachment 27983 [details] implements download New patch with Xan's suggestions applied.
(In reply to comment #24) > _new functions that do something else than g_object_new (...) (or equivalent) > are evil for bindings. Maybe create a property like URI_FOR_REQUEST that does > this automatically? Just to make clear how I addressed this: I removed the _new function, and renamed _new_from_request to be the _new function, since all it does is call g_object_new, and there is no point in making things too complex in this. > I wonder, what happens if you do not pass a variable for the return variable? > If it's OK you could miss the dummy variables in a couple of places. In my experience not passing the boolean variable makes it crash. I've been hit by this elsewhere. > In these two cases you don't actually set the new destination_uri. Shouldn't > you? I am setting it now, but only emitting the signal when everything goes right. I think this is the correct approach.
Created attachment 28111 [details] implements download and provides a nice API changes needed because of recent commits
Created attachment 28189 [details] proposed implementation This work has been reviewed by almost all major WebKitGTK+ contributors, and we have a consensus on how the API should look like. This version of the patch fixes some issues raised on IRC (removes the unnecessary download_requested virtual class member in WebKitWebView, mainly) and contains some minor style convention corrections.
Created attachment 28190 [details] proposed implementation [I am resending the patch because there were some errors in the automatic merging git did to the main changelog, and I took the opportunity to remove a rogue modification that escaped my eye] This work has been reviewed by almost all major WebKitGTK+ contributors, and we have a consensus on how the API should look like. This version of the patch fixes some issues raised on IRC (removes the unnecessary download_requested virtual class member in WebKitWebView, mainly) and contains some minor style convention corrections.
Created attachment 28191 [details] proposed implementation [I am resending the patch because there were some errors in the automatic merging git did to the main changelog, and I took the opportunity to remove a rogue modification that escaped my eye; sorry for the mess] This work has been reviewed by almost all major WebKitGTK+ contributors, and we have a consensus on how the API should look like. This version of the patch fixes some issues raised on IRC (removes the unnecessary download_requested virtual class member in WebKitWebView, mainly) and contains some minor style convention corrections.
Comment on attachment 28191 [details] proposed implementation Please add bug URL to all ChangeLogs. + // The frame could be null is the ResourceHandle is not associated to any + // Frame, i.e. if we are downloading a file. I think that's "e.g.", not "i.e." + // we could reuse the same handle, but our replacing of the + // client seems to make this impossible; the main load fails + // and is stopped I don't understand this comment - there is a setClient() method in ResourceHandle (and it is even used elsewhere in this patch). What is the reason that makes reusing the handle impossible? Also, we prefer full sentences in comments (starting with a capital letter, ending with a period). #include <webkit/webkitnetworkrequest.h> +#include <webkit/webkitdownload.h> Please keep the list sorted. This problem is repeated in several files. +extern "C" { + +class DownloadClient : Noncopyable, public ResourceHandleClient { How can a class be extern "C"? There are no classes in C. I don't think extern "C" is ever needed in .cpp files - API methods should have it on declarations, and everything else shouldn't be exported anyway. + WebCore::ResourceResponse* network_response; + RefPtr<WebCore::ResourceHandle> resource_handle; There is "using namespace WebCore" at the top of this file, are explicit namespaces necessary here? +static guint webkit_download_signals[LAST_SIGNAL] = { 0, }; I don't remember the standards precisely, but this trailing comma is either forbidden or discouraged in various C dialects, please remove it. + PROP_TOTAL_SIZE, +}; Same comment here. + if(error) { There should be a space after if. + priv->timer = g_timer_new (); But no space here. + /* FIXME can we have a better check? */ A FIXME like this should explain what's wrong with the current check. + GFile* dest = g_file_new_for_uri(destination_uri); + GError *error = NULL; Misplaced star here, and NULL instead of 0. + } else { + g_free(priv->destination_uri); + priv->destination_uri = g_strdup(destination_uri); + } Two-space indentation here. +WebKitDownloadState webkit_download_get_state (WebKitDownload* download) An extra space again (and in other functions below). + if (priv->current_size == 0) { + priv->state = WEBKIT_DOWNLOAD_STATE_STARTED; + } There should be no braces around single line blocks. #include <webkit/webkitwebframe.h> #include <webkit/webkitwebpolicydecision.h> #include <webkit/webkitwebnavigationaction.h> +#include <webkit/webkitdownload.h> #include <webkit/webkitwebsettings.h> #include <webkit/webkitwebwindowfeatures.h> #include <webkit/webkitwebbackforwardlist.h> @@ -44,9 +45,13 @@ #include "InspectorClientGtk.h" #include "FrameLoaderClient.h" #include "WindowFeatures.h" +#include "ResourceHandle.h" +#include "ResourceResponse.h" Please keep include lists sorted. There is a number of C-style casts, Ñ-style comments and NULL variables (instead of 0) in C++ files here. It is a bit of border case, as the implemented functions are very C-style in nature, but our coding style asks for C++ style in C++ files. Are C-style comments necessary for documentation generator to work properly? A number of misplaced stars, too. Obviously, I cannot adequately review some Gtk-specific parts of the patch, but from the above comments, they have been extensively discussed, so that's OK. I had many comments, but they are mostly style nitpicks, so I'll say r=me anyway. Please fix as many as you can when landing, and you can even consider submitting an updated patch for another quick review round.
(In reply to comment #31) > + // we could reuse the same handle, but our replacing of the > + // client seems to make this impossible; the main load fails > + // and is stopped > > I don't understand this comment - there is a setClient() method in > ResourceHandle (and it is even used elsewhere in this patch). What is the > reason that makes reusing the handle impossible? When I tried implementing that, the load would always fail and stop, so I decided to simplify the approach. I modified the comment to make this statement instead. > How can a class be extern "C"? There are no classes in C. I don't think extern > "C" is ever needed in .cpp files - API methods should have it on declarations, > and everything else shouldn't be exported anyway. As discussed in IRC, I am moving the extern to after the C++ class, and I submitted a bug report for us to handle this for the GTK+ port as a whole, since this is repeated in all API implementation files: https://bugs.webkit.org/show_bug.cgi?id=24322 > Misplaced star here, and NULL instead of 0. As discussed on IRC, I will remain using NULL in API implementation because this was decided some time ago, and I have seen commits replacing 0 with NULL already. > There is a number of C-style casts, Ñ-style comments and NULL variables > (instead of 0) in C++ files here. It is a bit of border case, as the > implemented functions are very C-style in nature, but our coding style asks for > C++ style in C++ files. Are C-style comments necessary for documentation > generator to work properly? I'm fixing all of those that are not agreed GTK+ port convention. The C-style comments for the documentation generator are really necessary, with the two stars starting them, even. > A number of misplaced stars, too. I believe you are talking mostly about webkitdownload.h here. I decided to put starts to the right side in this file because of current practice in other major public headers (webkitwebview.h and webkitwebframe.h, for instance), though there are smaller/newer files putting stars to the left side (some are mine, even). I guess I'll just keep it like it is for now, since it's following the major ones. > I had many comments, but they are mostly style nitpicks, so I'll say r=me > anyway. Please fix as many as you can when landing, and you can even consider > submitting an updated patch for another quick review round. Thanks for reviewing! I am available for fixing any issues post-landing, too. I believe we will actually need a task force to clean up style in the GTK+ port =).
Landed in r41401, with style issues fixed.