Bug 30187 - Implicit AtomicString(char*) considered harmful
Summary: Implicit AtomicString(char*) considered harmful
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P3 Enhancement
Assignee: Myles C. Maxfield
URL:
Keywords: EasyFix, GoodFirstBug
Depends on: 31749
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-07 15:14 PDT by Jens Alfke
Modified: 2024-08-05 18:43 PDT (History)
4 users (show)

See Also:


Attachments
patch (96.89 KB, patch)
2009-12-02 11:23 PST, Jens Alfke
no flags Details | Formatted Diff | Diff
patch 2 (stylistic fixes) (97.00 KB, patch)
2009-12-03 15:05 PST, Jens Alfke
mjs: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Alfke 2009-10-07 15:14:47 PDT
The WebCore::AtomicString constructor that takes a const char* is not marked 'explicit' -- this means that the compiler will allow clients to pass a string constant to functions that take an AtomicString, and construct the object on the fly. But this is expensive: it computes the hashcode of the string, does a hashtable lookup, and often allocates a new StringImpl.

Having instrumented StringImpl, I'm seeing lots of AtomicString objects being created ephemerally this way, often in situations like loading resources and scanning CSS selectors. (The primary offender seems to be ResourceRequestBase, which has a bunch of convenience accessors for HTTP header fields that do this.) If there are AtomicStrings being used often, they should be allocated once and kept around as constants.

I think it would be a good idea to add the 'explicit' keyword to this constructor, and fix the resulting compile errors, so people don't end up accidentally allocating AtomicStrings this way without knowing it.
Comment 1 Jens Alfke 2009-11-20 13:29:05 PST
I've been working on this since Wednesday. All the work is in fixing the resulting compile errors, of course.
* I added a few very common string constants to AtomicString itself (trueAtom, falseAtom).
* I've used DEFINE_STATIC_LOCAL in some methods that might be performance-sensitive.
* In a few cases, I've added variants of methods that take an AtomicString parameter, which take a char* instead.

I initially created global AtomicString constants for HTTP header names, since they've very commonly passed to BaseResourceRequest/Response as C string constants. Unfortunately that isn't legal since AtomicString is not thread-safe but the resource classes are used on multiple threads (unlike the DOM classes.) Instead I've enabled efficient access by char* to the underlying HashMap of headers -- this required adding some code to HashMap to allow lookup by alternate key types. I shamelessly plagiarized this from the existing support in HashSet.
Comment 2 Jens Alfke 2009-11-20 15:10:20 PST
I've broken up the patch and put the more significant changes (HashMap, HTTPHeaderMap, etc.) into a new bug 31749 which I'm sending out for review now.
Comment 3 Jens Alfke 2009-12-02 11:23:19 PST
Created attachment 44166 [details]
patch

Here's the rest of the patch. This finally makes the AtomicString(const char*) constructor explicit.
It does two other nontrivial things to support this change:
1. Adds a HashMap::remove that takes an alternate key type, so HTTPHeaderMap can use it for char*.
2. Adds an HTTPHeaders namespace containing a bunch of string constants like ContentType, so we don't have to use literal C strings for these all over the code.

There are also little changes in lots of source files, to replace C string literals with either
a. AtomicString constants (like emptyAtom)
b. HTTPHeaders constants
c. New static AtomicString constants defined using DEFINE_STATIC_LOCAL
d. Explicit AtomicString("...") construction, in places where the time/space tradeoff didn't seem to make c. worthwhile
e. Calls to new method variants that explicitly take char*, such as Element::setCStringAttribute.
Comment 4 WebKit Review Bot 2009-12-02 11:26:56 PST
Attachment 44166 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/loader/DocumentThreadableLoader.cpp:44:  wtf includes should be <wtf/file.h> instead of "wtf/file.h".  [build/include] [4]
WebCore/platform/text/AtomicString.h:159:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/text/AtomicString.h:160:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:42:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:43:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:44:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:45:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:46:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:47:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:48:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:49:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:50:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:51:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:52:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:53:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:54:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:55:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.cpp:56:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.h:74:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/HTTPHeaderMap.h:74:  namespace should never be indented.  [whitespace/indent] [4]
JavaScriptCore/wtf/HashMap.h:302:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
WebCore/platform/network/ResourceRequestBase.cpp:368:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebCore/platform/network/ResourceRequestBase.cpp:369:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebCore/platform/network/ResourceRequestBase.cpp:370:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
WebCore/platform/network/ResourceRequestBase.cpp:371:  Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side.  [whitespace/operators] [4]
Total errors found: 25
Comment 5 Jens Alfke 2009-12-02 11:28:28 PST
Weird -- I just noticed that in that patch, the changes to JavaScriptCore/ChangeLog and WebCore/ChangeLog are reversed. I must have screwed this up somehow during some manual cut-and-pasting.
I've fixed this in my source tree; doesn't seem worth it to upload another patch just for that. Consider it fixed.

The automated style warnings below appear to refer to pre-existing code.
Comment 6 David Levin 2009-12-02 14:28:26 PST
This warning refers to newly added code:
    WebCore/loader/DocumentThreadableLoader.cpp:44:  wtf includes should be
    <wtf/file.h> instead of "wtf/file.h".  [build/include] [4]

These warnings 
   WebCore/platform/network/ResourceRequestBase.cpp:368:  Boolean expressions that
   span multiple lines should have their operators on the left side of the line
   instead of the right side.  [whitespace/operators] [4]

Refer to lines that you changed, so it would be nice to fix the style since you are changing the lines anyway.

The namespace indent warnings are unfortunate.
Comment 7 David Levin 2009-12-02 16:56:28 PST
I was looking at fixing the namespace indent warning, but it turns out that several of them are valid warnings for your patch like this one:

WebCore/platform/network/HTTPHeaderMap.cpp:42:  Code inside a namespace should
not be indented.  [whitespace/indent] [4]
Comment 8 Jens Alfke 2009-12-03 14:27:26 PST
I understand the namespace guideline, but in this case it's a _nested_ namespace, and I don't think the guideline should apply to those ... especially since if HTTPHeaders had been made a class instead (an alternate way to group related constants) it would of course be indented.
I've sent an email to webkit-dev asking for clarification.
Comment 9 Jens Alfke 2009-12-03 15:05:43 PST
Created attachment 44270 [details]
patch 2 (stylistic fixes)

This patch fixes the style warnings about #include and ||-at-EOL.
Comment 10 WebKit Review Bot 2009-12-03 15:09:37 PST
Attachment 44270 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
WebCore/platform/network/HTTPHeaderMap.cpp:42:  Code inside a namespace should not be indented.  [whitespace/indent] [4]
Total errors found: 1
Comment 11 Jens Alfke 2009-12-04 13:32:30 PST
FYI, the amendment to the style guide allowing indentation of nested namespaces is bug 32137.
Comment 12 Jens Alfke 2009-12-11 10:11:50 PST
*ping* This patch hasn't been reviewed in a week now. I'd like to get this off my plate before Xmas =)
There are a lot of diffs in it, but almost all of them are just simple replacements.
Comment 13 Eric Seidel (no email) 2009-12-14 12:26:03 PST
Comment on attachment 44270 [details]
patch 2 (stylistic fixes)

Most of this is a total no-brainer to review.  But the Hash* stuff I'm not confident on.

I would be happy to r+ the changes to move all the callsites off of "true" and to use explicit AtomicString(), etc, but the Hash* stuff will either take some in-person explanation, or someone more used to that code.
Comment 14 Maciej Stachowiak 2009-12-28 18:09:01 PST
Comment on attachment 44270 [details]
patch 2 (stylistic fixes)

This patch is kind of large and combines multiple logically separate changes. I think it would be better to split it up into the following independent components:

1) The HashMap changes. (Yeah, it sucks that they are not unit tested but they seem pretty separate from the rest).
2) One or more patches removing dependence of various pieces of code on implicit conversion from char* to AtomicString.
3) Final patch to make AtomicString(char*) explicit.

r- to consider such a split. But if that's impractical, just go ahead and reflag this patch for review.
Comment 15 Myles C. Maxfield 2012-07-19 15:18:04 PDT
Since this bug hasn't been touched in 3 years, mind if I claim it?

Thanks,
Myles
Comment 16 David Levin 2012-07-19 15:42:27 PDT
(In reply to comment #15)
> Since this bug hasn't been touched in 3 years, mind if I claim it?
> 
> Thanks,
> Myles

Go for it. Make sure to consider comment #14.