Improve heuristics whether `format_args` string is a source literal #106195

Noratrieb · 2022-12-27T21:20:17Z

Previously, it only checked whether there was a literal at the span of the first argument, not whether the literal actually matched up. This caused issues when a proc macro was generating a different literal with the same span.

This requires an annoying special case for literals ending in \n because otherwise println wouldn't give detailed diagnostics anymore which would be bad.

Fixes #106191

rustbot · 2022-12-27T21:20:22Z

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

compiler-errors

Cool, r=me with or without comments addressed since they're minor

compiler-errors · 2022-12-27T21:32:45Z

compiler/rustc_parse_format/src/lib.rs

+    unescape::unescape_literal(string, unescape::Mode::Str, &mut |_, unescaped_char| {
+        match unescaped_char {
+            Ok(c) => buf.push(c),
+            Err(_) => {}


Hm, should we handle escape errors and bail in find_width_map_from_snippet if we see any?

I dont think it matters much but doesnt hurt

compiler-errors · 2022-12-27T21:34:08Z

compiler/rustc_parse_format/src/lib.rs

+    // Macros like `println` add a newline at the end. That technically doens't make them "literals" anymore, but it's fine
+    // since we will never need to point our spans there, so we lie about it here by ignoring it.
+    // Since there might actually be newlines in the source code, we need to normalize away all trailing newlines.
+    let input_no_nl = input.trim_end_matches('\n');


Do we only want to trim only one \n off the literal to deal with the println! case, then check strict literal equality? Or am I missing something more subtle here?

You are just like I was when I tried this at first.
If we have format!("uwu\n"), then trimming it off the input will cause a mismatch, as it's still present in the snippet.
Doing "only trim it when the snippet doesn't end with a newline" also doesn't work because it could be println!("uwu\n").
We would need to actually count the trailing newlines and match and only trim it off when the input has one more, but doing it like I did is a little simpler.

I will explain this better in the comment.

Ah, yeah, I guess the important part is that we can't distinguish when the newline is added by the macro or already present in the literal. Oh well.

Noratrieb · 2022-12-27T22:01:19Z

I also added a new enum for find_width_map_from_snippet because I didnt like the tuple return type

Previously, it only checked whether there was _a_ literal at the span of the first argument, not whether the literal actually matched up. This caused issues when a proc macro was generating a different literal with the same span. This requires an annoying special case for literals ending in `\n` because otherwise `println` wouldn't give detailed diagnostics anymore which would be bad.

This makes the relationship between the vec and the boolean clearer.

compiler-errors · 2022-12-28T18:14:15Z

@bors r+

bors · 2022-12-28T18:14:17Z

📌 Commit 31b490d has been approved by compiler-errors

It is now in the queue for this repository.

bors · 2022-12-29T11:20:53Z

⌛ Testing commit 31b490d with merge 0c0b403...

bors · 2022-12-29T13:48:41Z

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing 0c0b403 to master...

rust-timer · 2022-12-29T15:08:18Z

Finished benchmarking commit (0c0b403): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.6%	[1.6%, 1.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-3.2%, -0.8%]	3
Improvements ✅ (secondary)	-2.2%	[-4.7%, -0.7%]	19
All ❌✅ (primary)	-2.2%	[-3.2%, -0.8%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.2%	[-1.2%, -1.2%]	1
All ❌✅ (primary)	-	-	0

…ther-it-really-is-a-literal, r=compiler-errors Revert "Improve heuristics whether `format_args` string is a source literal" This reverts commit e6c02aa (from rust-lang#106195). Keeps the code improvements from the PR and the test (as a known-bug). Works around rust-lang#106408 while a proper fix is discussed more thoroughly in rust-lang#106505, as proposed by `@tmandry.` Reopens rust-lang#106191 r? compiler-errors

…-episode-2, r=petrochenkov Properly allow macro expanded `format_args` invocations to uses captures Originally, this was kinda half-allowed. There were some primitive checks in place that looked at the span to see whether the input was likely a literal. These "source literal" checks are needed because the spans created during `format_args` parsing only make sense when it is indeed a literal that was written in the source code directly. This is orthogonal to the restriction that the first argument must be a "direct literal", not being exanpanded from macros. This restriction was imposed by [RFC 2795] on the basis of being too confusing. But this was only concerned with the argument of the invocation being a literal, not whether it was a source literal (maybe in spirit it meant it being a source literal, this is not clear to me). Since the original check only really cared about source literals (which is good enough to deny the `format_args!(concat!())` example), macros expanding to `format_args` invocations were able to use implicit captures if they spanned the string in a way that lead back to a source string. The "source literal" checks were not strict enough and caused ICEs in certain cases (see rust-lang#106191). So I tightened it up in rust-lang#106195 to really only work if it's a direct source literal. This caused the `indoc` crate to break. `indoc` transformed the source literal by removing whitespace, which made it not a "source literal" anymore (which is required to fix the ICE). But since `indoc` spanned the literal in ways that made the old check think that it's a literal, it was able to use implicit captures (which is useful and nice for the users of `indoc`). This commit properly seperates the previously introduced concepts of "source literal" and "direct literal" and therefore allows `indoc` invocations, which don't create "source literals" to use implicit captures again. Fixes rust-lang#106191 [RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene

Improve debug logs of find_width_of_character_at_span

1322e47

rustbot assigned oli-obk Dec 27, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 27, 2022

compiler-errors approved these changes Dec 27, 2022

View reviewed changes

Noratrieb force-pushed the no-more-being-clueless-whether-it-really-is-a-literal branch from b2ab034 to 7e682cb Compare December 27, 2022 22:08

This comment has been minimized.

Sign in to view

Noratrieb added 2 commits December 28, 2022 17:43

Add enum for find_width_map_from_snippet

31b490d

This makes the relationship between the vec and the boolean clearer.

Noratrieb force-pushed the no-more-being-clueless-whether-it-really-is-a-literal branch from 7e682cb to 31b490d Compare December 28, 2022 16:43

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2022

compiler-errors assigned compiler-errors and unassigned oli-obk Dec 28, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 29, 2022

bors merged commit 0c0b403 into rust-lang:master Dec 29, 2022

rustbot added this to the 1.68.0 milestone Dec 29, 2022

Noratrieb deleted the no-more-being-clueless-whether-it-really-is-a-literal branch December 29, 2022 15:13

Noratrieb mentioned this pull request Jan 5, 2023

Properly allow macro expanded format_args invocations to uses captures #106505

Merged

Noratrieb mentioned this pull request Jan 18, 2023

Revert "Improve heuristics whether format_args string is a source literal" #107041

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve heuristics whether `format_args` string is a source literal #106195

Improve heuristics whether `format_args` string is a source literal #106195

Noratrieb commented Dec 27, 2022

rustbot commented Dec 27, 2022

compiler-errors left a comment

compiler-errors Dec 27, 2022

Noratrieb Dec 27, 2022

compiler-errors Dec 27, 2022

Noratrieb Dec 27, 2022

compiler-errors Dec 27, 2022 •

edited

Loading

Noratrieb commented Dec 27, 2022

This comment has been minimized.

compiler-errors commented Dec 28, 2022

bors commented Dec 28, 2022

bors commented Dec 29, 2022

bors commented Dec 29, 2022

rust-timer commented Dec 29, 2022

Improve heuristics whether format_args string is a source literal #106195

Improve heuristics whether format_args string is a source literal #106195

Conversation

Noratrieb commented Dec 27, 2022

rustbot commented Dec 27, 2022

compiler-errors left a comment

Choose a reason for hiding this comment

compiler-errors Dec 27, 2022

Choose a reason for hiding this comment

Noratrieb Dec 27, 2022

Choose a reason for hiding this comment

compiler-errors Dec 27, 2022

Choose a reason for hiding this comment

Noratrieb Dec 27, 2022

Choose a reason for hiding this comment

compiler-errors Dec 27, 2022 • edited Loading

Choose a reason for hiding this comment

Noratrieb commented Dec 27, 2022

This comment has been minimized.

compiler-errors commented Dec 28, 2022

bors commented Dec 28, 2022

bors commented Dec 29, 2022

bors commented Dec 29, 2022

rust-timer commented Dec 29, 2022

Overall result: ❌ regressions - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Improve heuristics whether `format_args` string is a source literal #106195

Improve heuristics whether `format_args` string is a source literal #106195

compiler-errors Dec 27, 2022 •

edited

Loading