fix: completion generation inside $()#1403
fix: completion generation inside $()#1403richardkmichael wants to merge 2 commits intoscop:mainfrom
Conversation
2b82879 to
550e0cd
Compare
yedayak
left a comment
There was a problem hiding this comment.
In main echo $(ip <TAB> completes all files (the default completion, it should complete sub-commands), but with this PR it doesn't complete anything. Maybe you can use _comp_command_offset for this?
|
@yedayak Thanks for the quick review. I'll investigate. |
scop
left a comment
There was a problem hiding this comment.
Just a note to set "changes requested" per earlier comments so this shows up as such in PR list.
c0c36c5 to
c77070a
Compare
ed95258 to
6286e30
Compare
Thanks for mentioning All completions should now be correct, e.g., Handling nesting and completion position is a little complicated, so I commented the code somewhat more than other areas. Let me know if I should reduce them -- I'd find it helpful being new to the code base, but it might be verbose for experts.
|
|
|
||
| # Rebuild inner_line from the nested word onward, then | ||
| # strip its leading $( for the next iteration. | ||
| inner_line="${COMP_WORDS[*]:nested_idx}" |
There was a problem hiding this comment.
| inner_line="${COMP_WORDS[*]:nested_idx}" | |
| inner_line="${COMP_WORDS[@]:nested_idx}" |
There was a problem hiding this comment.
I believe either will work for the scalar assignment. However, pre-commit shellcheck fails with @. Should it be disabled?
# shellcheck disable=SC2124
inner_line="${COMP_WORDS[@]:nested_idx}"There was a problem hiding this comment.
[*] is affected by IFS, while [@] always joins the words by whitespace. Actually, one can set local IFS=$' \t\n' and use [*].
There was a problem hiding this comment.
Thanks for the explanation. Given the preference for @, what should be done about shellcheck?
There was a problem hiding this comment.
Actually, we have never used @ in this context. I think we just used IFS and * for this type of processing.
There was a problem hiding this comment.
Yeah sorry, I didn't know about the shellcheck rule. [*] with setting IFS locally is fine
There was a problem hiding this comment.
I searched the codebase. It seems we haven't had similar cases (with [*] or [@] in the right-hand side of a scalar assignment) so far. The following cases explicitly set IFS to a special value:
./completions-core/gnokii.bash-229- local IFS='|'
./completions-core/gnokii.bash:230: local regex_main_cmd="(${main_cmd[*]})($|[^_[:alnum:]])"
--
./completions-core/invoke-rc.d.bash-20- local IFS='|'
./completions-core/invoke-rc.d.bash:21: local exclude="@(${words[*]})"
--
./completions-core/lintian.bash-33- local IFS='|'
./completions-core/lintian.bash:34: todisable="@(${todisable[*]})"
--
./completions-core/mr.bash-23- local IFS='|'
./completions-core/mr.bash:24: local glob_commands="@(${commands[*]})"The following case is not accompanied by any adjustment of IFS, so it's broken with a custom IFS:
./completions-core/tree.bash:37: local line="${words[*]}"If we search for ${a[*]} not restricted in the right-hand sides of assignments, there seem to be many cases affected by custom IFS, which I think we can work on in another PR.
There was no occurrence of "${a[@]}" in the right-hand sides of assignments.
There was a problem hiding this comment.
Thank you 👍
I did a similar exploration after this feedback.
I was unsure about using IFS, due to scope effects elsewhere -- especially since there is no RHS scalar assignment like this in bash_completion, and _comp_initialize is a function called quite early. For example, it will affect _comp_command_offset.
I also wondered if, when IFS is used, it's set at the top of a function (i.e., for reader visibility), or instead near the site needed / affected. The approaches seems evenly split. For example, _comp_split is at the top, so is ssh-keygen.bash. Usage in _comp_compgen is at the usage site with a comment, so I've set it just above usage as well. (I also considered a new function, to confine IFS scope, but it seemed unnecessary.)
Aside from this PR --
Since lint can't be used for this, an addition to the quoting section of the style guide could be helpful. Perhaps something like:
diff --git a/doc/styleguide.md b/doc/styleguide.md
index cff6d95b..436a631b 100644
--- a/doc/styleguide.md
+++ b/doc/styleguide.md
@@ -171,6 +171,10 @@ exceptions where the quoting is still needed for other reasons.
suggests that e.g. `var= cmd` is confusing with `var=cmd`.
- `$*` and `${array[*]}` need to be always quoted because they can be affected
by the word splitting in bash <= 4.2 even in the above contexts.
+- When joining array elements into a scalar variable, prefer `"${array[*]}"`
+ over `"${array[@]}"`, and ensure `IFS` is set to the intended separator
+ (typically `local IFS=$' \t\n'`), because `[*]` joins by the first character
+ of `IFS`.
- In the following contexts, double-quoting of shell expansions is needed
unless the result of expansions is intentionally treated as glob patterns or
regular expressions.I used an LLM for code exploration and review (help ensuring adherence to the style guide and API naming documentation), and such an explanation would have caught this usage.
There was a problem hiding this comment.
For example,
_comp_splitis at the top,
That is because IFS is determined by the options specified to _comp_split, and we need to give the default value at the top of _comp_split before starting the argument parsing.
so is
ssh-keygen.bash.
I think that is because ${words[*]} and ${protocols[*]} are used many times (8 times) in the function, and the smallest common scope seems to be the function scope.
However, there doesn't seem to be a particular reason for IFS set at the top of _comp_load. Git Blame reveals that it was introduced in commit d9082d2, but I don't remember the reason that I put it at the top. The IFS is necessary for the line $(complete -p "$cmd" 2>/dev/null || echo false) "\\$cmd" && return 0, so it can be set just above this line technically.
(I also considered a new function, to confine IFS scope, but it seemed unnecessary.)
Yeah, that is another possibility, something like _comp_join.
Added. I noticed the fixture directory, so I used it for stable
Added, and thank you, this caught a bug. I hadn't considered the |
6286e30 to
72dbc5c
Compare
Since bash 4.3, $(cmd args is treated as a single opaque word in COMP_WORDS, so the outer command's completion function produces nothing. Detect when cur starts with $( in _comp_initialize, rewrite COMP_LINE / COMP_POINT / COMP_WORDS / COMP_CWORD to describe the command line inside the substitution, and dispatch to the inner command's completion function via _comp_command_offset 0. Handle nested substitutions (e.g. echo $(cmd1 arg $(cmd2 <TAB>) by looping: after splitting inner_line into words, scan backward with cumulative $( and ) counts to find the rightmost unclosed $(. Counting across words is necessary because $( and ) may land in different words after splitting (e.g. "$(echo" "hi)"). If found, rebuild inner_line from that word onward, strip its $(, and repeat until the innermost level is reached. Tests: - Closed $() does not interfere with normal completion - Single-level: echo $(csub_cmd <TAB> - Single-level partial: echo $(csub_cmd a<TAB> - Nested: echo $(echo foo $(csub_cmd <TAB> - Nested partial: echo $(echo foo $(csub_cmd b<TAB> - Closed-then-open: echo $(echo hi) $(csub_cmd <TAB> - Open-then-closed: echo $(csub_cmd $(echo hi) <TAB> Closes scop#630
72dbc5c to
325ccd0
Compare
| nested_idx="" | ||
| open_count=0 | ||
| close_count=0 | ||
| for ((i = COMP_CWORD; i >= 0; i--)); do | ||
| local word=${COMP_WORDS[i]} | ||
| local without_closes=${word//)/} | ||
| ((close_count += ${#word} - ${#without_closes})) | ||
| if [[ $word == *'$('* ]]; then | ||
| local without_opens=${word//\$\(/} | ||
| ((open_count += (${#word} - ${#without_opens}) / 2)) | ||
| if ((open_count > close_count)); then | ||
| nested_idx=$i | ||
| # re-parse from this word onward; | ||
| # inner_line is assembled below | ||
| break | ||
| fi | ||
| fi | ||
| done |
There was a problem hiding this comment.
The current implementation falls into an infinite loop with the following completion attempt:
$ cmd1 $(cmd2 a$(cmd3)b$(cmd4 [tab]Also, the command name is wrongly extracted to be pwd)/$(cmd3 with the following completion attempt:
$ cmd1 $(cmd2 $(pwd)/$(cmd3 [tab]Actually, I have difficulty understanding the choice of the present algorithm. Even if the above issues are fixed somehow, is this algorithm stable and robust in general? Since I currently don't see a clear path or picture to understand the construction of the current algorithm, the algorithm appears to be a random trial currently.
Can you explain what is the assumptions of the algorithm, and how it is ensured to work under the given assumptions?
There was a problem hiding this comment.
Thank you!
Given the failures, yes, one assumption that words do not contain unbalanced pairs is a mistake, and another being that the first $( in the COMP_WORD is correct one. Evidently not robust. :) I was trying to find the word right-adjacent to the last open $( but the pair-wise counting by line length difference won't work, and the strip is to the wrong $(. I wouldn't say "random trial" :-) but definitely too naive. I was experimenting with test cases on my command line, and hoping to avoid writing a parens matcher with character position.
I'll go through it more carefully, for another approach.
There was a problem hiding this comment.
Thank you for the explanation!
Thank you for thinking about another approach! If it turns out to be difficult, we don't necessarily have to support all possible cases. I think the important thing is that we want to gracefully fail in the unsupported cases (instead of causing an infinite loop or producing a strange result); for example, if the required assumptions are clear, we may examine the contents of cur and give up the special treatment when the assumptions are unsatisfied.
Closes #630
python -m pytest test/t/unithas two failing tests and two skipped tests intest_unit_compgen_filedir.py. This is the same as onmain, so I do not believe they are related to this patch.python -m mypy .andpython -m ruff checpassI am unsure what to make of the
runLintoutput, but it has exit status0, so I presume it is OK