Skip to content

Conversation

@bartolomej
Copy link

@bartolomej bartolomej commented Aug 4, 2024

Description

POC for preserving comments in Cadence AST. This should enable us to make a fully working, pretty printer for Cadence.

This is a follow-up to my discussion in Discord some time ago: https://discord.com/channels/613813861610684416/1108479699732152503/1240749220357607475

Closes #308

Design

I decided to start tracking comments in the form of leading/trailing comment structs attached to lexer tokens and AST nodes as discussed in #308 (comment).

I believe this approach (instead of the old one where comments were tracked as AST nodes) could simplify comment parsing logic, as the lexer already computes which tokens are associated with which comments + it removes the need to skip comments when parsing.

Notes

  • This change will touch quite a lot of the parsing code, as we'd need to update how we parse declaration/statement/expression (and possibly other) AST nodes to attach leading/trailing comments to them. These changes shouldn't be difficult to implement but are pretty sensitive regardless, so updating and expanding the test suite will be important.
  • We should also test that the comments in the real-world code on mainnet/testnet are preserved as expected + possibly add a runtime check to see if all the comments are correctly preserved when doing prettifying, similar as it's done in prettier: https://github.com/prettier/prettier/blob/251ecabbdee509954f7b0d33b38da9ec4ae93ad8/src/main/comments/print.js#L206-L222
  • I'm tracking some of the leftover work with TODO(preserve-comments) in code

Definition of Done

  • Add AST elements for comments:
    • Block comment
    • Line comment
  • Add comments to AST elements when parsing
    • Declarations
      • Variable declaration (let/var)
      • Function declaration
      • Import declaration
      • Event declaration
      • Struct declaration
      • Resource declaration
      • Entitlement declaration
      • Attachment declaration
      • Contract declaration
      • Enum declaration
      • Transaction declaration
      • Pragma declaration
    • Statements
      • Return statement
      • Break statement
      • Continue statement
      • If statement
      • Switch statement
      • While statement
      • For statement
      • Emit statement
      • Remove statement
      • Function declaration statement
      • Expression statement (skip for now, tackle when/if expression support is needed)
      • Assignment statement (skip for now, tackle when/if op token support is needed)
      • Swap statement (skip for now, tackle when//if op token support is needed)
    • Expressions
      • Binary expression
      • Unary expression
      • Function expression
      • Identifier expression
      • Array expression
      • Dictionary expression
      • Member expression
      • Index expression
      • Conditional expression
      • Invocation expression
      • Integer literal expression
      • Fixed-point literal expression
      • String literal expression
      • Composite literal expression
      • Create expression
      • Destroy expression
      • Reference expression
      • Force expression
      • Path expression
      • Type expression
  • Tests

TODOs

Description


  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Code follows the standards mentioned here
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

@bartolomej bartolomej changed the title Retain comments in AST [WIP] Retain comments in AST Aug 4, 2024
Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Had another pass through the code with some suggestions. Will review the tests next

},
},
Source: "*/",
Source: "\000",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come the source is the null character here?

Copy link
Author

@bartolomej bartolomej Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is expected, since the error occurs at EOF (which doesn't have a position within the source code) - added a clarifying comment now.

The EOF token has the same source, but its just not checked, in unit tests: https://github.com/bartolomej/cadence/blob/43b4adf252468a0580c0adc635c7715d956777e2/parser/lexer/lexer_test.go#L74-L81

Do you think null source for error tokens could cause any unexpected issues downstream?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retain comments in AST

4 participants