Skip to content

Conversation

@ben-dz
Copy link

@ben-dz ben-dz commented Dec 19, 2025

Summary of Changes

RFC defines changes required to Solana validators and transaction senders' use of QUIC to support edge filtering.

@ben-dz ben-dz force-pushed the bdz/quic-rfc branch 2 times, most recently from a9dc9c6 to 5c9fead Compare December 19, 2025 17:03
@ben-dz ben-dz marked this pull request as ready for review December 19, 2025 17:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This RFC proposes modifications to Solana validators' and transaction senders' use of QUIC protocol to enable FPGA-based edge filtering in the DoubleZero network. The changes aim to overcome QUIC's encryption, flow control, and packet formatting challenges while minimizing modifications to existing Solana validator and QUIC library code.

Key Changes:

  • Introduction of in-band encrypted session key sharing mechanism via modified HANDSHAKE_DONE packets
  • Frame substitution approach using RESET_STREAM to handle dropped traffic while maintaining flow control
  • Enforcement of specific packet formatting requirements including fixed 8-byte CIDs, 1232-byte stream frames, and standardized encryption

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
rfcs/rfcx-quic-changes-for-edge-filtering.md New RFC document detailing QUIC protocol modifications for FPGA edge filtering support
CHANGELOG.md Added changelog entry for the new RFC

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <[email protected]>
Copy link

@alexpyattaev alexpyattaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. Some initial feedback below:

└─────────────────────────────────────────────────────────────––––┘
```

This initial architecture passes through any non-QUIC traffic, and any traffic for which the FPGA does not have the keys. A future improvement may add a join-time option for a server to drop 1RTT traffic for which the FPGA does not have keys. The drawback of this would be an increased connection spin-up latency, for a benefit of further reducing bad traffic reaching the server.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definition of join-time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it to say "an option to be selected at the time of connection to DoubleZero"


### 2. Flow Control

Assuming that the cryptographic problem is solved, the FPGA needs a way to handle the QUIC connection once it determines that it wants to drop a stream frame due to edge filtering logic. Unlike in UDP, the FPGA cannot drop the packet or frame. First, the client will re-try sending until the packet is acknowledged. Second, QUIC’s built in flow control will eventually cause the connection to stall because the server will not keep advancing the `MAX_DATA` window since it will not have received the amount of data that the client has sent, if the packet is received but has been shorted by dropping the frame.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm "Unlike in UDP, the FPGA cannot drop the packet or frame." ?


----

As a result of this discrepancy between the two validator software stacks, there are two proposed options for the `FINAL_SIZE`: If Agave is changed to match Firedancer with a 2^62 `MAX_DATA`, then the FPGA will always set `FINAL_LENGTH` to 4k. If Agave continues to use the `MAX_DATA` backstop, then the FPGA will make a best guess based on offset+len, and it is recommended (but not necessary) that Quinn is modified based on the recommendation in #1 above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agave change is trivial

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's technically trivial to change, but would folks like to see the MAX_DATA backstop remain?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not required for correct operation => not an issue worth debating here IMO. Just state the fact - MAX_DATA will be messed up and our servers will ignore it =)


- if Quinn receives a `RESET_STREAM` with a different `FINAL_LENGTH` then it has already determined, Quinn must not issue a `FNAL_SIZE_ERROR`. This is slightly different from the previous point, and addresses an edge case where the `RESET_STREAM` replaced a packet without the Fin but, but that packet has already been received by the server.

**4k Transactions & Fragmentation:** Transactions which are fragmented across multiple packets may not be dropped until the last packet depending on the criteria causing the drop. There isn’t a workaround for this other than storing and forwarding, which is not practicable for the amount of traffic a single edge filtering node might be handling. The FPGA must issue the `RESET_STREAM` as soon as it knows that a drop is desired.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear how fragmented would be handled

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% on how we're going to handle it in the FPGA either, other than that we have to find a way to do so since fragmentation is inevitable. I have a few different ideas, but I think the handling system will partly depend on the fragmentation rules settled on. I'm coming around to a possible store-forward system being able to be performant enough if we place enough limits on fragmentation.

The point here is that the FPGA will issue the RESET_STREAM as soon as it knows the transaction is bad- which could be subbing for any frame in a multi-frame stream. So in making modifications to a receiving validator ahead of edge filtering support, software needs to account for three possibilities:

  • FPGA subs reset_stream for first frame, but delivers the rest of the frames.
  • FPGA subs reset_stream for a middle frame, having delivered some frames already, and delivering some additional one(s) after.
  • FPGA subs reset_stream for last frame, having already delivered all preceding data in the stream.

The first and last are most likely scenarios.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delivering anything for a stream after reset is a protocol breaking change. hacking the server to ignore it is possible but not desirable. I'd prefer if FPGA zeroed out the payload or just dropped the last fragment. Granted it would not help with bandwidth much, but the server is welcome to blacklist the peer to save on bandwidth.


<br>

**Packet Fragmentation**: Stream frames must be 1232 bytes, except for the last frame in a stream. If a frame is shorter than 1232 bytes and does not have the FIN flag, then the FPGA must replace it with a `RESET_STREAM` to prevent abuse of the connection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is way more than needed to parse tx header. why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, it makes reassembly and managing buffers simpler all around- max data until you don't have that much left, then the rest. On the Rx side, you have four windows into which new data gets slotted, and it always lands at one of those four addresses.

It's similar to the approach USB uses, but USB has to do it that way because there's no equivalent of the FIN flag. The short frame defines the end. We don't have to do it, but it does make for a clean RX side.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

problem is, this requires client compliance


> 💡 This ensures the FPGA has the most information possible as early as possible, that transactions are broken up into a predictable pattern, and that a malicious sender does not break a transaction into tiny pieces to get around filtering. Currently Agave allows smaller frames, but only four total fragments. Since there are already rules about fragmentation, this adjustment of those rules allows the Edge Filtration to be more useful and efficient. This is already usually met by a normal sender.

> 👉 There is an option to instead enforce some smaller (but reasonable) minimum size for the first frame in a stream. For example, requiring all the signatures, or signatures + header of a transaction to be in the first frame. However once some size constraint must be enforced, we might as well enforce something that will make both Validator and FPGA code paths more efficient (thus optimizing for processor time, rather than network bandwidth).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

far more reasonable!


**Coalescing:** Short header packets must not be coalesced, even after a long-header packet. If the UDP datagram contains a QUIC short header packet, then that must be the first and only thing in the packet.

**Frame Ordering:** Stream Frames must be the first thing in a QUIC packet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be very hard to enforce...


**Frame Ordering:** Stream Frames must be the first thing in a QUIC packet.

**Single Stream Frame per Packet**: There must be only one Stream frame in a QUIC packet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smallest TX is <200 bytes. epic waste of IOPS...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one that had been suggested (don't recall by who) back in September and everyone seemed on board with it. In a bare-metal embedded world or an FPGA world, the added processor overhead is workable, but perhaps running on top of Linux makes that concerning?

Were transactions in separate UDP packets before QUIC?

Session secrets will be passed to the FPGA encrypted by an FPGA pubkey so that any other snoopers of network traffic cannot intercept them. If the FPGA's private key is compromised, then it can be rotated, and the validator software updated to match the new key. Since session secrets are ephemeral, previously captured secrets have no future value. Any validator or sender with concerns that a particular session may have been compromised needs only to disconnect and reconnect to establish new secrets.

### FPGA Access to Transactions
Some in the Solana community may be concerned that the DoubleZero FPGA will have access to transaction data as it passes through. The Solana Core Dev community has agreed that since DoubleZero is a trusted contributor to the Solana ecosystem, this is acceptable. A developer of Validator software would have similar access to transaction flow. Additionally, until recently the transactions were not encrypted in the first place, and the change to QUIC for TPU was for the purpose of flow control, not encryption. Any validator who does not wish to allow the DoubleZero FPGA this access can choose not opt into edge filtering.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A developer of Validator software has no access to TX flow, the operator does. Devs do not distribute binaries, only source.

Copy link
Author

@ben-dz ben-dz Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I will just cut this. My point was more that there were other places someone could add code to do untoward things, but not really relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants