-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The context
Historically we've been applying the SHA3-512 integrity signature to the disclosure that was coming from the user-submitted data from https://openethics.ai/label/generate/ directly on the Open Ethics Label server.
This approach is totally fine in the centralized environment where every step of the process is controlled within Open Ethics infrastructure. However, this approach lacks to deliver on one important aspect - decentralization. The reason for that is because there's no standard way to validate the disclosure without bumping into an issue where the hashes for the same disclosure JSON object don't match.
How it works right now (see files attached, or generate one yourself)
- The user fills in the disclosure form on https://openethics.ai/label/generate/
- The entered data is stored in the JSON object's key
snapshot - The JSON object gets serialized in JavaScript and is converted into string
- The string is sent to the server for the signature
- The server injects a Unix timestamp into the
snapshotastimestampkey - The server generates a SHA3-512 signature on the serialized timestamped
snapshot - The server responds with the timestamped and signed JSON
- The signed JSON (timestamped and augmented with the integrity signature) is stored in the DB
- The signed JSON is displayed to the user as
oetp.json - The
oel.htmlfile with theiframevisualization of the Open Ethics Label is displayed to the user, it also contains the integrity signature in iframe src URL
The problematics
A. Scalability
Currently the disclosure server managed by the Open Ethics has a limited capacity and is not ready for higher loads or large disclosure files.
B. Interoperability
Different programming languages perform JSON serialization differently, therefore if one would like to validate the existing disclosure JSON manually they may not receive the same result as was received when this disclosure was signed. This creates a problem because there's no way except for trusting Open Ethics. In addition, this excludes interoperability for external providers, designed in the Open Ethics Transparency Protocol.
Specifically,
- The presence of non-ascii characters in the JSON file could impact the signature and UTF-8 characters will be converted which breaks readability.
- The order of the keys in the disclosure JSON if not the same like on the Open Ethics Label page could impact the signature.
- The presence of the spaces as additional separators in the string impact the signature.
Proposed solution
To address this issue, it's to (A) decouple the disclosure website and the signature infrastructure as well as to (B) create a transparent criteria of how the integrity signatures should be formed given the current explanation in 4.1.1 of the OETP is not concrete enough.
Acceptance criteria
IMPORTANT: There is no requirement for the new implementation to generate the signatures which will be the same as in the current implementation, however all the future implementations should rely on the standard description and be cackwards-compatible both for signature generation and for signature validation.
- Documentation: The criteria of the serialized string that should be signed by the SHA3-512 hashing algorithm should be explicit and should be outlined.
- The result of the signature should be indifferent to the sequence of the keys in the input JSON file
- The UTF-8 characters should be unescaped in the serialized JSON
- The demo API
sinature-generateendpoint for signature should be created - The demo API endpoint should accept POST requests with the request body containing the disclosure JSON, but not signed yet
- The demo API endpoint should return the timestamped and signed JSON
- The code implementation in Python should be published in the OETP repository in
examples/implementation
Testing
The integrity signatures issued for disclosure#1 and disclosure#2 should be equal
Disclosure #1
{
"disclosure": {
"timestamp": 1740917935,
"key1": "This is the key1 content",
"key2": "✅ this is content with non-ASCII characters",
"key3": "Цей конте́нт містить кирилічні символи",
"key4": {}
}
}Disclosure #2
{"disclosure": {"key3": "Цей конте́нт містить кирилічні символи", "key1": "This is the key1 content","timestamp": 1740917935,"key4": {},"key2": "✅ this is content with non-ASCII characters"}}Equivalence
Generated integrity has should be equivalent to using https://emn178.github.io/online-tools/sha3_512.html or https://www.strerr.com/en/sha3_512.html
Benefits of the implementation
- Ability to decouple the front-end and the back-end of the disclosure process during the signature phase.
- Ability to decentralize disclosure process.
- Publishing the documentation on the signature process will
- make validation more transparent, bring trust, and allow for others to implement it too.
Future steps
- Implementation of the
signature-validateendpoint which will be backwards compatible with the disclosures done up-to-date. - Transition to the new infrastructure with the decoupled form and the signature
- Update to the OETP documentation