Skip to content

Disclosure Integrity Signature as a separate API #31

@nikiluk

Description

@nikiluk

The context

oetp_current.json

Historically we've been applying the SHA3-512 integrity signature to the disclosure that was coming from the user-submitted data from https://openethics.ai/label/generate/ directly on the Open Ethics Label server.

This approach is totally fine in the centralized environment where every step of the process is controlled within Open Ethics infrastructure. However, this approach lacks to deliver on one important aspect - decentralization. The reason for that is because there's no standard way to validate the disclosure without bumping into an issue where the hashes for the same disclosure JSON object don't match.

How it works right now (see files attached, or generate one yourself)

  1. The user fills in the disclosure form on https://openethics.ai/label/generate/
  2. The entered data is stored in the JSON object's key snapshot
  3. The JSON object gets serialized in JavaScript and is converted into string
  4. The string is sent to the server for the signature
  5. The server injects a Unix timestamp into the snapshot as timestamp key
  6. The server generates a SHA3-512 signature on the serialized timestamped snapshot
  7. The server responds with the timestamped and signed JSON
  8. The signed JSON (timestamped and augmented with the integrity signature) is stored in the DB
  9. The signed JSON is displayed to the user as oetp.json
  10. The oel.html file with the iframe visualization of the Open Ethics Label is displayed to the user, it also contains the integrity signature in iframe src URL

The problematics

A. Scalability

Currently the disclosure server managed by the Open Ethics has a limited capacity and is not ready for higher loads or large disclosure files.

B. Interoperability

Different programming languages perform JSON serialization differently, therefore if one would like to validate the existing disclosure JSON manually they may not receive the same result as was received when this disclosure was signed. This creates a problem because there's no way except for trusting Open Ethics. In addition, this excludes interoperability for external providers, designed in the Open Ethics Transparency Protocol.

Specifically,

  1. The presence of non-ascii characters in the JSON file could impact the signature and UTF-8 characters will be converted which breaks readability.
  2. The order of the keys in the disclosure JSON if not the same like on the Open Ethics Label page could impact the signature.
  3. The presence of the spaces as additional separators in the string impact the signature.

Proposed solution

To address this issue, it's to (A) decouple the disclosure website and the signature infrastructure as well as to (B) create a transparent criteria of how the integrity signatures should be formed given the current explanation in 4.1.1 of the OETP is not concrete enough.

Acceptance criteria

IMPORTANT: There is no requirement for the new implementation to generate the signatures which will be the same as in the current implementation, however all the future implementations should rely on the standard description and be cackwards-compatible both for signature generation and for signature validation.

  • Documentation: The criteria of the serialized string that should be signed by the SHA3-512 hashing algorithm should be explicit and should be outlined.
  • The result of the signature should be indifferent to the sequence of the keys in the input JSON file
  • The UTF-8 characters should be unescaped in the serialized JSON
  • The demo API sinature-generate endpoint for signature should be created
  • The demo API endpoint should accept POST requests with the request body containing the disclosure JSON, but not signed yet
  • The demo API endpoint should return the timestamped and signed JSON
  • The code implementation in Python should be published in the OETP repository in examples/implementation

Testing

The integrity signatures issued for disclosure#1 and disclosure#2 should be equal

Disclosure #1

{
    "disclosure": {
    "timestamp": 1740917935,
        "key1": "This is the key1 content",
        "key2": "✅ this is content with non-ASCII characters",
        "key3": "Цей конте́нт містить кирилічні символи",
        "key4": {}
    }
}

Disclosure #2

{"disclosure": {"key3": "Цей конте́нт містить кирилічні символи", "key1": "This is the key1 content","timestamp": 1740917935,"key4": {},"key2": "✅ this is content with non-ASCII characters"}}

Equivalence

Generated integrity has should be equivalent to using https://emn178.github.io/online-tools/sha3_512.html or https://www.strerr.com/en/sha3_512.html

Benefits of the implementation

  • Ability to decouple the front-end and the back-end of the disclosure process during the signature phase.
  • Ability to decentralize disclosure process.
  • Publishing the documentation on the signature process will
  • make validation more transparent, bring trust, and allow for others to implement it too.

Future steps

  • Implementation of the signature-validate endpoint which will be backwards compatible with the disclosures done up-to-date.
  • Transition to the new infrastructure with the decoupled form and the signature
  • Update to the OETP documentation

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions