Skip to content

Marcodaum/Distributed-Privacy-Preserving-Data-Sharing

Repository files navigation

Distributed Privacy-Preserving Data Sharing Prototype Application

Introduction

The prototype application for distributed, privacy-preserving data sharing combines the benefits of decentralized file distribution with strong anonymity. By utilizing the widely used Sphinx Packet Protocol, it becomes nearly impossible for outsiders to determine the addresses of distributors (seeders) or downloaders (leechers), provided that a sufficient number of mix nodes is configured.

This prototype is built on the Drasyl overlay network and follows a simple structure consisting of one or more trackers and peers. Every Peer to Peer (P2P) message is routed through a mix network composed of randomly selected peers, ensuring enhanced privacy and security.

Terminology

The prototype application incorporates multiple concepts from popular file-sharing protocols, such as BitTorrent. As a result, there is significant overlap in terminology. Below are some key terms to be aware of when using the application:

  • Seeder: A node that distributes a file.
  • Leecher: A node that downloads a file.
  • Tracker: A node responsible for providing the addresses of seeders for a requested file to a leecher.
  • Piece: A segment of a file requested by the leecher from a seeder.
  • Chunk: A smaller part of a piece, split to fit within a Sphinx message.
  • Intermediate Node: The node that receives the seeder’s response before forwarding it.
  • Final Destination Node: The node to which the intermediate node forwards the data.
  • Session Key: A symmetric AES key generated by the leecher and used by the seeder to encrypt the piece and message header.

Getting Started

Setting Up

Since every message is routed through n mix nodes, the network requires at least n initial peers to function properly. Without a sufficient number of mix nodes, message routes may not be sufficiently obfuscated, potentially allowing P2P communication between nodes to be traced.

Make sure you have the following directory structure set up:

.
├── README.md
├── config_files
│   ├── torrent-files.json
│   └── tracker-files.json
├── identities
├── input
├── output
├── pom.xml
└── src
    └──...

Before launching the application, it is essential to configure the prototype with the necessary initial state parameters for trackers and peers. Currently, all configuration files are stored in the config_files/ directory.

  • If a peer is initially providing a file, it must be placed inside the input/ directory.
  • Any files downloaded later will be stored in the output/ directory.
  • All Drasyl identity files are stored in the identities/ directory.

Further, the application requires the following library, implementing the Sphinx packet format: https://github.com/rsoultanaev/java-sphinx. The compiled jarfile can also be found in the project's root directory. It is necessary to manually add the compiled library within the project's dependency settings after installing the maven dependencies, specified in pom.xml.

tracker-files.json

This configuration defines parameters for the centralized tracker(s). It includes an entry for each file initially known to the tracker. Since the torrent application supports multiple trackers, the DrasylAddress of the specific tracker managing a file must be specified. Additionally, the DrasylAddress of all initially seeding peers can be configured, though this is often unnecessary, as seeders automatically register with the tracker.

In large networks, a tracker may be aware of numerous seeders. Sending the full list to a peer requesting a file can lead to excessive network traffic and complicate file management. To mitigate this, the number of seeders shared by the tracker can be limited.

Furthermore, each peer can specify a maximum number of seeders to download pieces from, which can be configured when instantiating a new peer.

For example, the tracker configuration for a file named testfile might look like this:

{
  "files": [
    {
      "filename": "testfile",
      "trackerAddress": "6dec92333f223ae52ab13735a7dcba3b3fc21faf208d81036aea9107a3a640a3",
      "seederIds": [
        "0fad3056af191ce7a5f51395139b59d8063061f51d961afa4bc640863c7bdda3"
      ],
      "maxNoOfSeeders": 50
    }
  ]
}

Additionally, the tracker accepts peers that register themselves as seeders for a file not previously known to the tracker. In such cases, the tracker assigns a default value for maxNoOfSeeders, which is a constant defined in the main class and can be configured as needed.

torrent-files.json

Similar to other P2P file-sharing applications, each peer requires a torrent file containing essential information about the file to be downloaded or shared. In addition to the filename and the DrasylAddress of the responsible tracker, the total file size (in bytes) must be specified. This information enables peers to divide the file into manageable pieces and apply padding as needed.

In this prototype application, each torrent file is represented as an entry in the torrent-files.json configuration. Currently, filenames must be unique and used only once, as they serve as unique keys for identifying file-related parameters. However, this requirement may change in future versions.

For example, the torrent configuration for a file named testfile might look like this:

{
  "files": [
    {
      "filename": "testfile",
      "trackerAddress": "6dec92333f223ae52ab13735a7dcba3b3fc21faf208d81036aea9107a3a640a3",
      "sizeInBytes": 17428
    }
  ]
}

Starting Trackers and Peers

Firstly, the tracker must be started by creating a new instance of the tracker object. A unique trackerId must be assigned, though it does not affect the program’s logic. Instead, it is used to locate the corresponding identity file within the identity/ directory.

Since the tracker currently only responds to Single Use Reply Blocks (SURBs), the number of used mixes cannot be configured, as it is predetermined by the requesting peer. Once instantiated, the tracker starts automatically.

After initializing the tracker, ensure that the correct tracker DrasylAddress is configured in both the torrent_files.json and tracker_files.json files.

int trackerId = 0;
Tracker tracker = new Tracker(trackerId);

Secondly, peers can be created, with two available options:

  • Single Peer
  • Peer Pool (recommended): A group of peers serving the same purpose—either exclusively seeding or downloading (and subsequently distributing the file). Before creating peers, the torrent files must be loaded. A peer pool can then be initialized by calling the createPeerPool() function. Like the tracker, peers start automatically upon creation.

At least one seeding peer must exist initially. Its DrasylAddress can either be preconfigured in the tracker-file.json configuration or dynamically registered with the tracker if not already known.

The FileOwnership.ownsFile option indicates that the peer already has the file locally, stored in the input/ directory. This peer will immediately register as a seeder at the tracker.

The FileOwnership.requestsFile option directs the peer to request file pieces from existing seeders. Once the peer has downloaded the entire file, it registers itself as a distributor for that file on the tracker.

Exemplary, creating and starting four seeders and five leechers could look like this:

HashMap<String, TorrentFile> torrentFiles = TorrentFiles.getTorrentFiles();

int noOfSeedingPeers = 4;
int noOfDownloadingPeers = 5;
// Specifying 0 for maxSeedersPerFile divides a file into as many pieces as there are seeders available
int maxSeedersPerFile = 0;
ArrayList<Peer> seeder = PeerPool.createPeerPool(noOfSeedingPeers, torrentFiles.get("testfile"), maxSeedersPerFile, FileOwnership.ownsFile);
ArrayList<Peer> leecher = PeerPool.createPeerPool(noOfDownloadingPeers, torrentFiles.get("testfile"), maxSeedersPerFile, FileOwnership.requestsFile);

Protocol Description

Each message is transmitted as a standard Drasyl message. Its payload contains a Sphinx message, encrypted multiple times using symmetric keys derived from a Diffie-Hellman key exchange with each mix node. The encapsulated data includes a message wrapper and a chunk wrapper. The following message types are available:

Value Enum Payload Description
0 FILEINFOREQUEST Nymtuple (Description see below) A peer uses this message type to send a request to the tracker, asking for seeding peers.
1 FILEINFO Seeder addresses (UTF-8 encoded SphinxNodeIds of seeders for a file, each 4 bytes long); concatenated using , as divider. The tracker responds to the FILEINFOREQUEST with the seeders to the corresponding file.
2 PIECEREQUEST AES IV (16 Byte) and Key (32 Byte) for the session key; concatenated without divider The seeder receives a request to provide one piece of a file.
3 PIECEDATA Pure data of one piece (might be chunked). The leecher receives one piece of a file, previously requested through PIECEREQUEST.
4 FILEPUBLICATION - A peer sends a message with MessageType FILEPUBLICATION to the tracker, indicating it will serve as a seeder for the specified file.

Message Wrapper

When nodes communicate with each other, they always wrap their payload with a Message Wrapper, adding additional header information. The header is structured as follows:

Message Header

Byte No. Length Type Unit Name Description
0 1 Byte MessageType MessageType The message type of the message.
1 1 Byte Bytes Filename length The length of the filename.
2 24 String (UTF-8 encoded) - Filename The filename of the associated file.
26 4 Int Bytes Payload Length The payload length.
30 4 Int - Mix Node ID A mixnode id.
- Entry mixnode ID set for FILEINFOREQUEST so the tracker sends the SURB to that mix node.
- Intermediate mixnode ID set for PIECEREQUEST so the seeder addresses the piece to the intermediate node.
34 4 Int - Total Pieces per File The total number of pieces, requested for one file. It is only set for PIECEREQUEST and PIECEDATA.
38 4 Int Bytes Piece Size The requested piece size. It is only set for PIECEREQUEST and PIECEDATA.
42 4 Int Bytes Offset The offset in bytes from the beginning of the file where the requested piece is located. It is only set for PIECEREQUEST and PIECEDATA.

Chunk Wrapper

Sometimes, messages may exceed the predefined Sphinx packet size, requiring additional fragmentation. To accommodate this, encoded message data can be divided into separate chunks, enabling the transmission of even large messages.

Chunking is particularly important for PIECEDATA messages, as pieces often surpass the body length of a Sphinx packet. Since the seeder does not send chunks directly to the final destination but instead routes them through an intermediate node designated by the leecher, additional header information is required.

Because the intermediate node serves as a cryptographic endpoint, it would not know whether to process the encrypted chunks or forward them to another peer. To prevent this, the payload is encapsulated within a chunk wrapper, which contains only the essential parameters needed by the intermediate node to forward the message, along with the encrypted message (message wrapper and payload). Furthermore, the chunk wrapper includes parameters necessary for reconstructing the original message from its individual chunks.

The chunk wrapper consists of the following fields:

Byte No. Length Type Unit Name Description
0 4 Int - Chunk No The number of the current chunk.
4 1 Boolean - Last Chunk Flag A flag, indicating that the chunk is the last one for the specific piece. If this flag is set, the application knows the missing chunks between 0 and that number.
5 1 Boolean - Encryption Flag A flag, indicating a encrypted payload. It is set both for the intermediate node and the final destination (leecher).
6 1 Boolean - Final Destination Flag A flag, indicating whether the current peer is the final destination or simply a intermediate node.
7 1 Boolean - Encrypted Destination Drasyl Address Set Flag A flag, indicating whether the encrypted destination address is set. It is unset after the intermediate node has processed the chunk.
8 256 String (UTF-8 encoded) - Encrypted Destination Drasyl Address The final destination address, encrypted with the RSA public key of the intermediate node. The intermediate node decrypts it with its private key and forwards the chunk to that address.
264 4 Int - Request ID The request id of the piece. Through the request id, chunks can be associated to one piece. It is only set for PIECEREQUEST and PIECEDATA message types.

Nymtuple

The NymTuple is used to embed data into a SURB. Therefore, it must be transferred between peers. Since encoding the NymTuple object is not straightforward, the following table shows the structure of a serialized NymTuple.

Note: Since reply blocks are sent directly between peers, no nymserver is needed, making the Ktilde entry obsolete. However, the javasphinx library always requires the Ktilde entry to be set. For simplicity, the prototype application also transmits this value.

Element No. Length Type Unit Name
0 4 Int Bytes Node Length
1 Node Length Byte[] - Node
2 4 Int Bytes Header (Alpha) Length
3 Header (Alpha) Length ECPoint - Header (Alpha)
4 4 Int Bytes Header (Beta) Length
5 Header (Beta) Length Byte[] - Header (Beta)
6 4 Int Bytes Header (Gamma) Length
7 Header (Gamma) Length Byte[] - Header (Gamma)
8 4 Int Bytes Ktilde Length
9 Ktilde Length Byte[] - Ktilde

Limitations

Since the application is designed to demonstrate the concepts developed in the corresponding paper, its functionality is intentionally limited in certain aspects. The following details outline these limitations:

  1. The relative path for all files to be seeded must be the same, currently set to the input/ and output/ directory.
  2. A peer can only load one torrent file at a time, meaning it can only download and distribute a single file.
  3. A peer distributing one file can not download another file.
  4. All peers must be configured with the same Sphinx body length.
  5. Chunking is only supported for the PIECEDATA message type. All other messages must fit within the predefined Sphinx body length. However, configuring the body length > 1024 Bytes, eliminates the need for chunking with other message types.
  6. Anonymity is currently only guaranteed for the leecher. Full anonymity for all participants, as described in the corresponding paper, is currently not supported. However, the application should be extendable to that anonymity level without great effort.
  7. The maximum supported file size is 2³² bytes (4 GB), as the file size is represented by an integer value (4 Bytes).
  8. Seeders can only distribute a file if they have fully downloaded it.
  9. Filenames must be unique and not longer than 24 Signs (including type ending).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages