Kushal Das

FOSS and life. Kushal Das talks here.

kushal76uaid62oup5774umh654scnu5dwzh4u2534qxhcbi4wbab3ad.onion

Targeted WebID for privacy in Solid

In my last post I talked about the privacy issues from static public WebID in Solid. In this post I am trying to explain a way to preserve privacy, I will later submit a proposal (after figuring out how to) to change/update the original SPECs as required.

Targeted WebID for each unique client

Instead of returning the same unique WebID, the OP can return targeted WebID based on the client asking for the information. This will remain the same for every unique client and user, and can also be computed in future. This way every service accessing a Solid Pod server, will see a different unique URL for WebID, and those can not be used to co-relate the information.

We will have to update the OP (IDP) so that either it itself can calculate (or ask a different service) for the unique WebID every time.

Below I modified the official example flow to show (in step 19 and 20) how this can be achieved.

Sequence diagram

This brings in the question of how the user will learn/see all the available/used WebIDs for themselves.

That can be done by marking one client as the primary viewer/editor for the user, you can think it like a wallet. This solid application will be able to get the original unique WebID, and using that in the user's pod the wallet can find all the issued WebIDs. This goes into the implementation details of the pod server. Maybe all targeted WebIDs (& related pods) will be stored in a different namespace, maybe not.

I will write more in the next post.

Solid Project, WebID and privacy

In my last post I mentioned about Solid Project, and while digging more into it I got more questions on privacy issues. Let us break it down from beginning:

What is Solid Project & WebID?

Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data.

If you dig into the the actual specification, you will find this paragraph about WebID.

In line with Linked Data principles, a WebID is a HTTP URI that, when dereferenced, resolves to a profile document that is structured data in an RDF 1.1 format. This profile document allows people to link with others to grant access to identity resources as they see fit. WebIDs underpin Solid and are used as a primary identifier for Users in this specification.

One person can have more than one WebID (say one for work, one for personal details, one from Government). And the services will use the WebID you provide, or provided by your digital wallet (some Solid application running somewhere) which in turn comes from a Government provided service. The WebIDs provided by an agency (private or government) can be verified based on the issuer.

Now this WebID is the unique thing in the Solid world, the core of the Linked Data. If one service can get the WebID for someone and identify the person, they (or any other service) can corelate the same WebID usage in all other services. You don’t need magical code, just find the unique WebID usage.

For government issued WebID, this becomes even easier, as the person has no choice of providing the ID. Instead whatever mechanism the agency use to identify, will provide the same WebID every time (after identifying to the IDP service). One similar usage is documented in flow diagram here.

In my mind this is a privacy nightmare. The WebID spec has section about security considerations, but nothing about privacy implications.

One way of dealing with this could be having a separate service providing random (but unique to each to application asking for the resource based on aud) pseudo WebIDs to the IDP, and IDP provides it back to the client (wallet). I will write a separate blog post with sequence diagram to explain it better. Maybe it will work, maybe not.

Using Python to access a Solid Pod

solid logo

From the project website:

Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for your data.

We can host these Pods in personal servers or at any provider. Everything is tied up based on the user identity, called WebID. It is an HTTP URI described as RDF document.

You can decide who/what can access your data. Applications/humans can use Solid authentication and identify to the Pod server to access the data using open protocols.

How to get a Pod?

The website lists current vendors who provide Pod services. If you want to play around locally, you can run community server on your local system. Or just create an account at solidcommunity.net, and use that to learn more.

Using Python with Solid

We already have a solid-file Python module. You can install it via pip in a virtual environment.

$ python3 -m venv .venv
$ source .venv/bin/activate
$ python3 -m pip install solid-file

For the rest of the example code, I am going to use my Pod at the solidcommunity.net, feel free to replace the username/password and URL values in the code as required.

USERNAME = "kushaldas"
PASSWORD = "******************************************"

IDP = 'https://solidcommunity.net'
POD_ENDPOINT = "https://kushaldas.solidcommunity.net/public/"

from solid.auth import Auth
from solid.solid_api import SolidAPI

auth = Auth()
api = SolidAPI(auth)
auth.login(IDP, USERNAME, PASSWORD)

Here we are importing the module, creating an Auth object and identify using username and password.

Then we will check if a folder exist or not (it does not exist yet), and create the folder in this case.

folder_url = f"{POD_ENDPOINT}/languages/"
if not api.item_exists(folder_url):
    print(api.create_folder(folder_url))

The output is <Response [201 Created]>.

Next, we create two text files.

data = io.BytesIO("I ❤️ 🦀".encode("utf-8"))
file_url = f"{folder_url}hello.txt"
print(api.put_file(file_url, data, 'text/plain'))
data = io.BytesIO(b"Already 10 years of SOPA blackout")
msg_url = f"{folder_url}message.txt"
print(api.put_file(msg_url, data, 'text/plain'))

We can then list all of the items under our subfolder.

folder_data = api.read_folder(folder_url)
files = "\n".join(list(map(lambda x: x.name, folder_data.files)))
print(f'Files in the folder: \n{files}')

We can then try to read one of the files we just now created.

resp = api.get(file_url)
print(f"**{resp.text}**")

Output:

Files in the folder: 
hello.txt
message.txt

Why am I looking at Solid?

Solid as the specification is evolving along with the community. One usecase for any government organization would be if the citizens can control the access to their own data, and people can authorize who gets to access their data. Solid can become answer to that question. The specification is loose enough to allow building things easily on top of it.

I would love to see Python as a major part of this ecosystem. The solid-file project maintainers are doing a great job. But, we need more related projects, including proper examples of various usecases. Maybe a server too.