My phone has more friends than me. It talks to more peers (computers) than the number of human beings I talk on an average. In this age of smartphones and mobile apps for A-Z things, we are dependent on these technologies. However, at the same time, we don’t know much of what is going on in the computers equipped with powerful cameras, GPS device, microphone we are carrying all the time. All these apps are talking to their respective servers (or can we call them masters?), but, there is no easy way to track them.
These questions bothered me for a long time: I wanted to see the servers my phone is connecting to, and I want to block those connections as I wish. However, I never managed to work on this. A few weeks ago, I finally sat down to start working to build up a system by reusing already available open source projects and tools to create the system, which will allow me to track what my phone is doing. Maybe not in full details, but, at least shed some light on the network traffic from the phone.
I tried to create a wifi hotspot at home using a Raspberry Pi and then started
capturing all the packets from the device using standard tools (
later reading through the logs using Wireshark. This procedure meant that I
could only capture when I am connected to the network at home. What about when I
am not at home?
This time I took a bit different approach. I chose algo to create a VPN server. Using WireGuard, it became straightforward to connect my iPhone to the VPN. This process also allows capturing all the traffic from the phone very easily on the VPN server. A few days in the experiment, Kashmir started posting her experiment named Life Without the Tech Giants, where she started blocking all the services from 5 big technology companies. With her help, I contacted Dhruv Mehrotra, who is a technologist behind the story. After talking to him, I felt that I am going in the right direction. He already posted details on how they did the blocking, and you can try that at home :)
Looking at the data after 1 week
After capturing the data for the first week, I moved the captured pcap files into my computer. Wrote some Python code to put the data into a SQLite database, enabling me to query the data much faster.
Domain Name System (DNS) data
The Domain Name System (DNS) is a decentralized system which helps to translate the human memory safe domain names (like kushaldas.in) into Internet Protocol (IP) addresses (like 192.168.1.1 ). Computers talk to each other using these IP addresses, we, don’t have to worry to remember so many names. When the developers develop their applications for the phone, they generally use those domain names to specify where the app should connect.
If I plot all the different domains (including any subdomain) which got queried at least 10 times in a week, we see the following graph.
The first thing to notice is how the phone is trying to find servers from Apple, which makes sense as this is an iPhone. I use the mobile Twitter app a lot, so we also see many queries related to Twitter. Lookout is a special mention there, it was suggested to me by my friends who understand these technologies and security better than me. The 3rd position is taken by Google, though sometimes I watch Youtube videos, but, the phone queried for many other Google domains.
There are also many queries to Akamai CDN service, and I could not find any easy way to identify those hosts, the same with Amazon AWS related hosts. If you know any better way, please drop me a note.
You can see a lot of data analytics related companies were also queried.
dev.appboy.com is a major one, and thankfully algo already blocked that domain
in the DNS level. I don’t know which app is trying to connect to which all
servers, I found about a few of the apps in my phone by searching about the
client list of the above-mentioned analytics companies. Next, in coming months,
I will start blocking those hosts/domains one by one and see which all apps stop
Looking at data flow
The number of DNS queries is an easy start, but, next I wanted to learn more about the actual servers my phone is talking to. The paranoid part inside of me was pushing for discovering these servers.
If we put all of the major companies the phone is talking to, we get the following graph.
Apple is leading the chart by taking 44% of all the connections, and the number is 495225 times. Twitter is in the second place, and Edgecastcdn is in the third. My phone talked to Google servers 67344 number of times, which is like 7 times less than the number of times Apple itself.
In the next graph, I removed the big players (including Google and Amazon).
Then, I can see that analytics companies like
have 31% of the connections, which is a lot. Most probably I will start with
blocking these two first. The 3 other CDN companies, Akamai, Cloudfront, and
Cloudflare has 8%, 7%, and 6% respectively. Do I know what all things are these
companies tracking? Nope, and that is scary enough that one of my friend
commented “It makes me think about throwing my phone in the garbage.”
What about encrypted vs unencrypted traffic? What all protocols are being used? I tried to find the answer for the first question, and the answer looks like the following graph. Maybe the number will come down if I try to refine the query and add other parameters, that is a future task.
As I said earlier, I am working on creating a set of tools, which then can be deployed on the VPN server, that will provide a user-friendly way to monitor, and block/unblock traffic from their phone. The major part of the work is to make sure that the whole thing is easy to deploy, and can be used by someone with less technical knowledge.
How can you help?
The biggest thing we need is the knowledge of “How to analyze the data we are capturing?”. It is one thing to make reports for personal user, but, trying to help others is an entirely different game altogether. We will, of course, need all sorts of contributions to the project. Before anything else, we will have to join the random code we have, into a proper project structure. Keep following this blog for more updates and details about the project.
Note to self
Do not try to read data after midnight, or else I will again think a local address as some random dynamic address in Bangkok and freak out (thank you reverse-dns).