Peccavi: 2010-06

Saturday 26 June 2010

Apple and location based tracking

Kim Cameron just posted an interesting blog post about the latest changes to Apple’s iTunes Terms of Use. He noticed a rather alarming change to the Privacy Policy which I’m ashamed to say I missed entirely when I blithely accepted the changes earlier this week. My bad. Anyway these terms of use are a no-opt out agreement that you have to accept if you are going to use iTunes and be able to either activate your shiny new iPhone or deploy any apps to it.

The kicker is this part of the revised document:

Apple and our partners and licensees may collect, use, and share precise location data, including the real-time geographic location of your Apple computer or device. This location data is collected anonymously in a form that does not personally identify you and is used by Apple and our partners and licensees to provide and improve location-based products and services. For example, we may share geographic location with application providers when you opt in to their location services.

Note the weasel terms at the end – they may share it with those people but they say nothing about whether they can share it with others, they don’t really clarify that very well. And also note that it is not just Apple – it is Apple and its partners and licensees.

Kim’s follow up on a Consumerist piece from June 21st indicates that the change was made a number of days before I saw the Terms of Use up date on my iPhone so the timing is a bit of a puzzle. Still the point remains – Apple is building a huge database of participants who have “consented” to being put in a global location tracking database. Kim’s right in pointing out that the timing of this change is a bit supect given the high profile attention being focussed on Google’s location tracking practices of late. A key reason for doing this must be that they hope to be able to defend their location tracking practices from legal challenges that they expect to happen now that the Google WiFi ID scanning has become such a serious issue.

However there is another timing issue that should be borne in mind. The reason Apple are now more interested in location tracking, and precise location tracking at that, seems pretty obvious to me – accurate location data makes the new iOS 4 iAd feature* a killer advertising platform. Minority Report’s directed advertising only skims the surface of the possibilities – linking individually directed advertising to locations and even more specifically to location patterns makes the sort of things we’ve seen before (Google adverts for BP when I’m reading about oil slicks for example) seem trivial. Imagine the power of an iAd that knows what your location patterns are, and the sort of pre-emptive advertising that could support – trivially we’re talking about inserting an advert for Burger King as your phone realises you are following a regular route to McDonalds. The problem here is that for this to work Apple has to give this advanced level of location data to a whole bunch of people you probably do not want watching your every move. No doubt Google hoped to gather similar data (and possibly do with their Google Latitude product and Android phones) but Apple have cut directly to the chase as far as their customers are concerned.

The “partners and licensees may collect, use, and share precise location…” phrase got me thinking – if “licensees” were to include your employer could they use the data to track your specific location at all times? What if a private investigator wanted to be a licensee? Could they just pull in anyone’s location data they wished? How about PETA, Greenpeace or the someone like the BNP in the UK? I’m pretty hopeful there will be some serious controls that should prevent those specific scenarios but honestly, how can you be sure?

Kim also points out that when someone figures out how to map this data to a larger uber databases maintained by one of the global WiFi identifier scanning operations then its really hard not to see this as major privacy threat. The problem comes back once again to the use of globally unique identifiers and how they can be used to make undesirable connections between data sets– however if the iAd motivation is behind this then Apple really do need a globally unique identifier. The value in this data for advertising is that it is globally unique and personally identifying – Apple’s claims that it is not are absolute rubbish – the globally unique device ID of someone’s phone is just as much personally identifying as a real fingerprint.

I certainly think this is an issue (and clearly Kim does) but we seem to be in a fairly small minority at the moment. Looking at the coverage of the Google WiFi scanning debacle it’s interesting (and depressing) to note that there is almost no attention being paid to the privacy problems of “just” scanning for device identifiers.

* For some limited interpretation of the term “feature” – not one that’s really useful for end users but great for advertisers, obviously.

Wednesday 23 June 2010

Progress

From iFixit – via ArsTechnica

The always cool folks at iFixit have provided a nice disassembly and exploded final view of Apple’s latest phone.

That’s the entire logic board of the new iPhone 4. Similar in style to the equally minute logic board on the iPad and not far off actual size (at least on my screen). It’s astonishing to think that embedded in this we have:

1 Ghz CPU
Memory and IO controllers
Power management and Systems Management circuitry
512Meg RAM
16-32GB of Solid State Flash Storage.
3-axis Accelerometer \ 3-axis Gyroscope – a full 6 axis IMU \ Compass.
Bluetooth \ 802.11a\b\n Radio + FM Receiver (and Transmitter but disabled thus far)
Tri Band GSM Radio
PentaBand WCDMA 3G Radio
GPS (12 Channel)
Proximity sensor
Ambient light sensor
Multi-touch screen controller
USB Controller
960x640 video controller
MicroSIM reader
Stereo Microphone \ Speaker Hi Fi Audio
5Megapixel Still \ 720p HD Video Camera
0.5Megapixel Video Camera

OK so there area few other peripheral bits that actually have some part to play in those roles but the level of integration is incredible in any case. That’s 6 different radios (18 if you count each GPS receiver channel separately), 4 environmental sensors and more processing, storage and graphics power than a high end PC from 2000\2001 all in a strip that doesn’t take up much more volume than a credit card.

Whatever way you look at it that’s an amazing level of progress. If the same rate of change continues we’ll have the same sort of capabilities available in fingernail sized devices by 2020.

Sunday 20 June 2010

Mobile Fingerprinting

Kim Cameron has been following through with some additional musings on the issues that have emerged from the Google WiFi Geolocation database debate and gives us a personal example from 2005 that shows how Bluetooth isn’t necessarily all that safe and how a simple behaviour (discoverability) can turn into a powerful tracking technology. It’s notable that even in 2005, when the idea of building a global database of identifiers was just a pipe dream, the problems were fairly clear as far as Kim was concerned.

I’d made a point in my earlier post that because these issues had been highlighted fairly early on in the commercial proliferation of Bluetooth that the manufacturers had pretty much sorted things out by adopting much safer defaults and implementing features like timeouts for discoverability. Newer devices are, by and large, better at keeping themselves quiet. Out of curiosity I just enabled Bluetooth on my iPhone and Laptop and scanned for nearby devices and found a total of 4 – my own two obviously showed up for each other but apparently someone called Danielle* has a phone nearby and there’s some other Bluetooth device that I could probably identify if I was to try to connect to it but I’m so not going there now. So even though there have been improvements in the field there are still some problems there. As an example of how this can be done intelligently – the iPhone’s Bluetooth is only “discoverable” when you have the Bluetooth menu open, it’s disabled once you close that menu.

There’s also the entire field of malicious interception of “secured” Bluetooth comms. It’s a sad fact that many devices use very poor pairing techniques and compromising the integrity of many supposedly secure Bluetooth connections isn’t particularly hard. From a casual users point of view that still serves a useful purpose – an entity like Google could never launch a global project to harvest Bluetooth ID’s using those techniques. That doesn’t stop some random attacker targeting individuals or small groups but at least it prevents large scale abuse, as I pointed out in my earlier post. As a healthy reminder of why my casual remark that the Bluetooth folks had made some good decisions shouldn’t be taken as a statement that Bluetooth is safe in anyway here’s a link to a presentation at this years Shmoocon about Bluetooth Keyboards which is really disturbing, especially (but not only) if you are still using XP.

*Name’s have been changed to protect those who devices are poorly configured. :)

Friday 18 June 2010

This ain't going to end well

Given my recent focus on WiFi scanning, geolocation and the potential for abuses of such data I was intrigued and horrified in equal measure when I read this Gizmodo article on the upcoming Nintendo 3DS

I'm not sure that it significantly adds anything new or increases the risks above and beyond those caused by the WiFi capability of current generation Nintendo DS\DSi handhelds but the "always on even when powered off" aspect seems a lot less like a good idea to me today than it would have a week or two ago and the silent integration with a WiFi GeoLocation system doesn't make me feel all warm and fuzzy I have to say.

Tuesday 8 June 2010

So how much does a MAC address tell about you?

Kim Cameron responded to my previous post making some very good point – you shouldn’t just dismiss the parts that can’t be solved by fixing the technology.

I’d discounted the payload snooping issue as a distraction because I’d believed (and still do) that it was almost certainly an unfortunate error. I’d then made the point that a legal barrier to a technical problem was insufficient to prevent the bad guys doing bad things but I used that as an excuse to ignore the problem – small scale abuses of this sort of thing are not good but systematic large scale abuses “benefit” from network scaling effects. You might not be able to prevent small scale\illegal abuse through legal means but just because you can’t does not mean that you can’t control large scale abuses this way. The benefits and dangers inherent in this data become exponentially worse as the scale of the database that contains it increases. Large scale means companies and companies react to regulation by being much more careful about what they do. If a technology that is already out there has major privacy issues the regulatory approach is the only way to keep a lid on the problem while the technologists argue about how to fix the bits. Even if we assume that the law was OK about companies creating Geo-location databases using WiFi SSID\MAC mapping, effective regulation would have made the additional mistake made by Google (assuming it was a mistake) much less likely.

Now the obvious question is should scanning for identifiers that are broadcast openly by all WiFi radio signals be acceptable and legal?

802.11 WiFi signals are pretty complex things - Wikipedia has a brief overview here for those who want to see the alphabet soup of standards involved. Despite the range of encoding\modulation schemes and the number of frequency bands and channels almost all 802.11 devices revert to a couple of basic communication modes. This makes it easy for devices to connect to each other, and it’s what makes public WiFi hotspots practical. However it also makes configuring a device to monitor WiFi traffic trivially easy – the hardware does all the heavy lifting and the standards don’t really do anything to stop it happening. An important feature of WiFi is that, even though the payload encryption standards can now be pretty robust, the data link layer is not protected from snooping. This means that the content (my Google searches, the video clip I’m streaming down from Youtube etc) can be pretty well kept away from prying eyes but, at what the Ethernet folks call layer 2, the logical structures called frames that carry your encrypted data transmit some control data in the open.

So even with WPA2’s thorough key management and AES encryption your WiFi traffic still contains quite a bit of chatter that isn’t hidden away. The really critical thing for me is that the layer 2 addresses, the Media Access Control (MAC) addresses, of the sender and receiver (generally your PC\Phone’s WiFi adaptor and your Access Point) for each frame are always visible. And remember that MAC addresses are globally unique identifiers by design. Individual WiFi networks are defined by another identifier, the Service Set Identifier or SSID – when you set up your home WiFi AP and call the network “MyWLAN” you are choosing an SSID. SSID’s are very important, you can’t connect to a wireless LAN without knowing the relevant SSID, but they are not secure even though they can be sort of hidden they are never protected and can always be seen by someone just watching your wireless traffic. Interestingly SSID’s are not globally unique – there’s generally no real issue so long as my chosen SSID doesn’t match that of another network that’s relatively close by.

So SSID’s are possibly visible but MAC addresses are definitely visible, and MAC addresses are unique. While driving along a street or sitting in a coffee shop, hotel lobby or conference room your WiFi adaptor will see dozens if not hundreds of WiFi packets all of which will contain globally unique MAC addresses. It is possible to hack some WiFi hardware to change the MAC address but that practice is rare. Your PC has a couple (one for the wired Ethernet adaptor which isn’t important here, and usually one for WiFi these days), your Wii\PS3\XBox-360 has one, so does your Nintendo DS, iPhone, PSP … you get the picture. Another feature of MAC addresses is that it is very easy to differentiate between the MAC address of a Linksys Access Point, an iPhone and a Nintendo DS – Network protocol analyzers have been doing that trick for decades.

So the systematic scanners out there (Google, Navizon, Skyhook and the rest) can drive around or recruit volunteers and gather location data and build databases of unique identifiers, device types, timestamps, signal strengths and possibly other data. The simplest (and most) benign use of that would be to pull out the ID’s of devices that are known to be fixed to one place (Access Points say) and use that for enabling Geo-location.

It’s not a big leap to also track the MAC addresses that are more mobile. Get enough data points over a couple of months or years and the database will certainly contain many repeat detections of mobile MAC addresses at many different locations, with a decent chance of being able to identify a home or work address to go with it. Kim Cameron describes the start of this cascade effect in his most recent post, mapping the attendees at a conference to home addresses even when they’ve never consented to any such tracking is not going to be hard if you’ve gone to the trouble of scanning every street in every city in the country. With a minor bit of further analysis the same techniques could be used to get a good idea of the travel or shopping habits of almost everyone sitting in an airport departure lounge or the home addresses of everyone participating in a Stop The War protest.

And remember that even though you can only effectively use WiFi to send and receive data over a range of a few 10’s to maybe a 100m you can detect and read WiFi signals easily from 100’s to 1000’s of metres away without any special equipment.

The plans to blanket London with “Free WiFi” start to sound quite disturbing when you think about those possibilities.

To answer my own title question – MAC addresses can tell far more about you than you think and keeping databases of where and when they’ve been seen can be extremely dangerous in terms of privacy.

What about Bluetooth?

Bluetooth is a slightly different animal. It’s also a short range radio standard for data communications but it was developed from the ground up to replace wires and the folks building the standard got a lot of stuff right. It doesn’t appear to be all that bad from a privacy leakage perspective – when implemented correctly nothing is sent in clear text (the entire frame is encoded, not just the payload) and the frequency hopping RF behaviour makes it much harder to casually snoop on specific conversations. Bluetooth devices have a Bluetooth Device ID that is very like a MAC address (48 bits), with a manufacturer ID that enables broad classification of devices if the ID can be discovered but most Bluetooth devices keep that hidden most of the time by defaulting to a “not visible” mode even when Bluetooth is enabled. When actively communicating (paired) all data is encrypted so the device ID’s are not visible to a third party. Almost all modern Bluetooth devices only allow themselves to remain openly visible in this way for a short period of time before they revert to a safer non broadcasting mode. The main weakness is that when devices are set to “visible” the unique identifiers and other data can be scanned remotely and used in just the same way as scanned WiFi MAC addresses. That’s not to say that Bluetooth doesn’t have its share of security problems but they made an attempt to get some of the fundamentals right. It does also show that there is a practical way to approach the wireless privacy challenge which is good to see.

Sunday 6 June 2010

Kim Cameron takes on Google’s StreetView

I’ve been following Kim Cameron’s increasingly critical analysis of Google’s StreetView WiFi mapping data privacy debacle with some interest of late.

Some background might be in order for those interested in reading where he’s been coming from – start here and work forward. He’s been quite vocal and directed in his criticism and I have been surprised that his focus has been almost entirely on Google rather than on the underlying technical root cause. My initial view on the issue was that it was a stupid over-reaction to something that everyone has been doing for years, and that at least Google were being open about having logged too much data. I’m still of the opinion that the targeting of Google specifically is off base here, although I think Kim is right that there is a fundamental problem here.

Kim is probably the pre-eminent proponent and defender of strong authentication and privacy on the net at the moment. His Laws of Identity should be mandatory reading for anyone working with user data in any sort of context but especially for anyone working with online systems. He’s a hugely influential thought leader for doing the right thing and as a key technical leader within Microsoft he’s doing more than almost anyone else to lay the groundwork for a move away from our current reliance on insecure, privacy leaking methods of authentication. Let’s just say that I’m a fan.

For obvious reasons he has spotted the huge privacy problems associated with the practice of gathering WiFi SSID and MAC addresses and using them to create large scale geo-location databases. There are serious privacy issues here and despite my initial cynicism about this perhaps it’s a good thing that there has been a huge furore over what Google were doing.

Note that there were two issues in play here – the intentional data (the SSID’s, MAC addresses and geo-location info) and the unintentional data (actual user payloads). I’m only going to talk about the intentionally harvested data right now because that is the much trickier problem – few people would argue that having Google (or anyone) logging actual WiFi traffic from their homes is OK.

The problem that I see with Kim’s general position on this and the focus on Google’s activities alone is that he’s not seeing the wood for the trees. The problem of companies or individuals harvesting this data is minor compared to the problem that enables it. The technical standards that we all use to connect wirelessly with the endless array of devices that we all now have in our homes, use at work and carry on our person every day are promiscuous communicators of identifiers that can be easily and extensively misused. Even if Google are prevented by law from doing it, if the standards aren’t changed then someone else will.

First some history is in order. Google aren’t the first to do this not by a long shot. Google are the first to admit that they have harvested more data from these signals than just the base identifiers but you can be certain that all the other players did too, and many are probably still doing it. Skyhook were the first to exploit the idea commercially as far as I can tell and they have been partnering with Apple (and Yahoo amongst others) since at least 2008 to provide the fruits of this data to iPhone users. The geo-location capability of iPhones that was available prior to the release of the 3G, and that is still used when GPS data is poor, uses that data. Navizon provide Microsoft Live with their data – conveniently described as crowd sourced – which has a similar dubious provenance. All three systems use a combination of WiFi SSID\MAC and Cellular Phone Base Station IDs to provide geo-location in the +-100m range. Cisco provide techniques for companies to leverage their WiFi infrastructure to do something similar in reverse – their WiFi management consoles allow administrators to track the physical location of individual devices (ie people) within large sites with a high level of accuracy – IIRC Cisco claim to be able to give location data down to the sub 5m range. This is a common surveillance technique and an excellent covert tracking mechanism that is certainly in common use. Wardriving tools (like Kismet that Google modified for their StreetView scanners, AirScanner, NetStumber..) have been around since WiFi first became practical. That the technology enabled these sort of uses is not a sudden revelation. None of this is to claim that any of this is OK mind you, just that it is a blatantly obvious side effect of the technical standard and it will be used like this as a result.

Kim very rightly points out that just because Skyhook and others did it before them does not absolve Google of responsibility if what they did was an invasion of privacy. It would help if he pointed out that all the major OS vendors are using exactly the same techniques though and they are all equally guilty of the same crimes here. That Google have patent applications based on novel [ab]uses of broadcast WiFi signals is no real surprise – I know Intel have had similar things in the pipeline in the past and I’d be shocked if all the other major companies had missed out on that trick.

Anyway the reason all of these companies have used this data is because the 802.11 (and 3GPP\WCDMA\EVO cellular) standards make no attempt to secure these things. In fact the current software\hardware stacks go so far as to actively discourage users from disabling the broadcast features. Kim’s employers flagship end user OS, Windows 7, goes so far as to warn you that you are taking a security risk if you set up an access point that does not broadcast its SSID.

Wireless technologies work by broadcasting data. WiFi uses frequencies that have an effective range of tens of meters in congested dense buildings and tens of miles in open air. That in itself is no excuse for unauthorised third parties to intercept that data but if the standard is implemented so poorly that a blind chicken cannot fail but to have the data presented to them then there is no use telling anyone not to make use of it, if it is problematic then the technology should not enable it in the first place. While laws can prevent the likes of Google and Skyhook from harvesting this in countries that care to be strict about such things that is no solution to this problem. Kim makes a sincere point about this that totally misses the point in my view – under the strict reading of some fairly outdated legislation anyone logging their neighbour’s WiFi SSID could be guilty of a criminal offence in some jurisdictions. That may or may not be true, I’m not a lawyer, but in any event the fundamental problem is that it is not possible for me to prevent you logging that data even if I wanted to without denying myself the benefits of using the technology I’ve paid for. And a law that says you shouldn’t is no way to protect me from you.

Kim’s hypothetical Child molester can stalk a child using a WiFi adaptor’s MAC address because the people who wrote the operating systems and defined the WiFi standards allow the device to leak that data over the air to systems that are untrusted. Google’s [alleged] misuse of the data is a minor issue compared to the failure of those who invented and ratified the standards.

The reasons why this is the case are not trivial. It’s not simply that the people involved didn’t know how to make things secure, or that they didn’t care. The reality is that the WiFi standard we have now is a trade off where those security aspects were not a priority. It’s probably about time that we made them one, and I’d be very happy to see the current crusade move on from focussing on just Google and going after the IEEE and the 802.11 standards body.

Search This Blog