Ideally there’s not a whole lot of data that needs to be kept.
Legitimately all that needs to be stored is a few things:
- Location (GPS)
- SSIDs (Wifi APs only)
- Cell ID & MCC/MNC (Cell Towers only)
and things they MUST NOT STORE OR SHARE like:
- IPs of contributors for longer than a few days
- un-hashed BSSIDs (Wifi/BT)
- MAC addresses (Wifi/BT)
- IMEI/IMSIs (or other cellular identifiers derived from them)
- APs that don’t exist in a fixed location (Think mobile hotspot SSIDs) for longer than a fixed amount of time.
- BT devices
- Non-unique SSIDs or IDs that may indicate no user config took place and manufacturer did not differentiate device ID. (Things like “SETUP” with no unique number (SSIDs like"SETUP-be3fd34d" would be valid) or “[ISP]@HOME” or “[ISP]Wifi” which provide no meaningful discriminators)
No.
It is hard to have both; but not impossible. You can still be privacy friendly without sacrificing necessary functionality.
It will require that the “provider” of such a dataset constantly scrub, sanitize and remove data that would cause privacy hazards as quickly as reasonably possible however. That in and of itself is a technical challenge; though not impossible.