Privacy regulators unite against ‘data scraping’ on social media
Posted: September 26, 2023
Twelve data protection and privacy regulators have signed a joint statement about “data scraping”: The automated collection of publicly-available personal data from social media platforms and the open web.
The privacy dangers of data scraping have been highlighted by cases involving Meta, which was fined €225 million last year after a scraping-related data breach, and Clearview AI, a data-scraping and biometric information firm that faces legal action worldwide.
This article explores the main data-scraping risks and mitigations identified in the regulators’ statement.
What is the data-scraping statement about?
The statement is aimed at social media companies and website providers. It outlines 12 data protection and privacy regulators’ concerns about data scraping.
The statement has three objectives:
- To explain the privacy risks of data scraping.
- To guide websites and platform providers on preventing data scraping.
- To inform individuals about data scraping.
The statement was signed by the following regulators, who are all members of the Global Privacy Assembly (GPA) International Enforcement Cooperation Working Group (IEWG):
- Argentina: Agency for Access to Public Information (AAIP)
- Australia: Office of the Australian Information Commissioner (OAIC)
- Canada: Office of the Privacy Commissioner of Canada (OPC)
- Colombia: Superintendencia de Industria y Comercio (SIC)
- Hong Kong: Office of the Privacy Commissioner for Personal Data (PCPD)
- Jersey: Jersey Office of the Information Commissioner (JOIC)
- Mexico: National Institute for Transparency, Access to Information and Personal Data Protection (INAI)
- Morocco: Commission Nationale de Contrôle de la Protection des Données à Caractère Personnel (CNDP)
- New Zealand: Office of the Privacy Commissioner (OPC)
- Norway: Datatilsynet
- Switzerland: Federal Data Protection and Information Commissioner (FDPIC)
- United Kingdom: Information Commissioner’s Office (ICO)
Privacy risks of data scraping
The regulators identify five privacy risks associated with data scraping.
1. Targeted cyberattacks
Data scrapers can collect personal data such as names, addresses, email addresses, and phone numbers, and use it for targeted cyberattacks, such as via phishing and social engineering.
Here are a few examples of how this type of activity could occur via data scraped from the web:
- A scammer calls an individual pretending to be an employee of the individual’s bank. The scammer confirms detailed information about the individual, which was scraped from the web, to persuade them that the call is genuine.
- A scammer texts an individual pretending to be their son. Using scraped data, the scammer provides sufficiently detailed information to persuade the individual that the text is genuine.
- A scammer pretends to be an individual’s travel agent, using scraped data about their travel plans. The scammer persuades the individual to visit a phishing site and enter their password.
2. Identity fraud
Sometimes, people reveal more sensitive information online than they should. Financial data and other sensitive information represent a goldmine for fraudsters, and certain data-scraping tools can be configured to search for specific data types.
The more personal information a fraudster has about a person, the more convincingly the fraudster can imitate that person,
A fraudster who scrapes sensitive data such as identity numbers, payment information, and “secret answers” could fraudulently set up financial accounts in another person’s name.
3. Monitoring, profiling, and surveilling individuals
Data scrapers can collect data about individuals’ online activities, such as the websites they visit, the products they purchase, and the people they communicate with.
This information could be used to monitor, profile, and surveil individuals without their knowledge or consent.
Of course, information about our online activity is already collected and used in this way via cookies and similar technologies. But privacy controls can—at least in theory—prevent such monitoring.
Scraping might enable companies to circumvent people’s privacy preferences in an age when increasingly strict regulations are intended to give people more control over their personal data.
4. Unauthorized political or intelligence-gathering purposes
Data scrapers can collect large amounts of information about individuals, including political views, religious beliefs, and personal relationships.
This data can then be used by governments for malicious purposes, such as suppressing dissent, targeting political opponents, or blackmail.
Here are a few examples of how this type of activity could occur via data scraped from the web:
- A government scrapes data about political activists abroad and uses this information to intimidate the activists.
- A political party scrapes data about people’s voting habits or political beliefs and uses the data for targeted advertising campaigns (note that such activity might not be illegal).
- An intelligence agency scrapes data about criminal suspects. Such an activity might be a legitimate counter-terrorism tactic, but certain intelligence agencies are not subject to appropriate democratic safeguards.
5. Unwanted direct marketing or spam
Data scrapers can collect email addresses and other contact information. This information can then be used to make unwanted phone calls or send spam emails.
Scraping social media sites for publicly available email addresses is a very common tactic among some marketing teams.
While a person might be happy for individuals to contact them via published contact details for certain purposes, they might not wish to receive unsolicited marketing emails as part of a mass email campaign.
This example demonstrates one of the issues with using personal data—even publicly available data—without a proper legal basis. People provide their personal data in one context and might reasonably expect it not to be used for unrelated purposes.
Steps to prevent data scraping
The regulators propose the following steps to prevent data scraping:
- Organizational measures: Designating a team or roles to implement data-scarping monitoring and controls.
- Rate-limiting: Restricting the number of visits that one user’s account can visit other users’ profiles.
- Monitoring: Implementing controls to monitor how each new account starts interacting with other users.
- IP logging: Logging IP addresses associated with suspicious activity.
- Bot detection: Using CAPTCHAs to detect suspicious login attempts.
- Legal action: Sending “cease and desist” letters and other legal action against confirmed scrapers.
- Breach notification: Notifying affected individuals and regulators about data scraping if required by law.
The regulators sent the letter to social media companies and said they expect a response within one month (24 September 2023).
However, the letter makes clear that the above steps can form part of the risk-based security program of any publicly accessible website, depending on the resources available to the provider and the relevant risk of data scraping.