LinkedIn latest platform to scrape user data for AI training
Posted: September 19, 2024
It looks like LinkedIn is the latest social media to train AI models on user data – and reportedly without updating its terms of service before the big change.
Will this be the modus operandi for social media going forward, using the data that they are already capturing on their platform in order to train AI models that will “enhance” the user experience?
While LinkedIn has now updated its privacy policy, 404Media reported that the new privacy setting and opt-out form were available on LinkedIn before releasing an updated privacy policy. This policy now states that data from the platform is being used to train AI models. Elaborating on its Q&A pages, LinkedIn states that AI is employed for features such as writing suggestions and post recommendations.
“As with most features on LinkedIn, when you engage with our platform we collect and use (or process) data about your use of the platform, including personal data,” the Q&A reads. “This could include your use of the generative AI (AI models used to create content) or other AI features, your posts and articles, how frequently you use LinkedIn, your language preference, and any feedback you may have provided to our teams. We use this data, consistent with our privacy policy, to improve or develop the LinkedIn services.”
Users have an opt-out toggle in their settings, disclosing that LinkedIn scrapes personal data to train “content creation AI models.” This excludes users in the EU, EEA, and Switzerland, whose data will not be used to train generative AI. UPDATE: The Information Commissioner’s Office (ICO) said on Friday 20th September that it had received confirmation from the professional networking platform that it had “suspended” AI model training based on data from UK users as well.
This change may only just be coming to light, but the nonprofit Open Rights Groups (ORG) has already called on the Information Commissioner’s Office (ICO), the UK’s independent regulator for data protection rights, to investigate LinkedIn and other social networks (like X) that train on user data by default.
Open Rights Group’s Legal and Policy Officer Mariano delli Santi said:
“LinkedIn is the latest social media company found to be processing our data without asking for consent. The opt-out model proves once again to be wholly inadequate to protect our rights: the public cannot be expected to monitor and chase every single online company that decides to use our data to train AI. Opt-in consent isn’t only legally mandated, but a common-sense requirement.”
As the ICO has so far not taken action, this means that people in the UK have weaker data protection rights than people in Europe, where data protection authorities have already taken a stand against Meta and X.
delli Santi added:
“The ICO’s failure to take action sends a message to social media companies that they can ignore the data protection rights of people in the UK. The ICO need to investigate and take urgent action against Meta, X, LinkedIn and other companies that think they are above the law.”
Ireland’s Data Protection Commission (DPC), the supervisory authority responsible for monitoring compliance with the General Data Protection Regulation (GDPR), told TechCrunch that LinkedIn informed it last week that clarifications to its global privacy policy would be issued on September 18th.
“LinkedIn advised us that the policy would include an opt-out setting for its members who did not want their data used for training content generating AI models,” a spokesperson for the DPC said. “This opt-out is not available to EU/EEA members as LinkedIn is not currently using EU/EEA member data to train or fine-tune these models.”
We are seeing an ever-growing demand for data to train generative AI, coming from users on platforms such as social media. Some companies have even moved to monetize this user-generated content—Tumblr owner Automattic, Photobucket, Reddit, and Stack Overflow are among the networks licensing data to AI model developers.
So what does this mean for users who do not want their content used for this purpose? Does using a platform automatically give companies the right to access the data that users upload? And how should companies be informing users of that fact?
Nicky Watson, Founder and Architect at Cassie, says:
“Compliance is good –- ethics are better. Do it for yourself (prioritizing your customers is proven to create stronger relationships, increased brand loyalty and higher sales). But on a moral level, do it for the millions of people placing their trust in you.”