Majority of Leading News Websites Bar AI Bots, While Right-Wing Media Embraces Them

As told by Kate Knibbs from WIRED, nearly 90% of top-tier news platforms such as The New York Times are actively blocking AI data collection bots from entities like OpenAI. Conversely, prominent right-leaning outlets such as NewsMax and Breitbart largely permit their activity.

As media conglomerates negotiate licensing agreements with AI giants like OpenAI in pursuit of training data, they are also erecting digital barriers. Recent data reveals that over 88% of the highest-ranked news sites in the US are now restricting web crawlers utilized by AI firms to gather data for chatbots and similar projects. However, one sector within the news industry stands out: right-wing media is significantly less inclined to block such bots compared to their liberal counterparts.

Research conducted in mid-January on 44 prominent news websites by the Ontario-based AI detection startup, Originality AI, indicates that nearly all of them impede AI web crawlers. This includes esteemed publications like The Washington Post, The Guardian, and various others. OpenAI’s GPTBot is the most frequently blocked crawler. However, among the top right-wing news outlets surveyed, including Fox News and Breitbart, none prohibit the most prominent AI web scrapers, which also include Google’s AI data collection bot.

While most right-wing sites declined to comment on their AI crawler strategy, speculation arises regarding their motivations. Could this disparity be a deliberate attempt to counter perceived political biases in AI models? Experts suggest that AI models inherently mirror the biases present in their training data. Consequently, by allowing AI access to right-leaning content, these outlets may seek to balance the scales.

Originality AI determined which sites block GPTbot and other AI scrapers by analyzing robots.txt files. The startup utilized Internet Archive data to identify when each website commenced blocking AI crawlers. Notably, right-leaning news sites showed no inclination to block GPTBot.

The discussion around AI biases has gained traction, particularly among conservative leaders in the US. Concerns have been raised about the perceived liberal biases in leading AI tools. However, opinions vary regarding the effectiveness of right-wing sites' decisions to allow AI scraping. While some believe it could influence AI outputs, others remain skeptical due to the extensive volume of data already collected from mainstream outlets.

OpenAI emphasizes its commitment to diversity in training data to ensure neutrality in its AI models. They assert that the inclusion or exclusion of data from a single sector or site has negligible impact on the model's learning and output.

The discrepancy in AI crawler blocking strategies may also reflect ideological differences on copyright issues. Mainstream media outlets, including The New York Times, have pursued legal action against OpenAI for copyright infringement, whereas right-wing media's stance remains less contentious.

While some right-leaning outlets cited oversight or technical reasons for permitting AI scraping, others are considering stricter measures to protect their intellectual property. Nonetheless, this contrast in approach underscores the complex intersection of media, technology, and ideology in the digital age.

Previous
Previous

Waymo to Launch Paid Robotaxi Service in Los Angeles

Next
Next

Meta's Nick Clegg Debunks AI Election Threat: A Sword and Shield for Democracy