Proxy servers aren’t exactly a high-profile tool – and yet, they power much of AI infrastructure. A proxy server is another device with its own IP address that you use to access the web. In aggregate, they let you open many webpages through automated means without encountering CAPTCHAs or other roadblocks. Lacking proxies, companies wouldn’t be able to collect as much training data for large language models, and AI agents would stumble halfway through every third task.
However, all this power brings tremendous responsibility. Sourced carelessly, proxies turn people’s computers into unwitting botnets. Used maliciously, they can overwhelm websites, create fake social media profiles, or even help steal yours. As any powerful tool, they can either make or maim, which is why proper governance is so important.
Proxyway, a website covering the web data collection infrastructure, makes it their job to follow the proxy server market closely; their findings are presented in an annual and publicly available proxy server market report. This article, which draws from the report, examines the risks of choosing an unethical provider and offers advice on how to avoid doing so.
Proxy servers in times of AI
Proxies have been around for a long time. Their use goes back to early naughties, maybe even earlier, mostly as anonymity tools. In the last decade though, proxy servers have turned into a burgeoning industry. They form the backbones of companies that compare flight prices, perform market research, help businesses see where they stand in Google search, and more. The largest proxy server providers today make hundreds of millions in revenue, while the market as a whole is worth billions of dollars.
It’s fair to say that the industry was doing well even before AI. However, the billions invested into OpenAI, Anthropic, Perplexity, and other AI startups have proven to be a force multiplier. Language models need huge amounts of data for every training run; the web is the largest source of data; and proxies speed up the data collection process hundredfold. The demand has allowed one of the biggest proxy providers, Bright Data, to reach $300M in annualized recurring revenue, growing by 50% year over year.
The dark side of residential proxy networks
The most desirable type of proxy server is residential proxies. They’re valuable because websites aren’t keen on giving away their data, even if it’s public, so they implement measures like Cloudflare to limit automated access. Unlike proxies hosted in data centers, residential proxies are much less likely to get blocked – that’s because they appear as home computers connected to internet service providers like Comcast or Verizon.
Here’s the interesting part: residential proxies look like home computers because they are home computers. They come from users’ laptops, phones, and other connected devices. A proxy provider borrows a user’s IP address and a small amount of data so that its clients can open websites relevant to their business. That IP is a proxy, and the users’ devices are servers.
At this point, some readers may wonder whether they are involved without having knowingly agreed to participate. Ideally, the people sharing their connection should know about and benefit from the relationship. Unfortunately, that’s not always the case. Unscrupulous proxy server operators recruit devices by installing malware, repackaging pirated software, offering free VPNs, or even selling vulnerable smart devices like picture frames. In other words, they create botnets.
In fact, recent years have surfaced multiple large-scale botnets, some spanning tens of millions of devices. BADBOX affected millions of cheap Android TV boxes, and so did Aisuru. Most recently, authorities in the Netherlands disrupted the ASOCKS botnet which comprised over 17 million devices.
Many of these botnets circulate in the dark web, where they’re weaponized for malicious activities. For instance, Aisuru caused some of the largest website takedown attacks (also called distributed denial of service) the web has seen. However, they’re also often monetized as commercial proxy services, making it hard to distinguish them from legitimate businesses. In January 2026, Google shut down 10 Hong Kong-based proxy server brands, and the ASOCKS botnet had a storefront with the same name.
Malicious proxy networks violate the trust and property of unwilling people, which is despicable and potentially dangerous in its own right. Commercial entities that unknowingly buy from such vendors risk affecting their reputation and network security. And the botnet operators gamble with jail time, but at least they have the benefit of taking such a wager knowingly.
Doing it right
How, then, do you distinguish between a reputable proxy server business and a storefront of a botnet? It’s not always easy, but major market participants have taken serious steps to self-govern the procurement and use of their infrastructure.
Residential proxy acquisition is the first of the two steps determining legitimacy. The golden standard for sourcing such IPs ensures that the source knows about it, consents to it, and gets something in return. The closest to that come bandwidth sharing apps like Honeygain or TraffMonetizer; their sole purpose is exchanging money for the user’s traffic.
Another avenue involves SDKs – small pieces of code inserted into popular desktop or mobile applications. Developers usually treat it as an alternative monetization method to subscriptions or ads. SDKs, however, are much more morally ambiguous. There’s a big difference between hiding the SDK in the terms of service versus presenting a consent screen in clear terms; offering nothing or a disproportionate reward versus compensating fairly; putting the SDK into apps for kids versus approaching people who can make an informed decision.
Checking if the provider enforces proper residential proxy use is the second step. Reputable vendors will never give you full access to their networks due to the risks involved. First of all, they limit the available ports to those necessary for opening websites. This rules out malicious use cases like engaging in email spam. Second, they block risky targets by default, such as login pages, banks, or government agencies. And third, they actively monitor for potential abuse. The most proactive proxy providers even observe website health to prevent accidentally causing damage. A commercial proxy service is a business, not an anonymity tool.
Major services also implement know-your-customer procedures, requiring higher-risk customers to verify their identities and use cases before accessing the network. Finally, market participants have begun organizing into self-governing entities, such as the Ethical Web Data Collection Initiative. They create guidelines for automated access and help shape policies involving the broader web.
Proxyway has observed that some proxy server vendors have started posturing as ethical without actually providing any backing for their claims. It’s important to go beyond skin deep, even if that’s not always easy or obvious. Proxyway’s market report lists major proxy server providers. Together with performance benchmarks, the research piece describes their IP sourcing methods and policies to prevent abuse.
To conclude
There will always be bad seeds – such is the nature of proxy servers. Businesses and consumers have the power to influence how the market evolves by understanding and choosing legitimate businesses.
Get the TNW newsletter
Get the most important tech news in your inbox each week.
TNW newsroom and editorial staff were not involved in the creation of this content.