Defending Against Bot Attacks: Art of War

Whether you are building a large application, or a small form for your running club, bot attacks are an annoying reality. If you’ve found your way to this page, it’s likely that your beloved sites and APIs are under some form of attack. I’ve written this guide to teach you some of the defensive moves I’ve learned over the last decade working as a software engineer for security companies (BlackBerry), companies with large volume bot detection issues (Facebook) and most recently a small startup with direct security focus (UnifyID).

Understand the Terrain / Vocabulary

  • You — The reader of this document. I’ve written it so that it should be useful for software developers like myself, but made it simple enough that brave PM’s or non-engineering types should be able to understand ;)
  • App — You’ve got an app, whether it is a Web-App, a Desktop App, a Mobile App or some combination.
  • Data — Your app needs this to function, and it’s either too dynamic, user specific, private, or too large to bundle with your app
  • API — This is the battlefield. Your App needs to get to your data, and so it must travel the Internet to get it. Your API is the gate that protects your data and ideally, only lets your App in.
  • User — A legitimate user of your App and Data, whether authenticated (logged in) or not, through a public App

Forms to Perceive / Ways You Can Be Attacked

  • Targeted — An individual or group of individuals are focusing their attention on your API and business specifically, hallmark will likely be extra high or unusual web traffic from a specific IP or Region
  • BotNet — Attackers often have access to BotNets, or a distributed group of computers from which they can attack from. Hallmark here will be an increase in API traffic from a broad variety of IP’s and Regions, and is harder to differentiate from genuine traffic
  • DOS — Denial of Service. This attack is often from a BotNet, but if your routing is unsophisticated can be pulled off with a single bad actor. The goal here is to bring down your service and cause your business pain or make a point. It is accomplished by “filling” your API’s capacity with bogus or meaningless requests to the point that regular users requests take too long or fail.
  • Exploit — Your API is built on some framework, and both the framework itself and your API on top of it are made with code that very likely has some bug that could allow an attacker more access than they should have, either as another user, or as a global or “root” user.
  • Credential Stuffing — Another point of vulnerability is your users themselves, they reuse passwords and email logins, and those leak to the web. Credential Stuffing is a rapid attack against your API using a known list of leaked email / password combinations in hopes that they’ll find one of your users and then be able to gain their privileged access to your API.
  • Scraping — The goal here is to steal your company data, data that they have access to through login (Trusted User) or through a public API, that they can’t get at quickly or at scale using your App. These hackers will setup a bot to rapidly make API calls to fetch this data and “crawl” your information for their own usage. Examples here would be AirBNB bookings and their market prices, or Facebook public posts and related friend graph information.
  • Scanning — If an existing API, API Stack, or Plugin has a known vulnerability, or you have a public website that exposes a login page, your website might get swept up in a scanning attack, where the bots will automatically attempt to execute vulnerability and sometimes even Credential Stuffing attacks.

In addition to your API, your App itself is a vulnerability and a surface that can be attacked. The good news here is that Apple and Google do take some steps to make it hard for attackers to directly manipulate your App code, but the bad news is, it’s not impossible, and in the case of browser Web Apps the code is pretty easy to get to. You must feel comfortable with a user having access to ANY data that is present on their own device, as well as access to all signals into or out of it, and if you need to control the access to data it must reside behind your more defensible API.

Waging War / Choosing Your Tools

Scale for values is 0–5

Google reCAPTCHA

That’s just what a robot would say isn’t it?

Failing that test, or if your user happens to be stuck on a network with a bunch of suspected bad actors will send them to the older and more painful test:

I can’t pull over any further, I’m already pulled over

Traditional CAPTCHA like reCAPTCHA suffer a number of specific weaknesses:

  • Weak to DOS — Your server will have to validate the captcha token sent. This means that while you can protect some of your really “heavy-weight” API Calls by requiring a valid captcha token, your backend server can still be tasked with thousands of captcha checks per second.
  • Only moderate protection against stuffing, spam and botnets — By being the “big dog” in the market, Google has a pretty advanced bot detection algorithm, but the dark side is it also has TONS of enemies, and lots of known exploits, including entire networks of humans working for pennies per click to “solve” CAPTCHA. Going with a lesser known CAPTCHA will mean that spam attacks, and botnets will have to be dedicated to attacking your server or that particular CAPTCHA type.
  • Privacy weak — Google offers their service for “Free” but the price they extract is visibility into your servers web traffic. They use this to strength their partner advertiser network, as well as to strengthen their service. For privacy sensitive sites, this may be a deal breaker. There are options (read on)
  • Little protection against dedicated assault — Unless you enforce the highest user friction option (Always stoplights) and make users enter them repeatedly for attacked surfaces, reCAPTCHA will allow users who have passed a CAPTCHA access again without having to re-do credentials. For a dedicated attack (non-botnet) this means that the attacker solves the captcha once manually on their machine, and they can spam your interface for hours from that computer.

CloudFlare

Seen here, stopping robots from building more muscle

The advantage here compared to some form of CAPTCHA alone, is that BotNet and SPAM attackers will often have unusual traffic signatures and will be blocked before they can DOS your API. CloudFlare and services like it take the brunt of the assault, and have been designed to be able to weather it. There are still some drawbacks:

  • Cost — All of your API traffic now first hits a secondary cloud service first, and this can end up costing a lot of $$ to maintain.
  • Not really usable for native mobile apps — While your mobile API could connect through cloudflare, if the user is on a suspect IP or network (often through no fault of their own) they’ll get rejected, and the only way to get them authorized is to open up a web browser window inside your app and have them solve a CAPTCHA (super high friction!)
  • Unfriendly to TOR Users — TOR is an anonymous online browser that is preferred by a small but privacy focused group of users, Tor sends each network request through a different endpoint to help users (often in countries that have restrictive or facist internet blocking in place) access sites that otherwise would be blocked, and remain anonymous from their governments while doing so. CloudFlare and others block most connections from Tor networks, and solving the captcha only temporarily clears the latest endpoint, so your next request results in a blocked message (Again). Minor point, but important if this is part of your audience.

AWS Cognito / Firebase Auth

  • No Public API — Many Apps can’t be successful if they force users to login before accessing services, or have some portion of their app that needs to function in a non-authenticated way.
  • Trusted User Attacks— Even if your App can make users register and login doesn’t protect against trusted user attacks, if the attackers are able to sign up for an account, then they can sign up bot accounts and use those credentials to scrape or attack your API. These services alone won’t prevent bots from accessing your website through a hacked or created account, and would have to be combined with CAPTCHA or other service.
  • Credential Stuffing — Traditional logins are vulnerable to password stuffing attacks which have become more prevalent, and similar to Google CAPTCHA, there are crawlers out there on the net looking specifically for these sorts of logins to try their attacks on, and have an ongoing cat / mouse battle with Amazon and Google getting around the DOS and stuffing protections.

Twilio / SendGrid / SMS or Email

  • Expensive — Sending SMS costs less than a penny using SMS services, however after enough logins and with enough traffic this can add up quickly, with thousands or tens of thousands of logins per day, the SMS bill can become a major part of your operations costs. Email is less expensive, but suffers the same problem at scale.
  • More friction — Requiring MFA is a pretty steep drop off to user adoption, and adds at least a few seconds to every login flow, and thus is having trouble gaining traction with mobile apps or apps with non-captive audiences.
  • Regional Limitations — Many countries and communities don’t have ready access to SMS, and there are some countries that charge a significant amount still for SMS messaging, making SMS MFA a non starter if your business is in these markets.
  • Weak Botnet protection — While more available, and less expensive, email MFA is trivially easy to setup thousands of fake and usable addresses for coordinated attacks.
  • Copy Pasta — The (often 6 digit) code sent to email must be typed into your website or app, which is kind of a pain for the users.

UnifyID HumanDetect

App -> API <-> UnifyID HumanDetect Service

The service is currently free, and is simple to setup for iOS Applications with support for Android and Browser Apps in the works. Additionally, because it doesn’t rely on tracking user device or browser information, your users privacy remains intact. There are limitations to this new product that I should call out:

  • Only protects mobile API at the present — Adding a requirement for HumanDetect Token to your API will prevent anything but your Mobile iOS App from being able to use your API. If your users access from a website or Android App, then you will have to have a “back door” authentication for those users which would utilize one of the mechanisms above, and leave you exposed to the relevant vulnerabilities. This still allows you to close off the iOS surface, while providing a better 0-friction experience for your mobile users, so a hybrid solution may be ideal.
  • Dedicated Account Theft — Because bots are a key part of Credential Stuffing, if combined with authentication, HumanDetect provides free protection against this type of attack. However a Dedicated attack on a secure account (for instance a bank, or high value account) can still be attempted manually from your app at a slower rate, fortunately there is a solution from UnifyID for this as well

UnifyID PushAuth

Notification based MFA

Variations of Tactics

They’ll be back. But you’ll be ready

Kevin Lohman, Software Engineer, Father, Story Teller, and former US Navy Sailor (who never set foot on a ship)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store