Whether you are building a large application, or a small form for your running club, bot attacks are an annoying reality. If you’ve found your way to this page, it’s likely that your beloved sites and APIs are under some form of attack. I’ve written this guide to teach you some of the defensive moves I’ve learned over the last decade working as a software engineer for security companies (BlackBerry), companies with large volume bot detection issues (Facebook) and most recently a small startup with direct security focus (UnifyID).
Understand the Terrain / Vocabulary
Quickly square on some definitions that I’ll use below. If your use case doesn’t fit into these definitions, this post may not help you, however feel free to comment below as I’d like to include most variations in my guide.
- You — The reader of this document. I’ve written it so that it should be useful for software developers like myself, but made it simple enough that brave PM’s or non-engineering types should be able to understand ;)
- App — You’ve got an app, whether it is a Web-App, a Desktop App, a Mobile App or some combination.
- Data — Your app needs this to function, and it’s either too dynamic, user specific, private, or too large to bundle with your app
- API — This is the battlefield. Your App needs to get to your data, and so it must travel the Internet to get it. Your API is the gate that protects your data and ideally, only lets your App in.
- User — A legitimate user of your App and Data, whether authenticated (logged in) or not, through a public App
Forms to Perceive / Ways You Can Be Attacked
A little more vocabulary for a common understanding, here is a broad (and probably not fully complete) list of the ways that your API can be attacked over the internet:
- Targeted — An individual or group of individuals are focusing their attention on your API and business specifically, hallmark will likely be extra high or unusual web traffic from a specific IP or Region
- BotNet — Attackers often have access to BotNets, or a distributed group of computers from which they can attack from. Hallmark here will be an increase in API traffic from a broad variety of IP’s and Regions, and is harder to differentiate from genuine traffic
- DOS — Denial of Service. This attack is often from a BotNet, but if your routing is unsophisticated can be pulled off with a single bad actor. The goal here is to bring down your service and cause your business pain or make a point. It is accomplished by “filling” your API’s capacity with bogus or meaningless requests to the point that regular users requests take too long or fail.
- Exploit — Your API is built on some framework, and both the framework itself and your API on top of it are made with code that very likely has some bug that could allow an attacker more access than they should have, either as another user, or as a global or “root” user.
- Credential Stuffing — Another point of vulnerability is your users themselves, they reuse passwords and email logins, and those leak to the web. Credential Stuffing is a rapid attack against your API using a known list of leaked email / password combinations in hopes that they’ll find one of your users and then be able to gain their privileged access to your API.
- Scraping — The goal here is to steal your company data, data that they have access to through login (Trusted User) or through a public API, that they can’t get at quickly or at scale using your App. These hackers will setup a bot to rapidly make API calls to fetch this data and “crawl” your information for their own usage. Examples here would be AirBNB bookings and their market prices, or Facebook public posts and related friend graph information.
- Scanning — If an existing API, API Stack, or Plugin has a known vulnerability, or you have a public website that exposes a login page, your website might get swept up in a scanning attack, where the bots will automatically attempt to execute vulnerability and sometimes even Credential Stuffing attacks.
In addition to your API, your App itself is a vulnerability and a surface that can be attacked. The good news here is that Apple and Google do take some steps to make it hard for attackers to directly manipulate your App code, but the bad news is, it’s not impossible, and in the case of browser Web Apps the code is pretty easy to get to. You must feel comfortable with a user having access to ANY data that is present on their own device, as well as access to all signals into or out of it, and if you need to control the access to data it must reside behind your more defensible API.
Waging War / Choosing Your Tools
So, hopefully you’ve got a handle on who is attacking, how they are attacking, and what they are hoping to achieve, as well as some idea of your own product API and App surfaces. Lets go over some of the commercially or publicly available tools to defend against the bot assault as well as their strengths and limitations.
One of the most common tools for stopping certain bot attacks on the internet, it has been provided for free as service from Google, and does a pretty good job of stopping spam and bot-net type attacks. It works by gathering data about the users browser and IP address, as well as some of the users passive behavioral data and if any of that information is suspicious asking the user to click a familiar “I am not a robot” button:
Failing that test, or if your user happens to be stuck on a network with a bunch of suspected bad actors will send them to the older and more painful test:
Traditional CAPTCHA like reCAPTCHA suffer a number of specific weaknesses:
- Weak to DOS — Your server will have to validate the captcha token sent. This means that while you can protect some of your really “heavy-weight” API Calls by requiring a valid captcha token, your backend server can still be tasked with thousands of captcha checks per second.
- Only moderate protection against stuffing, spam and botnets — By being the “big dog” in the market, Google has a pretty advanced bot detection algorithm, but the dark side is it also has TONS of enemies, and lots of known exploits, including entire networks of humans working for pennies per click to “solve” CAPTCHA. Going with a lesser known CAPTCHA will mean that spam attacks, and botnets will have to be dedicated to attacking your server or that particular CAPTCHA type.
- Privacy weak — Google offers their service for “Free” but the price they extract is visibility into your servers web traffic. They use this to strength their partner advertiser network, as well as to strengthen their service. For privacy sensitive sites, this may be a deal breaker. There are options (read on)
- Little protection against dedicated assault — Unless you enforce the highest user friction option (Always stoplights) and make users enter them repeatedly for attacked surfaces, reCAPTCHA will allow users who have passed a CAPTCHA access again without having to re-do credentials. For a dedicated attack (non-botnet) this means that the attacker solves the captcha once manually on their machine, and they can spam your interface for hours from that computer.
A different approach, that is often combined with a CAPTCHA product, is network based detection. These are router front ends that sit in front of your API and block suspicious traffic, deflecting them to a page where a captcha is required.
The advantage here compared to some form of CAPTCHA alone, is that BotNet and SPAM attackers will often have unusual traffic signatures and will be blocked before they can DOS your API. CloudFlare and services like it take the brunt of the assault, and have been designed to be able to weather it. There are still some drawbacks:
- Cost — All of your API traffic now first hits a secondary cloud service first, and this can end up costing a lot of $$ to maintain.
- Not really usable for native mobile apps — While your mobile API could connect through cloudflare, if the user is on a suspect IP or network (often through no fault of their own) they’ll get rejected, and the only way to get them authorized is to open up a web browser window inside your app and have them solve a CAPTCHA (super high friction!)
- Unfriendly to TOR Users — TOR is an anonymous online browser that is preferred by a small but privacy focused group of users, Tor sends each network request through a different endpoint to help users (often in countries that have restrictive or facist internet blocking in place) access sites that otherwise would be blocked, and remain anonymous from their governments while doing so. CloudFlare and others block most connections from Tor networks, and solving the captcha only temporarily clears the latest endpoint, so your next request results in a blocked message (Again). Minor point, but important if this is part of your audience.
Forcing your users to login can also be a good way to stop bot attacks. While not every service can expect authentication of its users, those that can have a nice place to block the bots. There are a number of services out there that can make adding authentication to your API or App easy, but these authentication interfaces themselves are vulnerable to Credential Stuffing, and DOS attacks. That’s where services like Amazon Cognito and Google’s Firebase Authentication come into play. Both are able to integrate with your existing authentication setup, or host your user database, and because the login action is managed on their own servers, you get some network filtering, and DOS protections as a part of the package. Key limitations:
- No Public API — Many Apps can’t be successful if they force users to login before accessing services, or have some portion of their app that needs to function in a non-authenticated way.
- Trusted User Attacks— Even if your App can make users register and login doesn’t protect against trusted user attacks, if the attackers are able to sign up for an account, then they can sign up bot accounts and use those credentials to scrape or attack your API. These services alone won’t prevent bots from accessing your website through a hacked or created account, and would have to be combined with CAPTCHA or other service.
- Credential Stuffing — Traditional logins are vulnerable to password stuffing attacks which have become more prevalent, and similar to Google CAPTCHA, there are crawlers out there on the net looking specifically for these sorts of logins to try their attacks on, and have an ongoing cat / mouse battle with Amazon and Google getting around the DOS and stuffing protections.
For those sites that can require it (either because they have a trapped user base, like corporate users, or because the service is financial in nature so that users actually demand greater protection) login is a good first step. But because of the above stuffing vulnerability many of these sites find themselves needing more protection. Enter multi-factor authentication (MFA), this service sends an email or SMS to the users device whenever they attempt to login from an unfamiliar browser or App. Twilio sends SMS to the users phone (assuming you have their number) and SendGrid sends an email (often an easier factor to get, but also much easier to generate usable fakes). So adding MFA will prevent login stuffing attacks, as your real users won’t approve the MFA even if the hacker has their password in a database. And will raise the bar (only a little bit) for trusted user attacks, as they now have to integrate one of the many SMS / Email response mechanisms into their bot system for the login process. Additional limitations:
- Expensive — Sending SMS costs less than a penny using SMS services, however after enough logins and with enough traffic this can add up quickly, with thousands or tens of thousands of logins per day, the SMS bill can become a major part of your operations costs. Email is less expensive, but suffers the same problem at scale.
- More friction — Requiring MFA is a pretty steep drop off to user adoption, and adds at least a few seconds to every login flow, and thus is having trouble gaining traction with mobile apps or apps with non-captive audiences.
- Regional Limitations — Many countries and communities don’t have ready access to SMS, and there are some countries that charge a significant amount still for SMS messaging, making SMS MFA a non starter if your business is in these markets.
- Weak Botnet protection — While more available, and less expensive, email MFA is trivially easy to setup thousands of fake and usable addresses for coordinated attacks.
- Copy Pasta — The (often 6 digit) code sent to email must be typed into your website or app, which is kind of a pain for the users.
The company I’m currently working for has making authentication easier for the user as one of their main goals. And we’ve recently launched a couple of products with a goal of filling some of the gaps in this space, and providing tools that can be used alongside or in place of some of the options I’ve listed above. The first launch was HumanDetect, this is a “Passive” Bot Detection tool that you add to your iOS App. HumanDetect uses motion data collected from the devices sensors to determine that it is being held by a Human, and will generate a token (Passively) that can be sent along with any API call and then verified by UnifyID servers as legitimate. Preventing any sort of Stuffing, Scraping, or bot related attack without introducing any friction to your end users.
The service is currently free, and is simple to setup for iOS Applications with support for Android and Browser Apps in the works. Additionally, because it doesn’t rely on tracking user device or browser information, your users privacy remains intact. There are limitations to this new product that I should call out:
- Only protects mobile API at the present — Adding a requirement for HumanDetect Token to your API will prevent anything but your Mobile iOS App from being able to use your API. If your users access from a website or Android App, then you will have to have a “back door” authentication for those users which would utilize one of the mechanisms above, and leave you exposed to the relevant vulnerabilities. This still allows you to close off the iOS surface, while providing a better 0-friction experience for your mobile users, so a hybrid solution may be ideal.
- Dedicated Account Theft — Because bots are a key part of Credential Stuffing, if combined with authentication, HumanDetect provides free protection against this type of attack. However a Dedicated attack on a secure account (for instance a bank, or high value account) can still be attempted manually from your app at a slower rate, fortunately there is a solution from UnifyID for this as well
If you have a mobile iOS (or Android) App as your main customer entry point, then PushAuth can be added from our modular SDK alongside or separate from HumanDetect support. This adds easy to configure and secure Push Notification based multi-factor authentication to your API Security. Currently also a free service, PushAuth utilizes Apple / Android built in notification for sending messages at a much cheaper rate than SMS or email, and is much harder to fake, as the underlying notification security from Apple or Android is robust and included with every device sold. Also, with native operating system integration, accepting can be as easy as a button tap. Downside here is that PushAuth is currently only supported on iOS and Android based Apps.
Variations of Tactics
A robust bot defense will be one or more of the above solutions as needed based on your particular App and API. If you think I’ve missed or mischaracterized something or would like me to add a use case let me know in the comments below, otherwise, if you found this guide useful a clap and a follow will encourage further guides.