There is no such thing as being negative SEO-proof, says contributor Joe Sinkwitz. All you can do is take steps to lessen the probability of becoming a victim. Here’s how to reduce attack vectors and protect your site.
In previous articles, we discussed what is and isn’t negative SEO and how to determine if you’ve actually been hit by negative SEO. With the basics out of the way, it’s now time to look at how you can keep your website safe from negative SEO (search engine optimization) campaigns.
To start, I have some bad news: There is no such thing as being hackproof.
And there is no such thing as being negative SEO-proof!
All you can reasonably do is take action to lessen the probability of becoming a victim by reducing attack vectors. This way, anyone seeking to do harm has to be more sophisticated and put forth a greater effort than they would against an average website.
In this installment of our negative SEO series, we will segment SEO into three areas: content, links and user signals and focus on protecting each, as well as your site overall, from being a victim of negative SEO.
Content and infrastructure
Hosting. What can your host do to keep you out of trouble? Quite a bit, actually. I debated including hosting as a user signal vector, but there’s another critical factor at play with this specific recommendation: reputation.
If you were to address 100 percent of all the issues in this article, yet you happen to be on a shared IP with a dozen other domains which are flagged for distributing malware or are blocked by email spam detection services or are subject to manual link actions from Google, you’re in for a bad time.
You will, at a minimum, want to ensure you have a dedicated IP for a domain you care about, and ideally, have the site on its own dedicated server.
Another advantage of not sharing a hosting server? It becomes one fewer attack vector anyone trying to execute negative SEO can employ. Their not being able to gain access to your hosting through a less security-minded domain on the same host makes you are a little safer.
CMS considerations. Not all content management systems (CMS) are equal. Some will automatically auto-spawn regular, archive and separate image pages when you attempt to create a single page. Some will automatically allow dofollow commenting on posts, which is an open invitation to spam.
Since the majority of the world’s websites run on WordPress, disabling comments, adding noindex to tag pages, author archive pages and category pages makes sense to me. Some will disagree, but my focus is on attempting to index and rank high-value pages only, a hurdle that tag, archive and category pages rarely clear.
With certain content management systems, it is important to ensure proper canonicalization is used to keep duplicate content from being indexed due to pagination and other query-string nonsense.
Robots.txt. I find robots.txt manipulation to be a double-edged sword. It’s not because it’s common to find a mistake which may result in an entire domain being deindexed, but also because of what happens when crawling rules are too strict.
It’s possible to rank a page which contains an undesirable phrase in the URL string given how Google treats a domain’s inherent authority and the keywords used in the URL. For example:
exampledomain.com/directory/undesirablekeywordphrase
Since the robot.txt rules prevent Google from actually crawling the page, Google has to trust that the page might be “good” (or exist at all) and then (usually) ranks it.
This tends to plague large media sites more than those of other industries. For the rest of us, one of the biggest risk reductions comes in the form of disallowing search pages from becoming crawled and indexed. Without knowing which CMS you use, here’s some generic advice for you to pick and choose from:
Disallow: /search/
Disallow: /*?s=
Disallow: /*?q=
Proper robots.txt setup isn’t just for keeping poor-quality pages out of the index. To fine-tune your crawl budgeting, it can also be important to tell search engines not to crawl preview pages — that ensures that crawl bots don’t waste time getting caught in a spider trap. To do that in WordPress is relatively easy, as these are the typical constructions for those pages:
Disallow: *&preview=
Disallow: *?p=
Disallow: *&p=
Scraping. No, I’m not going to suggest you take a stance on scraping content as a means to protect yourself; quite the opposite. You’ll need to be proactive in using a content protection service to ensure your images and writing are not used elsewhere on the web without your authorization.
While Google is better now at figuring what site is the original source, there are still issues with using authoritative domains as parasitic hosts.
An attacker will purposefully seek to continuously crawl a target domain by sniffing at their sitemap. The attacker will then post any new content you upload to a parasitic host within seconds of your pushing your content live.
Use a service such as Copyscape or Plagium to find these content thieves. If they are successful in stealing your content, you may need to contact the hosting company with a takedown request or issue a DMCA order.
Bad links
Outbound links via user-generated content (UGC). As stated in the CMS section above, I’m not a fan of open comments because they are abused. But what about other sources of UGC?
If you add a community/forum section on your site where members can interact, I recommend doing one of four things:
- Apply nofollow attributes on all external links.
- Force all external links to redirect through an internal page to strip outbound link equity.
- Noindex all threads.
- Moderate all external links.
Injected outbound links. This is a trickier issue to be proactive about because, by definition, you are really being reactive. However, you should frequently monitor your Google Search Console for outbound links found on your site that you did not put there.
Another method to check for injected outbound links on your site involves a consistent crawling script with multiple user agents (Google and not Google) to determine if any links or content exist that should not. This is essentially handled by reverse-engineering cloaking software to attempt to decloak injected issues.
To do this, set your crawler agent in either Chrome or Firefox to mimic Googlebot, either manually or using a user agent switching plug-in. If you were to view pages on your site as both Googlebot and as a normal user, you could visually determine whether certain links are only visible to Googlebot, effectively decloaking the injected links.
Inbound links. Inbound links from sites other than your own are far more likely to be your problem than your internal links. Why? Because you cannot control what other people do.
There are only a few things you can do to try and protect yourself from bad inbound links:
- Get a lot of links. Always work to get as many quality inbound links as possible and make quality links a high percentage of your overall link count. I know it sounds trite, but it’s true, if you are consistently focused on producing the best content, you’ll consistently earn good links. If you have only a few decent links, and someone practicing negative SEO toward you decides to point a few hundred thousand bad links at you, Google will almost certainly treat you unfavorably. The more uneconomical you can make that attack by increasing your quality links, the better.
- Watch your anchor text. One easy filter to trip is still the overoptimization of anchor text, so even if you’re attracting great links, be sure not to rely on a limited set of anchor text phrases. If you do see your anchor text starting to get too concentrated, look for other signs of a negative SEO attack. Pointing a lot of same-phrase anchors is one of the easier and cheaper ways to get a negative campaign started.
- Disavow. I’ve gone on record as saying I don’t like the disavow tool, as I feel it is indicative of a guilty-until-proven-innocent environment within Google. But since it does exist, you’ll want to proactively disavow based on your risk scoring solution. Remember, it is not just the overseas counterfeit porn and gambling links you’ll need to address, but also those that appear to be part of any nuanced attack.
User signals
There are only a few factors that come into play here, and sadly, there isn’t much you can do about one of them.
Metrics. Click-through rate (CTR), time on site and bounce metrics are consistently being folded in as more trusted signals by Google. Knowing your baseline stats in Google Search Console and Google Analytics is important here because it is easy to hire a botnet and a few thousand micro workers to click a result and bounce away a second later.
The micro workers can also file a suggestion that the domain they visited wasn’t a quality site. All you can really hope to do is notice strange trends and then attempt to compensate; if it is an obvious botnet, block it at the server or content delivery network (CDN) level. If it is a bunch of incentivized users, however, all you can really hope to do is handle the situation like you would your inbound links, by aiming to provide a satisfactory experience and acquiring traffic that you know will offset the poor metrics.
Speed. To prevent a potentially slow site being used against you, don’t host it on a shaky setup. If possible, consider using a CDN to protect yourself from DDoS (denial-of-service) attacks, and make sure your server environment is up to date to prevent zero-day issues such as user datagram protocol (UDP) amplification, Slowloris and other attacks.
Beyond that, you’ll want to investigate any way an individual could leech bandwidth from you by locking down inline linking of your images at the server level, removing any unused CMS plug-ins and establishing proper caching.
Malware. Malware as a user signal? Absolutely, though you could argue this is more of a content issue. Nothing contributes to a poor experience quite like getting auto-redirected to a scam site by some injected JavaScript. To prevent such situations, it is healthy to periodically run a malware scanner on your site’s server to seek out and remove malware.
The sooner you can find problems, the better. Thankfully, Google is pretty forgiving when addressing known malware issues, but they don’t catch them all and will see poor user data as normal usage when they miss it.
To close
It is impossible to cover all the dynamics here, but I believe what we’ve covered is a solid overview of how to check and protect your site if you think you’re a victim of negative SEO.
In next month’s article for this series, we’ll discuss what to do when you discover you’re in the middle of an ongoing negative SEO attack.
Source: SearchEngineLand