My name is Andy, I own OnPage Rocks, am a bit of a data geek and love analyzing server logs – They hold some of the most useful information I have ever found out. Analyzing them will really help your on page, while one example I will give you does help with offsite signals – it’s mainly on page that I use the data to improve.
Log analysis is useful for both SEO and PPC practitioners, but more of this to come later.
Table of Contents
Table of Contents
What are Server Logs?
Every time somebody or more importantly something (i.e. a bot) hits your websites it requests the data from the server. This request is logged in a row in a file and these rows combined make a log of server requests – ok this is a simple explanation and there are far more in-depth details out there, but I am here to explain to you why you should do the analysis and not bore you with technical information.
99% of the time I am only really interested in what bots are doing on my site, tools like Google Analytics gives you a lot more information about what users are doing on your website, more so than what the logs will tell you. The only except to this rule is for fraud reason – Google doesn’t track PII data, but the logs allows you to get the IPs of the fraudsters (yes smart fraudsters mask their IP’s but not all fraudsters are smart).
Why do Server log analysis?
A very good question, one which I get asked all the times, and it’s usually followed by “I use screaming frog or deepcrawl, so what’s the point” more on them later, but what if I told you that while Google Search Console (webmaster tools to the people who have been doing this long enough) gives you some really good information – especially the new beta version, Google doesn’t always tell you all the data.
If you really want to know what Google is doing on your website – then your server logs are the only place to truly get this data.
I have a quote I use most of the time when speaking which is:
“You wouldn’t build a million dollar house on quick sand, so don’t build million dollar websites on poor foundations”.
And I really mean this, the amount of times I come across great looking websites, or websites with brilliant links, but that are poor on page. Its like your trying to make it harder for yourself to rank.
Getting the data
There are a few ways to get the data but I am going to cover the main two ways I come across.
If you work for a medium to large business and have an inhouse development team your will need them to give you this data.
This article gives you everything you need to ask a developer. The good news is that you can be specific and ask just for specific bots over a specific time frame and that you want the file in a .txt or .csv file.
The downside is firstly you have to venture into the development cave and speak with them. As well as asking for the data, you usually need to spend ages convincing them to give you the data, but once you get the data once getting it in future is usually a lot easier.
The other way to get the data is if your a small company / blog owner and have access to the cpanel. Firstly you need to tick this one box to keep historical logs.
Now you have done this, it’s so easy to download the file, no speaking with developers, just login and download it. While that part of the process is easier – the file you download isn’t readable by any of your normal programmes and you have to download everything, and if your site gets a lot of traffic – this could be a fairly large file.
I am a Apple fanboy, I am not afraid to admit it, I am one of them idiots who spent £160 on the Airpods, but my favourite tool for converting the data isn’t available for Mac, they refuse to write a version. Its called Web Log Explorer and is a great tool – they have a lot of cool features, but as I covered earlier I love data so I just use it mainly to extract the data into excel so I can run my own analysis.
It got to the point over Christmas that I was tired of keeping my windows machine for running this one task, I decided I would try and review all the other options which are available for a mac so I could leave Web Log Explorer and get rid of my windows laptop.
Unfortunately, I wasn’t able to find a decent equivalent unless I wanted to pay super expensive price every month and most of my clients have development teams that send me the logs in Excel, so the price was an issue. For now, I will have to just stick with Web Log Explorer.
It wouldn’t be fair for me to exclude this section and it doesn’t happen a lot, but the data in the logs might not be 100% accurate. Two ways you data could be compromised and what you can do about it:
Fake Google bot users – I have no idea why, but some wonderful people in this world thinks its cool to create bots to act like Google. The one thing they can’t change is the from IP address, so also look at the IP address where the user agent came from, Google always crawls from California with a range of IPs.
SEO tools – the other way to get additional spoof data in the server logs can easily be caused by yourself.
If you use a tool like Screaming Frog or Deepcrawl and you should be, when you get them to crawl your site as Google, they will use Google agent strings as they want to get the same results as Google. Again, just exclude the IPs from analysis you are doing.
Sometimes I don’t bother if I am purely looking for errors, who cares if it’s true Google or a Google impersonator that found the error, the issue is to fix the error, its only useful on some of the more in depth analysis I will cover in a later.
I know there is a lot of talk of Big Data in our industry, but with server logs you want the opposite.
As covered earlier, you are really focusing your time and effort in the small details here. A lot of the data in the logs you can ignore. If you have limited time and resources even just focusing on Google bots is a good start.
Difference to crawling the site
I mentioned this earlier, when I said its one of the most common questions around why bother analyzing the logs when you can crawl the website.
It’s a very valid point and there is around a 80% cross over in terms of what crawling your website and analyzing your logs with give you. However it’s the additional 20% which is unique to you and that’s is what I am about to show you with real examples of why it’s worth it.
I would still highly recommend crawling your site with a crawling tool and combine the data to get a complete analysis.
For example a programme like Screaming Frog (SF) or Deepcrawl (DC) usually starts at the home page and follows all the available links to crawl the entire site.
If a page doesn’t have any internal links, then they won’t be found by these bots, but could still be picked up in the server logs. There could be external links pointing to the page which Google is following or and I am still testing this so have no concrete info – but Googlebot has a memory of old pages it has crawled.
If you accidently removed the link today and crawled the site SF or DC wouldn’t crawl this section, but it seems like Google will still try and crawl this section – this is still in testing and I need to run a few more to confirm, but it’s interesting.
An area where SF or DC have an advantage especially on large websites, is Googlebot might not hit every page in the time frame you are looking at some you might miss the error, whereas SF or DC are likely to find these errors as its crawling the entire site.
There are other differences as well, which will be highlighted in the examples listed below.
Before we get into this section let me put a big * against all the graphs I am about to show.
When I say All – there are only 17 bots which I monitor and track and the data is for 12 sites that regularly sends me data. These range from a few small blogs which don’t get a lot of traffic to a few sites do get quite a bit of Organic traffic. In exchange for them giving me their data to use for trend analysis, they get a overview of their activity against the group set.
Which bots are crawling
As you can see from the graph this is all December data for the 12 sites, there is a little lule over the Christmas period, it’s nice to know Google even gives its bots a rest on Christmas day and boxing day – this report isn’t that useful, in the grand scheme of things – its more my monitoring – is there any huge spikes which I would need to dig into.
Which Search Engine Bots are Crawling
This is probably one of the key bits and if you haven’t looked in your logs before will surprise you. Most of the sites in the data set are in the UK, and Bing accounts for around 5% of their Organic traffic, but look at how much Bing hits the websites, somedays – of the 17 bots I monitor 80% is from Bing, which is crazy, it’s delivers very little SEO traffic, but crawls a lot of your site.
If you have Bing Webmaster tools installed, you can request Bing crawls your site in non peak times – however they don’t tend to respect this in my experience and they also don’t tend to respect instructions you put your site, like nofollow. In my opinion they are doing this so they can crawl as much as the site as possible and there bots probably aren’t as sophisticated as maybe Google’s who have been around a lot longer.
This data can be useful to know, if you know how frequently they are crawling it can help with making sure you have enough server capacity.
I do want to run an experiment where I block them completely in the Robots file, and currently have a test site live which has been now getting data from Bing for two months, in the next month or so I will be blocking Bing completely to see what happens over the following weeks and months, for obvious reasons no one who gave me access to their logs is willing to do this.
One key thing here is that in early December 2017 John Mueller said that while you wouldn’t get a notification of when your site was included in the new mobile index, you would be able to tell from your logs.
He said traditionally you would get a 80% off your Googlebot activity from the desktop version and 20% from the mobile bot, as soon as you had made the switch, the bot activity would change to 20% Desktop and 80% mobile.
What isn’t clear in the graph is that he was largely telling the truth, not all the sites are mobile friendly so it’s hard to say from here, but it’s clear the overall after the Xmas period that Mobile is more active on all the sites combined.
I won’t ever share specific website stats as I have agreed before I am even given the data, but what I can say is on the sites which are now believed to be in the mobile index, it wasn’t an overnight switch as John might have made out and it’s not quite 80/20 more like in the data I am analyzing a 70/30 once moved over and it’s usually around 2-3 weeks as the gap closes and then switches to be more mobile.
This is why I monitor it by day so I can see this level of detail.
Which Ads bots are crawling
As I mentioned earlier, this data is also useful for PPC practitioners as well as people who manage the SEO. Knowing what Google Ad bot is doing is just as important. The bots are different, they are still trying to evaluate your site to give you a quality score. So some of the things to look out for are just as important for Ads bots as they are for the Search Engine bots.
Not all the 12 sites run Adwords or Bings ads but a few of them do so the sample set is smaller. Again i graph it out very similar to the Search Engine bots and can notice that on certain days that large amount of changes are made in the account which Google wants to come and see check the changes.
Which other bots are crawling
I also monitor the main SEO tools for crawling, this is more out of interest than anything, but around November time in one of the SEO groups i was in someone mentioned that Mozbot was hitting sites more frequently. Overall on the graph it didn’t make much of a spike, but as my reports built in DataStudio I have filters and drilled down to realise, sites with low DA was getting hit the hardest, why I am not sure.
It could be a move from Moz to improve their tools and crawl more of the web, it could be because they realise small sites tend to have new people in SEO and that’s how they acquire most of their new customers.
I am not sure, but Moz bot was a lot more active in November on smaller DA sites.
Now into the analysis:
So you have got the data and done so top line analysis to see if your part of the new mobile index, however there are a lot more to be found in the logs.
This is very important, you want to know where Google and other search engines are crawling and just as importantly where they are not crawling.
If Google doesn’t crawl a page, it can add its to its index and therefore no matter what you do the page it won’t appear in the SERPs.
A few simple techniques here:
On around the 1st of every month, I get a list of every page Googlebot has crawled, I then download the sitemap and do a simple vlookup formula both ways, firstly to see if Googlebot has crawled every page in the sitemap, but then to see if Googlebot has crawled any pages which aren’t in the sitemap.
If Googlebot hasn’t crawled a section why not, is there no internal links, have developers accidently added no follow tags to the section.
If Googlebot has crawled pages which aren’t in the sitemap – why if the sitemap is automatically generated has something broken, if its manually created has someone forgot to add the page. Sometimes it can be a simple reason the page was active at the start of the month so got crawled, but then became inactive so was removed from the site and the sitemap. This isn’t that uncommon especially for ecommerce businesses, but your looking for pages which are still active and not in the sitemap.
If you have SF get the list of URLs crawled by Google and not in sitemap and dump them into the option to crawl from a list and get the response codes. Then I just look at the 200 response codes.
Here the key thing is without knowing this information you don’t know where to start.
If I can and it’s not always possible I do a third lookup – I ask the developers to produce me a list of every url they expect to be live, this is just so I’ve covered every option – Google might not have crawled a section and it might not be in the sitemap.
When is Google crawling and more importantly how frequently are they visiting your site. A example I usually give at conferences and it doesn’t work exactly like this but helps to explain it.
Let’s say Google crawls your site once a week on a Monday, if you write a brilliant piece of new content and publish this on Tuesday, it’s going to take 6 days before Google even finds the pieces let alone add it to its index. That’s 6 days wasted of potential no organic traffic.
Knowing this information means you could either publish the post on Monday, or if that’s not possible – go into Search Console and ask Google to come and crawl the URL.
Like I said while this is an example, Googlebot doesn’t operate in such strict conditions, but when in terms of time of day is important.
If you need to take down your website for maintenance even for just a few minutes or a hour or so, doing it when Googlebot isn’t hitting the site is quite important.
Simply how deep are they crawling, when looking at which pages they are crawling I also do a count of the number of times they are each specific page.
Usually and for most sites the most visited page will be the home page followed by the Robots.txt file. Then after that you wanna see how far they are crawling.
This is where having a clear sitemap giving instructions is important – especially to get Google to crawl where you need them to crawl.
Some of these errors you will see if you use a tool like SF or DC, but others might only be found by looking in your logs.
Firstly let me say you may come across error numbers you haven’t seen before. Like ‘508’ basically a redirect loop. There is a great wikipedia article which lists all the errors.
But you will also see a lot of the other erros two, 301’s 404 etc. A 301 redirect might not be a bad sign, after all it has its place.
I usually can find quite a few 404’s here from either old css code which a developer has removed the file, but not the line in the code of the site, links from third party sites which goes to dead pages.
Slow loading pages
We all know that slow loading pages is as bad for SEO as well as for UX but in the logs you can quickly identify large pages. Google recently confirmed that slow loading pages will suffer in rankings, time will tell if this has any impact – it could be very similar to the Mobilegeddon which turned out to be nothing.
No report is perfect in giving you how quickly Google downloads a page, and the logs are the same – it doesn’t give you a specific time but what it can do is help you quickly identify large pages on the site. Typically the larger the page the longer it’s going to take to load.
This isn’t really a big deal on s small site, you would probably pick up on issues very quickly as you can spend more time analyzing every page.
Where this comes in super useful is when you have a large site with lots of people that have the ability to add.
I won’t reveal the name of the company, but this example, clearly shows why using your logs for this data is super important.
The client was an ecommerce client, quite a large client in fact. Multiple people across the business could edit the content, upload items etc.
Doing the monthly check in the server logs and I filtered to see the largest pages by size and was just expecting to see the usual stuff, what surprised me was to see a page at the top that didn’t really do a lot for the business. It was a nice to have category page but didn’t drive that much revenue.
However it was now the largest page on the website (in terms of size) and by some way.
A little bit of digging and I found out that they had given the junior designer who was coming to the end of his placement, a task to try and improve the page by adding a site skin, potentially some category banners etc – I personally think they had nothing left for him to do and wanted to keep him busy, but anyway off he went and created this beautifully looking page.
I am not super designer / technical so this is the explanation I was given for the page being super large.
Instead of using one compressed image to make the site skin, he had used multiple uncompressed images to create this beautiful “image”. The problem was all these small images made the page bloated and large.
The Head of Design at the time, made a few small changes – turned all the images into one image and compressed it.
Once he had done this, to the user the page looked no different, but was drastically cut in size. To make matters worse for this page, it was a category page and was one of ten. All ten pages had the same issue so our largest 10 pages was 10 pages that wouldn’t revolutionize the business but was wasting Google resources.
Don’t waste crawl budget
Talking of wasting Google resources, while it might seem they have a infinite amount of cash reserves, they don’t have a infinite amount of server space – meaning that there is a crawl budget for each website. Every website has a different budget and its not a fixed budget.
As a general rule of thumb, the more popular a website, the larger the crawl budget, but even if you have a large budget and not using it all you don’t want to be wasting it.
Find your unicorns
Unlike crawling your website, which a competitor can do to identify issues, analyzing your log’s is unique to you (unless you start sharing your log data with your competitors), so start using this treasure trove of information to get a boost in the SERPs.
I am British and for the last couple of Olympics we have had a very successful cycling team and I take a lot of my SEO inspiration from them. They are not looking for some magic miracle which will get them 1st place, they are looking for small tiny gains to constantly improve.
None of the things I mentioned above on their own are going to make you jump to position 1 and keep it, however even just a few of these combined and you will start to see a difference.
Sometimes it can come down to marginal gains which makes all the difference, this is a lot easier to do than getting some decent backlinks and you have total control.
Do’s / Don’ts of Analyzing Your Server Logs
One of the main things I get asked is what are some do’s and don’ts when it comes to analyzing your logs.
Here are my top 3 for each.
- Check the logs frequently
- If time limited just focus on Google bot
- Do a count of the error codes
- Try and use all the information – small manageable chunks makes it so much easier to analyze and less frightening
- Don’t waste your time looking at user activity – better tools out there (unless your looking at fraud cases)
- Look once and forget
How Often Do You Need To Analyze Your Logs?
This is the million dollar question – how often should you be looking in your logs and it’s all going to come down to your priorities. If your a one man band doing all the tasks associated with running a website I wouldn’t expect you to check back daily in the logs it would be too much work, maybe just checking once a month at the start of every month would be sufficient and allow you to create an action list of the back of it.
However if your day to day job is purely SEO (or you’re looking to learn SEO) and looking after a large account, maybe it’s going to be daily analysis.
Usually for me I start a new client with daily analysis and get a lot of the quick wins spotted and addressed and over time reduce it. Once you have made the fix, its done – then it just comes about maintenance and checking nothing new has broken.
- 508 Errors these really can impact your reputation with Google and count towards a lot of crawl wastage
- All other errors – even if you don’t have time to fix all the other errors, at least knowing about them will help you create an action list.
- Mobile vs Desktop – this is important, if your now not part of the mobile index, I would be asking why and looking to see what you need to do to be included
- Frequency of crawls
- Depth of Crawl
Just think of server log analysis as a bit of excel data work, nothing more – most people assume you have to be a complete nerd and understand coding to get the data, it’s not that difficult.
Once you have got the data then do small bite sized analysis, it can be daunting looking at huge excel files.
Start off by just trying to fix one error at a time, unless you love data and huge excel files then just jump in freely and start analyzing.
Share in the comments below any wins you have got from doing Server log analysis.
The SEO Institute Newsletter
Get Email List Exclusives Such As Our Latest Updates From Our Expert Powered Blog, Q&As And SEO Tips Direct To Your Inbox.