How to Audit your website’s Knowledge Graph

Knowledge Graph audits can fix a major content audit problem

Topic Detail Chart
Entities used on SEMRush

Most content audits out there are so granular in their analysis, that they actually miss the main actionable point. What should you write about to fill in the holes? So we have created a website Knowledge Graph Audit based around an Excel spreadsheet that we think does the job a whole lot better. It will be interesting to know if the SEO community agrees.

A big problem with most content audits is that they operate at a keyword level. Why look at how many times a word is used without considering its synonyms? Consider analyzing how the word “stallion” is used without seeing how “horse” is used independently. These two words potentially mean the same thing in the context of a site. Then the phrase “Horse shoe” might also be used, which is an entirely different entity! Existing SEO Tool content analysis tools do not recognize the distinction. We know that Google does (or at least tries), because Paul Haahr of Google tells us so. He used the example “New York” as being neither “New” nor “York” (Google WMC, 2019).

Fortunately, an entity first approach gets around much of the synonym issue.

A secondary problem is that many content analysis tools work on a page level. We wanted an audit tool that could work on a site level.

Want the TLDR Version? Download the Excel Template HERE. It is pre-populated with dummy data. You will need to then update the data fields with exports from your own projects.

(but you’ll miss out on the “why” if you choose not to read first)

KG Auditing using Entities and Excel

Whilst the InLInks “Graph” tab gives an excellent visual overview of a website’s knowledge graph, you can now create elaborated Knowledge Graph audits, using our free Excel template and by exporting entity data directly into the Excel Workbook. Knowledge Graph audits allow you to find gaps in your site’s content by looking at entities that you are under exploiting, perhaps in relation to your competition, or by comparing the entities you use in your text to the entities you use in your page titles. The tool lets you dive into the heart of modern, entity SEO, without focussing on keywords.

Search engines, like Google, are heavily increasing their reliance on semantics to derive meaning from text. This means a significant increase in the use of synonyms in search query interpretation and in indexing. Websites do not necessarily need the same keywords that the searcher has used to rank well – they need to instead be about the same THINGS as the user is searching for. In the 2020s, the information behind the words is often more important than the words themselves. Inlinks uses data from the Wikidata foundation as a dictionary of these ideas – or entities. It has built out one of the largest independent knowledge graphs on the planet. Entity SEO is the future.

Technology has taken to learning patterns in snippets of text at an enormous scale to try to establish what entities are semantically close to each other and in what context. To exploit this, SEOs moving forward can now analyse what entities are missing in a website and use this to enhance the website’s OWN Knowledge graph to better reflect its areas of expertise.

Ready to see an awe-inspiring Excel workbook and – more importantly – how to do an entity audit? Then let’s jump right in!

Knowledge Graph Vs Knowledge Panel Vs Semantic Web

A knowledge Graph is not a Knowledge Panel… although there are similarities. The “Knowledge Panel” is a visual representation of the data surrounding an entity, as displayed in the SERPs. A Knowledge Graph is a database of all the underlying topics (Entities) being used within the text of a page, site or corpus of text. It could be a book or a white paper, but for the knowledge graph audits, we are looking at the content within a website. A knowledge graph Audit is a website entity audit. This means that whilst you cannot easily change a Knowledge panel in Google unless you have claimed it, you CAN affect your own site’s knowledge graph and in doing so, you will affect changes in Google’s overall Knowledge graph of all topics and entities it sees on the Internet.

It makes sense that the more you demonstrate expertise in a narrow vertical of knowledge, the more the entities seen in your text-align to those that Google sees need to be covered in any content used to answer a searcher’s query. The Semantic web is something different – but is again related and important. Most of us would simply semantic web “schema”, but it is markup used to help machines understand the text more easily. Entities are important in this endeavour, as we can use schema (a form of semantic markup) to tell Google that this content is about THIS entity and use a Wikipedia URL (for example) to reference the entity definition. Wikipedia itself is the very definition of structure. Almost every page defines an entity.

Your Website is a mirror of your business

There is little room for artistic license in modern AI systems. Whilst you can train an AI to appear creative, Google’s AI – at least in its core search algorithms, is made up of pattern matching algorithms. which start with a Natural Language Processing of the content. The content is then split into constituent meaningful snippets… entities where possible… and then stitched back when the user queries based on semantic relationships – vectors between entities and words help Google to see which words are closely related to a concept and which are not. If you use hyperbole or metaphor in your web content, you may not rank well. But if you cover the right concepts, you can rank just fine.

How to Create your Website Knowledge Base Audit

1: You first need to create your project in InLinks. It is likely you have already done this if you have a free InLinks account. However, free users are limited to 20 pages. It is really likely that you need to analyze more of the site’s content if you want to see what entities have been missed or underplayed. The number of pages you will need depends on the size of the site, but in general, you do not need every URL variation of your content – but make sure all of your informative content is included.

2: Consider setting up a project for a competitor. You may not know which URLs they consider are most important, but by using the “Add Pages” Tab you can bring in the pages most visible in the SERPs. 

3: Export the knowledge graph of both projects into local files on your computer. The Knowledge Graph Export button is on the Graph tab of each project.

4: If you have not done it already you need to download the Excel Template. 

5: The Excel template has many sheets. You need to paste the data from the main site you are auditing into the “Data” sheet and the competitor into the “Comp. Data” sheet. I did a video to make sure…

YouTube player
How to insert Knowledge Graph exports into the template

This is all you need to to to create all the charts and graphs for the knowledge graph audit. although you would be wise to save a backup of this right now because I can bet you’ll start tinkering with the spreadsheet, which is full of pivot tables and charts based on that table. Here at InLinks, we use OneDrive to keep version control, but this is one of those times were not clicking “save” could cost you time!

Explanation of the raw data provided

The raw data provided by the InLinks graph export are made of 14 colums summarizing for each entity / topic:

  • the Topic name, corresponding to a Wikipedia identifier
  • Topic ID (internal to InLinks)
  • Topic Type (Person, Organization, Place, Event, Thing, …)
  • general Topic Category
  • related Topic Sector, especially useful for B2B
  • a status indicating if the Topic has been selected as a target topic
  • a Topic Search Volume (US market)
  • Topic frequency
  • Nb of occurrences of this topic in the analysed pages
  • Nb of potential internal links to this topic
  • Nb of InLinks-generated internal links
  • a Cannibalization factor indicating how much time the topic has been found in page titles
  • A status indicating whether or not the topic has been found in titles
  • A Search Engine Understanding factor (SEU)

Important reminder: each topic / entity acts as a cluster of keyword.

It means that synonyms keywords like “SEO”, “SEOs”, “searchability optimization”, or “Search optimization marketing” (among others) will be considered as referring to a single topic: “Search_Engine_Optimization”

Interpreting the Data

The data is now transformed into a series of hopefully actionable charts. The example here looks at the content on SEMRush.com and compares it to the content on Moz.com. You are welcome, SEMRush.

What is this site talking about at a macro level?

SEMRush’s content is mainly in the channel: “Marketing and Advertising”

The “Main Topics cats” tab has summarised all of the underlying entities into categories (or verticals). Not surprisingly, we see SEMRush entities point overwhelmingly to Marketing and Advertising and Technology. This makes sense. If you find a site that uses alliteration and metaphor, this is where the underlying meaning gets distorted out of the gate. Probably not something that most SEOs need telling, but occasionally the clients need to see it in a chart to see how their imaginative journey down metaphor mews meanders into SEO oblivion. (See? you CAN use metaphors… just don’t get carried away).

Most other charts below have drilled into the Marketing and advertising category. This cuts out background noise and focuses on the content most important to SEMRush’s business model.

What is the site talking about in detail?

Topical Detail: The biggest winner is SEO

The Most Used Topics tabs drill down into the content in more detail. Note that our system is using underscores for spaces. The radar chart helps to visually see the concepts which the site is REALLY about and what it is merely paying lip service to. We now see the em[pahsis that SEMRush content writers have placed on “Search Engine Optimisation (obviously) and “Google Search”. It also helps to see significant entities that may be underrepresented… perhaps more discussion on “blogging” or “Branding” might be helpful, although we could get more low-hanging fruit in this regard on the next chart.

What is the site almost missing?

Make sure no important topics are being missed

We thought this might be useful in some audits as a backstop for sites that pay lipservice to important things, but hardly talk about them at all. The radar chart looks at entities that are only mentioned (literally) once or twice. There no really important entities in this list for SEMRush, good job SEMRush, but you could probably look at talking more about how “URL redirection” or “language markup” are important for SEO. Other sites may have more glaring holes. We have all seen those marketing companies that can’t seem to call a spade a spade! Some of the are topics which… if you were Google… might appear to show gaps in the site’s understanding.

Are our important topics appearing in page Titles?

Most important Entities should also be in page titles

The “Entities used in page Titles” chart helps us make sure we are not going off point straight from the page title. Generally, entities appearing in titles are even more important top a site than ones appearing in the main text. At least – that’s what you would hope!

What are our best content opportunities by search volume?

The content Opp. the sheet looks really hopeful. Again displayed as a radar chart, we now have some quite clever data extraction. We have introduced search volume as a datapoint, then computed the chart by sifting through from topics mentioned in existing content, in the marketing category. (This is the main category for SEMRush), but not addressed with a specific page, That is to say, the entity does not appear in any page titles. I really like the information in this chart, as it is quantified and actionable. I think the action is also meaningful. For me, it screams “Write more about _____ or just change some page titles as the existing content may just need a stronger signpost for Google.”

Note: For this graph as for the others, the results will vary depending on the number of content pages taken into account in the graph (and therefore analyzed by InLinks). The higher the number of pages analyzed, the better the audit results will be.

Can I find better content opportunities by benchmarking a competitor?

What is your Competitor writing about that’s good?

The next tab shows the topics used by your competitor, but which have not been detected in your content.

By selecting the main topic categories you’re interested in (above is Marketing and Advertising, but it could be also Software or Technology), the graph helps you highlight gaps and quantifies their potential using search volumes.

So in the chart above we see a rich seam. Moz seems to be talking with purpose around the entity “Googke Webmaster Tools” but this is a blind spot in SEMRush’s content. “Google My Business” is another possible opportunity for more content.

Cannibalisation report

Topic Cannibalisation

Cannibalisation seems to be interpreted in different ways by different tools. I asked the community who does a cannibalisation report and got lots of answers. The actionable takeaway from these reports seems to vary quite a lot. Some reports look at duplicate content and rectify through canonicals. That’s great, but is duplicate content. Others look at page titles and suggest deleting content and redirecting. The problem here is that you WILL talk about important entities on more than one page! Obviously! Inlinks resolves the issue by internally linking entities within the text through to the most significant page on the topic. However, the reports work, knowing where and when you repeat concepts needs recording before you can fix it.

(We might go further into cannibalisation if asked by the community. We may have something rather special using links to fix cannibalisation.)

Find where internal links may reinforce content

InLinks are Internal links from other pages on the site. This is core to the InLinks tool, as it assesses where you have talked about concepts within the site and – where cornerstone content exists – injects a link over the semantically accurate anchor text.

Google’s Understanding of the Entities

Do the entities appear in Google NEU API?

InLinks runs regular (weekly) comparisons of Google’s understanding of Entities by industry using different examples every week in multiple countries. As of the summer of 2020, Google only identifies 20% or less of all the entities in their own public NLP API (Or Natural Language Understanding, as they sometimes prefer to describe it). We find Google to be good at identifying brands and proper nouns (one with capitalization) but still weak on concepts and ideas that are entities.

We can see in this graph that an essential entity such as Search Engine Optimization is only detected by Google in 18% of the cases, while the detection rate of this same entity reaches 67% for Moz! This is a significant difference and underlies the importance of Knowledge Graph audits.

Download the Template for Free

(Not got an InLInks account yet? Signup for free.)

What are your thoughts?

Thank you for reading all the way through! Does this approach to audits look different? Does it look sensible? Are there and particular charts that you think will fit into your own audit process? We would love your feedback in the comments.

Share this entry

Category

Replies

22 replies
  1. Jono
    Jono says:

    Oh, this looks really interesting. I think that the SEO community has been on board with the idea that we should be thinking about the relationship between pages and topics/entities, rather than pages and keywords for a while now – but still hard to visualise what that looks like, to define workflows, etc. This should help!

    Reply
    • Dixon
      Dixon says:

      Thanks, Jono. Delighted to get a positive comment from you! There seem to be so many ways to interpret the idea of “Entities” and “Topics” and Knowledge Graphs”. Having built a very big one, Inlinks hopes to create some meaningful clarity to the whole approach.

      Reply
  2. Shelley Walsh
    Shelley Walsh says:

    An interesting development at tool level. I’ve been working with topics and clusters a lot recently.

    The biggest part of my content audit process is the qualitative assessment where I manually review to pick up what a tool can’t do – bring knowledge, experience and intricacy into play. Supplementing with tools that can fill in the gaps around that helps.

    I still think defining the obvious questions the reader has and giving a clear and direct answer to the question in the text helps the most.

    I’m really interested in what you are developing here and will have a play with the tool and see how it works.

    Reply
    • Dixon
      Dixon says:

      Thank you Shelley. Absolutely agree that engaging “Brain 2.0” will always be a core SEO tool! 🙂 You can’t beat a layer of human sanity and interpretation overlaying the data a machine (albeit quite a clever one) churns out. That is particularly important at an entity level because we see lots of instances where an incongruous topic or entity appears in a SERPs – but that does not mean that you should start talking about it in your own content. Being able to see it, though, helps to understand how a machine is interpreting language (and extracting concepts, turning it into the emerging “machine language” (I don’t mean ones and zeros there, I mean a string of entities.)

      Reply
  3. Nino Knetemann
    Nino Knetemann says:

    Hello Dixon,

    nice read. I was thinking about a good way to dive more into entities and the audit of content. Your approach sounds like the next thing I try this week. Thank you for the Excel file.

    Reply
  4. Roman Berezhnoi
    Roman Berezhnoi says:

    I think it is a nice try and it’s not a tool for SEO purposes or content optimization. It seems really helpful to get ideas but it still far from Google’s reality. Because Google use several “systems” which a complex and consist of many subsytem. Well, thanks anyway.

    Reply
    • Dixon
      Dixon says:

      Yes, of course, Google uses hundreds or thousands of algorithms. However, the more those algorithms are successful, the more they reflect topic and entities used in real life. Knowledge Graphs reflect a human in the loop, so you could argue that Google is aspiring to understand this stuff. Anyway, thanks for reading.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *