What is Google’s approach to text comprehension?

Are we sure Google understands text?

We already know that Google comprehends text to some extent. Consider that for a moment. One of the most critical tasks for Google is to match what a user types into the search field with a relevant search result. User indications (such as click-through and bounce rates) will not be enough to assist Google in this endeavour. Furthermore, we know that you can rank for a phrase you don’t use in your content (though it’s still a good idea to find and include one or more specific keywords). So it’s evident that Google does something to read and evaluate your words in some way.

How Google understands text

To return to our original topic, how does Google interpret text? To be honest, we don’t have much information about this. Unfortunately, that data isn’t available for free. And, based on the search results, we also know that there is still a lot of work to be done. However, there are a few hints from which we can make inferences. When it comes to comprehending context, we know Google has made significant progress. We also know that the search engine tries to figure out how connected words and concepts are. What evidence do we have for this? On the one hand, by keeping a watch on any Google algorithm-related news. By looking at how the actual search results pages have evolved, on the other hand.

Word embeddings

Word embedding is an intriguing technology that Google has filed patents for and researched on. We’ll reserve the technicalities for a later article, but the aim is to figure out which terms are connected to each other. This is how it works: a computer software is given a set quantity of text to process. It then examines the text’s words to see which terms frequently appear together. The programme then converts each word into a series of integers. This enables the words to be displayed in a diagram as a point in space, similar to a scatter plot. This graphic depicts the various ways in which words are connected. It really depicts the distance between words, similar to a galaxy made entirely of words. For example, the term “keywords” is far more closely related to “copywriting” than “kitchen utensils.”

Surprisingly, this works for phrases, sentences, and paragraphs as well. The more data you feed the software, the more it will be able to classify and interpret words, as well as determine how they are used and what they imply. What’s more, Google maintains a database of everything on the internet. It’s feasible to build extremely reliable models that predict and assess the value of text and context using a dataset like this.

Related entities

It’s simply a short step from word embeddings to the idea of linked entities. Let’s take a peek at the search results to see what entities are linked. If you search for “types of pasta,” you’ll get a heading named “pasta variations” at the top of the SERP, along with a lot of rich results that feature a plethora of various sorts of pasta. These pasta variations are further divided into “ribbon pasta,” “tubular pasta,” and other pasta subtypes. And there are a plethora of comparable SERPs that represent the relationships between words and concepts.

Google now displays this entity-based rich result after entering [types of pasta].

The related entities index database is mentioned in the related entities patent that Google has submitted. This is a database where concepts or entities, such as pasta, are stored. Characteristics are also present in these things. Pasta, such as lasagna, is one example. It’s composed of dough as well. It’s also about eating. Entities may now be grouped and classed in a variety of ways based on the traits they possess. As a result, Google is able to better comprehend how words are connected and, as a result, context.

Google is heavily investing in NLP

The comprehension of words by machines is known as natural language processing. It’s one of the most difficult aspects of computer science, but it’s also where the most progress is being made. Proper language understanding is crucial in today’s environment, which is increasingly powered by AI-driven technologies. Google recognises this and devotes significant resources to the development of NLP models. BERT, a model that could interpret text arriving after and before content words, was one of the most important systems. As a result, the system has the entire context of a phrase to properly understand its meaning. BERT’s work is fantastic, but Google is doing much more. MUM is someone you should get to know.

MUM: Google’s upcoming language model

Google announced a new language model, MUM, during an event this year. It is said to be 1000 times more powerful than BERT, an older language model, according to Google. How? MUM, it appears, is capable of multitasking. This implies that this model can read material, comprehend its meaning, get a better understanding of the subject, supplement that knowledge with other media, gain insights from more than 75 languages, and convert everything into information that responds to complicated search queries. All of this is happening at the same moment.

(Image from Google’s blog) A graphic illustration of how Google MUM works.

Practical conclusions

So, how does Google understand text exactly? What we know leads us to two very important points:

1. Context is key

Google is likely to analyse and appraise context in some manner if it understands context in some way. The more closely your copy matches Google’s understanding of the context, the higher your copy’s chances of ranking well. As a result, thin text with a narrow scope will be at a disadvantage. You must thoroughly and thoroughly cover your themes. On a broader scale, covering related topics and providing a complete body of work on your website can bolster your authority on the subject you write about and specialise in.

2. Write for your reader

Simpler sentences that clearly represent links between concepts assist not just your readers, but also Google. Both people and robots have a harder time understanding complex, inconsistent, and poorly structured text. You may aid the search engine’s comprehension of your texts by concentrating on:

  • Readability: making your text as easy to read as possible without compromising your message.
  • Good structure: adding clear subheadings and using transition words.
  • Good content: adding clear explanations that show how what you’re saying relates to what’s already known about a topic.

The better you do, the easier it will be for your users and Google to comprehend your language and what it is trying to do. This also aids you in ranking with the appropriate sites when a user enters in a search query. Particularly because Google is essentially developing a model that replicates how people absorb language and information.

Google wants to be a reader

Finally, the point is this: Google is becoming more and more like a genuine reader. You may enhance your chances of ranking high in search results by producing substantial material that is well-structured, easy to read, and clearly integrated in the context of the issue at hand.

Need help with getting your business found online? Stridec is a top SEO agency in Singapore that can help you achieve your objectives with a structured and effective SEO programme that will get your more customers and sales. Contact us for a discussion now.