Natural language processing is not a newcomer in the tech arena. Yet with the progress it has undergone in the past few years, this branch of artificial intelligence has taken on a whole new dimension, with countless computer programs that we use every day (such as automated translators) now incorporating it.
Google appropriated this technology by introducing natural language processing to its ranking algorithm and, more recently, by offering a dedicated API for businesses, Google NLP. What is the Google NLP API? How does it work? Let’s delve into the link between Google and NLP (Natural Language Processing) and see how this technology impacts indexing and web page positioning, and, as a consequence, search engine optimisation strategies.
What Is Natural Language Processing?
Before we go into more detail about Google’s NLP efforts, we first need to understand what Natural Language Processing consists in. This subfield of artificial intelligence aims to give a computer program the ability to understand and interpret language such as it is spoken and written by human beings, in all its nuances and complexity. Thus, an algorithm that uses NLP is capable of analysing sentences, grasping the meaning of the words in context and, ultimately, generating language in order to communicate with the user.
Natural language processing leans on computer science, math, and linguistics. In the field of artificial intelligence, it stands at the crossroads between machine learning and deep learning (two methods for autonomous learning), as you can see on the diagram below:
The aim is to achieve a better flow of communication between Man and machine, by helping the latter “speak” the same language as the former. This has two immediate and tangible effects: it makes using technology more straightforward, and it speeds up the automatisation of tedious tasks thanks to the programs’ ability to process enormous volumes of data at record-breaking speeds. Once structured, this information can be used efficiently.
In actual fact, natural language processing is already a prominent part of certain applications that individuals and businesses alike use every single day (and that’s before we even begin to consider search engines and the Google NLP API). Here are a few real-world examples:
- Automated translators, such as Google Translate, which reproduce a text in the desired language instantaneously.
- Voice assistants, software assistants integrated into smartphones and computers (Siri, Cortana) and connected speakers (Google Home, Amazon’s Alexa).
- Chatbots, programs that simulate human conversation and are capable of answering simple questions users have (they are all over the place on corporate websites and web shops).
- Automated spelling and grammar checkers, such as the one integrated in Microsoft Word, or the Antidote software.
Which means that if you’ve ever used an automated translator or clicked the button to spell-check your Word doc, then you used a tool that calls upon machine learning models and strives to understand natural language. Other, more specific applications are often used by professionals in various fields. These include automated speech to text (or text to speech) transcription, automated summary with rephrasing and analysis of the emotions conveyed by a piece of content (sentiment analysis), natural language models in the form of complete sentences, tools for analysing text content (such as Google NLP), etc.
More broadly, every program that relies on an understanding of natural language uses NLP technology. They all have one thing in common, which is to make human tasks easier… And we are therefore still very far from the fantasies sometimes associated with the progress of artificial intelligence and machines taking over the world!
The History of Natural Language Processing
Google’s work on NLP is little more than further progress into a field that has been around for a long time, and whose origin almost coincides with that of computers. The earliest experiments around Natural Language Processing date all the way back to the 1950s, with the development of instant translation tools. The challenges of the political context at the time (that of the Cold War) were especially conducive to this type of research. Back then, the notion of conversational programming is at the heart of the works undertaken by several scientists. This was also when Alan Turing introduced his now famous test in his article “Computing Machinery and Intelligence” (source).
The very first conversational robot in history, ELIZA, was created by Joseph Weizenbaum in a Northern-American laboratory between 1964 and 1966. Later, in the 1980s, following a series of programs capable of structuring information into data that computers could understand, the improvement in terms of processing capacity opened the way for unprecedented uses for NLP. This was made possible by the introduction of machine learning algorithms: Computers gained the ability to “learn” and to define their own rules.
Since the beginning of the 21st century, every (technological) green light came on to encourage the development of natural language processing, including a better understanding of deep learning, an exponential increase in computer processing power, a boom in the volume of data… And, with that, real-life applications accessible to the average user, such as the rise of first virtual assistants on smartphones by the end of 2011 on the iPhone 4S, and then connected speakers (Amazon in 2014, and Google in 2016).
How Does Natural Language Processing Work?
There is nothing new about the idea of machines understanding natural language. But what made it progress so rapidly was deep learning. This methodology relies on the use of artificial neuronal networks that “imitate” the human brain. But it just so happens that “natural” language is a complex thing, with countless subtleties that machines find difficult to interpret, including undertones or innuendo, humour, metaphors, euphemisms, etc. The ambition of NLP technology is therefore to capture all these nuances and to associate them with autonomous learning to convert language into raw data, to generate interaction with the users, and to create intelligent conversations.
To this end, the algorithm relies on recurrences, patterns, and correlations as it decomposes the human language, then inferring meaning. The constituting elements of speech are categorised and segmented, words and phrases are considered separately so that functions may be attributed to each one based on its morphology. This allows the program to establish a distinction between a noun phrase, a conjugated verb, various clauses, complements, subjects, genders, numbers, etc. Several methodologies are taken into account, such as a frequency analysis for every term, comparison between proportional keyword occurrences in various texts from a same corpus, contextual analysis, etc. Language is thus processed on multiple levels:
- Lexical analysis,
- Syntactical analysis,
- Semantic analysis,
- Pragmatic analysis.
This is exactly the same type of workings involved in how Google uses NLP, as with its BERT algorithm, for instance.
What Is Google NLP: A Search Engine With Integrated Natural Language Processing
As far as natural language processing is concerned, Google is something of a reference. What we’re going to look into here is how this technology is used to transform indexing processes and the ranking of web pages.
In order to understand how Google’s algorithm has been evolving over time, one must always consider user experience. To the Mountain View firm, the aim is always to ensure the satisfaction of those Internet users who rely on its search engine by offering them results that are as relevant as possible. This implies that they are continuously improving the quality of the pages highlighted in the SERP.
In this context, understanding the queries formulated by the users is practically the name of the game. And it’s no longer about grasping the overall meaning of the words, but about identifying every intent concealed behind the phrasing of the search to provide a better answer. To this end, it is imperative not only to be aware of the nuances a query may contain, but also to detect any terms that express an “emotion”.
Google’s work on NLP has resulted in the launch of the BERT algorithm in 2019. This was the most important update in five years for the firm (according to its own statement about it) and an undeniable leap forward in terms of how search engines operate. This is because BERT no longer processes queries word for word. Instead, it weaves connections between the terms being used in order to take the context into account and to grasp the “deep meaning” of the query. With this in mind, it looks at every single term, including operative words and prepositions, and assesses the “emotions” that transpire from the query, giving it a positive, negative, or neutral score.
At the time of its launch, the BERT algorithm (for Bidirectional Encoder Representations from Transformers) was the technological culmination of Google’s research on NLP. At its base, there are two main cornerstones:
- data (pre-trained models, which are sets of information to be analysed via natural language processing);
- and methodology (the way the algorithm uses these models).
In other words, with BERT, Google meant to “read” the user’s thoughts by understanding not only the query itself, but what isn’t explicitly said. This also helps grasp unprecedented queries, those being formulated for the very first time, which Google then estimated to account for about 15% of daily searches.
In 2021, Google’s work on NLP intensified, eventually leading to the rise of MUM (Multitask Unified Model). This algorithm update improved the search engine’s understanding of natural language even further which, consequently, also improved the relevance of the results offered to the users as answers. More specifically, MUM focuses on what Google calls “complex search queries”, which are characterised by their length and the occurrence of several prepositions. MUM aims to provide an immediate answer to such queries thanks to several advanced functionalities. For example, it extracts information from several content formats, displays resources extracted from results in 75 different languages (using machine translation) and can process several tasks simultaneously.
Google’s Integration of Natural Language Processing: What Real Difference Does It Make in Terms of SEO?
What one needs to understand is that Google chose to incorporate NLP into its search engine in order to provide better services for Internet users. Natural Language Processing technologies help algorithms better understand the users’ search queries, so that the answers can be made more relevant, and potentially more satisfactory.
Google finds this all the more important that the search engine’s needs are dictated by changes in user behaviour, especially with regard to voice search – itself made possible by NLP applications – being ever more widely adopted. An extensive survey carried out by Uberall in 2019 (which can be found here) showed that 21% of the respondents use voice search on a weekly basis. And it just so happens that oral search queries call upon natural language, making them much more complex to grasp for search engines than generic queries made up of a few keywords instead of full sentences.
Consequently, as Google’s work around NLP continues to intensify and the algorithm integrates more and more criteria connected with natural language, it is becoming crucial for webmasters to optimise their pages in light of these changes. Ever since the launch of BERT, search engine optimisation experts have been very vocal about adjustments you can make for more suitable content:
- writing for the users rather than for the crawler bots,
- learning to better understand your audience to provide answers that are more relevant to their needs,
- simplifying the language and using a more conversational tone,
- working on the page’s semantic field to cement the context and help the algorithm understand the issues pertaining to the topic in question.
More recently, Google launched a dedicated tool which integrates natural language and helps users extract insights from unstructured texts. This tool, casually called Google NLP, is an API that allows you to examine text content and to draw data that can be leveraged for the benefits of an SEO strategy from it. Google NLP gives you an idea of how the algorithm perceives the text and what it understands from analysing the keywords it contains, semantics, syntax, its overall emotional weight, and its “entities” (words or expressions representing objects that can be identified and subjected to content classification). Here is an example of the results provided by Google’s Natural Language Processing tool:
The principle is quite straightforward: Google NLP provides a means of comparing the results of the analysis to the pages that rank at the very top of the SERP. You can then apply the same recipes to your optimisation, by targeting a certain combination of keywords that match a particular intent from the users, for example. Because of the equal value between those things (in terms of SEO criteria), it is theoretically possible to get your content to rank similarly to the top-level pages on Google, as long as it complies with the search engine’s expectations as far as natural language is concerned.
Another crucial point to consider is the links, both internal and external. These take on a whole new dimension if Google’s work with NLP is to be believed. Now more than ever, SEO experts need to consider the context of a page when deciding on the placement of links and the relevance of their anchors. The goal of the links should be to improve the user’s experience, and nothing more. This, by the way, doesn’t have to take anything away from their SEO value. In other words, the deeper your understanding of Google and NLP, the better you can understand what the algorithm expects from the most relevant web pages, what it is going to put forward for its users. This doesn’t call into question the importance of more traditional ranking factors at all, but it tends to emphasise the place given to relevance and content quality, user experience, and the right kind of optimisation for various types of content (text, images, video, and audio files). It’s more than about time to embrace natural language!