Automated Content Generation for SEO: GPT-3 Possibilities & Pitfalls

Since the arrival of GPT-3, content generators have multiplied the use cases for SEO. It seems a bi-monthly update to review the new progress in the field of language models is in order.

First of all, at the end of 2021, the very large language models club grew significantly.

Each country has tried to showcase its technologies and make them accessible through research papers and public or private demonstrations.

Here are the main competitors in the race:

  • US: OpenAI – Turing NLG.
  • China: Wu Dao 2.0 – PanGu-Alpha.
  • South Korea: HyperCLOVA.
  • Israel: A121 (Jurassic-1).
  • Europe: Aleph Alpha.
  • Open Source: EleutherAI.

Each model has its strengths and weaknesses.

To test them, many SEO software editors or SEO agencies are now trialing these models.

How to Choose a GPT-3 Model?

You may think that the more parameters the model has, the better it would be (Editor’s note: a parameter corresponds to a concept learned by the AI).

But you would be wrong.

The number one criteria is absolutely not the number of parameters, because you can obtain great results with lighter models.

Rather, it is the data on which the model was trained.

In fact, to be effective, a model must be able to understand a large number of disparate domains.

The first thing to do is to find out how the model was trained. For GPT-3, the following diagram helps:

Screenshot from GPT-3, October 2021

We can see that GPT-3 was mainly trained with data from:

  • Webarchive between 2016 and 2019.
  • WebText, which corresponds to data retrievals on the web.
  • Wikipedia.
  • Books in English (Books1)
  • Books in other languages (Books2).

Now, if we look at how the open-source models are trained, we see that the sources are quite different.

Screenshot from Gpt-3, October 2021

Everything is based on the project The Pile, which is a data set of 825 GB of diversified English texts that are free and accessible to the public.

With The Pile, we find very varied data such as books, GitHub repositories, webpages, discussion journals, articles in medicine, physics, mathematics, computer science, and philosophy.

In general, it will be important to test the language model in your language and especially on your website’s specific vocabulary.

Before we look at specific SEO use cases, let’s look at the pitfalls.

GPT-3 Content Generation Pitfalls for SEO

To generate qualitative texts that interest your users, it is important to know the pitfalls to avoid.

First of all, whatever model you choose, you must provide it with quality examples as input so that it can imitate them and, above all, respect a specific type of text.

If you ask a language model to generate content on “New York plumbers,” the model will head down various and often unsuitable paths:

  • Should it create a made-up directory?
  • Should it create content about a New York plumber?
  • Should it create a dialogue between plumbers in Paris?
  • Maybe a poem about plumbing in New York?

In short, the model will be lost.

Second, language models do not handle duplicate content at all.

Therefore, whatever text you generate, you will have to use a third-party tool to check that the model has not duplicated something it has learned – and more particularly, that the text does not already exist and that it is unique.

There are many tools available to confirm whether your content is unique. If it is not, simply regenerate the content.

In addition, content generation templates do not optimize text for search at all.

Again, they are trained on a wide variety of sources so you’ll have to guide them with all the semantic tools that exist on the market.

You can also ask them to emphasize keywords, and to explain your concepts in more detail.

Finally, the model can invent data. Indeed, models have a creativity setting.

If the model is set to allow high creativity, sometimes it can invent characteristics for an object, for example, which can generate inconsistencies in your texts.

Content Generation Use Cases for SEO

For the past eight months, I have been testing many different content generation use cases.

With free templates like GPT-Neo and GPT-J, we have had very good results on the following use cases:

  • Extraction of concepts (extraction of places, people, etc.).
  • Extraction of questions from a text.
  • Answers to questions based on a text.
  • Generation of short texts in English only in order to create titles or meta descriptions.

On the other hand, with a paid model like GPT-3 with accessible pricing, we have not only managed to generate long and specific texts but also to create sophisticated textual summaries.

Screenshot from GPT-3, October 2021

The image is not easy to read but shows the input examples which are given in bold, then the text generated by the model which is in italic.

Here, the goal is to create an incentive text to make people want to buy a product.

GPT-3 perfectly understands the required tone and length of the text and includes the words to be used. The text is generated in less than 10 seconds!

Here is the generated text:

Give in to your childhood dreams. Do you still remember your first video game? Thanks to technological progress, you can live them all over again. You can play with another player on the road via the Nintendo DS, PlayStation, or Xbox consoles!

With the Nintendo Switch, try out fun games like Mario Kart 8 Deluxe – rise up against your rivals and enjoy magical adventures with Mario, Zelda, and other characters. Enjoy an incredible range of AAA games; whether you prefer fighting games or adventure games or multiplayer games, PS4 offers PS Plus memberships to match your needs!

Xbox also features unlimited access to entertainment apps like Netflix and Youtube for little money. Whether you choose PS4, xbox or the new Nintendo Switch, it will be easy to connect with family or friends via applications such as Skype for a great time with your friends.

Make sure not to miss out on this opportunity!

Of course, proofreading is still required to improve the text, but this technology saves a lot of time and allows humans to focus on high value-added problems.

Let’s move on to two revolutions in content generation.

Two Revolutions for SEO

Before, GPT-3 might not have been able to generate specific content because it was not trained to do so, as we saw earlier.

Now, you have the ability to re-train the model.

Just give it between 20 and 100 examples with your instructions as input and the ideal completion you want it to produce as output:

{“prompt”:”CTA:nnsubject: smartphonenn words: smartphone, Samsung, mobile phones, Xiaomi, Samsung, Apple, Huawei, Sony, iPhone, intelligent mobile phones, Androidnn “,”completion”:”Looking for a new smartphone? You’ve shortlisted a Samsung Galaxy or a Huawei model, but you haven’t made a decision yet? Choose a smartphone in our product range and enjoy the best of the new mobile phones at an unbeatable price! You’re guaranteed to find a smartphone at the best price, so don’t miss out! It’s never easy to find the best option among a multitude of smartphones: Xiaomi, Samsung, Apple, Huawei, Sony, or even Honor. Smartphones – intelligent mobile phones – aren’t just synonymous with Apple’s iPhone, but include many models. Smartphones using the Android operating system are the most common.”}

The operation is time-consuming but then you have an optimized template that fits your use case, especially if you need to use a specific vocabulary for a particular niche, industry, or theme when optimizing your SEO.

This takes only two lines of Python code. The most time-consuming part is, of course, creating this example file.

Screenshot from GPT-3, October 2021

Finally, let’s move on to the last topic I was particularly excited about this month: code generation!

In fact, a new technology has been released where we give instructions and the new OpenAI Codex engine is able to generate Python code to solve our problems.

Let’s start by pointing out that these are simple problems: it cannot replace developers because we would need to provide the AI with all the code set up as well as all the technical constraints.

On the other hand, from a pedagogical point of view and especially in a no-code approach, it is great to be able to ask it to connect to a data source (Mysql, Excel, CSV, API, etc.) and generate the right views in a few seconds.

Screenshot from GPT-3, October 2021

Here’s a mini-example where I fetch the NASA log file for the day of August 1, 1995, and ask for a bar graph with the total number of URLs visited in the hour.

Then, with a simple text editor, you can see the result by copying and pasting the code.

In order to take the no-code concept even further, I’m preparing a web application where everything will be driven by text.

The only limit in the use of language models in SEO is your imagination. You can certainly create an entire SEO dashboard this way by breaking down each of the views you want, step by step.

Language models still have a lot of surprises in store and there are a lot of new uses coming for marketing.

More Resources:

Featured Image: Vector Juice/Shutterstock