Google Research Paper Reveals a Shortcoming in Search

A recent Google research paper on Long Form Question Answering illustrates how difficult it is to answer questions that need longer and nuanced answers. While the researchers were able to improve the state of the art of this kind of question answering, they also admitted that their results needed significant improvements.

I read this research paper last month when it was published and have been wanting to share it because it focuses on solving a shortcoming in search that isn’t discussed much at all.

I hope you find it as fascinating as I did!

What Search Engines Get Right

This research centers on Long Form Open-Domain Question Answering, an area that Natural Language processing continues to see improvements.

What search engines are good at is called, Factoid Open-domain Question Answering or simply Open-domain Question Answering.

Open Domain Question Answering is a task wherein an algorithm responds with an answer to a question in natural language.

What color is the sky? The sky is blue.

Long Form Question Answering (LFQA)

The research paper states that Long-form Question Answering (LFQA) is important but a challenge and that progress in being able to achieve this kind of question answering is not as far along as Open-domain Question Answering.

According to the research paper:

“Open-domain long-form question answering (LFQA) is a fundamental challenge in natural language processing (NLP) that involves retrieving documents relevant to a given question and using them to generate an elaborate paragraph-length answer.

While there has been remarkable recent progress in factoid open-domain question answering (QA), where a short phrase or entity is enough to answer a question, much less work has been done in the area of long-form question answering.

LFQA is nevertheless an important task, especially because it provides a testbed to measure the factuality of generative text models. But, are current benchmarks and evaluation metrics really suitable for making progress on LFQA?”

Search Engine Question Answering

Question answering by search engines typically consists of a searcher asking a question and the search engine returning a relatively short text of information.

Questions like “What’s the phone number of XYZ store?” is an example of a typical question that search engines are good at answering, especially because the answer is objective and not subjective.

Long Form Question Answering is harder because the questions demand answers in the form of paragraphs, not short texts.

Facebook is also working on long form question answering and came up with interesting solutions like using a question and answer subreddit called Explain Like I’m 5 (a dataset called ELI5). Facebook also admits that there more work to do. (Introducing Long-form Question Answering)

Examples of Long Form Questions

Once you read these examples of long form questions it’s going to be clearer how we’ve been trained by search engines to ask a limited set of queries. It might even seem shocking how almost infantile our questions are compared to long form questions.

The Google research paper offers these examples of long form questions:

  • What goes on in those tall tower buildings owned by major banks?
  • What exactly is fire, in detail? How can light and heat come from something we can’t really touch?
  • Why do Britain and other English empire countries still bow to monarchs? What real purpose does the queen serve?

Facebook offers these examples of long form questions:

  • Why are some restaurants better than others if they serve basically the same food?
  • What are the differences between bodies of water like lakes, rivers, and seas?
  • Why do we feel more jet lagged when traveling east?

Are Searchers Trained to Ask Short Questions for Factoids?

Google (and Bing) have a difficult time answering these long form types of questions. This may impact their ability to surface content that provides complex answers for complex questions.

Maybe people don’t ask these questions because they’ve been trained not to because of the poor responses. But if search engines were able to answer these kinds of questions then people would begin to ask them.

It’s a whole wide world of questions and answers that are missing from our search experience.

If I shorten the phrase “Why are some restaurants better than others if they serve basically the same food?” to “Why are some restaurants better than others?” Google and Bing still fail to provide an adequate answer.

The top Google search result for that question comes from the (HTTP insecure) blog of a Canadian Indian.

Google cites this section of the Indian restaurant in the SERP:

“People pay for the overall experience and not just the food and that is why some restaurants charge much more than others. Restaurant customers expect the prices to reflect the type of food, level of service and the overall atmosphere of the restaurant.”

What if the person had Popeye’s Fried Chicken versus KFC in mind when they asked that question?

There’s a certain amount of subjectivity that can creep into answering these kinds of questions that demands a long and coherent answer.

I can’t help thinking that there’s a better answer out there somewhere. But Google and Bing are unable to surface that kind of content.

Google Uses Signals to Identify High Quality Content

In a How Search Works explainer that Google published in September 2020, Google admits that it does not use the content itself to identify if it is reliable or trustworthy.

Google explains that it uses signals in a blog post titled, “How Google Delivers Reliable Information in Search.”

“…when it comes to high-quality, trustworthy information… We often can’t tell from the words or images alone if something is exaggerated, incorrect, low-quality or otherwise unhelpful.

Instead, search engines largely understand the quality of content through what are commonly called “signals.” You can think of these as clues about the characteristics of a page that align with what humans might interpret as high quality or reliable.

For example, the number of quality pages that link to a particular page is a signal that a page may be a trusted source of information on a topic.”

Unfortunately, that part of Google’s algorithm is unable to provide a correct answer to these kinds of long form questions.

And that’s an interesting and important fact to understand because it helps to be aware of what the limits are to search technology today.

What About Passage Ranking?

Passage Ranking is about ranking long web pages that contain the short answers for normal short queries needing an objective answer.

Martin Splitt used the example of finding a relevant answer about tomatoes in a web page that is mostly about gardening in general.

Passage ranking cannot solve the hard questions that Google currently cannot answer.

Both Google and Bing generally fail to answer LFQA type queries because this is an area that search engines still need to improve.

Hurdles to Progress

The research paper itself acknowledges that shortcoming in the title:

Hurdles to Progress in Long-form Question Answering

The research paper concludes by stating that its approach to solving this task “achieves state of the art performance” but that there are still issues to resolve and more research that needs to be done.

This is how the paper concludes:

“We present a “retrieval augmented” generation system that achieves state of the art performance on the ELI5 long-form question answering dataset. However, an in-depth analysis reveals several issues not only with our model, but also with the ELI5 dataset & evaluation metrics. We hope that the community works towards solving these issues so that we can climb the right hills and make meaningful progress.”

Questions and Speculation

It’s not possible to provide a definitive answer but one has to wonder if there are web pages out there that are missing out on traffic because both Google and Bing are not able to surface their long form content in answer to long form questions.

Also, some publisher mistakenly overwrite their articles in a quest to be authoritative. Is it possible that those publishers are over-writing themselves out of search traffic from queries that demand shorter answers since search engines can’t deliver nuanced answers available in longer documents?

There’s no way of knowing these answers for certain.

But one thing this research paper makes clear is that long-form question answering is a shortcoming in search engines today.


Google AI Blog Post
Progress and Challenges in Long-Form Open-Domain Question Answering

PDF Version of Research Paper
Hurdles to Progress in Long-form Question Answering

Facebook Web Page About LFQA
Introducing Long-form Question Answering