Neural network approaches for NLP / NLU / dialog / chatbots

A number of you mentioned in your intros that you’re interested in chatbots, so am starting this discussion.

There have been a number of recent papers on generating text / dialog with GANs, reinforcement learning, etc.

Three questions:

  1. Are there any recent NLP / NLU papers that are essential reading? Hard to tell just searching in Arxiv and the like what is or isn’t a breakthrough paper.

  2. Have any of these been implemented in beta or production systems and shown to lead to an actual improvement in UX for end users? (i.e. increased engagement duration, higher satisfaction / resolution scores)

  3. Lack of real world context / knowledge stunts the effectiveness of most open-ended conversational chatbots. Developers of semi-decent ones compensate by hardcoding thousands of objects and their relationships and attributes. Can this concept modeling be done automatically / implicitly with an NN approach? Seems like you already acquire some level of semantic / topical relatedness just by looking at word vectors.

Some papers on adversarial learning / RL for dialog
Let me know others we can add.

Policy Networks With Two-Stage Training For Dialog Systems (from Maluuba / acquired by Microsoft)

Generating Text With Adversarial Training (Zhang, Gan, Carin)

Adversarial Learning for Neural Dialog Generation

Deep Reinforcement Learning For Dialog Generation

Adversarial Methods For Semi-Supervised Text Classification (Miyato, Dai, Goodfellow)

Adversarial Examples in NLP Contexts


Later in the course we’ll be looking at memory networks for chatbot / Q&A, FYI…


@jeremy We have clients we’re meeting with at SxSW who are actively interested in this topic, so I’m jumping ahead and doing a bit of research. Let me know of any “must-read” papers I should know about.

The punch-line, that we’ll see when we cover this, is that current technology doesn’t really work for any kind of true “chat bot”, and is just a fairly basic dictionary lookup. Given how poorly Facebook’s chat box is performing, you could probably guess that’s the state of play! However no academic seems likely to write that in a paper…

It’s true! Chatbots are on average so terrible, although you can significantly improve the experience with design hacks and deploy them effectively (i.e. Autodesk reduced customer support load by 90% with their pure chatbot powered by Watson - so you know it wasn’t very clever :stuck_out_tongue: - just by being smart about the right questions to address).

Another “hack” that can improve a chatbot UX when it is an entertainment / open-ended conversation bot is to proactively offer commentary when it doesn’t quite understand a user comment. Kind of like a politician answers they questions they want rather than the question you asked, but hopefully not as smarmy.


BOT: Do you ever dream?

YOU: (proceed to explain some complicated dream you had that the bot cannot hope to understand)

BOT: I only dream in zeros and ones.

So as long as the bot generates some plausible sentence that is topically related, this can improve the conversation UX. That’s why I’m interested in looking into GANs, which would eliminate the need for a bot designer to manually program in 20 permutations of a response to a bucket of related inputs, which is what we do now.

Seems like this is simple enough of a use case that GANs or current NN approaches would be able to address, and would actually boost end user experience and performance.

An experiential / entertainment bot is designed to replace a static ad or video. The engagement rates we’ve seen for them have topped 10 mins on avg (multiples on normal ad engagement rates) and these are stupid, scripted bots. On the flip side, a user once chatted with Mitsuku (2016 Loebner Prize winner) for over 9 HOURS - also a completely scripted bot - although it has over 3000 objects in its knowledge graph and 300,000 templated responses. About 80% of her 5+ mil users come back to chat with her over multiple sessions.

The takeaway is that while chatbots suck, people are lonely and bored, so at least some narrow use cases of chatbots are outperforming previous UI.


I’ve recently worked on chatbots that are designed for answering questions on structured data, and rule based methods worked really well (I don’t have the raw metrics, but the bot was able to handle many variations of around 50 difficult questions we aimed for). The bot was built using ChatScript, a very powerful natural language processing engine that provides out of the box support for essentials like spell correction, parsing, Wordnet etc.

If you are:

  • Working with a restricted domain.
  • Not particular about a machine learning based solution.
  • Want to optimize the results / time ratio.

Rule based chatbots are your friend.


Thanks @mariya and @amanmadaan for these insights - really interesting.

a) Two other interesting papers w/ different flavors of QA:

b) like Jeremy says, chatbots in general are a toy right now (though toys can be useful too - engagement doesn’t have to be tied to helping with a specific task, etc.).

c) You can currently acquire some shallow semantic/topical information from word vectors - not enough to substitute for knowledge graph construction though.
There is some interesting work being done on extending ideas from word vectors to “concept”/“relation” vectors but it’s more fun/academic than something you’d want to do
in a production setting because knowledge graphs and task models are powerful.
Successful products like Alexa , etc. use a ton of different technologies (automatic knowledge acquisition, semi-automatic, DL for intent analysis and so on). Outside of a few efforts,
what’s going on is that neural approaches are being substituted for older approaches, one by one, in the typical building blocks of a Q & A system (e.g. query processing, entity recognition, etc.).

Also, like @amanmadaan says - for specific verticals, with a mixture of techniques (ML , manually-specified templates, etc.) you can have good results right now.

I am currently working on a review of QA systems - i’ll post a link to the post on Twitter when done in a couple of weeks if you’re interested. (@AMP_SV)


Thanks for your insights! Definitely interested in your review of QA systems when you’re ready to publish. Also downloaded those two papers you recommended to add to my very long reading queue.

yes, there are 1000 papers to read and like most research a lot are incremental/academic toys/etc. :). And now with arxiv, the sheer number is overwhelming…

I know! It’s stressful just trying to decide what to read. Definitely let me know if any of these toy papers actually catch your eye though.

@anamariapopescug can you recommend any papers /systems/ learning reources for automatic knowledge acquisition and semi-automatic knowledge acquisition?

Also would be interested to hear more about the work for extracting ‘concepts’ from text.

Looking forward to reading your review.

Nice to have an actual NLP expert here to help us! :slight_smile: Thanks @anamariapopescug that was all very interesting.

For getting vectors out of graphs are you thinking of something like ? What are the more recent advances in this area that are of potential interest?

yes exactly - TransE and the various follow-ups on it (to deal with different types of relations, and so on). I’ll find some more up-to-date references.

It seemed like the TransE approach of assuming a linear offset is pretty simplistic - is it good enough in practice? Or are there more recent papers that use a non-linear model (like a neural net, for instance! :wink: )

yes, there’s work like below - but note for e.g. first paper, not a huge increase over prev. methods. still, promising direction.
It’s pretty clear for something like KB completion a mixture of approaches will do best in short term (bc. you have to deal with functional vs. non-functional
relations, and so on.). I’ll see what the SOTA is in literature.

Somewhat related:

1 Like

Found this article by OpenAI interesting where they train agents to invent their own language based on real-world experiences. Highlights difference between “grounded” language (based on real experience) and “inferred” language (a la John Searle’s Chinese Room experiment where you map text to large dictionaries). Traditional ML approaches use the latter, but the result is not real understanding of what the text really means.

The related papers are here:


I found this interesting. Here’s an elaboration from

Language is grounded in experience. Unlike dictionaries which define words in terms of other words, humans understand many basic words in terms of associations with sensory-motor experiences. People must interact physically with their world to grasp the essence of words like “red,” “heavy,” and “above.” Abstract words are acquired only in relation to more concretely grounded terms. Grounding is thus a fundamental aspect of spoken language, which enables humans to acquire and to use words and sentences in context.

This was one of the most satisfying paragraphs I’ve ever read.

Full text:


You’re right. That passage is so satisfying. Words are just labels to index into real experiences.