With a large amount of unlabeled training data available and model size increasing, the AI community developed the so-called zero-shot and few-shot learning. Here I will walk you through what each of these terms means, show what NLP tasks can be tackled, and present some available libraries for playing around.


The overall idea is using a learning in natural language processing model, pre-trained in a different setting or domain, in an unseen task (zero-shot) or fine-tuned in a very small sample (few-shot).

A common use case is applying this technique to the classification problem. In this case, it refers to learning a classification model that is trained on a certain set of classes and evaluated on another set of labels that it has never seen before (zero-shot) or it has seen a few examples (few-shot). 

Within this genre, some approaches are language models trained on a large text corpus which, given explanatory class names, can adapt to the proposed classification task. Next, I will show some examples of tasks zero-shot and few-shot models can handle.


Consider the natural language inference (NLI) task, where given a premise and a hypothesis we say if the hypothesis is true (entailment), false (contradiction), or undetermined (neutral). For example, if the premise is “I have a dog” and the hypothesis is “I have an animal”, the premise entails the hypothesis.

The work by Yin et al. shows a way to use a pre-trained NLI model for tackling the zero-shot text classification problem. For that, the authors proposed the following method: 

Consider the premise as the text you would like to classify and the hypothesis as a sentence with this structure: “This text is about {label_name}”, where label_name is the label you want to predict, the NLI model predicts probabilities for entailment, contradiction and neutral and that value tells if the text is classified as the provided label.

Zero-shot and few-shot learning in natural language processing

Source: Yin, Wenpeng, Jamaal Hay, and Dan Roth. “Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach.” arXiv preprint arXiv:1909.00161 (2019).

The image above showcases one of the major advantages of using  zero-shot learning in natural language processing text classification: labeling the same text in many different ways, without a rigid constraint in the number of predefined classes. Depending on how you interpret the text, there are no boundaries in how you define your labels and can serve multiple purposes.

For example, in the image above, the sentence “The plague in Mongolia, occurring last week, has caused more than a thousand isolation” can be classified as the “health” label, when categorized as a possible news section, or as the “sadness” label, when observing which sentiment this sentence evokes.

Available libraries and code examples

There are many available zero-shot and few-shot learning in natural language processing models out there. The aforementioned NLI model is available in Hugging Face for zero-shot classification, as a pre-trained BART model. The documentation for using it is available here.

Additionally, Flair has made available the TARS classifier and documentation on how to use it for zero-shot and few-shot classification. It is simple to load a base model and use that for prediction, as shown by the following code:

from flair.models.text_classification_model import TARSClassifierfrom
flair.data import Sentence

# 1. Load our pre-trained TARS model for English
tars = TARSClassifier.load(‘tars-base’)

# 2. Prepare a test sentence
sentence = Sentence(“I am so glad you liked it!”)

# 3. Define some classes that you want to predict using descriptive names
classes = [“happy”, “sad”]

#4. Predict for these classes
tars.predict_zero_shot(sentence, classes)

# Print sentence with predicted labels

Pin It on Pinterest