Hugging Face + FastAI - Session 1

9 minute read


FastAI + HF Learnings - Week -1

Setting up a work Environment:

  • For using colab, Installation is pretty straight forward and can be done using the pip installer like below: ``` !pip install transformers[sentencepiece]
    • [sentencepiece] will make sure that all the necessary libraries are included. ```
  • For using local installations use conda / mamba and then utilize the fastconda channel to grab all the necessary fastai related libraries. (All the necessary links at the end of the blog post).

Source of this blogpost:

  • The great course (part 1/3 - Introduction) released & offered by the Hugging face is the source of this blogpost . Screen Shot 2021-07-18 at 5 36 58 AM

  • This blogpost is my understanding of the Transformers library after participating and learning from the HuggingFace Course with FastAI bent - session 1, generously organized by Weights & Biases and weight lifted by some great folks like Wayde Gilliam, Sanyam Bhutani, Zach Muller, Andrea & Morgan. Sorry if I have missed anyone but thank you all for great hardworking for brining this to the masses.


  • Natural Language processing is the field of linguists and machine learning focused on understanding everything related to human language” and some of the common tasks include Sequence classification, Token classification, Text generation and Extractive Q & A.
  • The state of the art techniques for above tasks comes from the deep learning and transformer models are part of it.

Insights into 🤗 Transformers library:

  • “The 🤗 Transformers library provides the functionality to create and use those shared models.” “The Model Hub contains thousands of pretrained models that anyone can download and used.”
  • The pipeline function is the highest level of the Transformers API and it returns an end-to-end object that can perform a specific NLP task on one or several texts.
  • The pipeline is the most basic object in the 🤗 Transformers library and can connect a model with its necessary preprocessing and post-processing steps, allowing us to directly input any text and get an intelligible answer.
  • Some of the key steps a pipeline can perform:
    • Preprocess the text into a format the model can understand.
    • Pass the preprocessed inputs to the model.
    • Post-process the predictions of the model so they are humanly understandable.
  • For tokenization purpose, the transformers library always encourages using AutoModels which under the hood knows which exact models to utilize for the current task.

  • Example of the tokenization:

    Step 1: Create the Tokenizer and choose the architecture Screen Shot 2021-07-18 at 6 04 32 AM

    • The AutoTokenizer knows which exact tokenizer object to create. In this case it has created the BertTokenizerFast Object.

    • The transformers library when creating these objects, has some cool stuff under the hood like vocab_size which denotes the total size of that model vocabulary set, the max length of that model, which special tokens are included and whether we have any masks included.

    • A Mask is there some of the key capabilities of the transformers library are included.

Step 2: Inputing a text to the tokenizer Screen Shot 2021-07-18 at 6 15 41 AM

  • The tokenizer model will create the numerical ids of the sentence we have provided and also the token type ids which are essential for the Bert model.

  • The token type ids are essentially a way for the Bert models to identify two sentences.

  • The attention masks are a special capability of the transformers library by which it determines whether to apply the learnings / representation to the other tokens from the current tokens it is looking at.

Additional Step: * We can decode these token to see the exact representation of how the words are created as tokens. Screen Shot 2021-07-18 at 6 27 59 AM

  • The CLS & SEP are special tokens added by the model and vary by the model we are using. And also some special tokenization like ## depending on how they utilize the sub-word tokenization.

  • A sub-word tokenization is a strategy by which the model limits the size of the learning batches for performance.

Example of using a Pipeline:

The beauty of the pipeline API lies in the simplicity of steps we have to apply a model and get a result.

Step1: Create a classifier

  • Create a classifier according to the task. For instance a sentiment-analysis classifier and provide a text to classify its sentiment. Screen Shot 2021-07-18 at 6 41 34 AM

Step2: Utilizing the classifier object Screen Shot 2021-07-18 at 6 42 36 AM

Additional Step 1: Know its information. Screen Shot 2021-07-18 at 6 43 29 AM

  • The distil models of the bert are the smaller versions of the architecture for faster performance. Additional Step 2: Using the user-contributed models

  • For using the user-contributed models, we need to specify the model while creating the classifier object in step 1 Screen Shot 2021-07-18 at 6 48 20 AM

Additional Step 3: Using Zero-shot classification

  • A zero-shot classification refers to a specific use case of machine learning (and therefore deep learning) where you want the model to classify data based on very few or even no labeled example, which means classifying on the fly.
  • The below picture shows the difference between the transfer learning technique that we utilize & zero-shot learning.

Screen Shot 2021-07-18 at 6 53 23 AM

  • The below code snippet is a classic example of how to start with zero-shot classification Screen Shot 2021-07-18 at 6 54 58 AM

Example of Text-Generation capabilities of Pipeline API:

  • Create the pipeline object with the text-generation task and provide the piece of text that you want the generator to complete the rest of the text for you. Screen Shot 2021-07-18 at 6 59 56 AM

  • We also have the capability to control the text generation via parameters like max_length which restricts the length of the sentence and num_return_sequences which will return the exact number of sequences. Uploading Screen Shot 2021-07-18 at 8.57.49 AM.png…

Example of a Language Modeling:

  • Language modeling (LM) is the use of various statistical and probabilistic techniques to determine the probability of a given sequence of words occurring in a sentence. Screen Shot 2021-07-18 at 7 09 26 AM

  • We can utilize the “fill-mask” architecture from the library to achieve such amazing NLP capability. Screen Shot 2021-07-18 at 7 11 17 AM

Example of Token classification:

  • We have several use cases in token classification and one of such amazing capability is Named-entity recognition (NER).
  • One the capabilities of the NER is that, it can group tokens after tokenization to understand which ones are together and if they are names are not etc. Screen Shot 2021-07-18 at 7 14 28 AM

Example of Question-Answering Task:

  • The transformer library is pretty good at Question-Answering tasks like extracting the answer from the provided question and its context.
  • Here we are creating a pipeline providing the task “question-answering” and then provide the question and context to the pipeline object which will then extract the answer from the context and return it. Screen Shot 2021-07-18 at 7 20 05 AM

Example of Summarization capabilities of the library:

  • We can summarize the text by using the summarization capabilities of the Pipeline API by providing the “summarization” task. Screen Shot 2021-07-18 at 7 26 49 AM

  • And the summarizer object will then return the summary of the text provided like below: Screen Shot 2021-07-18 at 8 58 44 AM

Example of Translation capabilities of the API:

  • We can utilize the translation task with the pipeline object to create a translator which will translate the provided text from the language given to language needed by looking at the model attribute. Screen Shot 2021-07-18 at 7 30 43 AM

Pretty Cool 🤗.

We have looked at the Transformers library and its extraordinary capabilities for different types of NLP tasks. Now this my understanding about what a transformer is and how does that actually work.

  • A transformer is a deep learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input data. It is used primarily in the field of natural language processing and in computer vision.
  • In the case of HF🤗 all the Transformer models mentioned above (GPT, BERT, BART, T5, etc.) have been trained as language models.
  • If a model excels at the language modeling task then, we can use the transfer learning technique where these pre-trained language models can be fine tuned for specific task.
  • The transformer architecture is originally built to handle translation.
  • In simple summary, the basic transformer under the hood has two steps, encode and decode. The encoder is focused on understanding the input and the decoder is focused on generating the output or a higher level representation of the output which we can utilize for predictions. Screen Shot 2021-07-18 at 7 54 34 AM

  • One of the key components of the transformers working is “Attention”.
  • Attention allows the model to focus on the relevant parts of the input sequence as needed.
  • When the model is processing the text (words), self attention allows it to look at other positions in the input sequence for clues that can help lead to a better encoding for this word.
  • Some of the transformer models are encoder only like ALBERT, BERT, DistilBERT, ELECTRA and RoBERTa.
  • Some of the transformer models are decoder only and focus on text generation like CTRL, GPT, GPT-2 & Transformer XL.
  • Some of the transformer models are Sequence-to-Sequence and utilize both encoder & decoder. These models work well for tasks where input distribution is different from output distribution in tasks such as summarization, translation, generative Q&A . Some the examples are BART / MBART, M2M100, MarianMT, Pegasus, PropheNet, T5/mT5.
  • One of the key-consideration to take into account while using these models is the bias. So we should be aware of that and try to reduce it as mush as possible.