Prediction scores #4

TheophileBlard · 2020-12-27T14:22:03Z

With 🤗Transformers pipelines it's very easy to get prediction scores, for each class.

Bonjour,
Ma question se repose sur le code que vous avez utilisé pour calculer le pourcentage ( score) pour chaque sentiment (positive et negative), je le trouve pas sur votre page de github , le résultat d'affichage que je parle est montré sur cette page : https://huggingface.co/tblard/tf-allocine?text=Je+t%27appr%C3%A9cie+beaucoup.+Je+t%27aime.
Pour moi, je veux exécuter ce code sur mon propre script et pas seulement tester le résultat sur le site.
Pouvez vous svp de l'ajouter sur votre page github ou de me l'envoyer directement sur ma boite email: [email protected].
Merci pour votre aide.

Originally posted by @hodhoda in #3 (comment)

1. First instantiate Tokenizer & Model

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine")
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")

2. Then create pipeline

Do not forget to set the return_all_scores parameter to True, otherwise the pipeline will only output the probability of the predicted class.

nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer, return_all_scores=True)

3. Last, feed the pipeline

result = nlp("J'aime le camembert")
result

[[{'label': 'NEGATIVE', 'score': 0.21667276322841644},
  {'label': 'POSITIVE', 'score': 0.7833272218704224}]]

The text was updated successfully, but these errors were encountered:

emiliepicardcantin · 2021-10-25T15:41:23Z

Hello !
Thank you for this model.
I am having some problems with the labelling process. The returned score does not seem to match the labels in my case and I don't know why. Here are some examples :

Je suis tres satisfaite du service . Prix correct. Assurance que je recommande a tous. En cas de souci très réactif et a l'écoute des clients. Pas déçue . [{'label': 'POSITIVE', 'score': 0.9916808009147644}]

L'établissement et la réalisation contrat sont faciles à mettre en place, tout comme les appels à la plateforme. Plus qu'à voir en cas de sinistre,mais je ne suis pas pressé !! [{'label': 'POSITIVE', 'score': 0.5242959260940552}]

Le prix m'a convenu pour un jeune conducteur. Il est par contre excessif si l'on rajoute un conducteur secondaire. Le service téléphonique est plutôt assez rapide à répondre. [{'label': 'NEGATIVE', 'score': 0.9145129919052124}]

Un assureur qui assure tant qu'il n'y a pas de sinistre... Un assureur qui résilie le contrat et qui ferme les accès aux informations personnelles avant la fin du contrat en indiquant au 24/09/2021 le message suivant "votre contrat est résilié depuis le 3/12/2021" :-) Donc plus de 2 mois avant l'échéance, le client n'a plus accès à son contrat, ni à la liste des sinistres enregistrés sur son compte, ne serait-ce que pour vérifier qu'il n'y a pas d'erreur... Le service client explique qu'effectivement, l'espace personnel est fermé à partir du jour d'envoi du courrier de résiliation, et non pas à la date de fin de contrat. [{'label': 'NEGATIVE', 'score': 0.9356728196144104}]

TheophileBlard · 2021-10-26T17:42:34Z

Hi @emiliepicardcantin, I believe you make the assumption that because the model was trained on a binary classification task, its output is a single neuron with sigmoid activation. In fact, this model have two output neurons on which we apply a softmax activation in order to get (pseudo) probabilities. Because this is a binary classification task, if the "negative" probability is > 0.5, then the predicted label is "negative", and if the "positive" probability is > 0.5 then the predicted label is "positive". That's why on your examples, the output scores are always > 0. If you pass the return_all_scores=True to the pipeline object, you will get the (probability) score for both outputs.

This said, the predicted label seems ok in your examples (3rd example is arguable) , but I'd advise you to fine-tune the model for your task (as the model was only trained on movie review data).

TheophileBlard · 2021-10-26T17:52:09Z

If you bypass the pipeline and directly use the model, you will be able to get its outputs

review = "J'aime le camembert"
inputs = tokenizer(review, return_tensors="tf")
model_outputs = model(inputs)
outputs = model_outputs["logits"][0]
print(outputs) # => tf.Tensor([-0.6336924   0.65147054], shape=(2,), dtype=float32)

You can then manually apply the softmax

import numpy as np

def softmax(_outputs):
    maxes = np.max(_outputs, axis=-1, keepdims=True)
    shifted_exp = np.exp(_outputs - maxes)
    return shifted_exp / shifted_exp.sum(axis=-1, keepdims=True)

scores = softmax(outputs)
print(scores) # => [0.21667267 0.7833273 ]

You will get the same results than with the pipeline

nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
result = nlp(review, return_all_scores=True)
result

[[{'label': 'NEGATIVE', 'score': 0.2166726142168045},
  {'label': 'POSITIVE', 'score': 0.7833273410797119}]]

emiliepicardcantin · 2021-10-26T18:42:17Z

Thank you !

TheophileBlard added the tutorial label Dec 27, 2020

TheophileBlard pinned this issue Dec 27, 2020

TheophileBlard mentioned this issue Dec 27, 2020

A propos du data set #3

Closed

TheophileBlard closed this as completed Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction scores #4

Prediction scores #4

TheophileBlard commented Dec 27, 2020 •

edited

Loading

emiliepicardcantin commented Oct 25, 2021

TheophileBlard commented Oct 26, 2021 •

edited

Loading

TheophileBlard commented Oct 26, 2021 •

edited

Loading

emiliepicardcantin commented Oct 26, 2021

Prediction scores #4

Prediction scores #4

Comments

TheophileBlard commented Dec 27, 2020 • edited Loading

1. First instantiate Tokenizer & Model

2. Then create pipeline

3. Last, feed the pipeline

emiliepicardcantin commented Oct 25, 2021

TheophileBlard commented Oct 26, 2021 • edited Loading

TheophileBlard commented Oct 26, 2021 • edited Loading

emiliepicardcantin commented Oct 26, 2021

TheophileBlard commented Dec 27, 2020 •

edited

Loading

TheophileBlard commented Oct 26, 2021 •

edited

Loading

TheophileBlard commented Oct 26, 2021 •

edited

Loading