Transformers Spanish Summarizer

Hi everyone,

I am starting a project that aims at summarising text (in Spanish) and would like to leverage Transformers. I am not sure what model I should start with and how to approach this problem. I imagine I would need to fine-tune a model on my dataset but do not fully understand how to pick a model. Indeed, some support multiple languages, some are more fitted for some tasks.

Thanks for your help !

Charles

This is how I would start.

Good luck!! Happy learning!!

2 Likes

Hi Charles,

If you want to do extractive summarization, you could start with a pre-trained Spanish BERT model. A good place to look is the huggingface model hub: https://huggingface.co/models?search=spanish

For abstractive summarization, I don’t know if there is any pre-trained model available in Spanish. If you have a large training corpus and the resources, you could try training a model from scratch. To my knowledge, BART and T5 have shown some promising results for summarization.

1 Like

If you aren’t worried about actually building the model yourself, you could try using a library like OpenNMT or Fairseq to train a summarization model. In addition, it could serve as a baseline if you do decide to build your own. (https://opennmt.net/OpenNMT-py/examples/Summarization.html)

@nn.Charles
Just discovered a live project. Seems like a fun project to work on if you are new to Summarization (if not please ignore)

Hi Stefan,
Maybe a suggestion how to start to pre-trained BERT model in Spanish for extractive summarization?
Thank you so much in advance.

Hi Wilfredo,

I would start with some pre-trained Spanish language model, e.g. https://huggingface.co/Geotrend/bert-base-es-cased, and then fine-tune it on an extractive summarization dataset. I’m not sure which datasets exist in Spanish for this task, but this one could be interesting.