Business idea:
To offer the service of “cleaning” the list of materials a company has on its ERP system, so strings like:
Example 1 (English)
CAS SYSTEM DUAL FREQ LIGHT VEHICLE CAS SYSTEM DUAL FREQ LIGHT VEHICLE -GEN 2 PROD0240 G1 SYSTDFL01-VI GENERAL ELECTRIC
should be converted into something like that:
CAS SYSTEM DUAL FREQ LIGHT VEHICLE GENERAL ELECTRIC PROD0240 GEN 2 PROD0240 G1 SYSTDFL01-VI
Example 2 (Spanish)
ALUMBRADO PUBLICO 86X295X492MM 120WATTS ALUMBRADO PUBLICO 86x295x492MM MODELO GREEN VISION XCEED N/P PH643120W MARCA PHILIPS
should be converted into something like that:
ALUMBRADO PUBLICO PHILIPS PH643120W MODELO GREEN VISION XCEED 86X295X492MM 120W "
Basically a good start would be to comply with this options:
- Remove repeated words
- Clean empty spaces
- Complete truncated words
- Order words in a meaningful way (Noun + modifier + part number)
I know companies that offer that service and they do it manually, maybe with NLP (perhaps transformers) we could compete in that market.
Who’s in?