NLP + Computer Vision task

I picked up a project where I want to integrated NLP task with computer vision. The task goes something like, where I ask questions via speech and the model answers the question. I have the computer vision model. But I just don’t know how to integrated NLP speech to text and text to speech on top of that. @jeremy HELP!

Hi Aamir,

Looks like you’re working on an interesting project, but also a pretty complex one. I don’t think it will be straigthforward to offer a solution here, especially that the question is very generic. You probably need to develop each component separately - either using a custom model, or a service (cloud providers like GCP or AWS may offer speech to text and text to speech services) and then connect everything together in your code. Once you get going, you may be able to start asking more specific questions that are easier to answer.

Good luck!

BTW. I’d recommend against tagging Jeremy in your questions, the purpose of the forum is to use community to help each other out :slight_smile:


thank you