edit to clarify a misconception in the comments, this is an instagram post so “caption” refers to the description under the image or video
as an example, this text i am typing now is also a “caption”
just saying because someone started a debate misunderstanding this to be about subtitles (aka “closed captions”) and that’s just not the case 👍
If you use any generic LLM then yes, but there are LLMs (like i said in another reply - its prrobably not a LLM - but as there is no ‘real’ ai that’s what I’m calling all this ai bullshit) That are trained specifically for captioning/transcripts, just not necessarily done in real time.
Doing it “live” is what increases the error rate.
LLMs are large language models, they’re a specialized category of artificial neural network, which are a way of doing machine learning. All of those topics are under the academic computer science discipline of artificial intelligence.
AI, neural net, or ML model are all way more accurate to say than LLM in this case.