I have a program that receives an audio (mono) stream of bits from TCP/IP. I am wondering whether the speech (speech-recognition) API in Mac OS X would be able to do a speech-to-text transform for me.
(I don't mind saving the audio into .wav first and read it as oppose to do the transform on the fly).
Speech To Text On Mac
Mac OS X also has a Text to Speech Option which will read selected text when the user presses a user defined Shortcut Key. The text to speech option also works in the Apple Calculator to make it self voicing. It is a simpler option than VoiceOver to use for reading e-texts for people with learning disabilities. Mac OS X voices CereVoice text-to-speech v4.0 is available for Apple Mac OS X, bringing CereProc's high-quality voices to computers running Apple's OS X: 10.7 Lion, 10.8 Mountain Lion, 10.9 Mavericks, 10.10 Yosemite, 10.11 El Capitan and 10.12 Sierra. Text to Speech feature on Mac OS X: How to enable and use Go to System Preferences and Select Dictation & Speech as we did to enable Speech to Text. Now Select Text to Speech tab. ( Note: MacOS Sierra 10.12 users can enable it from System Preferences –> Accessibility–> Speech ). Unix for Mac OS X Users unlocks the powerful capabilities of Unix that underlie Mac OS X, teaching how to use command-line syntax to perform common tasks such as file management, data entry, and text manipulation. The course teaches Unix from the ground up, starting with the basics of the command line and graduating to powerful, advanced tools like grep, sed, and xargs. You can make your Mac talk to you in various different ways and even speaking with different voices, all by using the powerful built-in Text-to-Speech abilities of Mac OS X. With this, you can either speak a few words, phrases, or even an entire document.
I have read the official docs online, it is a bit confusing. And I couldn't find any good example about this topic.
Also, should I do it in Cocoa/Carbon/Java or Objective-C?
There's a number of examples that get copied under /Developer/Examples/Speech/Recognition when you install XCode.
Cocoa class for speech recognition is NSSpeechRecognizer.I've not used it but as far as I know speech recognition requires you to build a grammar to help the engine choose from a number of choices rather then allowing you to pass free-form input. This is all explained in the examples referred above.
This comes a bit late perhaps, but I'll chime in anyway.
The speech recognition facilities in OS X (on both the Carbon and Cocoa side of things) are for speech command recognition, which means that they will recognize words (or phrases, commands) that have been loaded into the speech system language model. I've done some stuff with small dictionaries and it works pretty well, but if you want to recognize arbitrary speech things may turn hairier.
Something else to keep in mind is that the functionality that the speech APIs in OS X provide is not one to one. The Carbon stuff provides functionality that has not made it to NSSpeechRecognizer (the docs make some mention of this).
I don't know about Cocoa, but the Carbon Speech Recognition Manager does allow you to specify inputs other than a microphone so a sound stream would work just fine.