Voice recognition to AHK

Get help with using AutoHotkey and its commands and hotkeys
flyingcrap
Posts: 3
Joined: 06 Jan 2020, 13:24

Voice recognition to AHK

06 Jan 2020, 14:36

Hiya,

I'm looking for a dependable way to send commands via voice/speech recognition which is then recognized by AHK. I want to use this in the office in real time to help dictate/write out a block of text.

An example would involve me saying "consent" and AHK would trigger the command ::consent::.

I'm open to using my Android phone as I prefer Google's voice recognition over MS. I have almost no background in coding except playing around with AHK. Thanks.
gregster
Posts: 4252
Joined: 30 Sep 2013, 06:48

Re: Voice recognition to AHK

06 Jan 2020, 15:00

Welcome to the forums!

For pre-defined words and word combos, like you seem to have in mind, you could use HotVoice by evilC https://www.autohotkey.com/boards/viewtopic.php?f=6&t=34288, to trigger certain actions. Potential advantage: it could constantly listen for you to activate it via voice, afaik.

For dictation (not sure, if you would want to do that via the internet) and/or triggering actions, you could probably create a Telegram bot (using the Telegram Bot API and AHK) that saves your messages that you send to it. The messages you can create with the Google voice recognition (it can be used in Telegram, but you'll need to press buttons to activate and finish/send it).
Or perhaps, on your computer, use this website (https://dictation.io) that uses Google Voice recognition in the Google Chrome browser - that website could probably be automated via the Chrome.ahk library and potentially be combined with HotVoice.

HotVoice is probably the easiest option.

Edit:
Related thread that outlines the Telegram bot approach: https://www.autohotkey.com/boards/viewtopic.php?f=76&t=49362
scriptor2016
Posts: 630
Joined: 21 Dec 2015, 02:34

Re: Voice recognition to AHK

07 Jan 2020, 00:10

i would REALLY love to get this HotVoice working, but I keep getting this error when I run the script:

HotVoice.dll failed to load

Dll may be blocked. Try running the powershell command Get-ChildItem -Path '.'-Recurse Unblock-File in the script folder


So then I right-click the file named "Unblocker.ps1" and select 'Run With Powershell'. It opens up an ms-dos window and hangs for a few seconds.

But then same problem all over again, and I get the same error message.

I'm dying to get this script working, anyone have a solution by chance?
gregster
Posts: 4252
Joined: 30 Sep 2013, 06:48

Re: Voice recognition to AHK

07 Jan 2020, 00:31

I would recommend to look through the Hotvoice thread (I think this problem has come up before), if you haven't already - and especially - to ask your question there, if it persists. Hotvoice's creator evilC is regularly active on the forum and might be able to troubleshoot this with you.

I am also not sure, if it can work on Win7. I used it on Win10. So, you should probably add information about your AHK and WIndows versions.
scriptor2016
Posts: 630
Joined: 21 Dec 2015, 02:34

Re: Voice recognition to AHK

07 Jan 2020, 01:40

yes, you're right - I remember windows 10 being the OS that HotVoice was tested on, not Win7. Maybe that's the issue - I'm running Win7 still.

I'll bring it up in that thread anyways. Thanks :)
Martin
Posts: 26
Joined: 24 Jun 2017, 02:54

Re: Voice recognition to AHK

07 Jan 2020, 10:27

I've done some tests in the past with python and autohotkey.
Python with SpeechRecognition.
Maybe you will look at https://pypi.org/project/SpeechRecognition/

Something like a voice assistant...
Basically, he listens all the time, all the words, then he sends them to goole for recognition, and if he recognizes a hotword (defined by you, in your language, this is important!), he activate and replies that he is ready to receive orders.
Then I can ask him for something. Whatever I have taught him : like reads the news, tell a joke, tell the weather, or whatever.
Realy fast recognition.

If my files weren't too disorganized, I could have helped you dearly!
Martin
Posts: 26
Joined: 24 Jun 2017, 02:54

Re: Voice recognition to AHK

07 Jan 2020, 11:01

I found my files :)
Oh, yes, it was an interesting project!
They are mixed and disorganized, but I remember the project.

You need Python27 and a lot of modules to install at command line.
If you are a beginner, I do not advise you to take the python way.....
scriptor2016
Posts: 630
Joined: 21 Dec 2015, 02:34

Re: Voice recognition to AHK

09 Jan 2020, 00:07

So I just searched through my collection of scripts and sure enough, I have a different Voice Recognition script which I forgot all about - and this one works fantastic on my system. Hopefully it will work for everyone else too.

I don't know who wrote it, so unfortunately I wouldn't know who to credit - so I'll just paste it here and maybe to author will chime in at some point.

Be sure to read the beginning of the code as it has a few instructions on what needs to be installed on your system for this to work.

Then, simply add your voice commands in the middle and then again at the bottom of the script - this is working 100% for me. I believe that phrases with multiple words require underscores in between the words.

Also, this is just for speaking commands, not dictating - so it can be useful for software like graphics applications, sound/audio applications, that use keyboard shortcuts, etc.

I hope this works for you guys:

Code: Select all

#Persistent
#SingleInstance

; For voice recognition to work you need Microsoft SAPI installed in your PC, some versions of Windows don't support voice recognition though.
; You may also need to train voice recognition in Windows so that it will understand your voice.

pspeaker := ComObjCreate("SAPI.SpVoice")

;plistener := ComObjCreate("SAPI.SpSharedRecognizer") 

plistener:= ComObjCreate("SAPI.SpInprocRecognizer") ; For not showing Windows Voice Recognition widget.


paudioinputs := plistener.GetAudioInputs() ; For not showing Windows Voice Recognition widget.

plistener.AudioInput := paudioinputs.Item(0)   ; For not showing Windows Voice Recognition widget.

ObjRelease(paudioinputs) ; Release object from memory, it is not needed anymore.

pcontext := plistener.CreateRecoContext()

pgrammar := pcontext.CreateGrammar()

pgrammar.DictationSetState(0)

prules := pgrammar.Rules()

prulec := prules.Add("wordsRule", 0x1|0x20)

prulec.Clear()

pstate := prulec.InitialState()



; Add here the words to be recognized! Looks like it understands the null pointer.
pstate.AddWordTransition( ComObjParameter(13,0) , "Hello") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Goodbye") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Yes") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "No") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Maybe_So") ; ComObjParemeter(13,0) is value Null for AHK_L



prules.Commit()

pgrammar.CmdSetRuleState( "wordsRule", 1)

prules.Commit()

ComObjConnect(pcontext, "On")

If (pspeaker && plistener && pcontext && pgrammar && prules && prulec && pstate)
   {	
   ;pspeaker.speak("Voice recognition initialisation succeeded. Available voice commands:")
   tooltip, READY
sleep, 1000
tooltip
   ;MsgBox, Available Voice recognition initialisation succeeded. Available Voice Commands:`nApple`nBanana
   
   }
Else 
{
 pspeaker.speak("Starting voice recognition initialisation failed")
 MsgBox, Starting voice recognition initialisation failed 
}
return

OnRecognition(StreamNum,StreamPos,RecogType,Result)
{
	
 
	  
   Global pspeaker    
 
   ;Msgbox Command Recognised.

   ; Grab the text we just spoke and go to that subroutine
   
   pphrase := Result.PhraseInfo()
 
   sText := pphrase.GetText()
  
   ;pspeaker.Speak("You said " sText)
   
   ;MsgBox, Command is %sText%
   
   voice_command = %sText%
   
      
   ; Send voice command to execute a code block.
 
   
   ; check if it is a Label 
   if(IsLabel(voice_command)) 
      gosub, %voice_command% 

   
   ObjRelease(pphrase) ;release object from memory
   ObjRelease(sText)
   
   }






;==================================================================================================
;ADD YOUR VOICE COMMANDS BELOW:

Hello:
msgbox, You Just Said Hello
Return

Goodbye:
msgbox, You Just Said Goodbye
Return

Yes:
msgbox, You Just Said Yes
Return

No:
msgbox, You Just Said No
return

Maybe_So:
msgbox, You Just Said Maybe So
return
;==================================================================================================





flyingcrap
Posts: 3
Joined: 06 Jan 2020, 13:24

Re: Voice recognition to AHK

09 Jan 2020, 22:22

THANK YOU so much scriptor2016!!

I also could not get HotVoice to work at all! Your script totally works, although Microsoft SAPI has terrible voice recognition. I really wish I could incorporate Google's API but I'm ecstatic about the possibilities with this script. Thanks again!
scriptor2016
Posts: 630
Joined: 21 Dec 2015, 02:34

Re: Voice recognition to AHK

09 Jan 2020, 23:54

Glad it works for you - just remember though, it's someone else's code!! Maybe they'll show up soon :)

This script works excellent for me since I like to dictate a lot of commands.

But the problem for me with this is that it is constantly picking up noises and/or background noise and somehow interpreting it as a command.

So for example if I'm wearing a windbreaker and I rub the jacket up and down with my hand (that sounds bad lol), then it picks up that noise and might interpret it as a command (such as the word "Inside") or something like that. I think this is why I stopped using it a while back, it was a little too sensitive.

I'm thinking the only way around this confusion is to add some kind of a number or something after the word - so for example if one of your voice commands is "Rotate" then make it "Rotate_One" - my hope is that the additional "One" will prevent any errors. Even clearing my throat or coughing confuses it and it converts these noises into a command somehow. For example, clearing my throat might make it think I said "In". So I'm going to try adding additional numbers/letters after each voice command and will report back.

Other than that I find the script to be ultra-fast and ultra-responsive so far.

I'm using a webcam microphone which is sitting across my desk (so no headphones, no cables, etc) - and it still recognises just about everything I say.
scriptor2016
Posts: 630
Joined: 21 Dec 2015, 02:34

Re: Voice recognition to AHK

10 Jan 2020, 01:31

Yeah I dunno. I remember now why I stopped using this one.


Let's say you set up a voice command which is "Open_Notepad_Now"

..as soon as it hears you say the word "Open", it will execute the command without waiting to hear the rest.

So basically you can't have the following voice commands together in your code:

"Open_Notepad_Now"
"Open_Chrome_Now"
"Open_Explorer"
..and so on

because all it's going to hear is the word "Open" and then carry out whatever actions are associated to your first instance of the word.

Or if you have the word "Apple" as a command, all you have to say is "App" and it will translate it to "Apple". Or all you have to say is "Ban" and it will accept it as "Banana".. I'm not sure I understand why it's working this way but it's not going to work for dictating commands.

I'm not sure how to get this to listen to multiple words in a single command and also wait for the entire word/phrase to be spoken before carrying out the action...
flyingcrap
Posts: 3
Joined: 06 Jan 2020, 13:24

Re: Voice recognition to AHK

10 Jan 2020, 19:48

I'll also play around with it for a bit but I'm having the exact issues with background noises confusing the script. I dropped my phone on my desk and it picked up "Hello"..... Like seriously? Go home Microsoft Speech, you're drunk.

The two word commands also do not work for me. It'll confuse it with a completely unrelated 1 word.

It seems like it works better with more vocabulary I manually input (tedious), but some very common words/phrases will never be recognized. Microsoft Speech API suckssssss
scriptor2016
Posts: 630
Joined: 21 Dec 2015, 02:34

Re: Voice recognition to AHK

11 Jan 2020, 00:09

Lucky enough, I found another one. This script is a little better and was written by a user named UBERI.

The difference between this one and the other one is that this one allows for phrases.

So if you have a command entitled "Happy Birthday", then you need to say the entire thing for the script to recognize it - much better than the previous one. If you just say "Happy", nothing will happen. You need to say "Happy Birthday".
The same problem applies here though - you can say "Happ Birth" and it will recognize it as Happy Birthday. That I do not understand. Something to do with SAPI for sure.

Anyhow, I use this script for dictating commands to software like Photoshop - so my commands are like "Rotate", "New Layer", "Delete Layer", etc.



So what I do is make the commands "Rotate Apply", or "New Layer Apply", or "Delete Layer Apply" - adding the same word at the end of each command kind of helps that this script doesn't become confused by other noises in the background. One-word commands are the ones which are causing me problems, so a simple command like "In" is surely to get confused when it hears some kind of noise like breathing or coughing - but adding the additional word after it so it becomes "In Apply" means that there is less chance of some kind of sound being confused for those two words. That's why this code is better, because it allows for multiple-word commands unlike the first one which listens only for the first word.

But after testing this out for a while, it's almost as useless as the first one - it still picks up any noise and converts it to a command somehow. Even with additional words at the end. Try adding "One Two Three" at the end of each command and it eliminates the problem even further but not entirely. I can say "Fi Th SS Bll" or some garbage and it'll still pick it up as a command. So I don't know where to go from here. This one's a little better though do I'd go with this one over the other one.

Too bad HotVoice won't work - I can't for the life of me get it going :(

Code: Select all

 

#NoEnv
#Persistent
#SingleInstance
MySpeechListener := new CustomSpeech

 
MySpeechListener.Recognize(["Mineral Spirits Smell Bad", "Detroit After Dark", "Happy Birthday"])                                                        

return
 

Class CustomSpeech extends SpeechRecognizer
{
OnRecognize(Text)
{	
;============================================================================================================================================	
	
If Text = Mineral Spirits Smell Bad
{
Msgbox, You said Mineral Spirits Smell Bad
}
;============================================================================================================================================	
	
If Text = Detroit After Dark
{
Msgbox, You said Detroit After Dark
}
;============================================================================================================================================	
	
If Text = Happy Birthday
{
Msgbox, You said Happy Birthday
}
;============================================================================================================================================	
}	
}

/*
	UBERI's SAPI Speech Wrapper for AHK
	Speech Recognition
	==================
	A class providing access to Microsoft's SAPI. Requires the SAPI SDK.
	Reference
	---------
	### Recognizer := new SpeechRecognizer
	Creates a new speech recognizer instance.
	The instance starts off listening to any phrases.
	### Recognizer.Recognize(Values = True)
	Set the values that can be recognized by the recognizer.
	If `Values` is an array of strings, the array is interpreted as a list of possibile phrases to recognize. Phrases not in the array will not be recognized. This provides a relatively high degree of recognition accuracy 

compared to dictation mode.
		If `Values` is otherwise truthy, dictation mode is enabled, which means that the speech recognizer will attempt to recognize any phrases spoken.
			If `Values` is falsy, the speech recognizer will be disabled and will stop listening if currently doing so.
				Returns the speech recognizer instance.
	### Recognizer.Listen(State = True)
	Set the state of the recognizer.
	If `State` is truthy, then the recognizer will start listening if not already doing so.
		If `State` is falsy, then the recognizer will stop listening if currently doing so.
			Returns the speech recognizer instance.
	### Text := Recognizer.Prompt(Timeout = -1)
	Obtains the next phrase spoken as plain text.
	If `Timeout` is a positive number, the function will stop and return a blank string after this amount of time, if the user has not said anything in this interval.
		If `Timeout` is a negative number, the function will wait indefinitely for the user to speak a phrase.
			Returns the text spoken.
	### Recognizer.OnRecognize(Text)
	A callback invoked immediately upon any phrases being recognized.
	The `Text` parameter received the phrase spoken.
	This function is meant to be overridden in subclasses. By default, it does nothing.
	The return value is discarded.
*/

class SpeechRecognizer
{ ;speech recognition class by Uberi
	static Contexts := {}
	
	__New()
	{
		try
		{
			this.cListener := ComObjCreate("SAPI.SpInprocRecognizer") ;obtain speech recognizer (ISpeechRecognizer object)
			cAudioInputs := this.cListener.GetAudioInputs() ;obtain list of audio inputs (ISpeechObjectTokens object)
			this.cListener.AudioInput := cAudioInputs.Item(0) ;set audio device to first input
		}
		catch e
			throw Exception("Could not create recognizer: " . e.Message)
		
		try this.cContext := this.cListener.CreateRecoContext() ;obtain speech recognition context (ISpeechRecoContext object)
		catch e
			throw Exception("Could not create recognition context: " . e.Message)
		try this.cGrammar := this.cContext.CreateGrammar() ;obtain phrase manager (ISpeechRecoGrammar object)
		catch e
			throw Exception("Could not create recognition grammar: " . e.Message)
		
        ;create rule to use when dictation mode is off
		try
		{
			this.cRules := this.cGrammar.Rules() ;obtain list of grammar rules (ISpeechGrammarRules object)
			this.cRule := this.cRules.Add("WordsRule",0x1 | 0x20) ;add a new grammar rule (SRATopLevel | SRADynamic)
		}
		catch e
			throw Exception("Could not create speech recognition grammar rules: " . e.Message)
		
		this.Phrases([""])
		this.Dictate(True)
		
		SpeechRecognizer.Contexts[&this.cContext] := &this ;store a weak reference to the instance so event callbacks can obtain this instance
		this.Prompting := False ;prompting defaults to inactive
		
		ComObjConnect(this.cContext, "SpeechRecognizer_") ;connect the recognition context events to functions
	}
	
	Recognize(Values = True)
	{
		If Values ;enable speech recognition
		{
			this.Listen(True)
			If IsObject(Values) ;list of phrases to use
				this.Phrases(Values)
			Else ;recognize any phrase
				this.Dictate(True)
		}
		Else ;disable speech recognition
			this.Listen(False)
		Return, this
	}
	
	Listen(State = True)
	{
		try
		{
			If State
				this.cListener.State := 1 ;SRSActive
			Else
				this.cListener.State := 0 ;SRSInactive
		}
		catch e
			throw Exception("Could not set listener state: " . e.Message)
		Return, this
	}
	
	Prompt(Timeout = -1)
	{
		this.Prompting := True
		this.SpokenText := ""
		If Timeout < 0 ;no timeout
		{
			While, this.Prompting
				Sleep, 0
		}
		Else
		{
			StartTime := A_TickCount
			While, this.Prompting && (A_TickCount - StartTime) > Timeout
				Sleep, 0
		}
		Return, this.SpokenText
	}
	
	Phrases(PhraseList)
	{
		try this.cRule.Clear() ;reset rule to initial state
		catch e
			throw Exception("Could not reset rule: " . e.Message)
		
		try cState := this.cRule.InitialState() ;obtain rule initial state (ISpeechGrammarRuleState object)
		catch e
			throw Exception("Could not obtain rule initial state: " . e.Message)
		
        ;add rules to recognize
		cNull := ComObjParameter(13,0) ;null IUnknown pointer
		For Index, Phrase In PhraseList
		{
			try cState.AddWordTransition(cNull, Phrase) ;add a no-op rule state transition triggered by a phrase
			catch e
				throw Exception("Could not add rule """ . Phrase . """: " . e.Message)
		}
		
		try this.cRules.Commit() ;compile all rules in the rule collection
		catch e
			throw Exception("Could not update rule: " . e.Message)
		
		this.Dictate(False) ;disable dictation mode
		Return, this
	}
	
	Dictate(Enable = False)
	{
		try
		{
			If Enable ;enable dictation mode
			{
				this.cGrammar.DictationSetState(0) ;disable dictation mode (SGDSInactive)
				this.cGrammar.CmdSetRuleState("WordsRule", 1) ;enable the rule (SGDSActive)
				/*
					this.cGrammar.DictationSetState(1) ;enable dictation mode (SGDSActive)
					this.cGrammar.CmdSetRuleState("WordsRule", 0) ;disable the rule (SGDSInactive)
				*/
			}
			Else ;disable dictation mode
			{
				this.cGrammar.DictationSetState(0) ;disable dictation mode (SGDSInactive)
				this.cGrammar.CmdSetRuleState("WordsRule", 1) ;enable the rule (SGDSActive)
			}
		}
		catch e
			throw Exception("Could not set grammar dictation state: " . e.Message)
		Return, this
	}
	
	OnRecognize(Text)
	{
        ;placeholder function meant to be overridden in subclasses
	}
	
	__Delete()
	{
        ;remove weak reference to the instance
		this.base.Contexts.Remove(&this.cContext, "")
	}
}

SpeechRecognizer_Recognition(StreamNumber, StreamPosition, RecognitionType, cResult, cContext) ;speech recognition engine produced a recognition
{
	try
	{
		pPhrase := cResult.PhraseInfo() ;obtain detailed information about recognized phrase (ISpeechPhraseInfo object from ISpeechRecoResult object)
		Text := pPhrase.GetText() ;obtain the spoken text
	}
	catch e
		throw Exception("Could not obtain recognition result text: " . e.Message)
	
	Instance := Object(SpeechRecognizer.Contexts[&cContext]) ;obtain reference to the recognizer
	
    ;handle prompting mode
	If Instance.Prompting
	{
		Instance.SpokenText := Text
		Instance.Prompting := False
	}
	
	Instance.OnRecognize(Text) ;invoke callback in recognizer
}
return

scriptor2016
Posts: 630
Joined: 21 Dec 2015, 02:34

Re: Voice recognition to AHK

11 Jan 2020, 03:55

Been using it for a few hours now and although the second script works better, it still has too many problems to be useful. Also of note, it might have more to do with Microsoft SAPI as opposed to problems with the ahk code itself.


It's just picking up too many subtle noises in the background and converting them to commands. Clearing your throat is converted to the command "Rotate Canvas Apply Right Now" apparently. How clearing your throat can be converted to those 5 random words is beyond me.

Really disappointed, I was hoping this would work. I do have Dragon Naturally Speaking, which works flawlessly, but it's a major resource hog and very complicated and is extremely tedious to add new voice commands. Loading it up takes a half-minute alone and the disk space it reserves is insane. And even then, it can still pick up random noise and convert it to random commands that you don't expect.

I guess the search continues.......

Return to “Ask For Help”

Who is online

Users browsing this forum: Jgorski, jlpsiinc88, SilasDeVis, vsub, wineguy and 145 guests