Voice recognition to AHK

06 Jan 2020, 14:36

Hiya,

I'm looking for a dependable way to send commands via voice/speech recognition which is then recognized by AHK. I want to use this in the office in real time to help dictate/write out a block of text.

An example would involve me saying "consent" and AHK would trigger the command ::consent::.

I'm open to using my Android phone as I prefer Google's voice recognition over MS. I have almost no background in coding except playing around with AHK. Thanks.

06 Jan 2020, 15:00

Welcome to the forums!

For pre-defined words and word combos, like you seem to have in mind, you could use HotVoice by evilC https://www.autohotkey.com/boards/viewtopic.php?f=6&t=34288, to trigger certain actions. Potential advantage: it could constantly listen for you to activate it via voice, afaik.

For dictation (not sure, if you would want to do that via the internet) and/or triggering actions, you could probably create a Telegram bot (using the Telegram Bot API and AHK) that saves your messages that you send to it. The messages you can create with the Google voice recognition (it can be used in Telegram, but you'll need to press buttons to activate and finish/send it).
Or perhaps, on your computer, use this website (https://dictation.io) that uses Google Voice recognition in the Google Chrome browser - that website could probably be automated via the Chrome.ahk library and potentially be combined with HotVoice.

HotVoice is probably the easiest option.

Edit:
Related thread that outlines the Telegram bot approach: https://www.autohotkey.com/boards/viewtopic.php?f=76&t=49362

scriptor2016 · 07 Jan 2020, 00:10

i would REALLY love to get this HotVoice working, but I keep getting this error when I run the script:

HotVoice.dll failed to load

Dll may be blocked. Try running the powershell command Get-ChildItem -Path '.'-Recurse Unblock-File in the script folder

So then I right-click the file named "Unblocker.ps1" and select 'Run With Powershell'. It opens up an ms-dos window and hangs for a few seconds.

But then same problem all over again, and I get the same error message.

I'm dying to get this script working, anyone have a solution by chance?

07 Jan 2020, 00:31

I would recommend to look through the Hotvoice thread (I think this problem has come up before), if you haven't already - and especially - to ask your question there, if it persists. Hotvoice's creator evilC is regularly active on the forum and might be able to troubleshoot this with you.

I am also not sure, if it can work on Win7. I used it on Win10. So, you should probably add information about your AHK and WIndows versions.

scriptor2016 · 07 Jan 2020, 01:40

yes, you're right - I remember windows 10 being the OS that HotVoice was tested on, not Win7. Maybe that's the issue - I'm running Win7 still.

I'll bring it up in that thread anyways. Thanks

Martin · 07 Jan 2020, 10:27

I've done some tests in the past with python and autohotkey.
Python with SpeechRecognition.
Maybe you will look at https://pypi.org/project/SpeechRecognition/

Something like a voice assistant...
Basically, he listens all the time, all the words, then he sends them to goole for recognition, and if he recognizes a hotword (defined by you, in your language, this is important!), he activate and replies that he is ready to receive orders.
Then I can ask him for something. Whatever I have taught him : like reads the news, tell a joke, tell the weather, or whatever.
Realy fast recognition.

If my files weren't too disorganized, I could have helped you dearly!

Martin · 07 Jan 2020, 11:01

I found my files

Oh, yes, it was an interesting project!
They are mixed and disorganized, but I remember the project.

You need Python27 and a lot of modules to install at command line.
If you are a beginner, I do not advise you to take the python way.....

scriptor2016 · 09 Jan 2020, 00:07

So I just searched through my collection of scripts and sure enough, I have a different Voice Recognition script which I forgot all about - and this one works fantastic on my system. Hopefully it will work for everyone else too.

I don't know who wrote it, so unfortunately I wouldn't know who to credit - so I'll just paste it here and maybe to author will chime in at some point.

Be sure to read the beginning of the code as it has a few instructions on what needs to be installed on your system for this to work.

Then, simply add your voice commands in the middle and then again at the bottom of the script - this is working 100% for me. I believe that phrases with multiple words require underscores in between the words.

Also, this is just for speaking commands, not dictating - so it can be useful for software like graphics applications, sound/audio applications, that use keyboard shortcuts, etc.

I hope this works for you guys:

Code: Select all

#Persistent
#SingleInstance

; For voice recognition to work you need Microsoft SAPI installed in your PC, some versions of Windows don't support voice recognition though.
; You may also need to train voice recognition in Windows so that it will understand your voice.

pspeaker := ComObjCreate("SAPI.SpVoice")

;plistener := ComObjCreate("SAPI.SpSharedRecognizer") 

plistener:= ComObjCreate("SAPI.SpInprocRecognizer") ; For not showing Windows Voice Recognition widget.


paudioinputs := plistener.GetAudioInputs() ; For not showing Windows Voice Recognition widget.

plistener.AudioInput := paudioinputs.Item(0)   ; For not showing Windows Voice Recognition widget.

ObjRelease(paudioinputs) ; Release object from memory, it is not needed anymore.

pcontext := plistener.CreateRecoContext()

pgrammar := pcontext.CreateGrammar()

pgrammar.DictationSetState(0)

prules := pgrammar.Rules()

prulec := prules.Add("wordsRule", 0x1|0x20)

prulec.Clear()

pstate := prulec.InitialState()



; Add here the words to be recognized! Looks like it understands the null pointer.
pstate.AddWordTransition( ComObjParameter(13,0) , "Hello") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Goodbye") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Yes") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "No") ; ComObjParemeter(13,0) is value Null for AHK_L
pstate.AddWordTransition( ComObjParameter(13,0) , "Maybe_So") ; ComObjParemeter(13,0) is value Null for AHK_L



prules.Commit()

pgrammar.CmdSetRuleState( "wordsRule", 1)

prules.Commit()

ComObjConnect(pcontext, "On")

If (pspeaker && plistener && pcontext && pgrammar && prules && prulec && pstate)
   {	
   ;pspeaker.speak("Voice recognition initialisation succeeded. Available voice commands:")
   tooltip, READY
sleep, 1000
tooltip
   ;MsgBox, Available Voice recognition initialisation succeeded. Available Voice Commands:`nApple`nBanana
   
   }
Else 
{
 pspeaker.speak("Starting voice recognition initialisation failed")
 MsgBox, Starting voice recognition initialisation failed 
}
return

OnRecognition(StreamNum,StreamPos,RecogType,Result)
{
	
 
	  
   Global pspeaker    
 
   ;Msgbox Command Recognised.

   ; Grab the text we just spoke and go to that subroutine
   
   pphrase := Result.PhraseInfo()
 
   sText := pphrase.GetText()
  
   ;pspeaker.Speak("You said " sText)
   
   ;MsgBox, Command is %sText%
   
   voice_command = %sText%
   
      
   ; Send voice command to execute a code block.
 
   
   ; check if it is a Label 
   if(IsLabel(voice_command)) 
      gosub, %voice_command% 

   
   ObjRelease(pphrase) ;release object from memory
   ObjRelease(sText)
   
   }






;==================================================================================================
;ADD YOUR VOICE COMMANDS BELOW:

Hello:
msgbox, You Just Said Hello
Return

Goodbye:
msgbox, You Just Said Goodbye
Return

Yes:
msgbox, You Just Said Yes
Return

No:
msgbox, You Just Said No
return

Maybe_So:
msgbox, You Just Said Maybe So
return
;==================================================================================================

09 Jan 2020, 22:22

THANK YOU so much scriptor2016!!

I also could not get HotVoice to work at all! Your script totally works, although Microsoft SAPI has terrible voice recognition. I really wish I could incorporate Google's API but I'm ecstatic about the possibilities with this script. Thanks again!

scriptor2016 · 09 Jan 2020, 23:54

Glad it works for you - just remember though, it's someone else's code!! Maybe they'll show up soon

This script works excellent for me since I like to dictate a lot of commands.

But the problem for me with this is that it is constantly picking up noises and/or background noise and somehow interpreting it as a command.

So for example if I'm wearing a windbreaker and I rub the jacket up and down with my hand (that sounds bad lol), then it picks up that noise and might interpret it as a command (such as the word "Inside") or something like that. I think this is why I stopped using it a while back, it was a little too sensitive.

I'm thinking the only way around this confusion is to add some kind of a number or something after the word - so for example if one of your voice commands is "Rotate" then make it "Rotate_One" - my hope is that the additional "One" will prevent any errors. Even clearing my throat or coughing confuses it and it converts these noises into a command somehow. For example, clearing my throat might make it think I said "In". So I'm going to try adding additional numbers/letters after each voice command and will report back.

Other than that I find the script to be ultra-fast and ultra-responsive so far.

I'm using a webcam microphone which is sitting across my desk (so no headphones, no cables, etc) - and it still recognises just about everything I say.

scriptor2016 · 10 Jan 2020, 01:31

Yeah I dunno. I remember now why I stopped using this one.

Let's say you set up a voice command which is "Open_Notepad_Now"

..as soon as it hears you say the word "Open", it will execute the command without waiting to hear the rest.

So basically you can't have the following voice commands together in your code:

"Open_Notepad_Now"
"Open_Chrome_Now"
"Open_Explorer"
..and so on

because all it's going to hear is the word "Open" and then carry out whatever actions are associated to your first instance of the word.

Or if you have the word "Apple" as a command, all you have to say is "App" and it will translate it to "Apple". Or all you have to say is "Ban" and it will accept it as "Banana".. I'm not sure I understand why it's working this way but it's not going to work for dictating commands.

I'm not sure how to get this to listen to multiple words in a single command and also wait for the entire word/phrase to be spoken before carrying out the action...

10 Jan 2020, 19:48

I'll also play around with it for a bit but I'm having the exact issues with background noises confusing the script. I dropped my phone on my desk and it picked up "Hello"..... Like seriously? Go home Microsoft Speech, you're drunk.

The two word commands also do not work for me. It'll confuse it with a completely unrelated 1 word.

It seems like it works better with more vocabulary I manually input (tedious), but some very common words/phrases will never be recognized. Microsoft Speech API suckssssss

scriptor2016 · 11 Jan 2020, 00:09

Lucky enough, I found another one. This script is a little better and was written by a user named UBERI.

The difference between this one and the other one is that this one allows for phrases.

So if you have a command entitled "Happy Birthday", then you need to say the entire thing for the script to recognize it - much better than the previous one. If you just say "Happy", nothing will happen. You need to say "Happy Birthday".
The same problem applies here though - you can say "Happ Birth" and it will recognize it as Happy Birthday. That I do not understand. Something to do with SAPI for sure.

Anyhow, I use this script for dictating commands to software like Photoshop - so my commands are like "Rotate", "New Layer", "Delete Layer", etc.

So what I do is make the commands "Rotate Apply", or "New Layer Apply", or "Delete Layer Apply" - adding the same word at the end of each command kind of helps that this script doesn't become confused by other noises in the background. One-word commands are the ones which are causing me problems, so a simple command like "In" is surely to get confused when it hears some kind of noise like breathing or coughing - but adding the additional word after it so it becomes "In Apply" means that there is less chance of some kind of sound being confused for those two words. That's why this code is better, because it allows for multiple-word commands unlike the first one which listens only for the first word.

But after testing this out for a while, it's almost as useless as the first one - it still picks up any noise and converts it to a command somehow. Even with additional words at the end. Try adding "One Two Three" at the end of each command and it eliminates the problem even further but not entirely. I can say "Fi Th SS Bll" or some garbage and it'll still pick it up as a command. So I don't know where to go from here. This one's a little better though do I'd go with this one over the other one.

Too bad HotVoice won't work - I can't for the life of me get it going

Code: Select all

 

#NoEnv
#Persistent
#SingleInstance
MySpeechListener := new CustomSpeech

 
MySpeechListener.Recognize(["Mineral Spirits Smell Bad", "Detroit After Dark", "Happy Birthday"])                                                        

return
 

Class CustomSpeech extends SpeechRecognizer
{
OnRecognize(Text)
{	
;============================================================================================================================================	
	
If Text = Mineral Spirits Smell Bad
{
Msgbox, You said Mineral Spirits Smell Bad
}
;============================================================================================================================================	
	
If Text = Detroit After Dark
{
Msgbox, You said Detroit After Dark
}
;============================================================================================================================================	
	
If Text = Happy Birthday
{
Msgbox, You said Happy Birthday
}
;============================================================================================================================================	
}	
}

/*
	UBERI's SAPI Speech Wrapper for AHK
	Speech Recognition
	==================
	A class providing access to Microsoft's SAPI. Requires the SAPI SDK.
	Reference
	---------
	### Recognizer := new SpeechRecognizer
	Creates a new speech recognizer instance.
	The instance starts off listening to any phrases.
	### Recognizer.Recognize(Values = True)
	Set the values that can be recognized by the recognizer.
	If `Values` is an array of strings, the array is interpreted as a list of possibile phrases to recognize. Phrases not in the array will not be recognized. This provides a relatively high degree of recognition accuracy 

compared to dictation mode.
		If `Values` is otherwise truthy, dictation mode is enabled, which means that the speech recognizer will attempt to recognize any phrases spoken.
			If `Values` is falsy, the speech recognizer will be disabled and will stop listening if currently doing so.
				Returns the speech recognizer instance.
	### Recognizer.Listen(State = True)
	Set the state of the recognizer.
	If `State` is truthy, then the recognizer will start listening if not already doing so.
		If `State` is falsy, then the recognizer will stop listening if currently doing so.
			Returns the speech recognizer instance.
	### Text := Recognizer.Prompt(Timeout = -1)
	Obtains the next phrase spoken as plain text.
	If `Timeout` is a positive number, the function will stop and return a blank string after this amount of time, if the user has not said anything in this interval.
		If `Timeout` is a negative number, the function will wait indefinitely for the user to speak a phrase.
			Returns the text spoken.
	### Recognizer.OnRecognize(Text)
	A callback invoked immediately upon any phrases being recognized.
	The `Text` parameter received the phrase spoken.
	This function is meant to be overridden in subclasses. By default, it does nothing.
	The return value is discarded.
*/

class SpeechRecognizer
{ ;speech recognition class by Uberi
	static Contexts := {}
	
	__New()
	{
		try
		{
			this.cListener := ComObjCreate("SAPI.SpInprocRecognizer") ;obtain speech recognizer (ISpeechRecognizer object)
			cAudioInputs := this.cListener.GetAudioInputs() ;obtain list of audio inputs (ISpeechObjectTokens object)
			this.cListener.AudioInput := cAudioInputs.Item(0) ;set audio device to first input
		}
		catch e
			throw Exception("Could not create recognizer: " . e.Message)
		
		try this.cContext := this.cListener.CreateRecoContext() ;obtain speech recognition context (ISpeechRecoContext object)
		catch e
			throw Exception("Could not create recognition context: " . e.Message)
		try this.cGrammar := this.cContext.CreateGrammar() ;obtain phrase manager (ISpeechRecoGrammar object)
		catch e
			throw Exception("Could not create recognition grammar: " . e.Message)
		
        ;create rule to use when dictation mode is off
		try
		{
			this.cRules := this.cGrammar.Rules() ;obtain list of grammar rules (ISpeechGrammarRules object)
			this.cRule := this.cRules.Add("WordsRule",0x1 | 0x20) ;add a new grammar rule (SRATopLevel | SRADynamic)
		}
		catch e
			throw Exception("Could not create speech recognition grammar rules: " . e.Message)
		
		this.Phrases([""])
		this.Dictate(True)
		
		SpeechRecognizer.Contexts[&this.cContext] := &this ;store a weak reference to the instance so event callbacks can obtain this instance
		this.Prompting := False ;prompting defaults to inactive
		
		ComObjConnect(this.cContext, "SpeechRecognizer_") ;connect the recognition context events to functions
	}
	
	Recognize(Values = True)
	{
		If Values ;enable speech recognition
		{
			this.Listen(True)
			If IsObject(Values) ;list of phrases to use
				this.Phrases(Values)
			Else ;recognize any phrase
				this.Dictate(True)
		}
		Else ;disable speech recognition
			this.Listen(False)
		Return, this
	}
	
	Listen(State = True)
	{
		try
		{
			If State
				this.cListener.State := 1 ;SRSActive
			Else
				this.cListener.State := 0 ;SRSInactive
		}
		catch e
			throw Exception("Could not set listener state: " . e.Message)
		Return, this
	}
	
	Prompt(Timeout = -1)
	{
		this.Prompting := True
		this.SpokenText := ""
		If Timeout < 0 ;no timeout
		{
			While, this.Prompting
				Sleep, 0
		}
		Else
		{
			StartTime := A_TickCount
			While, this.Prompting && (A_TickCount - StartTime) > Timeout
				Sleep, 0
		}
		Return, this.SpokenText
	}
	
	Phrases(PhraseList)
	{
		try this.cRule.Clear() ;reset rule to initial state
		catch e
			throw Exception("Could not reset rule: " . e.Message)
		
		try cState := this.cRule.InitialState() ;obtain rule initial state (ISpeechGrammarRuleState object)
		catch e
			throw Exception("Could not obtain rule initial state: " . e.Message)
		
        ;add rules to recognize
		cNull := ComObjParameter(13,0) ;null IUnknown pointer
		For Index, Phrase In PhraseList
		{
			try cState.AddWordTransition(cNull, Phrase) ;add a no-op rule state transition triggered by a phrase
			catch e
				throw Exception("Could not add rule """ . Phrase . """: " . e.Message)
		}
		
		try this.cRules.Commit() ;compile all rules in the rule collection
		catch e
			throw Exception("Could not update rule: " . e.Message)
		
		this.Dictate(False) ;disable dictation mode
		Return, this
	}
	
	Dictate(Enable = False)
	{
		try
		{
			If Enable ;enable dictation mode
			{
				this.cGrammar.DictationSetState(0) ;disable dictation mode (SGDSInactive)
				this.cGrammar.CmdSetRuleState("WordsRule", 1) ;enable the rule (SGDSActive)
				/*
					this.cGrammar.DictationSetState(1) ;enable dictation mode (SGDSActive)
					this.cGrammar.CmdSetRuleState("WordsRule", 0) ;disable the rule (SGDSInactive)
				*/
			}
			Else ;disable dictation mode
			{
				this.cGrammar.DictationSetState(0) ;disable dictation mode (SGDSInactive)
				this.cGrammar.CmdSetRuleState("WordsRule", 1) ;enable the rule (SGDSActive)
			}
		}
		catch e
			throw Exception("Could not set grammar dictation state: " . e.Message)
		Return, this
	}
	
	OnRecognize(Text)
	{
        ;placeholder function meant to be overridden in subclasses
	}
	
	__Delete()
	{
        ;remove weak reference to the instance
		this.base.Contexts.Remove(&this.cContext, "")
	}
}

SpeechRecognizer_Recognition(StreamNumber, StreamPosition, RecognitionType, cResult, cContext) ;speech recognition engine produced a recognition
{
	try
	{
		pPhrase := cResult.PhraseInfo() ;obtain detailed information about recognized phrase (ISpeechPhraseInfo object from ISpeechRecoResult object)
		Text := pPhrase.GetText() ;obtain the spoken text
	}
	catch e
		throw Exception("Could not obtain recognition result text: " . e.Message)
	
	Instance := Object(SpeechRecognizer.Contexts[&cContext]) ;obtain reference to the recognizer
	
    ;handle prompting mode
	If Instance.Prompting
	{
		Instance.SpokenText := Text
		Instance.Prompting := False
	}
	
	Instance.OnRecognize(Text) ;invoke callback in recognizer
}
return

scriptor2016 · 11 Jan 2020, 03:55

Been using it for a few hours now and although the second script works better, it still has too many problems to be useful. Also of note, it might have more to do with Microsoft SAPI as opposed to problems with the ahk code itself.

It's just picking up too many subtle noises in the background and converting them to commands. Clearing your throat is converted to the command "Rotate Canvas Apply Right Now" apparently. How clearing your throat can be converted to those 5 random words is beyond me.

Really disappointed, I was hoping this would work. I do have Dragon Naturally Speaking, which works flawlessly, but it's a major resource hog and very complicated and is extremely tedious to add new voice commands. Loading it up takes a half-minute alone and the disk space it reserves is insane. And even then, it can still pick up random noise and convert it to random commands that you don't expect.

I guess the search continues.......

24 Jun 2020, 09:39

How can I store the input command into a global variable that I can use, been trying for days now hope someone can help. TIA

Sticky · 13 Mar 2021, 18:47

flyingcrap wrote: ↑
06 Jan 2020, 14:36
I'm looking for a dependable way to send commands via voice/speech recognition which is then recognized by AHK.

Did anybody reply to this already? If so, I haven't seen the reply, although I would like to.

Anyway, I do most of my work using voice recognition, Nuance Dragon, on Microsoft Windows. Over the past week or two I have converted nearly all of the voice commands that I wrote using KnowBrainer or Dragon's own Basic scripting languages to AutoHotKey.

The basic idea is that I have a Dragon/KnowBrainer beach command that invokes AutoHotKey, as follows:

SPEECH-COMMAND-NAME: foobar

Code: Select all

Sub Main
	u = Environ("UserProfile")
	ahk_cmd_path_nospace = u&"\Stuff\bin\AutoHotKey.exe"
	ahk_script= """"&u&"\Stuff\Windows Stuff\Dragon Stuff\ahk\AHK-script-for-Dragon.ahk"&""""
	ShellExecute  ahk_cmd_path_nospace &" " &ahk_script &" foobar args for ahk script", 7 ' default window size not active
End Sub

that example should work assuming you've got the appropriate scripts at the appropriate places in your file system - and assuming that the edits that I made when I removed crap you're not interested in did not break it.

(BTW, I have broken out the speech command name separately because of Dragon/KnowBrainer conventions. It doesn't really appear in the basic code, although it appears in the KnowBrainer and Dragon command browsers, and in the XML files they save it to.

But scripts that look like that would be painful, because you have to have one such each command for every combination of arguments that you send to AutoHotKey.

So instead, I go further: I have a generic command PUFF <dictation>, where <dictation> is anything - at least anything up until the point where Dragon decides that you have stopped speaking and it should invoke the speech command. "Puff", because it amuses me to think that I am talking to "Puff the magic Dragon". (I also have aliases for this and other commands using other Dragon names like "Smaug".)

SPEECH-COMMAND-NAME: PUFF <dictation>

Code: Select all

Sub Main
	u = Environ("UserProfile")
	ahk_cmd_path_nospace = u&"\Stuff\bin\AutoHotKey.exe"
	ahk_script= """"&u&"\Stuff\Windows Stuff\Dragon Stuff\ahk\AHK-script-for-Dragon.ahk"&""""
	ShellExecute  ahk_cmd_path_nospace &" " &ahk_script & ListVar1, 7 ' default window size not active
End Sub

where ListVar1 corresponds to the first argument to the speech command, <dictation>.

Once I have gotten into my AHK-script-for-Dragon.ahk, I parse A_Argv in the usual way - the only slightly unusual thing is that usually one uses AutoHotKey for hotkeys, whereas here I am using AutoHotKey just like a normal commandline scripting language. But I am taking advantage of AutoHotKey's non-hotkey facilities, like Send/SendRaw/SendPlay, convenient functions to manipulate apps and windows, etc. plus, I am more and more sharing code between this Dragon-speech==>command-line==>AHK-script and conventional AHK hotkeys and GUIs. I like being able to do the same thing by voice and by mouse and keyboard, whichever is most convenient.

At the moment I am parsing the commandline arguments passed to the AutoHotKey script by concatenating them and using regular expressions. This is a little bit embarrassing, but it was quick and easy. I expect that I will need to optimize it at some time in the future, but I haven't needed to yet. Here is an example of one of my regular expressions[*]:

Code: Select all

"Oi)^PUFF (?P<cmd>keyboard (status|state)|(?<press>press( and)? )?release( all)? keys?)$"

allowing me to say PUFF keyboard status orPUFF keyboard state to pop up a message box showing which of the modifier keys like Ctl/Alt/Shift/Win/ScrollLck/CapsLock/NumLock, etc. are currently considered to be pressed
and
PUFF release all keys or PUFF press and release all keys to get them to be released.

while this regular expression handles three different commands, keyboard status, release all keys, and press and release all keys, you may also note that I have several different aliases for the same command like "state" and "status". I find it easier to have a few such aliases than to force myself to remember exactly the command syntax. Also, I use these regular expressions to handle common misrecognitions - e.g. (red|read) when I want the color. Dragon does a fairly good job of recognizing dictation or entire document, but when you say a word in isolation it may guess wrong and provide a homophone.

---

Note *: This seemed an appropriate example, Since in my ordinary AutoHotKey scripts I sometimes end up with some of the modifier keys like CapsLock or NumLock or LWin logically stuck, making it difficult to type or use AHK bindings to fix. When this happened, before I started using speech recognition, I fell into the habit of pressing the physical keys in quick succession to try to clear such problems up, but that is a problem when your keyboard does not actually have a physical key for ScrollLock or NumLock. now I can use this speech command to un-press all keys when the keyboard does not work. I also have a mouse command to do the same thing, but that requires dedicating a mouse key ignoring all possible modifiers.)

---

Like I said, I do this on Windows using the Dragon speech recognition software.

Note that I am not using Windows Speech Recognition. I won't compare the quality of WSR and Dragon, but IMHO the most important thing is that in Dragon Professional (Individual or Group) you can write your own speech commands, whereas in WSR and the cheaper home versions of Dragon you cannot. I would not be able to do the above if I could not write a Dragon speech command that invoked AutoHotKey.

However, it is worth noting that you don't need to write a speech command to do what you ask. Dragon, like some other speech recognition systems, has simple "speech macros" rather like AutoHotKey hot strings - say "comment" and get ":comment:" entered as if from a keyboard. but macros are just one to one expansions - you need to use a real command language to handle parameters and have control flow. But you might be able to use such a macro to generate a hotkey that you may already have an existing AHK binding for.

I know that Dragon has such simple macros. Wikipedia says that Windows Speech Recognition has such macros, although I have never used that facility https en.wikipedia.org /wiki/Windows_Speech_Recognition#Macros. Broken Link for safety Given that WSR is free on Windows, if the macro extension is also free, that might be convenient for you. But overall, if you want to do anything serious with speech recognition, at the moment the best way seems to be to use Dragon on Windows.

---

Again, like I said, AFAIK if you want to do anything serious with speech recognition at the moment the best way seems to be to use Dragon on Windows.

Probably one of the professional versions of Dragon, to get speech commands.

Windows, because Dragon does not really run well on any other platform. Dragon may still be available on Mac OS, but I believe that software may have been EOL'ed after not having been maintained for many years. For that matter, Dragon was at one point available on Linux I believe, but certainly not anymore.

Some people run Dragon under WINE emulation of windows on top of Linux, but many people on the web report it crashing all the time.

more people report success running Dragon in a Windows guest on top of Parallels virtualization on Mac OS. but that has the usual issues with running Russian software.

You don't need to live with the atrocious basic scripting languages used by Dragon and KnowBrainer. I did when I started using Dragon a bit more than a year ago, but more on more I am currently using Dragon's basic scripting language as a dispatcher to scripts that I write in AutoHotKey. also, there is a facility NATLINK that allows, e.g., Python could be used to implement the speech commands, leveraging dragons speech recognition. Look for DragonFly or Vocola amongst others. I have not used these yet, but keep meaning to. In my dreams, eventually Linux-based speech recognition gets good enough, and when it does it would be nice to have portable Python code rather than nonportable AutoHotKey code. However, on Windows AutoHotKey is damned convenient.

Although the BKM seems to be to run Dragon on Windows, you can certainly network via SSH or the like from Windows to UNIX/Linux systems. Myself, I often do this from a shell mode window in emacs on Windows, SSH'ing Linux systems to run commands and makefiles, while editing the files across shared file systems on my Windows PC.

The downside of that approach is that stuff built into Dragon that supposedly improves speech control, like parsing menus and so on, may not work when accessing a Linux app. But then again, that sort of speech integration mainly only works with Microsoft apps. And not even all Microsoft apps. E.g. on Windows I spend most of my time using Microsoft OneNote, open source Thunderbird email, and my web browser (Chrome or Edge or Firefox). Of those, only Chrome and Edge make it easy for Dragon to parse their menus. Nevertheless, I am fairly happy using speech commands and dictating into OneNote and Thunderbird and emacs and ... and I suppose this chrome web browser text box.

bpassan2010 · 11 Apr 2021, 03:15

You might consider VoiceAttack which is available on steam as a gaming utility. It stacks on top of the free offline WSR windows speech recognition. It allows any macro or program (thus autohotkey script) to be launched or controlled by voice command in windows. The trick to high accuracy, and is to train wsr 3 times (yep like the holy handgranade in mony python) for each microphone source you routinely use. If you have any The advantage is wsr for dictation mode is also more accurate. If you need a private or offline voice recognition method, you can this is best current option.

superpeter · 11 Apr 2021, 14:36

@scriptor2016 Oh, man, thank you so much!! (And thanks to the original creator!).

I've been looking forever for this kind of voice functionality. Dragon NaturallySpeaking is too heavy, and takes too long to load individually all the voice commands.

Voice recognition to AHK

Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Re: Voice recognition to AHK

Who is online