Page 1 of 1

Speech to text//Dictation with Google Api

Posted: 21 May 2018, 05:32
by jekko1976

in your opinion, is it possible for AHK to hook the Google speech to text API and make a dictation script?

I cannot use the SAPI.Spvoice Object because i am italian and i use Windows 7 (Italian language is not in the available languages for STT but only for TTS), so i thought that i could use the api of Google, but i don't really know where to start :cry:

Re: Speech to text//Dictation with Google Api

Posted: 21 May 2018, 08:25
by A_AhkUser
Hi jekko1976,
jekko1976 wrote:in your opinion, is it possible for AHK to hook the Google speech to text API and make a dictation script?
It depends what you mean by possible. If its for your personal use, I guess it is... deep in hack country > Dictation-interface. The repo include a showcase script providing a basic interface which puts in the clipboard recognized speech, if any. Just tested with chrome Version 66.0.3359.181 (64 bits) and ahk v1.1.28.00 (Windows 8.1) in russian, spanish, french and... italian:

...and it seems to still work. You can optionally hide the chrome instance which is autmatically closed when the script exits. The Dictation class is able to call a user-defined callback for the following events: onInterimResult and onResult. I made it before GeekDude poped out of its magician's hat its outstanding chrome.ahk library so, as of now, it can be possible to execute the javascript contained in Dictation.injection.js without using any extension.

Hope this helps

Re: Speech to text//Dictation with Google Api

Posted: 22 May 2018, 00:02
by gregster
As an alternative - if you are willing to dictate into your smartphone/tablet/android device - you could a create a personal bot in Telegram messenger and use the google speech API that is integrated with it. (I assume, iPhone's speech recognition could be used in the same way with a bot.)

The bot, running on your computer via AHK and using the Telegram API, could automatically process everything sent to him (by you or also by others, if you allow it) and store it in a text/word/email/whatever file (everything you can do with AHK, basically).
For interfacing the bot this way, you will need to use just a tiny part of the Telegram bot API ( I can help you with that - I think I posted a basic example how to connect to a bot and read out all the messages it gets, some time ago. And with dictation, there is not even (much) parsing involved (only from Google, obviously), which should make it easy. Let me know, if you are interested - shouldn't take too long to set up :shifty: .

You could even dictate messages while on the road and your bot offline. You could still extract these messages at your computer via the bot at a later time (you will have 24 hours). After that, you could still copy the text from the Telegram Desktop app and paste it somewhere by hand (or means of AHK). So, the messages are not lost.

You could probably do something similar with your smartphone in combination with the 'WhatsApp Desktop' app (same Google speech recognition) - but it will be less flexible and reliable, because they don't have a bot API. Biggest problem probably, you simply cannot send a WhatsApp message to yourself (if I am not mistaken - perhaps if you could create a one person group :think: or by just sending it to your secreaty instead). But with a personal Telegram bot - no problem; it is much cooler and no secretary needed :)

But A_Ahkuser's discovery looks very interesting, too.

Re: Speech to text//Dictation with Google Api

Posted: 22 May 2018, 03:57
by jekko1976
Dear all,

first of all, thank you all for the hints, I dind't expected that this post could become such a treasure chest for me.

I know very well GeekDude and his immense capabilities at programming. I would like to ask him some hints with programming the chrome.ahk library in order to drive dictation features with AHK.

About gregster, i am a big fan of Telegram! It would be a big improvement for me to drive it via AHK! Could you send me some examples of how to do it?

Thank you very much for all!

Re: Speech to text//Dictation with Google Api

Posted: 22 May 2018, 23:30
by gregster
Alright, I will try to put something up here, tonight.

Re: Speech to text//Dictation with Google Api

Posted: 23 May 2018, 05:05
by jekko1976
gregster wrote:Alright, I will try to put something up here, tonight.
no no wait, wait, wait....
The last thing i want is to waste your time.

i have already managed to connect ahk to telegram with this:

It works fine, BUT now i am able to send messages
From AHK---->to telegram

What i wanna do is to read messages in telegram and store them in AHK variables. This is more complicated.

Thank you for the interest in my issue

Re: Speech to text//Dictation with Google Api

Posted: 24 May 2018, 02:19
by gregster
Ok, then I can save the introductory stuff how to set up a bot and get a bot ID, and chat ID ;)

Now, here is a stripped down version of this script: ... am#p192355

Just add your bot ID, chat ID and include Coco's JSON library for easier parsing of the bot's responses. Add name and path of a textfile to save the messages (if you don't, there are some msgboxes, too, which can be used to check):

Code: Select all

#include json.ahk								; Coco's JSON library, get it here:
botToken  :=  "xxxxxxxxx:yyyyyyyyyyyyyyyyyyy"			; add your Telegram bot token
chatID := 000000001										; add your chat ID 
textfile := ""											; add file (path and) name to save the Telegram messages
oCustomers := {}												; create Object for user ids who are allowed to send messages to your  
	oCustomers[chatID] := "My Name"							; add your chat id (and name if you want) to the customer object for testing purposes
offset := ""														; Telegram message offset

; Check for new updates 
SetTimer, UpdateTimer, 15000							; set to 1000 ms = 1 second or similar, if you want (first comment the msgboxes out and a textfile path)
Esc::ExitApp													; hit Escape to stop the script
UpdateTimer:											; checks constantly for user input in your bot
stack := {}												; message stack			

updates := GetUpdates(botToken, (offset+1))					; get (new) updates from your bot as JSON string; keep track of old messages  
	msgbox % "JSON response:`n" updates					; remove, if you use a textfile
try oUpdates := JSON.Load(updates)							; create an AHK object from the JSON string
If oUpdates.ok											; check if json answer was "ok" : true
	loop % oUpdates.result.MaxIndex()		; determine number of new messages (updates) 
			stack.Push(oUpdates.result[A_index])		; add all updates (=messages) to stack
	For key, msg in stack
		from_id := first_name := mtext := last_name := username := ""
		from_id :=									; which ID sent the message?
		mtext := msg.message.text										; what was the message text?
				;first_name := msg.message.from.first_name
				;last_name := msg.message.from.last_name
				;username ;=  msg.message.from.username
		msgbox % "userId:  " from_id "`n" mtext			; remove when you add a textfile to collect the messages
		offset := msg.update_id										; keep track of processed messages -> gets updated on Telegram server only with next call of GetUpdates(...)
		if (textfile != "")													; checks, if there is a textfile to save to
				if oCustomers.Haskey(from_id)					; check for known users...   optional
						FileAppend, %mtext% `n`n, %textfile%			; appends message to textfile and adds two linefeeds

;------------------------------------------  Telegram functions  --------------------------------------------------------------------------------------------------------
GetUpdates(token, offset="", updlimit=100, timeout=0)     
	If (updlimit>100)
		updlimit := 100
	; Offset = Identifier of the first update to be returned.
	url := "" token "/getupdates?offset=" offset "&limit=" updlimit "&timeout=" timeout
	updjson := URLDownloadToVar(url)					
	return updjson
;----------------------------------- additional functions ------------------------------------------------------------------------------------------------------------------
URLDownloadToVar(url,ByRef variable=""){						; function originally by Maestrith, I think
	try												; keep script from breaking if API is down or not reacting
		return variable
The msgboxes are mainly for testing/debugging, or in case you haven't added a path to a textfile yet, that collects all the text messages sent to the bot. They can be removed, if it works (and then the update frequency of the timer can be increased, if you like).

Since anybody can enter a private chat with your bot and send messages, I added a check so that only the messages of known chat IDs are added to the text file. That's why you will have to add your own chat ID (the msgboxes will show it for checking); of course, you can add your co-workers's/wife's/second phone's/whoever's chat ID as well, or remove this check completely.

If this script is not running on your computer while you send a message from your phone, it will still get all messages from the last 24 hours, when you start it next time. Older messages will be discarded by the API.
But there is also a webapp and a desktop app that can be used on Windows to copy the messages by hand later, if necessary (unfortunately, Google Speech is not accessible from these Windows apps).
Of course, you can change how the messages are processed by AHK, for example, if you want to save them separately. Let me know, if something is unclear.

Btw, Google speech can be started with the key just left to the space key on the Telegram keyboard; you might have to hold it for a short time until a small popup appears. Choose the microphone icon there and it will be remembered as the default action (at least, for some time). At least, it was like this on the Android phones I have seen...

Re: Speech to text//Dictation with Google Api

Posted: 24 May 2018, 03:22
by jekko1976
gregster wrote:Ok, then I can save the introductory stuff how to set up a bot and get a bot ID, and chat ID ;)
Dear gresgster,
i setup all like you described and it works like a charm and all this is uber-cool!! :bravo: :bravo: :bravo: :dance: :dance:
Now i can interface telegram with ahk with full features!
Thank you very much for your assistance

Re: Speech to text//Dictation with Google Api

Posted: 24 May 2018, 04:02
by gregster
I am glad that I could help :) . I am planning to post a few more Telegram-related scripts as soon as I have finished my (object-oriented) Telegram API wrapper (but recently I didn't have time to work on this). There are a lot of other things that can be done - custom buttoms and keyboards, automated responses, up- and download of files, images etc.

Don't hesitate to ask if you want to expand your Telegram bot script!