Translate PowerShell to AHK - possible?

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Translate PowerShell to AHK - possible?

22 Jun 2019, 17:22

Is it possible / complicated to translate a PowerShell script to AHK?

I found a PowerShell function that automatic analyze which delimiter a CSV-file have, and my desire is to translate this program to AHK.
(or write an AHK program that does the same) Some instructions do not seem so difficult to translate.

The description from .: Get-CSVDelimiter
"The PowerShell function uses a heuristic approach which proves to be very fast and dependable.
The assumption is that a delimiter must be a character that occurs in each line of the file with the very same frequency.
To speed up this process, the function also uses an exclude list with characters like plain letters or numbers that are generally not used as delimiter. Typically, Get-CSVDelimiter can identify the delimiter after parsing only three to five lines of data.
" (Nice)

One problem I can see is .:
In the PS program is a list of ascii codes that typically cannot be a delimiter: 0-9, A-Z, a-z, and space is defined:
  • $excluded = ([Int][Char]'0'..[Int][Char]'9') + ([Int][Char]'A'..[Int][Char]'Z') + ([Int][Char]'a'..[Int][Char]'z') + 32
I would also like to exclude ÅÄÖ and åäö (but, ascii code does not come directly after Z or z)..
it returns a hash table with ascii code as key and frequency as value

Here is the PowerShell function .: (I think the comments feel easy to understand (But how to solve this in AHK))

Code: Select all

<#
; 
.SYNOPSIS
  Autodetects delimiter used in CSV files and number of rows
.DESCRIPTION
  Uses heuristics to determine the delimiter character used in a CSV file
.PARAMETER Path
  Path name to CSV file
can be submitted as string or as File object
.EXAMPLE
  Get-CSVDelimiter -Path c:\somefile.csv
  Returns delimiter used in file c:\somefile.csv
.EXAMPLE
  Get-ChildItem $home -Filter *.csv -recurse -ErrorAction SilentlyContinue | Get-CSVDelimiter
  Returns delimiter used in any CSV-file found in the user's home folder or one of its subfolders
#>



function Get-CSVDelimiter {
[CmdletBinding()]

    param
    (
        # Path name to CSV file
        # can be submitted as string or as File object
        [Parameter(ValueFromPipelineByPropertyName=$true,ValueFromPipeline=$true)]
        [Alias('FullName')]
        [String]
        $Path
    )

    begin
    {
        # list of ascii codes that typically cannot be a delimiter: 0-9, A-Z, a-z, and space:
        $excluded = ([Int][Char]'0'..[Int][Char]'9') + ([Int][Char]'A'..[Int][Char]'Z') + ([Int][Char]'a'..[Int][Char]'z')  + 32

        # private function that gets all possible delimiters in a given text line
        # it excludes any text that is quoted
        # it excludes any character codes listed in $excluded
        # it returns a hash table with ascii code as key and frequency as value
        function Get-DelimitersFromLine
        {
            param($TextLine)

            $quoted = $false
            $result = @{}

            # examine line character by character
            foreach($char in $line.ToCharArray())
            {
                # if a double-quote is detected, toggle quoting flag:
                if ($char -eq '"')
                { $quoted = -not $quoted }
                elseif ($quoted -eq $false) 
                # else, if not excluded, add to result hash table:
                { if ($excluded -notcontains [Int]$char) { $result.$([Int]$char) ++ } }
            }

            # return result hash table            
            $result
        }
    }

    # this gets repeated for each submitted path or CSV file:
    process
    {
        # initialize variables
        $oldcandidates = $null

        # examine each line in CSV file:
        $file = [System.IO.File]::OpenText($Path)
        While (-not $file.EndOfStream)
        {
            $line = $file.ReadLine()

            # examine current line and get possible delimiters:
            $candidates = Get-DelimitersFromLine $line

            # if this is the first line, skip analysis
            if ($oldcandidates -eq $null) 
            { 
                # if first line starts with "#", ignore
                if (-not $line.StartsWith('#')) 
                { $oldcandidates = $candidates } 
            }
            # else, identify ascii codes that have the same frequency in both this line
            # and the previous line, and store in hash table $new:
            else
            {
                $new = @{}
                $keys = $oldcandidates.Keys
                foreach($key in $keys)
                {
                    if ($candidates.$key -eq $oldcandidates.$key)
                    {
                        $new.$key = $candidates.$key
                    }
                }
                $oldcandidates = $new

                # if only 1 possible delimiter is left, we are done
                # exit loop, no necessity to examine the remaining lines:
                if ($oldcandidates.keys.count -lt 2)
                {
                    break
                }
            }
        }
        $file.Close()

        # create return object
        $rv = New-Object PSObject | Select-Object -Property Name, Path, Delimiter, FriendlyName, ASCII, Rows, Status
        $rv.Path = $Path
        $rv.Name = Split-Path -Path $Path -Leaf

        # no keys found:
        if ($oldcandidates.keys.count -eq 0)
        {
            $rv.Status = 'No delimiter found'
        }
        # exactly one key found, good:
        elseif ($oldcandidates.keys.count -eq 1)
        {
            $ascii = $oldcandidates.keys | ForEach-Object { $_ }
            $rv.ASCII = $ascii
            # convert ascii to real character:
            $rv.Delimiter = [String][Char]$ascii
            # number of rows is frequency of delimiter plus 1:
            $rv.rows = $oldcandidates.$ascii + 1
            # add friendly names for the most common delimiter types:
            switch ($ascii)
            {
                9    { $rv.FriendlyName = 'TAB' }
                44   { $rv.FriendlyName = 'Comma' }
                59   { $rv.FriendlyName = 'Semicolon' }
            }
            $rv.Status = 'Found'
        }
        # ambiguous delimiters detected, list ambiguous delimiters
        else
        {
            # convert delimiter ascii keys in a comma-separated list of quoted characters:
            $delimiters = (($oldcandidates.keys | ForEach-Object { ('"{0}"' -f [String][Char]$_) }) -join ',')
            $rv.Status =  "Ambiguous separator keys: $delimiters"
        }

        # ETS: set default visible properties on return object (applies to PS 3.0 only, no effect in PS 2.0):
        [String[]]$properties = 'Name','Delimiter','FriendlyName', 'Rows'
        [System.Management.Automation.PSMemberInfo[]]$PSStandardMembers = New-Object System.Management.Automation.PSPropertySet DefaultDisplayPropertySet,$properties

        # an Rückgabeobjekt anhängen:  
        $rv | Add-Member -MemberType MemberSet -Name PSStandardMembers -Value $PSStandardMembers
        
        # return result:
        $rv
    }

}
The return is .: "a hash table with ascii code as key and frequency as value"
But I only want to know, if the delimiter in the CSV-file is ";" or "," (or TAB or...)
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 04:32

I think it's easy and possible to translate the PowerShell script to AHK.

But, I don't think it will save you from "manual" reviewing the results.
With every iteration of you moving the goal-posts, you need to reinvest brain-cycles.

If I understand correctly, the script just spits out a table, where each character is "linked" to its frequency.
AHK can do that as well.

=> Yes, it's possible to translate, it's impossible to read future variations of your requirements. Use brain-cycles to compensate.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 05:23

wolf_II wrote:
23 Jun 2019, 04:32
... I don't think it will save you from "manual" reviewing the results....
why? (explain)

My problem is to automatically analyze CSV files.
The most of my CSV-files have the delimiter ";" or "," (but it happens the delimiter is a TAB)
Managing a CSV file in AHK is much easier with "," as delimiter than ";".
From AHK - Ex4 - Parse CSV
Start doing my analysis as follows, and for simple CSV files it works.

Code: Select all

#SingleInstance Force

; Test strings!
StrVar = 302;"Emelie";"174,50"`n303;"Jenny";"19,90"	; ";" as delimiter
; StrVar = 302,"Emelie","174,50"`n303,"Jenny","19,90"	; "," as delimiter

; StrVar = "Emelie";"174,50",542`n"Jenny";"19,90";303	; ";" as delimiter
; StrVar = "Emelie","174,50",542`n"Jenny","19,90";303	; "," as delimiter

; StrVar = "19,90";Emelie;302;"Row 1"`n"18";Erica;201;"Row 2"	; ";" as delimiter
; StrVar = "19,90",Emelie,302,"Row 1"`n"18",Erica,201,"Row 2"	; "," as delimiter
MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % StrVar

; StringLength := StrLen(StrVar)
ChrCount = 0
Loop % StrLen(StrVar)
{	ChrCount += 1
	TestStr := SubStr(StrVar, ChrCount, 1)
	
	If ( TestStr = """" )
	{	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % """ is found! (First position)`nPos .: " ChrCount
		Loop % StrLen(StrVar) - A_Index
		{	If ( A_Index = 1 )
				Continue
			
			ChrCount += 1
			TestStr := SubStr(StrVar, A_Index, 1)
			; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "TestStr (pos " ChrCount ") .: " TestStr
			
			If ( TestStr = """" )
			{	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % """ is found! (Second position)`nPos .: " ChrCount
				Break
			}
		}
	}
	
	If ( TestStr = "`;" )
	{	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % """"";"""" is found! `nPos .: " ChrCount
		Break
	}

	If ( TestStr = "," )
	{	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % """"","""" is found! `nPos .: " ChrCount
		Break
	}
		
	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "TestStr (pos " ChrCount ") .: " TestStr
}

MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "The delimiter must be " TestStr
ExitApp
And I intended to improve it slightly.
(Started a thread on this topic, but I never got an answer/help Analyze field dividers in CSV-file)
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 06:28

You could, if you wanted, introduce a "base-line" requirement: E.g. All my data respects some simple but strict rules:
1) every table is saved in Albireo's approved format:
2) every row in the table is contained in its own line of text, Rows are delimited by `n or `r`n.
3) every field in the row is separated by , or ; or `t.
4) every field which wants to contain a previous used delimiter must "escape" it.
5) every field which wants to use the "escape" character, must escape the "escape" character.

With that set of rules in place, we can easily "auto"-decide what is data and what is delimiter.
Whenever you find Data with commas being used as delimiter AND as part of data ("123,75"), the data is invalid.
This would be the "entry-exam" to the main bulk of your code, where we can do whatever you desire.
Such as auto-detect one of several possible delimiters, and convert accordingly.

Make your data pass the "entry-exam" and all is well. Accept other data and you are stuck.
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 06:52

This is a valid CSV line:

Code: Select all

Line := "1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"
The field separator is semicolon ; and therefore commas , don't need to be "escaped". Which separator will your 'automatic analyzer' choose?
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 07:06

Which separator will your 'automatic analyzer' choose?
Touchée. I would have to make up more rules for this idea to work. :(
@Albireo: please ignore my previous post. There is more "manual" examination of data required.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 07:38

Thanks for your time!
wolf_II wrote:
23 Jun 2019, 06:28
...
1) every table is saved in Albireo's approved format:
The hardest thing to handle is, exported CSV files from Excel or Calc.
Some of these files have come "," as delimiters others have semicolon ";" ... (something I can't change)
Some field have have quotation marks and others not...

wolf_II wrote:
23 Jun 2019, 06:28
2) every row in the table is contained in its own line of text, Rows are delimited by `n or `r`n.
It's no problem

wolf_II wrote:
23 Jun 2019, 06:28
3) every field in the row is separated by , or ; or `t.
It's my desire!

wolf_II wrote:
23 Jun 2019, 06:28
4) every field which wants to contain a previous used delimiter must "escape" it.
That's one problem. Autohotkey already handles CSV files well, with the CSV option
An CSV example with AHK
Perhaps it is best to first automatically convert all delimiters to "," (comma) - but I must know if the field delimiter must be changed.
  1. Which field delimiter?
  2. Change the delimiter from ?? to ","
(The next step is .: How to fastest / easiest can change the delimiter from eg. ";" or "`t" to "," and use Loop Parse, CSV


wolf_II wrote:
23 Jun 2019, 06:28
5) every field which wants to use the "escape" character, must escape the "escape" character.
Some example?. ( I don't understand what you mean)

wolf_II wrote:
23 Jun 2019, 06:28
Whenever you find Data with commas being used as delimiter AND as part of data ("123,75"), the data is invalid.
Must be managed - see point 4 above.


I still believe that the accuracy of delimiter, that the CSV-file has, can be raised very much by doing an analysis of the first 5-10 rows.
  • All records consist only of one row.
  • All rows have the same field divider.
  • All rows have the same number of field dividers.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 07:53

0,001; 0,002; 0,003
0,004; 0,005; 0,006
0,007; 0,008; 0,009
How do you decide? comma or semicolon? How do you decide automatically, when there is 3 sets of data one way, and 4 sets the other way?
There is no automatic fool-proof way, as just me pointed out.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 08:07

example for escaping the escape character: here I choose \ as escape character. To use it for User[3] see example:
user, number, address
James Bond, 007, London
John\c the baptist, 001, Jerusalem
Jack\c the ripper (\\Cartman), 666, Southpark
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 08:12

just me wrote:
23 Jun 2019, 06:52
This is a valid CSV line:

Code: Select all

Line := "1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"
The field separator is semicolon ; and therefore commas , don't need to be "escaped". Which separator will your 'automatic analyzer' choose?
Nice example ;)
  1. I have never encountered this type of information in my CSV-file (but it absolutely possible)
  2. In cases where headlines are included, the headline, not in my case, will look like the above.
But if this happens, and the 5-10 first lines have this field structure - then the program stops and a message "select delimiter" or something opens. (no problem)
The possibility that the .csv file is saved in unwanted format from Calc / Excel is much larger.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 08:29

wolf_II wrote:
23 Jun 2019, 07:53
0,001; 0,002; 0,003
0,004; 0,005; 0,006
0,007; 0,008; 0,009
How do you decide? comma or semicolon? How do you decide automatically, when there is 3 sets of data one way, and 4 sets the other way?
There is no automatic fool-proof way, as just me pointed out.
I reply the same thing to you, as to @just me
It is possible to find examples that this type of automation does not handle, but stop the program and give a message. Then you can try to improve the hit (but if it never happens you have succeeded)
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 08:53

wolf_II wrote:
23 Jun 2019, 08:07
example for escaping the escape character: here I choose \ as escape character. To use it for User[3] see example:
user, number, address
James Bond, 007, London
John\c the baptist, 001, Jerusalem
Jack\c the ripper (\\Cartman), 666, Southpark
Is the escape characters used in Excel / Calc? (and then exported with the csv-file?)
What does this "esc" do in this case? (have not encountered this type of challenge ;) )
just me
Posts: 9425
Joined: 02 Oct 2013, 08:51
Location: Germany

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 09:08

Albireo wrote:
23 Jun 2019, 05:23
...
My problem is to automatically analyze CSV files.
...
Your problem is that you believe to need to automatically analyze CSV files. It sounds as if you have to process thousands of CSV files of unknown and/or changing formats. In your other thread you talk about 10 CSV files. It doesn't make sense for me to write a 'one size fits all' analyzer for such a little amount of files, although the format of most files seems to be well known.

Do you want to process known files or do you want to write an 'Universal CSV Manager'?
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 09:50

My wish remains to be able to analyze the delimiter in CSV files.
I think the structure/strategy of the PowerShell program would work well for me.

It is right that today there are about 10 CSV files that are affected. These come from four different sources (some "," and some ";").
But if the handling of CSV files becomes easier, the need will increase. (mainly from eg spreadsheets).

I want to keep down the number of AHK-script to be maintained.
One way is that the same program / function can be used for CSV files with different delimiters.

I have not intended to create a "Universal CSV Manager".
(There is one that I think is quite powerful.: CSV-Buddy)
just me wrote:
23 Jun 2019, 09:08
It doesn't make sense for me to write a 'one size fits all' analyzer for such a little amount of files
Nothing I can help, my desire remains - Thank you anyway!

Maybe someone else can help me translate from PowerShell to AHK?
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 11:03

Here is a thought:
Do you have access to the original data? If so, and you can not force your program to use a consistent data-format, use some way of auto-identifying when saving.
Such as: Comma_003-015-999.csv for csv-files that use comma.
And Semicolon_12.65.12457.csv for csv-files that use semicolon.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 16:45

wolf_II wrote:
23 Jun 2019, 11:03
...Do you have access to the original data? If so, and you can not force your program to use a consistent data-format, use some way of auto-identifying when saving...
yes (or no)
When data comes from one program, it is always delimited with comma, when data comes from another program it is always delimited with semicolon (on one computer). On the other computer, the same data does not have the same delimitation.
Now I must have one program for one computer and another program to the next computer - If a change in an AHK-program occurs, two AHK programs must be changed, then copy the correct program to different computers.)
When data comes from a spreadsheet, it is more uncertain.

When I can decide, I always try to enclose fields with quotation marks. Because, if this CSV-file is opened by Calc or Excel,
and a field is not enclosed with quotation marks, it's often 001234 is translated to numeric 1234. I feel that "001234" is processed more securely.
70-80% of the fields I could describe the delimiter as ";" or "," in the following way .: "001234";"ABCDE" or "001234","ABCDE". (but not 100%)

I could use my simple example above to check the field divider. It works on maybe 98-100% of the files (maybe 100% for 6 months and then 98-99%).
an analysis of characters would improve my CSV- function, to maybe always 100%.
Trying to count / handle different characters on a line.
Will check (in my case).
  • How often is it true that the character that occurs most often on one line / 5 lines / 10 lines is the delimiter?
  • Is it possible to count ";", "," and "` t "and the character that has the higest number is the field divider?
  • When the delimiter is "found" - Is it the same number of this character on all rows?
  • Will the search be faster if all characters that cannot be a field delimiter disappear?
  • Is it difficult (in my case) to ignore all fields encircled with "" (quote characters) on the row?
  • Is there other ways to check which delimiter?
(and in your examples - give a message if the AHK-program not can give the delimiter)
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 17:25

Albireo wrote:
23 Jun 2019, 16:45
When data comes from one program, it is always delimited with comma, when data comes from another program it is always delimited with semicolon (on one computer). On the other computer, the same data does not have the same delimitation.
Now I must have one program for one computer and another program to the next computer - If a change in an AHK-program occurs, two AHK programs must be changed, then copy the correct program to different computers.)
When data comes from a spreadsheet, it is more uncertain.

I could use my simple example above to check the field divider. It works on maybe 98-100% of the files (maybe 100% for 6 months and then 98-99%).
an analysis of characters would improve my CSV- function, to maybe always 100%.
Trying to count / handle different characters on a line.
It seems like your program, no matter the language used, must determine the delimiter used by the CSV. That looks like priority number one. So maybe the question you should be asking or trying to solve is, "How do you programmatically determine the delimiter used by any CSV file?"

You mentioned CSV Buddy (written in AutoHotkey). If it's author has found a way, then you might want to study the method used. And if not found, do a search on if anybody has found a way, regardless of language. Whoever has found a solution, study on the method(s) they used, then see if you can implement it in AutoHotkey.
swagfag
Posts: 6222
Joined: 11 Jan 2017, 17:59

Re: Translate PowerShell to AHK - possible?

23 Jun 2019, 18:29

Code: Select all

simple = 302;"Emelie";"111,174,50"
misidentified := "0,001; 0,002; 0,003"
ambiguous := "1,5;2,5;3"

MsgBox % mostLikelyDelimiter(simple) ; ';'
MsgBox % mostLikelyDelimiter(misidentified) ; ','
MsgBox % mostLikelyDelimiter(ambiguous) ; throws

mostLikelyDelimiter(line) {
	Delimiters := {}

	for each, char in StrSplit(line)
	{
		if (char ~= "(*UCP)[[:alnum:]]")
			continue

		if !insideQuotes
		{
			if (char == """")
			{
				insideQuotes := true
				continue
			}

			if Delimiters.HasKey(char)
				++Delimiters[char]
			else
				Delimiters[char] := 1
		}
		else
		{
			if (char == """")
				insideQuotes := false
		}
	}

	if !Delimiters.Count()
		throw "no delimiters found"

	last := highest := 0
	for delimiter, frequency in Delimiters
	{
		if (frequency >= highest)
		{
			last := highest
			highest := frequency
			rv := delimiter
		}
	}

	if (highest = last)
		throw "ambiguous delimiters found"

	return rv
}
e: forgot examples
Last edited by swagfag on 24 Jun 2019, 10:18, edited 1 time in total.
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 01:21

I seem to have failed to communicate my most recent idea, I start again:

Let's start with introducing the term "rumour" and attach it to whatever you get from your CSV-files.
Let's, for the moment, reserve the term "data" for the useful real data.

So you have calc/Excel programs or whatever that can produce rumours about Data. Those rumours are useless/difficult to process.
Do you have access to the real/useful/original data? that was used by whatever program to produced all those rumours?
If you do, dump all the rumours and work with data.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 09:58

wolf_II wrote:
24 Jun 2019, 01:21
... that can produce rumours about Data. Those rumours are useless/difficult to process.
Excuse me, but I still do not understand how to implement your ideas in my wish.
wolf_II wrote:
24 Jun 2019, 01:21
Do you have access to the real/useful/original data?
I think so (what do you mean by "original data")? Some data comes from different database tables in some way, other data comes from eg. Calc / Excel. So in another way it's not original data...
wolf_II wrote:
24 Jun 2019, 01:21
...If you do, dump all the rumours and work with data...
Sounds simple, but how is this thought implemented, in my case?

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: No registered users and 157 guests