Translate PowerShell to AHK - possible?

Get help with using AutoHotkey (v1.1 and older) and its commands and hotkeys
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 11:07

how is this thought implemented, in my case?
You can go back to the original data and try to find a way to produce consistent output from it, rejecting the "rumors" which do not pass the "entry-exam" to Albireo's approved format.

For example: suppose you have data in *.xls format. You would write a some Excel-code to write the data to a text file that respects any rule you make.
I've chosen the *.xls format, since I have software that can read such files, in case you want me to give an example for a file you have already, or made-up.

From there, you can choose to make even more strict rules:
added) every row must be identifiable: all delimiters must be delimiters. If some data needs to use a delimiter (one of three possibilities , ; or `t), those must be escaped with \c for comma, \s for semicolon, \t for tabs.
to be able to write a backslash, use \b. Choose your own encoding if you like.

With those rules, you only have to check for the presence of comma, semicolon or tab, and if it is found that is your delimiter.
Extra checks could be performed to assert equal number of the delimiter in every row.
Extra checks could be performed to assert the absence of all the other delimiters, if found, => reject invalid data.
The rules are yours, make some up for yourself. I invented some, and fine-tuned them. If there are still ambiguous cases, make the rules stricter still.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 13:05

wolf_II wrote:
24 Jun 2019, 11:07
how is this thought implemented, in my case?
You can go back to the original data and try to find a way to produce consistent output from it, rejecting the "rumors" which do not pass the "entry-exam" to Albireo's approved format.

For example: suppose you have data in *.xls format. You would write a some Excel-code to write the data to a text file that respects any rule you make.
I've chosen the *.xls format, since I have software that can read such files, in case you want me to give an example for a file you have already, or made-up.

From there, you can choose to make even more strict rules:
added) every row must be identifiable: all delimiters must be delimiters. If some data needs to use a delimiter (one of three possibilities , ; or `t), those must be escaped with \c for comma, \s for semicolon, \t for tabs.
to be able to write a backslash, use \b. Choose your own encoding if you like.

With those rules, you only have to check for the presence of comma, semicolon or tab, and if it is found that is your delimiter.
Extra checks could be performed to assert equal number of the delimiter in every row.
Extra checks could be performed to assert the absence of all the other delimiters, if found, => reject invalid data.
The rules are yours, make some up for yourself. I invented some, and fine-tuned them. If there are still ambiguous cases, make the rules stricter still.
OK! (Thanks - now I understand) - This is a safe way, but...

1) I don't know how to read xls / ods files quickly. I was thinking / searching and it seems possible to read cell for cell. But I think it takes quite some time to read about 35000 x 72 cells ... (do you know how it works?)

2) Other CSV-files is automatic exported from database tables, and I have problem (I don't know how to read cells through ODBC with AHK) - but it has been the best...
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 13:08

Thanks!
Test of Function .: mostLikelyDelimiter(line)
I have tested the function from @swagfag with some CSV files. It seems to work without problems. The function gave the right delimiter all times.
Then I have tried to construct "special cases", to see what happens.
  1. char ~= "(*UCP)[[:alnum:]]"
    If I understand correctly, this only excludes (a-z, A-Z, 0-9). Even the characters åäöÅÄÖ and Space can be excluded (if possible). (I do not understand how to solve this)
    .
    .
  2. If StrVar = "Test" (no delimiter) I got an Error!
    ---------------------------
    Error: no delimiters found

    Line#
    072: Else
    073: {
    074: if (char == """")
    075: insideQuotes := false
    076: }
    077: }
    079: if !Delimiters.Count()
    ---> 080: Throw,"no delimiters found"
    082: last := highest := 0
    083: For delimiter,frequency in Delimiters
    084: {
    085: if (frequency > highest)
    086: {
    086: last := highest
    087: highest := frequency

    The current thread will exit.

    ---------------------------
    Maybe, solved best by giving a message "no delimiter found" or only one field or... is shown?
    .
    .
  3. I tested with these special CSV strings, to see what happens
    StrVar := "1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"
    Delimiter .: , (I don't know if the delimiter is correct)
    1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9

    Only one field of all fields (is not expected)

    But this work (as expected) .:
    StrVar = "1,5";"2,5";"3,5";"4,5";"5,5";"6,5";"7,5";"8,5";9
    ---------------------------

    StrVar = 0,001; 0,002; 0,003; 0,004; 0,005; 0,006; 0,007; 0,008; 0,009
    Delimiter .: , (I don't know if the delimiter is correct)
    0,001; 0,002; 0,003; 0,004; 0,005; 0,006; 0,007; 0,008; 0,009

    Only one field of all fields (is not expected)

    This also doesn't work .:
    StrVar = "0,001"; "0,002"; "0,003"; "0,004"; "0,005"; "0,006"; "0,007"; "0,008"; "0,009"
    The "Space" is selected as delimiter. eg. 0,001"; as one field
    ---------------------------
    I have no idea how this could be solved. (Space couldn't be a delimiter, see Point 1 above)
    Either a message where the user can choose the field divider or just a message "cannot be resolved"
    .
    .
  4. Test .: StrVar = 302;"Emelie" 125 "Erica";"174,50";asd";"asd Give the following result .:
    • Field1 302
    • Field2 Emelie" 125 "Erica
    • Field3 174,50
    • Field4 asd (correct value is asd")
    • Field5 asd (correct value is "asd)
    Don't know if this problem can be handled?
    .
    .
  5. Delimiter .: TAB
    I have done a simple test with these two strings, with TAB as a delimiter .:
    StrVar = 302`t"Erica"`t"174,50"`t"asd"
    StrVar = 302 "Erica" "174,50" "asd"

    And it worked without problems
As I see it, there are only some cosmetic errors in this function.
If these can be solved, I think the function will do what I want - analyze the field delimiter in a CSV file :-)

The other problem that now arises, is that, in most "normal" tests, I get the right delimiter - fine.
But, to handle this correct - maybe I must change all delimiters from ";" or "`t" or .. to ","
How to do it fastest and in the best way? (it's about hundreds of thousands in some CSV files ...)
(Maybe best to open a new thread? - for this question...)
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 13:24

1) I don't know how to read xls / ods files quickly. I was thinking / searching and it seems possible to read cell for cell. But I think it takes quite some time to read about 35000 x 72 cells ... (do you know how it works?)
What is your threshold? how quickly do you need 35.000 x 72 cells? converted to Albireo's format and then converted to AHK-object?
I could do some tests with some mock up data if you want. I suppose 2.5 million fields processed at 1 million per minute would be 2 and a half minutes?
I have not done any test yet, but I want to know your threshold for acceptable speed.
Odlanir
Posts: 659
Joined: 20 Oct 2016, 08:20

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 13:37

I've played a bit with the possible solutions and I want to add my contribute tho this thread.
Some comments in the script. You can give it a try. Tested with various CSV seems to do the work.
Cheers.

Code: Select all

maskKO := Object()
ini := 0x02B0 			; Start of unicode Spacing modifier letters ( hopefully not present in the file )
validDelim := ";,`t "   ; <---- String of valid delimiter accepted ( can be modified by the user )

for i,v in StrSplit(validDelim)
	maskKO.Push(chr(format("0x{:X}",ini+i))) ; build an array of substitute chars based on validDelim

FileSelectFile,fname,3,A_ScriptDir\, Open a file, Text Documents (*.csv) ; choose a file to analyze
Loop, Read, %fname% ; ten records should be enough to analize the file structure
{   
   if A_Index > 10
      break
   totRows := A_Index
   ; call the function to avoid valid delimiters in quoted string
   StrVar .= MaskValidDelimitersInQuote(A_LoopReadLine) "`n" 
}   
StrVar := substr(RegExReplace(StrVar, """"),1,-1)   ; no need qouted string in analysis

Obj := Object()
   loop, parse, StrVar
   {
      if ( RegExMatch(A_LoopField, "`n|`r") or RegExMatch(A_LoopField, "[^" validDelim "]")) ; grab only valid delimiters
         continue
      if (!Obj.haskey(Asc(A_LoopField))) {
         obj[Asc(A_LoopField)] := 1
      } else {
         obj[Asc(A_LoopField)] += 1
         MaxDLM := MaxDLM > obj[Asc(A_LoopField)] ? MaxDLM : obj[Asc(A_LoopField)]  ; Max delimiter presence in file
      }   
   }

;------------------------------------------------------------------------------------------------***
; save the delimiter only if is present in all record in equal or multiple count
;------------------------------------------------------------------------------------------------***
Candidate := Object(), cnt:= 0
for k, v in Obj {   
   charx := ((k < 32 or k > 127) and k <> 9 ) ? "N/A" : chr(k)
   if (Mod(v,totRows) = 0 and charx <> "N/A") 
      Candidate[++cnt] := chr(k) 
}

if (cnt = 0) {
   MsgBox % "No one of the selected delimiters are present in file"
   ExitApp
}
   
;------------------------------------------------------------------------------------------------***
; a simple gui to show the result
;------------------------------------------------------------------------------------------------***
gui, font, s8
gui, add, text, ,% "Delimiter"
gui, font, s12 bold
gui, add, text, w70 ym-5 cred border vTx1 center
gui, font, s8 normal
gui, add, text, w400 ym vTx2
Loop, % MaxDLM/TotRows ; create an header based on max fields 
   Header .= Header ? "|" a_index : a_index
gui, add, ListView, x10 vLV w800 h200, %Header%
gui,Show,,AutoParser
idx := 1
gosub, ShowResult
return

;------------------------------------------------------------------------------------------------***
; two hotkeys for navigate between the possible solution if ambiguos delimiters are found
;------------------------------------------------------------------------------------------------***
left::
   dmy := idx+1 > Candidate.MaxIndex() ? idx :=1 : ++idx
   gosub Showresult
return
right::
   dmy := idx-1 = 0 ? idx := Candidate.MaxIndex() : --idx
   gosub Showresult
return

;------------------------------------------------------------------------------------------------***
; populate the listview with the result/s restoring the qouted contents if changed before
;------------------------------------------------------------------------------------------------***
ShowResult:
   thisChar := Candidate[idx], cnd := Candidate.MaxIndex() 
   GuiControl,, Tx1, %  Asc(ThisChar) = 9 ? "TAB" : Asc(ThisChar) = 32 ? "SPACE" : ThisChar
   GuiControl,, Tx2, %  Cnd = 1 ? "The delimiter has been Found!" : "Found " cnd " possible delimiters, use left/right arrow to evaluate."
   LV_Delete()
   Loop, Parse, StrVar, `n,`r
   {
      Arr := StrSplit(A_loopfield, thisChar)
      ;~ Restore original data inside quotations mark
      for k, field in Arr
         for i, v in StrSplit(validDelim) 
            Arr[k] := RegExReplace(Arr[k], maskKO[i], v)
      LV_Add("",Arr*)
   }
   LV_ModifyCol()
return

;------------------------------------------------------------------------------------------------***
; all the valid delimiters are masked in quoted string so the remaining delimiters should be 
; taked in account
;------------------------------------------------------------------------------------------------***
MaskValidDelimitersInQuote(line) {
	global validDelim, maskKO
	;~ result := Object()
	for each, char in StrSplit(line) {
		if !inQuotes 		{
			if (char = """") {
				inQuotes := true
			}
		} else {
			if (char ~= "[" validDelim "]") {
				;~ result.modif := true
				for k, v in StrSplit(validDelim) {
					char := RegExReplace(char, v, maskKO[k])
				}
			}				
			if (char = """")
				inQuotes := false
		}
		;~ result.line .= char
		result .= char
	}
	return result
}	

GuiClose:
GuiEscape:
esc::
    ExitApp
EDIT No more object returned but forgotten to modify the Function call..
Last edited by Odlanir on 24 Jun 2019, 16:56, edited 1 time in total.
____________________________________________________________________________
Windows 10 Pro 64 bit - Autohotkey v1.1.30.01 64-bit Unicode
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 14:12

wolf_II wrote:
24 Jun 2019, 13:24
...What is your threshold? how quickly do you need 35.000 x 72 cells? converted to Albireo's format and then converted to AHK-object?
I could do some tests with some mock up data if you want. I suppose 2.5 million fields processed at 1 million per minute would be 2 and a half minutes?
I have not done any test yet, but I want to know your threshold for acceptable speed.
(The example is one of the larger files ;) )
Does the solution works with both xls (Excel) and ods (OpenOffice / LibreOffice Calc)? (most of the files is ods)
Is it an external program that scans the spreadsheets? or Is the solution solved with AHK?
Does it cost anything? ( I can do test on my real files )
The time to read the cells probably works. (I have not finished all structures, because I put more focus on CSV-files that are exported daily from the databases - they are more fixed in the format)
wolf_II
Posts: 2688
Joined: 08 Feb 2015, 20:55

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 15:09

(The example is one of the larger files ;) )
Does the solution works with both xls (Excel) and ods (OpenOffice / LibreOffice Calc)? (most of the files is ods)
Is it an external program that scans the spreadsheets? or Is the solution solved with AHK?
Does it cost anything? ( I can do test on my real files )
The time to read the cells probably works. (I have not finished all structures, because I put more focus on CSV-files that are exported daily from the databases - they are more fixed in the format)
I have not written a test yet, but If you were interested, the test will only work for *.xls files.
I would start with ComObjCreate("Excel.Application") in a AHK-script. requirement for this to work is an installed Excel version.
I can only test with old version called "OFFICE11". I have no need or money to upgrade, I own this version and AHK can talk to it.

I have no idea if other free versions would work also, I will only test with what I got.

I tell you what: I will write a test and come back with the result, then you can decide whether you like or not.
I don't know *.ods file format, maybe Excel knows? or maybe calc.exe knows how to write to *.xls-format?
I want to know myself how fast/slow I can make 2.5 million data-fields with Excel, and how fast I can process them, All from within a simple AHK-script.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 16:31

Odlanir wrote:
24 Jun 2019, 13:37
I've played a bit with the possible solutions and I want to add my contribute tho this thread.
Some comments in the script. You can give it a try...
I have try to run your script, with a CSV-file like this (with 65 rows)

Code: Select all

"100";"Test ";"200";"abcdef";"0";"0";"0";"9";"100";"0";;"0";"20170401";"0"
"101";"Test2";"200";"999999";"0";"0";"0";"1";"0";"0";;"0";;"0"
I don't know what happen everywhere.
Before return in the function MaskValidDelimitersInQuote(line) the content of the variable result, is the same as original line.
But, after the function call StrVar .= MaskValidDelimitersInQuote(A_LoopReadLine).line "`n" the variable StrVar is empty. (should it be so?)MaskValidDelimitersInQuote(A_LoopReadLine).line Can you explain .line?
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 16:49

wolf_II wrote:
24 Jun 2019, 15:09
...I would start with ComObjCreate("Excel.Application") in a AHK-script. requirement for this to work is an installed Excel version.
I can only test with old version called "OFFICE11"...
OK!

wolf_II wrote:
24 Jun 2019, 15:09
I tell you what: I will write a test and come back with the result
Great!
Do you need to open Excel with the data file before reading the cells? or can you read the cells directly from the xls file?

wolf_II wrote:
24 Jun 2019, 15:09
I don't know *.ods file format, maybe Excel knows? or maybe calc.exe knows how to write to *.xls-format?
No, ods-files have anoter structure than xls-files - I know!

wolf_II wrote:
24 Jun 2019, 15:09
I want to know myself how fast/slow I can make 2.5 million data-fields with Excel, and how fast I can process them, All from within a simple AHK-script.
Nice! but I'm not sure that Excel can handle 35000 rows...
Odlanir
Posts: 659
Joined: 20 Oct 2016, 08:20

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 16:52

My bad. Should be :

Code: Select all

MaskValidDelimitersInQuote(A_LoopReadLine) "`n"
without the .line.
And the var result should be different only if inside the quotation marks are present valid delimiters.
____________________________________________________________________________
Windows 10 Pro 64 bit - Autohotkey v1.1.30.01 64-bit Unicode
SOTE
Posts: 1426
Joined: 15 Jun 2015, 06:21

Re: Translate PowerShell to AHK - possible?

24 Jun 2019, 20:22

Looking a various applications and ways to make CSVs, the "text delimiter" is usually " or `. So you could usually ignore anything between this. Though the ` might be a bit trickier to setup a rule for.

The "separator delimiter", is usually "comma", "tab", "semicolon", or "space". The killer will always be the other option. That is, people can put in | as the separator delimiter or other odd characters. The only thing that it seems you could do, is recognize the special separator delimiter if it's outside of the text delimiters of " or `.

So, I don't know if you can ever have a program that is 100% accurate at detecting the separator delimiter, because of odd ball stuff people might select in the "other" option. But you could cover all of the standard use situations. That is text delimiter of " or `, and separator delimiter of "comma", "tab", "semicolon", or "space". Of the separator delimiters used outside of standard, that is likely |

The most common issue will be that a custom or non-standard CSV does not use a text delimiter of " or `, but they will often use a separator delimiter of "comma", "tab", "semicolon", "space", or |.

Your program could be coded in such a way that in non-standard cases or when the separator delimiter couldn't be determined, it would throw up an error. Thus you would need to visually inspect such exceptions. So the main issue would be how accurate at determining the separator delimiter for CSVs must your program be? We are starting to push a bit into the realm of machine learning or AI, if you need some absurdly high accuracy, and it's not been stated why such accuracy is absolutely necessary.

If your program can determine 9 out of 10 cases or any standard CSV, and spits back just 1 out of 10 or any non-standard CSV for you to manually review, it still has saved you a tremendous amount of time and effort.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

25 Jun 2019, 03:18

Thanks for your information!
SOTE wrote:
24 Jun 2019, 20:22
... the "text delimiter" is usually " or `. So you could usually ignore anything between this...
In the CSV files I intend to handle, there is only " as text delimiter.

SOTE wrote:
24 Jun 2019, 20:22
...The "separator delimiter", is usually "comma", "tab", "semicolon", or "space"...
In the files I come across, it is essentially ; and , which are the field delimiters, but it have happened that there are CSV-files with TAB as delimiter. (so my case is enough to analyze these three delimiters)

SOTE wrote:
24 Jun 2019, 20:22
...So, I don't know if you can ever have a program that is 100% accurate at detecting the separator delimiter...
...you could cover all of the standard use situations...
I agree, in my case I think it is possible that 99.9% (maybe 100%) could analyze the right delimiters.
To always manually select the right delimiter, I think for a user, will more often be wrong than that. (Today I must perform these steps)

SOTE wrote:
24 Jun 2019, 20:22
...Your program could be coded in such a way that in non-standard cases or when the separator delimiter couldn't be determined, it would throw up an error. Thus you would need to visually inspect such exceptions...
It had been very good, but the problems can be difficult to analyze - in some cases.
But as I said, if it is possible - great!

SOTE wrote:
24 Jun 2019, 20:22
...If your program can determine 9 out of 10 cases or any standard CSV...
Of the files that are normally handled, I think any of the analysis functions that are in the thread, can analyze my CSV files to 100%. (with possibly minor changes and improvements)
Maybe sometime every year, there is some "unknown file" but they are never added directly to the automatic flow.
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

25 Jun 2019, 18:04

Thank you for all the ideas and opinions. (Now it works as desired)
interesting if new thoughts appear on the subject
Odlanir wrote:
24 Jun 2019, 16:52
And the var result should be different only if inside the quotation marks are present valid delimiters.
What do you mean by that? (I have no problem with "19,90",Emelie,302,"Row 1")
  • First, I redesigned the script from @Odlanir to an AHK function.
  • Then I tried to write messages when there are several delimiters or no delimiters. (These controls seem to work well)
  • Finally, I tested some different CSV combinations with successful results. (either the right delimiter or a message explaining the problem) Great!
Here is the modified AHK function, for analysis of CSV delimiters

Code: Select all

#NoEnv
#SingleInstance Force
SetWorkingDir %A_ScriptDir%  ; Ensures a consistent starting directory.

/*
FileSelectFile FileName, 3, A_ScriptDir\, Open a file, Text Documents (*.csv; *.txt)	; choose a file to analyze
FileRead FileContent, %FileName%
If ErrorLevel
{	MsgBox 16, Row %A_LineNumber% -> %A_ScriptName%, % "The file could not be read (maybe locked)`nErrorLevel .: " ErrorLevel
	ExitApp
}
; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, %FileContent%
*/

/*
FileContent =
( LTrim
	"100";"Test ";"200";"abcdef";"0";"0";"0";"9";"100";"0";;"0";"20170401";"0"
	"101";"Test2";"200";"999999";"0";"0";"0";"1";"0";"0";;"0";;"0"
)
*/

; FileContent =	; No delimiter	 - Give an Error message (no delimiter)

; FileContent := "1,5;2,5;3,5;4,5;5,5;6,5;7,5;8,5;9"	; "," and ";" as delimiter - Give an Error message (two delimiters)
; FileContent := "1,5	2,5	3,5	4,5	5,5	6,5	7,5	8,5	9"	; Tab and "," as delimiter - Give an Error message (two delimiters)

/*
; "," and ";" and " " as delimiter - Give an Error message (three delimiters)
FileContent = 
( LTrim
	0,001; 0,002; 0,003
	0,004; 0,005; 0,006
	0,007; 0,008; 0,009
)
*/

; FileContent = 302;"Emelie";"174,50"`n303;"Jenny";"19,90"	; ";" as delimiter
; FileContent = 302,"Emelie","174,50"`n303,"Jenny","19,90"	; "," as delimiter

; FileContent = "Emelie";"174,50";542`n"Jenny";"19,90";303	; ";" as delimiter
; FileContent = "Emelie","174,50",542`n"Jenny","19,90",303	; "," as delimiter

; FileContent = "19,90";Emelie;302;"Row 1"`n"18";Erica;201;"Row 2"	; ";" as delimiter

; FileContent = 302`t"Erica"`t"174,50"`t"asd"	; TAB as delimiter
; FileContent = 302	"Erica"	"174,50"	"asd"	; TAB as delimiter
; FileContent = 302 "Erica" "174,50" "asd"	; Space as delimiter

; FileContent = "19,90",Emelie,302,"Row 1"`n"18",Erica,201,"Row 2"	; "," as delimiter
FileContent = "19,90",Emelie,302,"Row 1"

MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % "Delimiter " CheckDelim(FileContent)

ExitApp

CheckDelim(InputString)
{	; Version 26 juni 2019
	; Idea from Odlanir
	; https://www.autohotkey.com/boards/viewtopic.php?f=76&t=65648&p=282172#p282172
	maskKO := Object()
	ini := 0x02B0				; Start of unicode Spacing modifier letters ( hopefully not present in the file )
	; validDelim := ";,`t "	; <---- String of valid delimiter accepted ( can be modified by the user )
	validDelim := ";,`t"	; <---- String of valid delimiter accepted ( can be modified by the user )
	for i, v in StrSplit(validDelim)
		maskKO.Push(chr(format("0x{:X}",ini+i)))	; build an array of substitute chars based on validDelim
	
	StrLine := StrSplit(InputString, "`n", "`r")
	If ( StrLine.Count() < 10 )
		CheckNo := StrLine.Count()
	else
		CheckNo = 10
		
	; MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%, % StrLine[1] "`n`n" CheckNo
	; ExitApp
	
	; Loop Read, %fname% ; ten records should be enough to analize the file structure
	Loop % CheckNo ; ten records should be enough to analize the file structure
	{	; if A_Index > 10
		;	Break
		totRows := A_Index
		
		; Avoid valid delimiters in quoted string	
		Result =		; Clear the result
		for each, char in StrSplit(StrLine[A_Index])
		{	if !inQuotes
			{	if (char = """")
					inQuotes := true
			}
			else 
			{	if (char ~= "[" validDelim "]")
				{	;~ result.modif := true
					for k, v in StrSplit(validDelim)
						char := RegExReplace(char, v, maskKO[k])
				}				
				
				if (char = """")
					inQuotes := false
			}
			;~ result.line .= char
			StrVar .= char
		}
	}
	StrVar := substr(RegExReplace(StrVar, """"),1,-1)	; no need qouted string in analysis
	
	Obj := Object()
	Loop Parse, StrVar
	{	if ( RegExMatch(A_LoopField, "`n|`r") or RegExMatch(A_LoopField, "[^" validDelim "]"))	; grab only valid delimiters
		Continue
		
		if (!Obj.haskey(Asc(A_LoopField)))
			obj[Asc(A_LoopField)] := 1
		else
		{	obj[Asc(A_LoopField)] += 1
			MaxDLM := MaxDLM > obj[Asc(A_LoopField)] ? MaxDLM : obj[Asc(A_LoopField)]	; Max delimiter presence in file
		}
	}
	
	;------------------------------------------------------------------------------------------------***
	; save the delimiter only if is present in all record in equal or multiple count
	;------------------------------------------------------------------------------------------------***
	Candidate := Object(), cnt:= 0
	for k, v in Obj
	{	charX := ( (k < 32 or k > 127) and k <> 9 ) ? "N/A" : chr(k)
		if ( Mod(v, totRows) = 0 and charX <> "N/A") 
			Candidate[++cnt] := chr(k) 
	}
	
	if ( cnt = 0 )
	{	ResDelim := "  "
		Loop % StrLen(validDelim)
		{	Count += 1
			DelimChr := SubStr(validDelim, A_Index , 1)
			If Asc(DelimChr) = 9
				DelimChr = {TAB}
			If Asc(DelimChr) = 32
				DelimChr = {Space}
			If ( A_Index > 1 )
				ResDelim .= DelimChr "  "
			else
				ResDelim .= "  " DelimChr "  "
		}
		MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%,
		( LTrim
		 %	"No one of the possible (" Count ") delimiters [" ResDelim "]
			are present in the CSV-file `n
			This program will be closed"
		)
		ExitApp
	}
	
	if ( cnt > 1 )
	{	Loop % cnt
			cntAll .= "Delimiter " A_Index " ( " Candidate[A_Index] " - chr(" asc(Candidate[A_Index]) "))`n"
		MsgBox ,, Row %A_LineNumber% -> %A_ScriptName%,
		( LTrim 
		 %	"This program can't automatically analyze the delimiter 
			" cnt " delimiters is present `n
			" cntAll "
			can't choose any of these, and this program will be closed."
		)
		ExitApp
	}

	Return % Candidate[1]
}
Odlanir
Posts: 659
Joined: 20 Oct 2016, 08:20

Re: Translate PowerShell to AHK - possible?

26 Jun 2019, 04:57

@Albireo You don't need to mask the delimiter in your version. In my script was necessary to split the record for show the result in a Litview but if you don't need the gui part you can simplify the script in this way.

Code: Select all

validDelim := "[;,`t]"
;~ validDelim := "(*UCP)[^[:alnum:]]" 
fname := "yourfilename"
Obj := Object()
Loop, Read, % fname
{   
   if A_Index > 10
      break
   totRows := A_Index
   FindValidDelimitersNotInQuote(A_LoopReadLine)
}   

Candidate := Object(), cnt:= 0
for k, v in Obj {   
   if (Mod(v,totRows) = 0 ) 
      Candidate[++cnt] := chr(k)
}
names := {"`t" : "Tab" , "," : "Comma", ";":"Semicolon"}
strOUT := "Found " Cnt " delimiter(s).`n"
for each, delim in Candidate
   strOUT .=  delim "`t" names[delim] "`n"
MsgBox %strOUT%
exitapp 

FindValidDelimitersNotInQuote(line) {
	global validDelim, Obj
	for each, char in StrSplit(line) {
		if !inQuotes 		{
			if (char = """") {
				inQuotes := true
                continue
			}
            if (char ~= validDelim) {
               if (!Obj.haskey(Asc(char))) {
                  obj[Asc(char)] := 1
               } else {
                  obj[Asc(char)] += 1         
               } 
            }

		} else {
			if (char = """")
				inQuotes := false
		}
	}
}	
exitapp
____________________________________________________________________________
Windows 10 Pro 64 bit - Autohotkey v1.1.30.01 64-bit Unicode
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

30 Jun 2019, 06:35

Odlanir wrote:
26 Jun 2019, 04:57
@Albireo You don't need to mask the delimiter in your version. In my script was necessary to split the record for show the result in a Litview but if you don't need the gui part you can simplify the script in this way.

Code: Select all

validDelim := "[;,`t]"
;~ validDelim := "(*UCP)[^[:alnum:]]" 
fname := "yourfilename"
Obj := Object()
Loop, Read, % fname
{   
   if A_Index > 10
      break
   totRows := A_Index
   FindValidDelimitersNotInQuote(A_LoopReadLine)
}   

Candidate := Object(), cnt:= 0
for k, v in Obj {   
   if (Mod(v,totRows) = 0 ) 
      Candidate[++cnt] := chr(k)
}
names := {"`t" : "Tab" , "," : "Comma", ";":"Semicolon"}
strOUT := "Found " Cnt " delimiter(s).`n"
for each, delim in Candidate
   strOUT .=  delim "`t" names[delim] "`n"
MsgBox %strOUT%
exitapp 

FindValidDelimitersNotInQuote(line) {
	global validDelim, Obj
	for each, char in StrSplit(line) {
		if !inQuotes 		{
			if (char = """") {
				inQuotes := true
                continue
			}
            if (char ~= validDelim) {
               if (!Obj.haskey(Asc(char))) {
                  obj[Asc(char)] := 1
               } else {
                  obj[Asc(char)] += 1         
               } 
            }

		} else {
			if (char = """")
				inQuotes := false
		}
	}
}	
exitapp
I like your solution!
But a problem arose, when I did a test.
If the file in which the delimiter is to be checked, have (many) empty rows in or in the begin of the file, these must be handled.
A CSV datafile, something like this .:




Row5;Info1;Info2;Info3
;;;
Row7;Info1a;Info2a;Info3a
;;;
;;;
;;;
;;;
(in my case is also ";;;" empty rows - but its harder to handle)
I don't know the delimiter and number of fields... (in the begining of the delimiter function ;) )

Is the best way to handle blank rows, before checking of the delimiter?
Something similar to this .:

Code: Select all

validDelim := "[;,`t]"
;~ validDelim := "(*UCP)[^[:alnum:]]" 
FileSelectFile fname, 3, A_ScriptDir\, Open a file, Text Documents (*.csv; *.txt)	; choose a file to analyze
; fname := "yourfilename"
; fname := "yourfilename"
Obj := Object()
MaxRows = 10

Loop Read, % fname
{	If ( Count < MaxRows )
	{	If ( strLen(A_LoopReadLine) > 3 )
		{	Data .= A_LoopReadLine "`n"
			Count += 1
		}
	}
}
; MsgBox % Data "`n`nRows .: " Count


Loop, Parse, Data, `n
{   
   totRows := A_Index
   FindValidDelimitersNotInQuote(A_LoopField)
}...
Odlanir
Posts: 659
Joined: 20 Oct 2016, 08:20

Re: Translate PowerShell to AHK - possible?

30 Jun 2019, 06:49

Replace the Loop Read with this:

Code: Select all

Loop, Read, % fname
{   
	if ( trim(A_LoopReadLine) = "" )
		continue
	++cnt
	if ( cnt > 10 )
		break
	totRows := cnt
	FindValidDelimitersNotInQuote(A_LoopReadLine)
} 
BTW what kind of program writes a CSV with so many blank lines ?
Any program I know in absence of values writes the delimiter to ensure the stucture is kept.
____________________________________________________________________________
Windows 10 Pro 64 bit - Autohotkey v1.1.30.01 64-bit Unicode
Albireo
Posts: 1747
Joined: 16 Oct 2013, 13:53

Re: Translate PowerShell to AHK - possible?

30 Jun 2019, 13:05

Odlanir wrote:
30 Jun 2019, 06:49
... BTW what kind of program writes a CSV with so many blank lines ? ...
I don't know either ;) I think it's better to handle empty lines than not to do it - if it happens…
Have rewritten the delimiter function() so it can handle both CSV-strings and CSV-objects.
Search the delimiters in CSV file

Return to “Ask for Help (v1)”

Who is online

Users browsing this forum: Google [Bot], mikeyww and 174 guests