AutoHotkey Community

It is currently May 26th, 2012, 7:57 am

All times are UTC [ DST ]




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: June 10th, 2008, 2:37 pm 
Offline

Joined: March 6th, 2007, 4:35 pm
Posts: 64
Location: Columbus, OH, USA
Is there a character limit with A_LoopField?

I have 122301 characters in one field, and I'm parsing the fields from a text file. But this field gets cut off and the remaining part is treated as another field...

If there is a limit, is there a workaround?

-Thanks

_________________
My startup is Telesaur - a telecommuting job site.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 10th, 2008, 3:23 pm 
This works for me...
Code:
Loop, 150000
   a .= "b"

a .= "`n"

FileAppend, %a%, test.txt
a=
FileRead, c, test.txt

Loop, Parse, c, `n
   MsgBox % StrLen(A_LoopField)


Are you sure you aren't encountering an extra delimiter in that large field?


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: June 10th, 2008, 3:34 pm 
Offline

Joined: March 6th, 2007, 4:35 pm
Posts: 64
Location: Columbus, OH, USA
Zappo- Thanks for testing that...

I'm pretty sure the delimiters aren't found within the field. I'm using ΓΏ (alt+0255) as the delimiter, but I'm going to try to isolate the problem with your example... since your test works, it must be something in the field.

One other thought, do new lines effect parsing at all? I'm using UltraEdit, and it wraps the line at around 4100 characters...

_________________
My startup is Telesaur - a telecommuting job site.


Report this post
Top
 Profile  
Reply with quote  
PostPosted: June 10th, 2008, 3:43 pm 
Online
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
elchapin wrote:
Is there a character limit with A_LoopField?


No! The #maxmem directive affects it - whose default value is 65MB per variable:

Code:
#MaxMem 65 ; Default Value
VarSetCapacity( A,(32*1024*1024-1),65 ), VarSetCapacity( B,(32*1024*1024-0),66 ), L := A "`n" B
Loop, Parse, L, `n
  MsgBox, 0, % StrLen(A_LoopField), %A_LoopField%

_________________
URLGet - Internet Explorer based Downloader
StartEx - Portable Shortcut Link


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 10th, 2008, 3:54 pm 
MmmHmmm it something in the field. I don't think using a Unicode character as a delemiter is going to work too well.

Is it possible to swap that out with a regular ANSI character (like @ or ^ or something)?

@SKAN: But the line is only just over 120kb :D


Report this post
Top
  
Reply with quote  
PostPosted: June 10th, 2008, 3:55 pm 
Offline

Joined: March 6th, 2007, 4:35 pm
Posts: 64
Location: Columbus, OH, USA
SKAN wrote:
Code:
#MaxMem 65 ; Default Value
VarSetCapacity( A,(32*1024*1024-1),65 ), VarSetCapacity( B,(32*1024*1024-0),66 ), L := A "`n" B
Loop, Parse, L, `n
  MsgBox, 0, % StrLen(A_LoopField), %A_LoopField%


Just to make sure I understand...
Does that mean that the character limit is 33554431 and MB limit is 65 for variable "A"?

Thanks for pointing out VarSetCapacity, SKAN.

_________________
My startup is Telesaur - a telecommuting job site.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 10th, 2008, 3:58 pm 
Offline

Joined: March 6th, 2007, 4:35 pm
Posts: 64
Location: Columbus, OH, USA
Zippo() wrote:
MmmHmmm it something in the field. I don't think using a Unicode character as a delemiter is going to work too well.

Is it possible to swap that out with a regular ANSI character (like @ or ^ or something)?

@SKAN: But the line is only just over 120kb :D


Unfortunately, I don't think I can swap it out. The field contains a bunch of OCR text from emails. But it doesn't hurt to try...

Is there an incompatibility with Unicode characters?

_________________
My startup is Telesaur - a telecommuting job site.


Report this post
Top
 Profile  
Reply with quote  
PostPosted: June 10th, 2008, 4:05 pm 
Online
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
elchapin wrote:
Just to make sure I understand...
Does that mean that the character limit is 33554431 and MB limit is 65 for variable "A"?


A is 32 MB - 1 byte
and
B is 32 MB

If I do not minus 1 from A then variable L will be longer than 64 MB as I am including a linefeed while concatenating A and B

:)

_________________
URLGet - Internet Explorer based Downloader
StartEx - Portable Shortcut Link


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 10th, 2008, 4:12 pm 
I don't know how much of an incompatibility there really is now as many work-arounds have been posted. Natively AHK has problems with Unicode.

A quick search on it might fix you up if it is too much of a pain to change the delimiters. :)


Report this post
Top
  
Reply with quote  
 Post subject:
PostPosted: June 10th, 2008, 9:19 pm 
Offline

Joined: March 6th, 2007, 4:35 pm
Posts: 64
Location: Columbus, OH, USA
I tried to pinpoint the problem by creating a text file with "^" for delimiters. Two fields are used: title^body^.

Here's a link to the text file named "sample6.txt": http://drop.io/7rqgfsl

If you don't want to download the text file, this is basically what it is:

title^body^
test^zzzzzzzzzz...(z 65519 times)...zzzzzzzzzz This is where the problem starts^


And here's script:

Code:
Loop, read, sample6.txt
{
If A_LoopReadLine  1
   FileAppend, `n, output.csv

    ; Loop, parse, current line being read, character that divides field, character to omit
    Loop, parse, A_LoopReadLine, ^
    {
        If A_Index = 1
           {
              CurrentField = `"%A_LoopField%`",
              FileAppend, %CurrentField%, output.csv
           }
        else
        If A_Index = 2
           {
                  continue
           }
    }
}
return


It should write the first field of each line to a file named "output.csv", but this is what I get instead:

Quote:

"title",
"test",
"ere the problem starts",


The end of the field on line two is added! Agh! :(

_________________
My startup is Telesaur - a telecommuting job site.


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 11th, 2008, 3:00 am 
Now the problem is the soft line breaks I think.

Is this what you want?
Code:
FileRead, OutputVar, sample6.txt
FileAppend, `n, output.cvs

Loop, Parse, OutputVar, `n
{
   Loop, Parse, A_LoopField, ^
   {
      If A_Index > 1
         Continue

      FileAppend, %A_LoopField%`n, output.csv
   }
}

OutputVar=


Report this post
Top
  
Reply with quote  
PostPosted: June 11th, 2008, 12:26 pm 
Offline

Joined: October 17th, 2006, 4:15 pm
Posts: 7502
Location: Australia
SKAN wrote:
elchapin wrote:
Is there a character limit with A_LoopField?

No! The #maxmem directive affects it

  1. #MaxMem affects the automatic expansion of variables.
  2. Think of a built-in variable as a function: you cannot assign to it, only retrieve a value. The concept of automatic expansion does not apply, so neither does #MaxMem.
  3. When Loop, Parse begins, it creates a copy of the input variable's contents. I haven't delved very deeply, but it seems A_LoopField actually points to a location within this copy. It makes sense for performance: the string is copied only once, at the beginning of the loop. This also means that no restriction should be applied, since the text is already in memory.
Code:
; Set maximum to 1MB.
#MaxMem 1
MB:=1024*1024

; Create a 4MB variable.
VarSetCapacity(A, 4*MB, 65)
MsgBox % "StrLen(A) = " StrLen(A)

; Insert a delimiter at 2MB.
NumPut(Asc("|"), A, 2*MB, "char"), VarSetCapacity(A,-1)

; A_LoopField is not restricted by #MaxMem.
Loop, Parse, A, |
    MsgBox % "StrLen(A_LoopField) = " StrLen(A_LoopField)

; B must be expanded to fit A_LoopField, but #MaxMem causes it to fail.
Loop, Parse, A, |
    B := A_LoopField

SKAN wrote:
whose default value is 65MB per variable:
The default is 64MB.


Report this post
Top
 Profile  
Reply with quote  
PostPosted: June 11th, 2008, 12:30 pm 
Online
User avatar

Joined: December 26th, 2005, 4:40 pm
Posts: 8776
Lexikos wrote:
The default is 64MB.


Uh! sorry.. that was a typo. Thanks for the clarification. :)

_________________
URLGet - Internet Explorer based Downloader
StartEx - Portable Shortcut Link


Report this post
Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: June 11th, 2008, 1:27 pm 
Offline

Joined: March 6th, 2007, 4:35 pm
Posts: 64
Location: Columbus, OH, USA
Zippo() wrote:
Now the problem is the soft line breaks I think.

Is this what you want?
Code:
FileRead, OutputVar, sample6.txt
FileAppend, `n, output.cvs

Loop, Parse, OutputVar, `n
{
   Loop, Parse, A_LoopField, ^
   {
      If A_Index > 1
         Continue

      FileAppend, %A_LoopField%`n, output.csv
   }
}

OutputVar=


Zippo(), thanks! That worked! It never occurred to me that I could use `n to parse a variable! Wow, that is beautiful!

SKAN / Lexikos - thanks for taking the time to explain things. That helps alot. :D

_________________
My startup is Telesaur - a telecommuting job site.


Report this post
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC [ DST ]


Who is online

Users browsing this forum: poserpro, sjc1000, Tilter_of_Windmills and 62 guests


You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group